cpdetector is a small yet clever framework for
codepage detection that integrates different
strategies. It may be used as a library for third
party software that accesses textual data over
network. It also includes a best-practice
implementation in form of a command line tool that
allows sorting and transforming large collections
of documents based on their codepage. Available
strategies include: jchardet (exclusion, frequency
analysis, and guessing), detection of the HTML
charset property, and detection of the XML
encoding declaration.