You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cocoon.apache.org by Stephen Zisk <sz...@mediabridge.net> on 2000/09/01 19:16:37 UTC
Re: [Cocoon Devel] Re: How to determine encoding?
> >>Is there a way to determine what encoding was specified for an
> >>XML document after parsing?
>
>Ricardo,
>Did you ever work out how to do this?
>How?
>
>thanks Jeremy
Am I missing something? I would have said that if there is not explicit
encoding information, there is no way to accurately derive the encoding.
The ISO-8859-x character encoding definitions, Windows code pages, and even
UTF-8 all represent the ASCII character complement using the same one-byte
encoding as ASCII itself, so unless you propose accented character and
language matching, how can you distinguish among any of these in a file
when most of the characters are part of the ASCII complement?
You might have a chance distinguishing UTF-8 from the others by recognizing
common multi-byte sequences, but for all of the one-byte encodings, most of
the non-ASCII character codes represent meaningful characters. This is
especially true for minor variants like ISO-8859-1 vs ISO-8859-17.
Stephen Zisk
----------
Stephen Zisk MediaBridge Technologies
email: szisk@mediabridge.net 100 Nagog Park
tel: 978-795-7040 Acton, MA 01720 USA
fax: 978-795-7100 http://www.mediabridge.net