You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xerces.apache.org by "Lemmin, Harald" <Ha...@softwareag.com> on 2002/09/09 14:32:14 UTC
Encoding detection
Hello,
My task is as follows:
If the parser fails to parse an XML document into the DOM, the document has
to be read as a String with the correct encoding. The document is read from
a byte stream. For that I have to detect the encoding.
I found some pieces of encoding detection in the XMLEntityManager:
auto-detection from the first four bytes, creation of the reader.
But I could not find where the encoding name is read from the xml-header.
Is there any other ready-to-use code that does this job?
Any other suggestions?
Kind regards,
Harald
---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org
Re: Encoding detection
Posted by Andy Clark <an...@apache.org>.
Lemmin, Harald wrote:
> I found some pieces of encoding detection in the XMLEntityManager:
> auto-detection from the first four bytes, creation of the reader.
> But I could not find where the encoding name is read from the xml-header.
That code tries to auto-detect the encoding from the
first bytes in the file. The code to check the encoding
specified in the XMLDecl/TextDecl is in the scanner
classes. I think "XMLScanner" has what you're looking
for.
--
Andy Clark * andyc@apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org