You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xerces.apache.org by "Lemmin, Harald" <Ha...@softwareag.com> on 2002/09/09 14:32:14 UTC

Encoding detection

Hello,

My task is as follows:
If the parser fails to parse an XML document into the DOM, the document has
to be read as a String with the correct encoding. The document is read from
a byte stream. For that I have to detect the encoding.

I found some pieces of encoding detection in the XMLEntityManager:
auto-detection from the first four bytes, creation of the reader.
But I could not find where the encoding name is read from the xml-header.

Is there any other ready-to-use code that does this job?
Any other suggestions?
  
Kind regards,
Harald


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org


Re: Encoding detection

Posted by Andy Clark <an...@apache.org>.
Lemmin, Harald wrote:
> I found some pieces of encoding detection in the XMLEntityManager:
> auto-detection from the first four bytes, creation of the reader.
> But I could not find where the encoding name is read from the xml-header.

That code tries to auto-detect the encoding from the
first bytes in the file. The code to check the encoding
specified in the XMLDecl/TextDecl is in the scanner
classes. I think "XMLScanner" has what you're looking
for.

-- 
Andy Clark * andyc@apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org