You are viewing a plain text version of this content. The canonical link for it is here.

Posted to j-users@xerces.apache.org by Greg Hess <gh...@wrappedapps.com> on 2003/04/17 20:29:51 UTC

How is the encoding attribute used?

Hi All,
 
How does the Xerces parser handle the document declaration <?xml
version="1.0" encoding="utf-8" ?> and is it used by the parser?
Does it automatically encode and decode values inserted and fetched in
text nodes? Is the programmer required to evaluate this attribute and
determine how to decode the XML document? If it is not automatically
encoded/decoded are there any tools to provide this functionality for
ISO-8859-1/ UTF-8 character data?
 
Many thanks,
 
Greg

Re: How is the encoding attribute used?

Posted by Andy Clark <an...@apache.org>.

Greg Hess wrote:
> How does the Xerces parser handle the document declaration |<?xml 
> version="1.0" encoding="utf-8" ?> ||and is it used by the parser?|

It's handled by the parser. The specification requires
specific encodings (e.g. UTF-8 and UTF-16) and states
that encodings need to be specified with valid IANA
encoding names.

Xerces supports all of the encodings with available
decoders in the Java runtime. If you find an encoding
that doesn't work but is present in Java, then tell
us so that we can add the appropriate encoding name
mapping to the parser.

What does this mean to you?

You should always allow Xerces to figure out what
encoding to use and handle that automatically for you.
This means using InputStream, not Reader. If, however,
you absolutely know the encoding of the file being
parsed, then you can use Reader.

-- 
Andy Clark * andyc@apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org