You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xerces.apache.org by Shekhar Karani <2k...@sun20.datamatics.com> on 2003/04/01 07:51:46 UTC

Re: UTF-8 Encoding

Thanks a lot guys. I will surely try this out.

Shekhar
----- Original Message -----
From: Andy Clark <an...@apache.org>
To: <xe...@xml.apache.org>
Sent: Monday, March 31, 2003 11:33 PM
Subject: Re: UTF-8 Encoding


> Michael Glavassevich wrote:
> > If you absolutely cannot get alter your input document, you can try
setting
> > your own character reader on the input source. This will force the
parser
> > to use your own reader. If you have an InputStream to the document you
can
> > easily get one for ISO-8859-1 using an InputStreamReader.
>
> Michael is right. If you know the actual encoding of the
> document, then you can follow this approach and it will
> always work because the parser will not try to perform
> any auto-detection. For example:
>
>    InputStream stream = /* ... */;
>    Reader reader = new InputStreamReader(stream, "ISO-8859-1");
>
>    InputSource source = new InputSource(reader);
>    // NOTE: Also set the system id so that the parser can
>    //       resolve relative URIs.
>
> However, in general, you should let the parser do the
> auto-detection of the character encoding. But if you're
> stuck in the situation where someone has given you a
> document that is not well-formed because the specified
> encoding is wrong, then use this method to work around
> the problem.
>
> --
> Andy Clark * andyc@apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-j-user-help@xml.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org