You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-dev@xerces.apache.org by David Bourget <ro...@dbourget.com> on 2000/04/23 22:53:32 UTC

Re: Xerces parsing - how to get the character set ?

Hi Francois,

1- You can choose the output encoding by constructing and XMLSerializer with
an OutputFormat specifying your desired encoding.

2- You can get the output encoding from an InputSource using the
getEncoding() method.
However, my opinion is that this should not be done in a perfect world
because the document data should not be binded to any particular encoding.
But I guess the emitters of your input documents can only parse specifying
encodings. You might want to send you documents encoded in UTF-8 as they are
required to understand it if their parser is conformant.  (please verify
this affirmation).

I hope this help, please let me know if you find a better solution.

David

----- Original Message -----
From: "Francois Granade" <fr...@viafone.com>
To: <xe...@xml.apache.org>
Sent: Monday, April 24, 2000 2:44 PM
Subject: Xerces parsing - how to get the character set ?


> I'm not sure this is the right list to ask this question, please don't
> hesitate to redirect me to the right list...
>
> When I'm parsing a document with an character set encoding like this:
> <?xml version="1.0" encoding="ISO-8859-1"?>
> then Xerces handles the char set properly.
>
> Nevertheless, once it's parsed, how can I know what character set this
> document is using - and how can I re-export my "org.w3c.dom.Document"
> tree into a stream (or file) with the right character set ?
> If I don't set the right character set, I may not be abble to re-parse
> my document without getting errors...
>
> It seems that Xerces parser is stripping out the encoding information
> from the tree - this information is only used and managed by the parser;
> it does not appear as a "org.apache.xerces.dom.ProcessingInstruction" in
> the tree...
>
> How can I know what charset was used ?
>
> Is it a known issue with the DOM API ? or am I missing something ?
>
> Francois
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-j-dev-help@xml.apache.org
>
>