You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-dev@xerces.apache.org by Jo...@HypoVereinsbank.DE on 2000/12/12 17:45:47 UTC

How can I tell the DOMParser to use encoding ISO-8859-1

Hi,

I have a problem with encoding. I'm working with the Solaris OS.

When the encoding attribute is present in the xml header everything is ok.
e.g.  <?xml version="1.0" encoding="ISO-8859-1"?>

Unfortunatly the encoding is skipped in the most cases  e.g.  <?xml
version="1.0"?>
 and so we assume the xml is encoded in ISO-8859-1.

Now my question: How can I tell the DOMParser it should use ISO-8859-1 and
not UTF-8, which seems to be the default.

I tried XMLDecl and TextDecl, but I got a segmentfault, when I call these
before I call parse.

Do I have to correct the xml header by inserting encoding? (Which would
solve my problem, but it's not very nice!! :-()

Thanks for any suggestion.

Regards
Jörg



 




Re: How can I tell the DOMParser to use encoding ISO-8859-1

Posted by Khaled Noaman <kn...@ca.ibm.com>.
Before parsing a document, the parser will try to recognize the encodings the
xml document is in, and sets the encoding accordingly. When a xml declaration
is parsed and an encoding string exists, the parser will try to change the
encodings to the one specfied in the xml declaration. If it fails or if there
is no encoding in the xml declaraion, the detected encodings will be used.

There is no method to tell the parser what encodings to use. The author of the
document is responsible for informing the parser of the proper encoding (xml
declaration).

One suggestion that I have is to create XMLTranscoder object with the
'ISO-8859-1' encoding and use it to transcode the DOM info, after parsing the
document, to see if it solves your problem.

Regards,
Khaled Noaman
XML Parser Dev.- IBM Toronto Lab
knoaman@ca.ibm.com

Joerg.Siebel.Extern@HypoVereinsbank.DE wrote:

> Hi,
>
> I have a problem with encoding. I'm working with the Solaris OS.
>
> When the encoding attribute is present in the xml header everything is ok.
> e.g.  <?xml version="1.0" encoding="ISO-8859-1"?>
>
> Unfortunatly the encoding is skipped in the most cases  e.g.  <?xml
> version="1.0"?>
>  and so we assume the xml is encoded in ISO-8859-1.
>
> Now my question: How can I tell the DOMParser it should use ISO-8859-1 and
> not UTF-8, which seems to be the default.
>
> I tried XMLDecl and TextDecl, but I got a segmentfault, when I call these
> before I call parse.
>
> Do I have to correct the xml header by inserting encoding? (Which would
> solve my problem, but it's not very nice!! :-()
>
> Thanks for any suggestion.
>
> Regards
> Jörg
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-c-dev-help@xml.apache.org