You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-dev@xerces.apache.org by Ma...@VerizonWireless.com on 2005/08/09 16:19:07 UTC

problem with XML encoding

We are using Xerces SAX parser to parse the incoming XML. In some cases the
XML is formed with characters that were copied and pasted from MS Word
document. It seems that the character set should be "windows-1252" in this
case.<?xml:namespace prefix = o ns =
"urn:schemas-microsoft-com:office:office" />

If such an XML is parsed with "utf-8" encoding, Internet Explorer and out
application give the same error message that there is an invalid character
encountered.  When this XML is parsed with "windows-1252" IE is able to
display it properly, but our application does not. The character set in out
application is set to 1252.

Why are we not able to display the characters properly? Does anybody know
the solution to this?

Attached is the sample XML file, and a word document with screen shots of
the problem in our application.

 
Thanks,
Marina
908 607 8580

 
 
___________________________________________________________________
The information contained in this message and any attachment may be
proprietary, confidential, and privileged or subject to the work
product doctrine and thus protected from disclosure.  If the reader
of this message is not the intended recipient, or an employee or
agent responsible for delivering this message to the intended
recipient, you are hereby notified that any dissemination,
distribution or copying of this communication is strictly prohibited.
If you have received this communication in error, please notify me
immediately by replying to this message and deleting it and all
copies and backups thereof.  Thank you.