You are viewing a plain text version of this content. The canonical link for it is here.

Posted to c-dev@xerces.apache.org by po...@xania.demon.co.uk on 2004/01/30 12:59:15 UTC

numerical character references

I've been using Xerces-C effectively for quite some time but am having a problem with numerical character references.  

I've implemented the DefaultHandler interface and my handler class is set up to as  SAX2XMLReader's error and content handler. I override various methods such as endDocument() and characters() - this all works very well. 

Except numerical character references for characters in the latin character set such as E with an accent, which is represented in the document as &#201;  When the characters() method gets invoked, Xerces doesn't seem to be converting it to the correct character (I call XMLString::transcode() on the data). Furthermore, any remaining character data in the XML element following the numerical character reference gets skipped by the parser, and I don't understand why this is happening.   

However other characters DO work, for example &#38; do get converted to their respective characters correctly.

The XML documents are encoded as ISO-8859-1. My platform is Linux/gcc-3.0 and I've been testing this with a locale (LC_ALL) of iso_8859_1 and iso-8859-1.  

Can anyone offer any suggestions?



---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org