You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@xml.apache.org by Dave Raskin <dr...@rimage.com> on 2002/08/23 15:44:24 UTC
Xerces-C user question
I didn't see a specific mailing list for Xerces-C, so I am sending my
question here, please help!
I am using Xerces 1.7, C++ version on Windows and am having problem parsing
a unicode XML string with the SAX parser.
I have no problems doing this when my code is compiled for _MBCS.
When my code is compiled for _UNICODE, the parser never calls
resolveEntity() or startElement(), or anything else except startDocument()
and endDocument().
I stepped through the Xerces code and it seems that somewhere along the way
extra null bytes are added after each character (which is already 2 bytes
long in Unicode). After this, tokenizer routines detect a premature EOF and
the code returns without parsing anymore of the document.
Is this a bug in Xerces or am I doing something wrong. Here's a sample of
my code (handler is an extention of DefaultHandler):
SAX2XMLReader* parser = XMLReaderFactory::createXMLReader();
parser->setContentHandler(handler);
parser->setErrorHandler(ehandler);
parser->setDTDHandler(dHandler);
parser->setEntityResolver(rHandler);
parser->setFeature(L" http://xml.org/sax/features/validation
<http://xml.org/sax/features/validation> ", true);
try
{
errorCount = 0;
int tcharSize = sizeof _TCHAR;
int length = _tcslen(xmlString) * tcharSize;
MemBufInputSource inStr((XMLByte*)xmlString, length, _T("fakeBufId"));
parser->parse(inStr);
errorCount = parser->getErrorCount();
}
catch (const XMLException& e)
{
throw e;
}
catch (const SAXParseException& e)
{
throw e;
}
thanks, any and all help is appreciated!
Dave Raskin
Rimage Corporation