You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@xml.apache.org by Dave Raskin <dr...@rimage.com> on 2002/08/23 15:44:24 UTC

Xerces-C user question

I didn't see a specific mailing list for Xerces-C, so I am sending my
question here, please help!
 
I am using Xerces 1.7, C++ version on Windows and am having problem parsing
a unicode XML string with the SAX parser. 
 
I have no problems doing this when my code is compiled for _MBCS. 
 
When my code is compiled for _UNICODE, the parser never calls
resolveEntity() or startElement(), or anything else except startDocument()
and endDocument().
 
I stepped through the Xerces code and it seems that somewhere along the way
extra null bytes are added after each character (which is already 2 bytes
long in Unicode).  After this, tokenizer routines detect a premature EOF and
the code returns without parsing anymore of the document.
 
Is this a bug in Xerces or am I doing something wrong.  Here's a sample of
my code (handler is an extention of DefaultHandler):
 
 
 SAX2XMLReader* parser = XMLReaderFactory::createXMLReader();
 
  parser->setContentHandler(handler);
  parser->setErrorHandler(ehandler);
  parser->setDTDHandler(dHandler);
  parser->setEntityResolver(rHandler);
  parser->setFeature(L" http://xml.org/sax/features/validation
<http://xml.org/sax/features/validation> ", true);

 try
 {
      errorCount = 0;
      int tcharSize = sizeof _TCHAR;
      int length = _tcslen(xmlString) * tcharSize;
      MemBufInputSource inStr((XMLByte*)xmlString, length, _T("fakeBufId"));
      parser->parse(inStr);
      errorCount = parser->getErrorCount();
 }
 catch (const XMLException& e) 
 {
      throw e;
 }
 catch (const SAXParseException& e) 
 {
      throw e;
 }

thanks, any and all help is appreciated!
 
Dave Raskin
Rimage Corporation