You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-dev@xerces.apache.org by Steel Bash <st...@chez.com> on 2004/01/02 14:07:58 UTC

Seg fault when receiving cyrillic characters

Hello,

We've developped an application that works great with latin characters. We need now to make it working with languages such as russian or japanese.
For now when the parser receives a russian characters we have a segmentation fault. What do we need to do to make the parser handling unicode characters ?

Here is a snippet of the code we are using:
and the XMl document that is sent as a test
<?xml version="1.0" encoding="UTF-16"?>
<!DOCTYPE event SYSTEM "event.dtd">
<event id="1">
   <argument name="sArg">???? Russian characters here ???</argument>
</event>

MyEventAccumulator::_doit()
{
char * evt;
MyEventSaxHandler handler;
SAXParser parser;
SAXParser::ValSchemes valScheme = SAXParser::Val_Always;
bool doNamespaces = false;
bool doSchema = false;

parser.setValidationScheme(valScheme);
parser.setDoNamespaces(doNamespaces);
parser.setDoSchema(doSchema);
parser.setDocumentHandler(&handler);
parser.setErrorHandler(&handler);
parser.setEntityResolver(&handler);
....
MemBufInputSource * memBufIS = new MemBufInputSource
(
(const XMLByte *) (evt),
strlen(evt),
"some event",
false
);
try
{
parser.parse(*memBufIS);
}
catch (const XMLException & e)
{
// Parse error
// cerr << "Error parsing event" << endl;
}
}
...
}

This is where the segmentation fault occurs (before catching any exceptions).

The DDD backtrace

#8 0x0805327a in MyEventAccumulator::_doit() at MyEventAccumulator.cpp:117
#7 0x4018d373 in SAXParser::parse() from libxerces-c1_7_0.so
#6 0x401d339c in XMLScanner::scanDocument() from libxerces-c1_7_0.so
#5 0x401d5a6d in XMLScanner::scanContent() from libxerces-c1_7_0.so
#4 0x401d07d5 in XMLScanner::scanCharData() from libxerces-c1_7_0.so
#3 0x401ce4ea in XMLScanner::sendCharData() from libxerces-c1_7_0.so
#2 0x4018dcf8 in SAXParser::doCharacters() from libxerces-c1_7_0.so
#1 0x0805503f in MyEventSaxHandler::characters() at bastring.h:343
#0 0x0804fc84 in MyEvent::addArgument() at straits.h:125

Any idea ?

Christelle


********** PROTEGEZ VOS E-MAILS !********** 
Avec Tiscali SuperMail, vos e-mails en toute sécurité ! 
Anti Spam personnalisable 
Anti Virus actualisé en permanence 
et de nombreux bonus... 
Pour en savoir plus, rendez-vous sur http://www.tiscali.fr/supermail/



---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


Re: Seg fault when receiving cyrillic characters

Posted by Alberto Massari <am...@progress.com>.
As the stack shows, the crash is happening inside your own code 
(MyEvent::addArgument() at straits.h:125). Unless you post it, we cannot 
help you.
Also, I see that you are creating a MemBufInputSource object that holds the 
XML document; but the document seems to be Unicode, but you are using 
strlen to compute its size. You should enter the size (in bytes) of the 
buffer, and not relying on strlen (that works only on non-Unicode buffers)

Alberto

At 14.07 02/01/2004 +0100, Steel Bash wrote:
>Hello,
>
>We've developped an application that works great with latin characters. We 
>need now to make it working with languages such as russian or japanese.
>For now when the parser receives a russian characters we have a 
>segmentation fault. What do we need to do to make the parser handling 
>unicode characters ?
>
>Here is a snippet of the code we are using:
>and the XMl document that is sent as a test
><?xml version="1.0" encoding="UTF-16"?>
><!DOCTYPE event SYSTEM "event.dtd">
><event id="1">
>    <argument name="sArg">???? Russian characters here ???</argument>
></event>
>
>MyEventAccumulator::_doit()
>{
>char * evt;
>MyEventSaxHandler handler;
>SAXParser parser;
>SAXParser::ValSchemes valScheme = SAXParser::Val_Always;
>bool doNamespaces = false;
>bool doSchema = false;
>
>parser.setValidationScheme(valScheme);
>parser.setDoNamespaces(doNamespaces);
>parser.setDoSchema(doSchema);
>parser.setDocumentHandler(&handler);
>parser.setErrorHandler(&handler);
>parser.setEntityResolver(&handler);
>....
>MemBufInputSource * memBufIS = new MemBufInputSource
>(
>(const XMLByte *) (evt),
>strlen(evt),
>"some event",
>false
>);
>try
>{
>parser.parse(*memBufIS);
>}
>catch (const XMLException & e)
>{
>// Parse error
>// cerr << "Error parsing event" << endl;
>}
>}
>...
>}
>
>This is where the segmentation fault occurs (before catching any exceptions).
>
>The DDD backtrace
>
>#8 0x0805327a in MyEventAccumulator::_doit() at MyEventAccumulator.cpp:117
>#7 0x4018d373 in SAXParser::parse() from libxerces-c1_7_0.so
>#6 0x401d339c in XMLScanner::scanDocument() from libxerces-c1_7_0.so
>#5 0x401d5a6d in XMLScanner::scanContent() from libxerces-c1_7_0.so
>#4 0x401d07d5 in XMLScanner::scanCharData() from libxerces-c1_7_0.so
>#3 0x401ce4ea in XMLScanner::sendCharData() from libxerces-c1_7_0.so
>#2 0x4018dcf8 in SAXParser::doCharacters() from libxerces-c1_7_0.so
>#1 0x0805503f in MyEventSaxHandler::characters() at bastring.h:343
>#0 0x0804fc84 in MyEvent::addArgument() at straits.h:125
>
>Any idea ?
>
>Christelle
>********** PROTEGEZ VOS E-MAILS !********** Avec Tiscali SuperMail, vos 
>e-mails en toute sécurité ! Anti Spam personnalisable Anti Virus actualisé 
>en permanence et de nombreux bonus... Pour en savoir plus, rendez-vous sur 
>http://www.tiscali.fr/supermail/
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
>For additional commands, e-mail: xerces-c-dev-help@xml.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org