You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-users@xerces.apache.org by Umesh Chandak <um...@gslab.com> on 2008/11/05 15:50:58 UTC

XMLString::transcode() not working properly

Hi All,
While parsing the XSD, I am using the XML_String::transcode function . 
But this function is not working properly for some UTf-8 chars.

I have a node like
<xsd:attribute name="str00Aٰ" type="xsd:string"/>
This is equivalent to <xsd:attribute name="str00A\xd9\xb0" 
type="xsd:string"/> when we look in some hex editor.

While transcoding the value of attribute name, I am always getting empty 
string instead of getting "str00Aٰ" (actual sequence is like 
"str00A\xd9\xb0")
This is for linux and in case of freebsd my application hangs.

I have attached the sample code , XSD under parsing and the sample 
output of my program.

Can anyone let me know what's the exact problem in my code or schema or 
this is a bug in Xerces.
I am using xerces-2.7.0

Thanks in advance.

Thanks.
Regards,
Umesh




Re: XMLString::transcode() not working properly

Posted by David Bertoni <db...@apache.org>.
Umesh Chandak wrote:
> Hi All,
> While parsing the XSD, I am using the XML_String::transcode function . 
> But this function is not working properly for some UTf-8 chars.
It certainly won't, unless the local code page is UTF-8.

> 
> I have a node like
> <xsd:attribute name="str00Aٰ" type="xsd:string"/>
> This is equivalent to <xsd:attribute name="str00A\xd9\xb0" 
> type="xsd:string"/> when we look in some hex editor.
> 
> While transcoding the value of attribute name, I am always getting empty 
> string instead of getting "str00Aٰ" (actual sequence is like 
> "str00A\xd9\xb0")
> This is for linux and in case of freebsd my application hangs.
> 
> I have attached the sample code , XSD under parsing and the sample 
> output of my program.
> 
> Can anyone let me know what's the exact problem in my code or schema or 
> this is a bug in Xerces.
No, it's a bug in your understanding of what the local code page is. 
The fix is to use a UTF-8 transcoder, instead of the local code page 
transcoder.

For more details, search the archives of the group, since this question 
comes up all the time, and was recently answered.

Dave