You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-dev@xerces.apache.org by "Lele, Parag" <le...@ugs.com> on 2007/09/06 14:17:49 UTC
problem with XMLString::transcode() for special characters
Hello
I am facing a problem with the XMLString::transcode function for special
characters on unix machine.
After I use this function it corrupts the special characters and changes
it hex value.
I have also checked that before using this function the hex value for
all the characters is same on windows and unix
But after using this function it changes the hex value.
A node having text content as "Ale's" gave the hex output ---
Windows
Unix
Hex for character A = 41
Hex for character A = 41
Hex for character l = 6c
Hex for character l = 6c
Hex for character e = 65
Hex for character e = 65
Hex for character ' = ffffff92
Hex for character ' = 1a
Hex for character s = 73
Hex for character s = 73
These special characters can be typed only in Word. Such characters are
also present in my xml file which I am parsing using xerces DOM parser.
Is there a way I can preserve the hex value of that character on unix
machine?
I tried using setencoding("UTF-8") on the parser and also tried using
setlocale( LC_ALL, "C" );
But it did not help.
Regards
Parag
Re: problem with XMLString::transcode() for special characters
Posted by Gareth Reakes <ga...@we7.com>.
Hi,
The problem you are having is that the special characters you see in
word are not in UTF-8. There is an article here that explains the
problem and how to export text from word in UTF8:
http://ljmu.ac.uk/cis/webpublishing/81434.htm
Cheers,
Gareth
Lele, Parag wrote:
> Hello
>
>
>
> I am facing a problem with the XMLString::transcode function for special
> characters on unix machine.
>
> After I use this function it corrupts the special characters and changes
> it hex value.
>
> I have also checked that before using this function the hex value for
> all the characters is same on windows and unix
>
> But after using this function it changes the hex value.
>
>
>
> A node having text content as “Ale’s” gave the hex output ---
>
> *Windows*
>
>
>
> *Unix*
>
> Hex for character A = 41
>
>
>
> Hex for character A = 41
>
> Hex for character l = 6c
>
>
>
> Hex for character l = 6c
>
> Hex for character e = 65
>
>
>
> Hex for character e = 65
>
> Hex for character ’ = ffffff92
>
>
>
> Hex for character ’ = 1a
>
> Hex for character s = 73
>
>
>
> Hex for character s = 73
>
>
>
>
>
> These special characters can be typed only in Word. Such characters are
> also present in my xml file which I am parsing using xerces DOM parser.
>
> Is there a way I can preserve the hex value of that character on unix
> machine?
>
> I tried using setencoding(“UTF-8”) on the parser and also tried using
> setlocale( LC_ALL, "C" );
>
> But it did not help.
>
>
>
> Regards
>
> Parag
>
--
Gareth Reakes, CTO WE7
+44-20-7117-0809 http://www.we7.com
---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org
Re: problem with XMLString::transcode() for special characters
Posted by Alberto Massari <am...@datadirect.com>.
You should not rely on XMLString::transcode to convert data, as it
will use the current locale as source/target encoding, and that's a
good choice only when the data you need to convert is coming from
stdin or going to stdout. In all the other cases, create a transcoder
of the correct encoding and use that.
Alberto
At 17.47 06/09/2007 +0530, Lele, Parag wrote:
>Hello
>
>I am facing a problem with the XMLString::transcode function for
>special characters on unix machine.
>After I use this function it corrupts the special characters and
>changes it hex value.
>I have also checked that before using this function the hex value
>for all the characters is same on windows and unix
>But after using this function it changes the hex value.
>
>A node having text content as "Ale's" gave the hex output ---
>Windows
>Unix
>Hex for character A = 41
>Hex for character A = 41
>Hex for character l = 6c
>Hex for character l = 6c
>Hex for character e = 65
>Hex for character e = 65
>Hex for character ' = ffffff92
>Hex for character ' = 1a
>Hex for character s = 73
>Hex for character s = 73
>
>
>These special characters can be typed only in Word. Such characters
>are also present in my xml file which I am parsing using xerces DOM parser.
>Is there a way I can preserve the hex value of that character on unix machine?
>I tried using setencoding("UTF-8") on the parser and also tried
>using setlocale( LC_ALL, "C" );
>But it did not help.
>
>Regards
>Parag
---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org