You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-dev@xerces.apache.org by "Lele, Parag" <le...@ugs.com> on 2007/09/06 14:17:49 UTC

problem with XMLString::transcode() for special characters

Hello

 

I am facing a problem with the XMLString::transcode function for special
characters on unix machine.

After I use this function it corrupts the special characters and changes
it hex value.

I have also checked that before using this function the hex value for
all the characters is same on windows and unix

But after using this function it changes the hex value.

 

A node having text content as "Ale's" gave the hex  output ---

Windows

Unix

Hex for character A = 41

Hex for character A = 41

Hex for character l = 6c

Hex for character l = 6c

Hex for character e = 65

Hex for character e = 65

Hex for character ' = ffffff92

Hex for character ' = 1a

Hex for character s = 73

Hex for character s = 73

 

 

These special characters can be typed only in Word. Such characters are
also present in my xml file which I am parsing using xerces DOM parser.

Is there a way I can preserve the hex value of that character on unix
machine?

I tried using setencoding("UTF-8") on the parser and also tried using
setlocale( LC_ALL, "C" );

But it did not help.

 

Regards

Parag


Re: problem with XMLString::transcode() for special characters

Posted by Gareth Reakes <ga...@we7.com>.
Hi,

	The problem you are having is that the special characters you see in 
word are not in UTF-8. There is an article here that explains the 
problem and how to export text from word in UTF8:

http://ljmu.ac.uk/cis/webpublishing/81434.htm

Cheers,

Gareth
	

Lele, Parag wrote:
> Hello
> 
>  
> 
> I am facing a problem with the XMLString::transcode function for special 
> characters on unix machine.
> 
> After I use this function it corrupts the special characters and changes 
> it hex value.
> 
> I have also checked that before using this function the hex value for 
> all the characters is same on windows and unix
> 
> But after using this function it changes the hex value.
> 
>  
> 
> A node having text content as “Ale’s” gave the hex  output ---
> 
> *Windows*
> 
> 	
> 
> *Unix*
> 
> Hex for character A = 41
> 
> 	
> 
> Hex for character A = 41
> 
> Hex for character l = 6c
> 
> 	
> 
> Hex for character l = 6c
> 
> Hex for character e = 65
> 
> 	
> 
> Hex for character e = 65
> 
> Hex for character ’ = ffffff92
> 
> 	
> 
> Hex for character ’ = 1a
> 
> Hex for character s = 73
> 
> 	
> 
> Hex for character s = 73
> 
>  
> 
>  
> 
> These special characters can be typed only in Word. Such characters are 
> also present in my xml file which I am parsing using xerces DOM parser.
> 
> Is there a way I can preserve the hex value of that character on unix 
> machine?
> 
> I tried using setencoding(“UTF-8”) on the parser and also tried using 
> setlocale( LC_ALL, "C" );
> 
> But it did not help.
> 
>  
> 
> Regards
> 
> Parag
> 

-- 
Gareth Reakes, CTO                                 WE7
+44-20-7117-0809                    http://www.we7.com

---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org


Re: problem with XMLString::transcode() for special characters

Posted by Alberto Massari <am...@datadirect.com>.
You should not rely on XMLString::transcode to convert data, as it 
will use the current locale as source/target encoding, and that's a 
good choice only when the data you need to convert is coming from 
stdin or going to stdout. In all the other cases, create a transcoder 
of the correct encoding and use that.

Alberto

At 17.47 06/09/2007 +0530, Lele, Parag wrote:
>Hello
>
>I am facing a problem with the XMLString::transcode function for 
>special characters on unix machine.
>After I use this function it corrupts the special characters and 
>changes it hex value.
>I have also checked that before using this function the hex value 
>for all the characters is same on windows and unix
>But after using this function it changes the hex value.
>
>A node having text content as "Ale's" gave the hex  output ---
>Windows
>Unix
>Hex for character A = 41
>Hex for character A = 41
>Hex for character l = 6c
>Hex for character l = 6c
>Hex for character e = 65
>Hex for character e = 65
>Hex for character ' = ffffff92
>Hex for character ' = 1a
>Hex for character s = 73
>Hex for character s = 73
>
>
>These special characters can be typed only in Word. Such characters 
>are also present in my xml file which I am parsing using xerces DOM parser.
>Is there a way I can preserve the hex value of that character on unix machine?
>I tried using setencoding("UTF-8") on the parser and also tried 
>using setlocale( LC_ALL, "C" );
>But it did not help.
>
>Regards
>Parag


---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org