You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-dev@xerces.apache.org by Hans Stoessel <hs...@pm-medici.ch> on 2005/10/12 17:15:41 UTC

Transcoding on Mac OS X

Hi

I parse an UTF-8 xml file on Mac OS X. In my C++ application I use the a 
standard string (std::string) to save the content of the tags. Now I have 
problems with characters > 127. If I use XMLString::transcode there are two 
bytes for such a character instead of one byte. But the std::string uses 
only char's (1 byte) for storing the data.

How can I transcode the contents from XMLCh (2 bytes) into the right format 
for my std::string's?

Thanks for any help.

Regards
Hans 




---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org


Re: Transcoding on Mac OS X

Posted by da...@us.ibm.com.
> I parse an UTF-8 xml file on Mac OS X. In my C++ application I use the a 

> standard string (std::string) to save the content of the tags. Now I 
have 
> problems with characters > 127. If I use XMLString::transcode there are 
two 
> bytes for such a character instead of one byte. But the std::string uses 

> only char's (1 byte) for storing the data.

But UTF-8 uses two, three, or four bytes to represent Unicode code points 
above 127, so the behavior you're seeing is expected.  If you require that 
characters must be equal to code units, you cannot use std::string to hold 
UTF-8, or any other multi-byte encoding, for that matter.  However, I'm 
not sure why this is a problem.

> How can I transcode the contents from XMLCh (2 bytes) into the right 
format 
> for my std::string's?

It's not possible, of course, unless you want to use a single byte 
encoding, like ISO-8859-1, and your sure it can represent all of the 
characters in the XML document.  If that's what you want, you'll need to 
create a transcoder for that encoding, and use it instead of 
XMLString::transcode(), which transcodes only to the local code page.

Dave

---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org