You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-dev@xerces.apache.org by Jeremy Sheeley <je...@sourcegear.com> on 2001/11/30 21:56:28 UTC

Transcoding ISO-8859-1 (Latin1) help needed

I have an XML document (WML, actually), that has this string in it:

Fran?ais

Note the C is the squigly c that is not in ascii.  It's in Latin1, and it's
represented in hex by E7.  I know this because I did an hexdump on the file,
and that was the byte where the character is.

When I parse this with UnRep_Throw set, I get this exception.

Fatal Error: An exception occured! Type:TranscodingException,
Message:Unicode char 0x6829 is not representable in encoding

When I parse it with UnRep_RepChar, I get back "Frans".  I know that it's
just eating the two characters after the strange one, because I tried "Fran?
ais" and got "Franais".

Since it's WML, I tried specifying the character as "Fran&#xE7;ais", which
worked great.  I can't gaurantee that no content provider is going not going
to put Latin1 characters in their content, so what can I do to make sure
that the transcoder can represent the strange character.

I'm creating my transcoder like this:
  transcoder =
XMLPlatformUtils::fgTransService->makeNewTranscoderFor("ISO-8859-1",
resCode, 8192);

and calling the transcodeTo method like this:

  transcoder->transcodeTo(toTranscode, (unsigned
int)XMLString::stringLen(toTranscode), bufToFill, 8192, bytesEaten,
unRepOpts);

Thanks for any help you can give me.


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org