You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-dev@xerces.apache.org by Brad Settlemyer <bw...@strongholdtech.com> on 2002/10/28 21:11:59 UTC

Character encodings and utf-8, XMLString::transpose()

Hello,

  I apologize for being woefully uninformed on this topic, but I'm confused
by how the SAX2 character function and transpose are intermingling.

I interact with a system that returned me a document wiht the following
element:

<TradeInModel>3.2TL&#xfd;MDX</TradeInModel>

Apparenently, the character function is called with the following inputs
1)  3.2TL
2)  &#xfd;
3)  MDX

No real suprises, however whenever I transpose &#xfd I receive the null
character (I think, I'm still validating all of this and trying to figure
out exactly what is going on).  The 254th (0xfd) character in iso 8859-1 is
a y with a accent mark over top, so this really surprised me.  Why am I
receiving a null character from transpose while parsing the above element?

Thanks,

--
Brad Settlemyer
bws@strongholdtech.com
(703) 547-0142


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


RE: Character encodings and utf-8, XMLString::transpose()

Posted by Juergen Hermann <jh...@web.de>.
On Mon, 28 Oct 2002 15:43:41 -0500, Brad Settlemyer wrote:

>On linux, my locale is set to en_US, which should correspond to iso 8859-1.

Make that explicit by using   LANG=en_US.ISO-8859-1



Ciao, Jürgen

--
Jürgen Hermann, Developer
WEB.DE AG, http://webde-ag.de/



---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


RE: Character encodings and utf-8, XMLString::transpose()

Posted by Brad Settlemyer <bw...@strongholdtech.com>.
Yes, transcode, sorry about that I got transpose on the brain somehow.

On linux, my locale is set to en_US, which should correspond to iso 8859-1.
So I expected to receive the 0xFD 'th character of iso8559-1 (its kinda like
a y' according to my documentation, but trying to find clear documentation
without actually reading the iso standards has been diffiuclt thus far [I
will read the standards soon, but I'm going to have to hack this first
attempt I'm afraid :( ]).

According to what you're saying, that should actually work with the standard
XMLString::transcode() I would think.  Certainly I will sit down and try to
get the TransService and TransCoder figured out for iso-8859-1, but still I
find this behavior surprising.  Why would the null string be returned as
well, the documentation I have for 1.7.0 makes no mention of null for
XMLString::transcode, is that an indication that my locale is screwed up?

Thanks for your fast response earlier, I really have gotten myself against
it here,
Brad


-----Original Message-----
From: Dean Roddey [mailto:droddey@charmedquark.com]
Sent: Monday, October 28, 2002 3:22 PM
To: xerces-c-dev@xml.apache.org
Subject: RE: Character encodings and utf-8, XMLString::transpose()


Do you mean transcode()? Transcode() works in the local code page, so
what is your local code page? Is it 8859-1? Even if it is, if you want
this to work on other machines, you should get an explicit trancoder for
the encoding you want to get the text into. Otherwise, you will get
different results on different machines. Transcode() was mainly intended
for display purposes, and for taking input from the user, which needs to
be done in the local code page, and if something cannot be represented
then it just can't be represented. For anything where you want to get it
to a particular encoding, you have to do that through a transcoder for
the encoding you want.

-------------------------------------
Dean Roddey
The Charmed Quark Controller
droddey@charmedquark.com
www.charmedquark.com


-----Original Message-----
From: Brad Settlemyer [mailto:bws@strongholdtech.com]
Sent: Monday, October 28, 2002 12:12 PM
To: [List] Xerces-C List
Subject: Character encodings and utf-8, XMLString::transpose()

Hello,

  I apologize for being woefully uninformed on this topic, but I'm
confused
by how the SAX2 character function and transpose are intermingling.

I interact with a system that returned me a document wiht the following
element:

<TradeInModel>3.2TL&#xfd;MDX</TradeInModel>

Apparenently, the character function is called with the following inputs
1)  3.2TL
2)  &#xfd;
3)  MDX

No real suprises, however whenever I transpose &#xfd I receive the null
character (I think, I'm still validating all of this and trying to
figure
out exactly what is going on).  The 254th (0xfd) character in iso 8859-1
is
a y with a accent mark over top, so this really surprised me.  Why am I
receiving a null character from transpose while parsing the above
element?

Thanks,

--
Brad Settlemyer
bws@strongholdtech.com
(703) 547-0142


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


RE: Character encodings and utf-8, XMLString::transpose()

Posted by Dean Roddey <dr...@charmedquark.com>.
Do you mean transcode()? Transcode() works in the local code page, so
what is your local code page? Is it 8859-1? Even if it is, if you want
this to work on other machines, you should get an explicit trancoder for
the encoding you want to get the text into. Otherwise, you will get
different results on different machines. Transcode() was mainly intended
for display purposes, and for taking input from the user, which needs to
be done in the local code page, and if something cannot be represented
then it just can't be represented. For anything where you want to get it
to a particular encoding, you have to do that through a transcoder for
the encoding you want. 

-------------------------------------
Dean Roddey
The Charmed Quark Controller
droddey@charmedquark.com
www.charmedquark.com
 

-----Original Message-----
From: Brad Settlemyer [mailto:bws@strongholdtech.com] 
Sent: Monday, October 28, 2002 12:12 PM
To: [List] Xerces-C List
Subject: Character encodings and utf-8, XMLString::transpose()

Hello,

  I apologize for being woefully uninformed on this topic, but I'm
confused
by how the SAX2 character function and transpose are intermingling.

I interact with a system that returned me a document wiht the following
element:

<TradeInModel>3.2TL&#xfd;MDX</TradeInModel>

Apparenently, the character function is called with the following inputs
1)  3.2TL
2)  &#xfd;
3)  MDX

No real suprises, however whenever I transpose &#xfd I receive the null
character (I think, I'm still validating all of this and trying to
figure
out exactly what is going on).  The 254th (0xfd) character in iso 8859-1
is
a y with a accent mark over top, so this really surprised me.  Why am I
receiving a null character from transpose while parsing the above
element?

Thanks,

--
Brad Settlemyer
bws@strongholdtech.com
(703) 547-0142


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org