You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-dev@xerces.apache.org by Matthias Niggemeier <M...@thias.de> on 2004/12/08 22:13:31 UTC
Multibyte characters
Hi there!
I am writing xml files which contain physical units.
Some units contain the greek mucro.
Xerces writes it as two characters, I remember thats
because it cannot be displayed as UTF-8.
My customer claims that this is wrong. I remember a
webpage dealing with this theme, but i cannot find it.
Can somebody give me a hint?
Regards
Matthias
Re: Multibyte characters
Posted by Alberto Massari <am...@progress.com>.
Hi Matthias,
it's difficult to understand who is wrong, without seeing the XML file.
Could you post it?
Alberto
At 22.13 08/12/2004 +0100, Matthias Niggemeier wrote:
>Hi there!
>I am writing xml files which contain physical units.
>Some units contain the greek mucro.
>Xerces writes it as two characters, I remember thats
>because it cannot be displayed as UTF-8.
>
>My customer claims that this is wrong. I remember a
>webpage dealing with this theme, but i cannot find it.
>Can somebody give me a hint?
>
>Regards
>
>Matthias
---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org
RE: Multibyte characters
Posted by Dean Roddey <dr...@charmedquark.com>.
"However, using UTF-8 is a much better option, and your customer can simply
get an
editor that can display text encoded in UTF-8."
Or more likely to just import it into a local code page that supports those
characters.
-------------------------------------
Dean Roddey
Chairman/CTO, Charmed Quark Systems
droddey@charmedquark.com
www.charmedquark.com
---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org
RE: Multibyte characters
Posted by Matthias Niggemeier <M...@thias.de>.
Hi David,
the character in question ist B5, it "mucro" should be "micro".
This character is written as C2B5, which ist correct in my opinion.
How can I change the encoding to ISO-8859-1?
Thanks in advance
Matthias
> -----Original Message-----
> From: david_n_bertoni@us.ibm.com [mailto:david_n_bertoni@us.ibm.com]
> Sent: Wednesday, December 08, 2004 10:44 PM
> To: xerces-c-dev@xml.apache.org
> Subject: Re: Multibyte characters
>
>
> > I am writing xml files which contain physical units.
> > Some units contain the greek mucro.
>
> Do you mean U+03BC which is "GREEK SMALL LETTER MU" or
> U+00B5, which is
> "MICRO SIGN"? There is no Greek "mucro" character.
>
> > Xerces writes it as two characters, I remember thats
> > because it cannot be displayed as UTF-8.
>
> Well, if you are using UTF-8 as the encoding, then yes, that
> character
> requires two bytes (not characters). I'm not sure what you
> mean by "thats
> because it cannot be displayed as UTF-8."
>
> > My customer claims that this is wrong.
>
> Perhaps your customer is expecting you to use an encoding
> that represents
> that character in one byte. If the character in question is
> U+00B5, you
> can use ISO-8859-1. If it's U+03BC, you can use ISO-8859-7.
> However,
> using UTF-8 is a much better option, and your customer can
> simply get an
> editor that can display text encoded in UTF-8.
>
> Dave
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-c-dev-help@xml.apache.org
>
Re: Multibyte characters
Posted by da...@us.ibm.com.
> I am writing xml files which contain physical units.
> Some units contain the greek mucro.
Do you mean U+03BC which is "GREEK SMALL LETTER MU" or U+00B5, which is
"MICRO SIGN"? There is no Greek "mucro" character.
> Xerces writes it as two characters, I remember thats
> because it cannot be displayed as UTF-8.
Well, if you are using UTF-8 as the encoding, then yes, that character
requires two bytes (not characters). I'm not sure what you mean by "thats
because it cannot be displayed as UTF-8."
> My customer claims that this is wrong.
Perhaps your customer is expecting you to use an encoding that represents
that character in one byte. If the character in question is U+00B5, you
can use ISO-8859-1. If it's U+03BC, you can use ISO-8859-7. However,
using UTF-8 is a much better option, and your customer can simply get an
editor that can display text encoded in UTF-8.
Dave
---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org