You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-dev@xerces.apache.org by Matthias Niggemeier <M...@thias.de> on 2004/12/08 22:13:31 UTC

Multibyte characters

Hi there!
I am writing xml files which contain physical units.
Some units contain the greek mucro.
Xerces writes it as two characters, I remember thats
because it cannot be displayed as UTF-8.

My customer claims that this is wrong. I remember a 
webpage dealing with this theme, but i cannot find it.
Can somebody give me a hint?

Regards

Matthias

Re: Multibyte characters

Posted by Alberto Massari <am...@progress.com>.
Hi Matthias,
it's difficult to understand who is wrong, without seeing the XML file. 
Could you post it?

Alberto

At 22.13 08/12/2004 +0100, Matthias Niggemeier wrote:
>Hi there!
>I am writing xml files which contain physical units.
>Some units contain the greek mucro.
>Xerces writes it as two characters, I remember thats
>because it cannot be displayed as UTF-8.
>
>My customer claims that this is wrong. I remember a
>webpage dealing with this theme, but i cannot find it.
>Can somebody give me a hint?
>
>Regards
>
>Matthias



---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


RE: Multibyte characters

Posted by Dean Roddey <dr...@charmedquark.com>.
"However, using UTF-8 is a much better option, and your customer can simply
get an 
editor that can display text encoded in UTF-8."

Or more likely to just import it into a local code page that supports those
characters.

-------------------------------------
Dean Roddey
Chairman/CTO, Charmed Quark Systems
droddey@charmedquark.com
www.charmedquark.com
 



---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


RE: Multibyte characters

Posted by Matthias Niggemeier <M...@thias.de>.
Hi David,
the character in question ist B5, it "mucro" should be "micro".
This character is written as C2B5, which ist correct in my opinion.
How can I change the encoding to ISO-8859-1?

Thanks in advance

Matthias


> -----Original Message-----
> From: david_n_bertoni@us.ibm.com [mailto:david_n_bertoni@us.ibm.com] 
> Sent: Wednesday, December 08, 2004 10:44 PM
> To: xerces-c-dev@xml.apache.org
> Subject: Re: Multibyte characters
> 
> 
> > I am writing xml files which contain physical units.
> > Some units contain the greek mucro.
> 
> Do you mean U+03BC which is "GREEK SMALL LETTER MU" or 
> U+00B5, which is 
> "MICRO SIGN"?  There is no Greek "mucro" character.
> 
> > Xerces writes it as two characters, I remember thats
> > because it cannot be displayed as UTF-8.
> 
> Well, if you are using UTF-8 as the encoding, then yes, that 
> character 
> requires two bytes (not characters).  I'm not sure what you 
> mean by "thats 
> because it cannot be displayed as UTF-8."
> 
> > My customer claims that this is wrong.
> 
> Perhaps your customer is expecting you to use an encoding 
> that represents 
> that character in one byte.  If the character in question is 
> U+00B5, you 
> can use ISO-8859-1.  If it's U+03BC, you can use ISO-8859-7.  
> However, 
> using UTF-8 is a much better option, and your customer can 
> simply get an 
> editor that can display text encoded in UTF-8.
> 
> Dave
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-c-dev-help@xml.apache.org
> 

Re: Multibyte characters

Posted by da...@us.ibm.com.
> I am writing xml files which contain physical units.
> Some units contain the greek mucro.

Do you mean U+03BC which is "GREEK SMALL LETTER MU" or U+00B5, which is 
"MICRO SIGN"?  There is no Greek "mucro" character.

> Xerces writes it as two characters, I remember thats
> because it cannot be displayed as UTF-8.

Well, if you are using UTF-8 as the encoding, then yes, that character 
requires two bytes (not characters).  I'm not sure what you mean by "thats 
because it cannot be displayed as UTF-8."

> My customer claims that this is wrong.

Perhaps your customer is expecting you to use an encoding that represents 
that character in one byte.  If the character in question is U+00B5, you 
can use ISO-8859-1.  If it's U+03BC, you can use ISO-8859-7.  However, 
using UTF-8 is a much better option, and your customer can simply get an 
editor that can display text encoded in UTF-8.

Dave

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org