You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xerces.apache.org by "Amthauer, Heiner" <He...@t-systems.com> on 2002/12/04 08:30:01 UTC

FAQ, encodingtool and demo required!?

Hi Folks!

Approx. 80% of the last 50 questions (just a guess, not mathematically
correct) have been about enconding and the errors around it. Should'nt there
be a FAQ and demosources, as well as a tool for converting text into some
encoding (e.g. using escape codes)? I would write one, of I'd know anything
about it.

greetings


---------------------------------------------------------------
Dipl. Ing. Heiner Amthauer

T-Systems GEI GmbH

Hausanschrift: Magirusstr. 39/1, 89077 Ulm
Postanschrift: Postfach 20 64, 89010 Ulm
Telefon: +49 ( 731) 9344-4422
Telefax: +49 (731) 9344-4409
Mobil: +49 (1 78) 4269335
E-Mail: heiner.amthauer@t-systems.com
Internet: http://www.t-systems.com



---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org


Re: FAQ, encodingtool and demo required!?

Posted by Joseph Kesselman <ke...@us.ibm.com>.
On Wednesday, 12/04/2002 at 08:30 CET, "Amthauer, Heiner" 
<He...@t-systems.com> wrote:
> a FAQ and demosources

There probably should be, somewhere. But encoding is a general Unicode 
question, and escaping characters not supported by the encoding is a 
general XML question, so there may be quibbles about who should be 
responsible for writing that FAQ and where it should live...

> as well as a tool for converting text into some encoding (e.g. using 
escape codes)

Any decent Unicode support library can do this (read using one encoding, 
write using another) -- but before you can convert text to a specific 
encoding, you need to know what encoding you're converting it from, which 
takes us back to the FAQ.


Conceptually, there really isn't much to the concept of encodings. Unicode 
can represent a HUGE set of characters, basically covering every natural 
human language's character set and with space reserved to support 
"unnatural" ones. (Klingon, anyone?). All the encoding does is say how 
those characters are mapped to and from what actually appears in your data 
stream -- which may involve a varying number of bytes per character, 
and/or may involve mapping a Unicode character number to a different 
character number in that specific encoding.

______________________________________
Joe Kesselman  / IBM Research

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org