You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jaxme-dev@ws.apache.org by Jochen Wiedmann <jo...@gmail.com> on 2005/10/04 23:18:20 UTC

Re: encoding: how does this work

Hi, Dean,

sorry for replying late to your various mails. I've been in vacation 
until today. Trying to work up my old mails. Expect me to reply to 
everything until tomorrow.

Dean Hiller wrote:

> so what is the point of the xml spec having the encoding attribute if 
> you have to figure out the encoding before you even get to the specified 
> encoding.

Detecting the encoding from the first bytes is not always possible. 
There are a real lot of encodings, which are upwards compatible to ASCII 
in the range of 0..127, for example US-ASCII itself (obviously), 
ISO-8859-1, and UTF-8.

In the above cases, the encoding is required.


Jochen


---------------------------------------------------------------------
To unsubscribe, e-mail: jaxme-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: jaxme-dev-help@ws.apache.org


Re: encoding: how does this work

Posted by Jochen Wiedmann <jo...@gmail.com>.
Dean Hiller wrote:

> "in the above cases".....but isn't that a catch 22.  How can you read 
> what encoding it is, if you don't know what encoding it is?

I never wrote an XML parser, but I would assume that the basic rules 
are: Read the first four or so bytes to identify an encoding family 
(upwards compatible to US-ASCII, EBCDIC, UTF-16, ...). These first bytes 
are also sufficient to detect whether an XML declaration is present. 
(Always the case for EBCDIC, UTF-16, ...)

If an XML declaration is present, continue to read the declaration, 
including the optional "encoding" attribute, which specifies the family 
member.


Jochen

---------------------------------------------------------------------
To unsubscribe, e-mail: jaxme-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: jaxme-dev-help@ws.apache.org


Re: encoding: how does this work

Posted by Dean Hiller <de...@xsoftware.biz>.
on vacation...no problem

"in the above cases".....but isn't that a catch 22.  How can you read 
what encoding it is, if you don't know what encoding it is?
thanks,
dean

Jochen Wiedmann wrote:

>
> Hi, Dean,
>
> sorry for replying late to your various mails. I've been in vacation 
> until today. Trying to work up my old mails. Expect me to reply to 
> everything until tomorrow.
>
> Dean Hiller wrote:
>
>> so what is the point of the xml spec having the encoding attribute if 
>> you have to figure out the encoding before you even get to the 
>> specified encoding.
>
>
> Detecting the encoding from the first bytes is not always possible. 
> There are a real lot of encodings, which are upwards compatible to 
> ASCII in the range of 0..127, for example US-ASCII itself (obviously), 
> ISO-8859-1, and UTF-8.
>
> In the above cases, the encoding is required.
>
>
> Jochen
>


---------------------------------------------------------------------
To unsubscribe, e-mail: jaxme-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: jaxme-dev-help@ws.apache.org