You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jaxme-dev@ws.apache.org by de...@xsoftware.biz on 2005/09/20 21:04:45 UTC

encoding: how does this work

I am wondering how encoding works in JaxME(and in general)

Say I have a header bytes.  Now, I happen to know those bytes are encoded
in utf-16 and so I can parse the header to the following.....

<?xml version="1.0" encoding="utf-16"?>

1. Isn't this a catch 22?  I can only read this into a string if I know
ahead of time that it is utf-16(in which case, specifying encoding
attribute is useless)?
2. OR is the header always in ascii and then when I get to the encoding, I
know to read in everything else as utf-16?
3. If this is the case though, what happens in the xml documents that are
in files.  They all have this catch-22 problem.  I mean, i can open the
file in ascii and then save it as utf-8 or utf-16.  I can't save the
header as ascii and the rest as utf-16.

How does encoding work with JaxME?
thanks,
dean


---------------------------------------------------------------------
To unsubscribe, e-mail: jaxme-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: jaxme-dev-help@ws.apache.org


Re: encoding: how does this work

Posted by Jochen Wiedmann <jo...@gmail.com>.
Dean Hiller wrote:

> "in the above cases".....but isn't that a catch 22.  How can you read 
> what encoding it is, if you don't know what encoding it is?

I never wrote an XML parser, but I would assume that the basic rules 
are: Read the first four or so bytes to identify an encoding family 
(upwards compatible to US-ASCII, EBCDIC, UTF-16, ...). These first bytes 
are also sufficient to detect whether an XML declaration is present. 
(Always the case for EBCDIC, UTF-16, ...)

If an XML declaration is present, continue to read the declaration, 
including the optional "encoding" attribute, which specifies the family 
member.


Jochen

---------------------------------------------------------------------
To unsubscribe, e-mail: jaxme-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: jaxme-dev-help@ws.apache.org


Re: encoding: how does this work

Posted by Dean Hiller <de...@xsoftware.biz>.
on vacation...no problem

"in the above cases".....but isn't that a catch 22.  How can you read 
what encoding it is, if you don't know what encoding it is?
thanks,
dean

Jochen Wiedmann wrote:

>
> Hi, Dean,
>
> sorry for replying late to your various mails. I've been in vacation 
> until today. Trying to work up my old mails. Expect me to reply to 
> everything until tomorrow.
>
> Dean Hiller wrote:
>
>> so what is the point of the xml spec having the encoding attribute if 
>> you have to figure out the encoding before you even get to the 
>> specified encoding.
>
>
> Detecting the encoding from the first bytes is not always possible. 
> There are a real lot of encodings, which are upwards compatible to 
> ASCII in the range of 0..127, for example US-ASCII itself (obviously), 
> ISO-8859-1, and UTF-8.
>
> In the above cases, the encoding is required.
>
>
> Jochen
>


---------------------------------------------------------------------
To unsubscribe, e-mail: jaxme-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: jaxme-dev-help@ws.apache.org


Re: encoding: how does this work

Posted by Jochen Wiedmann <jo...@gmail.com>.
Hi, Dean,

sorry for replying late to your various mails. I've been in vacation 
until today. Trying to work up my old mails. Expect me to reply to 
everything until tomorrow.

Dean Hiller wrote:

> so what is the point of the xml spec having the encoding attribute if 
> you have to figure out the encoding before you even get to the specified 
> encoding.

Detecting the encoding from the first bytes is not always possible. 
There are a real lot of encodings, which are upwards compatible to ASCII 
in the range of 0..127, for example US-ASCII itself (obviously), 
ISO-8859-1, and UTF-8.

In the above cases, the encoding is required.


Jochen


---------------------------------------------------------------------
To unsubscribe, e-mail: jaxme-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: jaxme-dev-help@ws.apache.org


Re: encoding: how does this work

Posted by Dean Hiller <de...@xsoftware.biz>.
so what is the point of the xml spec having the encoding attribute if 
you have to figure out the encoding before you even get to the specified 
encoding.  I mean, there are hundreds of encodings in use, aren't 
there?  GB-2312(I use this for chinese), Big5, etc. etc.  The list goes 
on an on.  I know java supports many of those encodings, but can they 
all be told apart just by the "<" character, or do some encodings 
represent that character exactly the same(I would suspect a few might).  
Who would know why this encoding attribute exists in the header?
thanks,
dean

Jochen Wiedmann wrote:

> dean@xsoftware.biz wrote:
>
>> I am wondering how encoding works in JaxME(and in general)
>>
>> Say I have a header bytes.  Now, I happen to know those bytes are 
>> encoded
>> in utf-16 and so I can parse the header to the following.....
>>
>> <?xml version="1.0" encoding="utf-16"?>
>>
>> 1. Isn't this a catch 22?  I can only read this into a string if I know
>> ahead of time that it is utf-16(in which case, specifying encoding
>> attribute is useless)?
>
>
> If it is utf-16 indeed, then the "<" is encoded as two bytes, which 
> can   be clearly distinguished from the two bytes "<?", which you 
> would see in UTF-8, or ASCII.
>
>
>> 2. OR is the header always in ascii and then when I get to the 
>> encoding, I
>> know to read in everything else as utf-16?
>
>
> No, the header is in the same encoding.
>
>
>> How does encoding work with JaxME?
>
>
> Just in the same way than for any other XML parser/creator.
>
>
> Jochen



---------------------------------------------------------------------
To unsubscribe, e-mail: jaxme-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: jaxme-dev-help@ws.apache.org


Re: encoding: how does this work

Posted by Jochen Wiedmann <jo...@gmail.com>.
dean@xsoftware.biz wrote:
> I am wondering how encoding works in JaxME(and in general)
> 
> Say I have a header bytes.  Now, I happen to know those bytes are encoded
> in utf-16 and so I can parse the header to the following.....
> 
> <?xml version="1.0" encoding="utf-16"?>
> 
> 1. Isn't this a catch 22?  I can only read this into a string if I know
> ahead of time that it is utf-16(in which case, specifying encoding
> attribute is useless)?

If it is utf-16 indeed, then the "<" is encoded as two bytes, which can 
   be clearly distinguished from the two bytes "<?", which you would see 
in UTF-8, or ASCII.


> 2. OR is the header always in ascii and then when I get to the encoding, I
> know to read in everything else as utf-16?

No, the header is in the same encoding.


> How does encoding work with JaxME?

Just in the same way than for any other XML parser/creator.


Jochen

---------------------------------------------------------------------
To unsubscribe, e-mail: jaxme-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: jaxme-dev-help@ws.apache.org