You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jaxme-dev@ws.apache.org by de...@xsoftware.biz on 2005/09/20 21:04:45 UTC
encoding: how does this work
I am wondering how encoding works in JaxME(and in general)
Say I have a header bytes. Now, I happen to know those bytes are encoded
in utf-16 and so I can parse the header to the following.....
<?xml version="1.0" encoding="utf-16"?>
1. Isn't this a catch 22? I can only read this into a string if I know
ahead of time that it is utf-16(in which case, specifying encoding
attribute is useless)?
2. OR is the header always in ascii and then when I get to the encoding, I
know to read in everything else as utf-16?
3. If this is the case though, what happens in the xml documents that are
in files. They all have this catch-22 problem. I mean, i can open the
file in ascii and then save it as utf-8 or utf-16. I can't save the
header as ascii and the rest as utf-16.
How does encoding work with JaxME?
thanks,
dean
---------------------------------------------------------------------
To unsubscribe, e-mail: jaxme-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: jaxme-dev-help@ws.apache.org
Re: encoding: how does this work
Posted by Jochen Wiedmann <jo...@gmail.com>.
Dean Hiller wrote:
> "in the above cases".....but isn't that a catch 22. How can you read
> what encoding it is, if you don't know what encoding it is?
I never wrote an XML parser, but I would assume that the basic rules
are: Read the first four or so bytes to identify an encoding family
(upwards compatible to US-ASCII, EBCDIC, UTF-16, ...). These first bytes
are also sufficient to detect whether an XML declaration is present.
(Always the case for EBCDIC, UTF-16, ...)
If an XML declaration is present, continue to read the declaration,
including the optional "encoding" attribute, which specifies the family
member.
Jochen
---------------------------------------------------------------------
To unsubscribe, e-mail: jaxme-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: jaxme-dev-help@ws.apache.org
Re: encoding: how does this work
Posted by Dean Hiller <de...@xsoftware.biz>.
on vacation...no problem
"in the above cases".....but isn't that a catch 22. How can you read
what encoding it is, if you don't know what encoding it is?
thanks,
dean
Jochen Wiedmann wrote:
>
> Hi, Dean,
>
> sorry for replying late to your various mails. I've been in vacation
> until today. Trying to work up my old mails. Expect me to reply to
> everything until tomorrow.
>
> Dean Hiller wrote:
>
>> so what is the point of the xml spec having the encoding attribute if
>> you have to figure out the encoding before you even get to the
>> specified encoding.
>
>
> Detecting the encoding from the first bytes is not always possible.
> There are a real lot of encodings, which are upwards compatible to
> ASCII in the range of 0..127, for example US-ASCII itself (obviously),
> ISO-8859-1, and UTF-8.
>
> In the above cases, the encoding is required.
>
>
> Jochen
>
---------------------------------------------------------------------
To unsubscribe, e-mail: jaxme-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: jaxme-dev-help@ws.apache.org
Re: encoding: how does this work
Posted by Jochen Wiedmann <jo...@gmail.com>.
Hi, Dean,
sorry for replying late to your various mails. I've been in vacation
until today. Trying to work up my old mails. Expect me to reply to
everything until tomorrow.
Dean Hiller wrote:
> so what is the point of the xml spec having the encoding attribute if
> you have to figure out the encoding before you even get to the specified
> encoding.
Detecting the encoding from the first bytes is not always possible.
There are a real lot of encodings, which are upwards compatible to ASCII
in the range of 0..127, for example US-ASCII itself (obviously),
ISO-8859-1, and UTF-8.
In the above cases, the encoding is required.
Jochen
---------------------------------------------------------------------
To unsubscribe, e-mail: jaxme-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: jaxme-dev-help@ws.apache.org
Re: encoding: how does this work
Posted by Dean Hiller <de...@xsoftware.biz>.
so what is the point of the xml spec having the encoding attribute if
you have to figure out the encoding before you even get to the specified
encoding. I mean, there are hundreds of encodings in use, aren't
there? GB-2312(I use this for chinese), Big5, etc. etc. The list goes
on an on. I know java supports many of those encodings, but can they
all be told apart just by the "<" character, or do some encodings
represent that character exactly the same(I would suspect a few might).
Who would know why this encoding attribute exists in the header?
thanks,
dean
Jochen Wiedmann wrote:
> dean@xsoftware.biz wrote:
>
>> I am wondering how encoding works in JaxME(and in general)
>>
>> Say I have a header bytes. Now, I happen to know those bytes are
>> encoded
>> in utf-16 and so I can parse the header to the following.....
>>
>> <?xml version="1.0" encoding="utf-16"?>
>>
>> 1. Isn't this a catch 22? I can only read this into a string if I know
>> ahead of time that it is utf-16(in which case, specifying encoding
>> attribute is useless)?
>
>
> If it is utf-16 indeed, then the "<" is encoded as two bytes, which
> can be clearly distinguished from the two bytes "<?", which you
> would see in UTF-8, or ASCII.
>
>
>> 2. OR is the header always in ascii and then when I get to the
>> encoding, I
>> know to read in everything else as utf-16?
>
>
> No, the header is in the same encoding.
>
>
>> How does encoding work with JaxME?
>
>
> Just in the same way than for any other XML parser/creator.
>
>
> Jochen
---------------------------------------------------------------------
To unsubscribe, e-mail: jaxme-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: jaxme-dev-help@ws.apache.org
Re: encoding: how does this work
Posted by Jochen Wiedmann <jo...@gmail.com>.
dean@xsoftware.biz wrote:
> I am wondering how encoding works in JaxME(and in general)
>
> Say I have a header bytes. Now, I happen to know those bytes are encoded
> in utf-16 and so I can parse the header to the following.....
>
> <?xml version="1.0" encoding="utf-16"?>
>
> 1. Isn't this a catch 22? I can only read this into a string if I know
> ahead of time that it is utf-16(in which case, specifying encoding
> attribute is useless)?
If it is utf-16 indeed, then the "<" is encoded as two bytes, which can
be clearly distinguished from the two bytes "<?", which you would see
in UTF-8, or ASCII.
> 2. OR is the header always in ascii and then when I get to the encoding, I
> know to read in everything else as utf-16?
No, the header is in the same encoding.
> How does encoding work with JaxME?
Just in the same way than for any other XML parser/creator.
Jochen
---------------------------------------------------------------------
To unsubscribe, e-mail: jaxme-dev-unsubscribe@ws.apache.org
For additional commands, e-mail: jaxme-dev-help@ws.apache.org