You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@qpid.apache.org by Kim van der Riet <ki...@redhat.com> on 2006/12/20 16:40:55 UTC

XML longstr mapping

I am working on integrating the new code generator into the Java
implementation.

I notice that in the old XSL-based generator, longstr is mapped to java
type byte[] while shortstr is mapped to String. In the new generator,
both shortstr and longstr are mapped to String. I also notice that in
the 0-8 XML, it is security-related challenge/response fields and
connection.start mechanisms/locales that use the longstr type.

Is it correct to keep the mapping in the new generator of longstr to
String, or should it be kept as byte[]? I had anticipated that longstr
may find wider usage besides security tokens.

I had thought we had discussed this and agreed on the mapping, but I
want to make sure.

Kim


Re: XML longstr mapping

Posted by Martin Ritchie <ri...@apache.org>.
On 20/12/06, Kim van der Riet <ki...@redhat.com> wrote:
> On Wed, 2006-12-20 at 15:48 +0000, Martin Ritchie wrote:
> > A longstr needs to be capable of handling 2-byte characters while the
> > shorstr only deals with ASCII values. I thought String was an ASCII
> > string only if that is the case then longstr will need to stay as a
> > byte[].
> I had thought that String does inherently handle 2-byte characters - it
> depends on the codeset/locale used. Fundamentally, String is composed of
> 2-byte char elements, is it not?

Sorry Kim I was thinking of the encoding on the wire rather than the
other way around.

> >From Strings javadoc:
> A String represents a string in the UTF-16 format in which supplementary
> characters are represented by surrogate pairs (see the section Unicode
> Character Representations in the Character class for more information).
> Index values refer to char code units, so a supplementary character uses
> two positions in a String.
>
> Kim


-- 
Martin Ritchie

Re: XML longstr mapping

Posted by Kim van der Riet <ki...@redhat.com>.
On Wed, 2006-12-20 at 15:48 +0000, Martin Ritchie wrote:
> A longstr needs to be capable of handling 2-byte characters while the
> shorstr only deals with ASCII values. I thought String was an ASCII
> string only if that is the case then longstr will need to stay as a
> byte[].
I had thought that String does inherently handle 2-byte characters - it
depends on the codeset/locale used. Fundamentally, String is composed of
2-byte char elements, is it not?

>>From Strings javadoc:
A String represents a string in the UTF-16 format in which supplementary
characters are represented by surrogate pairs (see the section Unicode
Character Representations in the Character class for more information).
Index values refer to char code units, so a supplementary character uses
two positions in a String.

Kim


Re: XML longstr mapping

Posted by Martin Ritchie <ri...@apache.org>.
A longstr needs to be capable of handling 2-byte characters while the
shorstr only deals with ASCII values. I thought String was an ASCII
string only if that is the case then longstr will need to stay as a
byte[].

On 20/12/06, Kim van der Riet <ki...@redhat.com> wrote:
> I am working on integrating the new code generator into the Java
> implementation.
>
> I notice that in the old XSL-based generator, longstr is mapped to java
> type byte[] while shortstr is mapped to String. In the new generator,
> both shortstr and longstr are mapped to String. I also notice that in
> the 0-8 XML, it is security-related challenge/response fields and
> connection.start mechanisms/locales that use the longstr type.
>
> Is it correct to keep the mapping in the new generator of longstr to
> String, or should it be kept as byte[]? I had anticipated that longstr
> may find wider usage besides security tokens.
>
> I had thought we had discussed this and agreed on the mapping, but I
> want to make sure.
>
> Kim
>
>


-- 
Martin Ritchie

Re: XML longstr mapping

Posted by Carl Trieloff <cc...@redhat.com>.
Alan Conway wrote:
> On Wed, 2006-12-20 at 19:14 +0000, Robert Greig wrote:
>   
>> On 20/12/06, Kim van der Riet <ki...@redhat.com> wrote:
>>     
>>> Ok, will do - byte[] it is.
>>>
>>> Perhaps we should change the term "longstr" in the spec to "binary" or
>>> something similar. It would be less confusing.
>>>       
>> I fully agree. I tried arguing this point in the past without any
>> success. I think the argument was "C programmers think of strings as
>> byte arrays"...
>>     
>
>
> +1, longstr is misleading and it is entirely unfair to blame C
> programmers! The type in question is a length-prefixed byte array. There
> are no guarantees in the spec about being able to treat it as any type
> of string. 
>
> I did a quick search and couldn't find a formal definition for longstr
> in the spec - I must be blind, where is it?
>
> Cheers,
> Alan.
>
>   


Just checked, there is no doc - so we should take to the working group 
to fix. I also agree we
should rename it. If we are in agreement we should cross post it to the 
AMQP working group
list.


RE: XML longstr mapping

Posted by Alan Conway <ac...@redhat.com>.
On Wed, 2007-01-03 at 17:42 -0500, Tomas Restrepo wrote:
> I'm guessing it's Section 4.2.5.3 "Strings". I also agree the term longstr
> is misleading, but the spec does talk about "short and long strings".
> 
> Actually, the spec is far more misleading, because it explicitly says that
> "short strings" are UTF-8 encoded (i.e. text), while saying that "Long
> Strings" are just a length-prefixed array of octets with no requirements at
> all about the content (so they can carry arbitrary binary data, I guess).

I think the intent and the actual implementations are that:
 - shortstr is a short (<256) UTF8 string.
 - longstr is a binary blob.
 
There is no other string type (e.g. UTF16) although clearly you can
store anything in a longstr, including strings encoded any way you like.

So I think we need to:
 - rename longstr as "binary"
 - change all spec wording (including section 4.2.5.3) so that the
binary type is not referred to as a "string"

Cheers,
Alan.


RE: XML longstr mapping

Posted by Tomas Restrepo <to...@devdeo.com>.
Alan,
 
> +1, longstr is misleading and it is entirely unfair to blame C
> programmers! The type in question is a length-prefixed byte array. There
> are no guarantees in the spec about being able to treat it as any type
> of string.
> 
> I did a quick search and couldn't find a formal definition for longstr
> in the spec - I must be blind, where is it?

I'm guessing it's Section 4.2.5.3 "Strings". I also agree the term longstr
is misleading, but the spec does talk about "short and long strings".

Actually, the spec is far more misleading, because it explicitly says that
"short strings" are UTF-8 encoded (i.e. text), while saying that "Long
Strings" are just a length-prefixed array of octets with no requirements at
all about the content (so they can carry arbitrary binary data, I guess).


Tomas Restrepo
tomas.restrepo@devdeo.com
http://www.winterdom.com/weblog/





Re: XML longstr mapping

Posted by Alan Conway <ac...@redhat.com>.
On Wed, 2006-12-20 at 19:14 +0000, Robert Greig wrote:
> On 20/12/06, Kim van der Riet <ki...@redhat.com> wrote:
> > Ok, will do - byte[] it is.
> >
> > Perhaps we should change the term "longstr" in the spec to "binary" or
> > something similar. It would be less confusing.
> 
> I fully agree. I tried arguing this point in the past without any
> success. I think the argument was "C programmers think of strings as
> byte arrays"...


+1, longstr is misleading and it is entirely unfair to blame C
programmers! The type in question is a length-prefixed byte array. There
are no guarantees in the spec about being able to treat it as any type
of string. 

I did a quick search and couldn't find a formal definition for longstr
in the spec - I must be blind, where is it?

Cheers,
Alan.


Re: XML longstr mapping

Posted by Robert Greig <ro...@gmail.com>.
On 20/12/06, Kim van der Riet <ki...@redhat.com> wrote:
> Ok, will do - byte[] it is.
>
> Perhaps we should change the term "longstr" in the spec to "binary" or
> something similar. It would be less confusing.

I fully agree. I tried arguing this point in the past without any
success. I think the argument was "C programmers think of strings as
byte arrays"...

RG

Re: XML longstr mapping

Posted by Kim van der Riet <ki...@redhat.com>.
Ok, will do - byte[] it is.

Perhaps we should change the term "longstr" in the spec to "binary" or
something similar. It would be less confusing.

Kim

On Wed, 2006-12-20 at 19:08 +0000, Robert Greig wrote:
> On 20/12/06, Kim van der Riet <ki...@redhat.com> wrote:
> 
> > If we keep String, then
> > String.getBytes() produces byte[], and
> > new String(byte[]) gets a String.
> >
> > Will this work for security tokens? I am uncertain of the integrity of
> > this conversion (but a test will soon prove it).
> 
> String.getBytes() decodes using the platform default encoding, which
> will probably work for most platforms but would almost certainly break
> if there is a platform with UTF-8 as the default encoding (although I
> am not aware of any).
> 
> > Keeping String will open up general long strings > 256 chars as type
> > String, or *must* we keep this byte[]? Your call. I *thought* we had
> > gone over these types early in the project... but I can't find it.
> 
> Do we have any cases where we need to send a true "long string"? I
> think having it as a byte array makes sense for the current cases and
> I think it was intended to be a byte array, despite the spec's odd
> choice of name.
> 
> RG


Re: XML longstr mapping

Posted by Robert Greig <ro...@gmail.com>.
On 20/12/06, Kim van der Riet <ki...@redhat.com> wrote:

> If we keep String, then
> String.getBytes() produces byte[], and
> new String(byte[]) gets a String.
>
> Will this work for security tokens? I am uncertain of the integrity of
> this conversion (but a test will soon prove it).

String.getBytes() decodes using the platform default encoding, which
will probably work for most platforms but would almost certainly break
if there is a platform with UTF-8 as the default encoding (although I
am not aware of any).

> Keeping String will open up general long strings > 256 chars as type
> String, or *must* we keep this byte[]? Your call. I *thought* we had
> gone over these types early in the project... but I can't find it.

Do we have any cases where we need to send a true "long string"? I
think having it as a byte array makes sense for the current cases and
I think it was intended to be a byte array, despite the spec's odd
choice of name.

RG

Re: XML longstr mapping

Posted by Kim van der Riet <ki...@redhat.com>.
On Wed, 2006-12-20 at 18:19 +0000, Robert Greig wrote:
> On 20/12/06, Kim van der Riet <ki...@redhat.com> wrote:
> 
> > Is it correct to keep the mapping in the new generator of longstr to
> > String, or should it be kept as byte[]? I had anticipated that longstr
> > may find wider usage besides security tokens.
> 
> We need to be able to transfer a byte[] for the security negotiation,
> i.e it is not a String that is being sent.

If we keep String, then
String.getBytes() produces byte[], and
new String(byte[]) gets a String.

Will this work for security tokens? I am uncertain of the integrity of
this conversion (but a test will soon prove it).

Keeping String will open up general long strings > 256 chars as type
String, or *must* we keep this byte[]? Your call. I *thought* we had
gone over these types early in the project... but I can't find it.

I see the spec says: "Long strings, used to hold chunks of binary data".

> 
> FWIW I think the term "longstr" used in the protocol spec is a poor one.
> 
> RG

Thanks,
Kim


Re: XML longstr mapping

Posted by Robert Greig <ro...@gmail.com>.
On 20/12/06, Kim van der Riet <ki...@redhat.com> wrote:

> Is it correct to keep the mapping in the new generator of longstr to
> String, or should it be kept as byte[]? I had anticipated that longstr
> may find wider usage besides security tokens.

We need to be able to transfer a byte[] for the security negotiation,
i.e it is not a String that is being sent.

FWIW I think the term "longstr" used in the protocol spec is a poor one.

RG