You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@qpid.apache.org by Rajith Attapattu <ra...@gmail.com> on 2007/09/13 19:11:37 UTC

Use of AMQShortString in client side code

I am wondering why we are using AMQShortString indiscriminately all over the
client side code?
There is no performance benefit of using AMQShortString (based on the way it
is used) on the client side and is purely used for encoding.

It makes sense to use it on Broker side as you deal at bytes level and I can
understand the performance benefit of not having convert back and forth into
a String.
On the client side we just merely wrap/unwrap a String using AMQShortString.
Why can't we do that at the encoding/decoding level for the client side ?

I really hate typing more than I need to :)

Regards,

Rajith

Re: Use of AMQShortString in client side code

Posted by Rupert Smith <ru...@googlemail.com>.

Yes, but if you have a new string, that may match an existing token you
still have to match it?

e.g.

Queue queue1 = session.createQueue("test");
Queue queue2 = session.createQueue("test");

need to match "test" against the token set, to see if it has already been
tokenized. (ok in this case the compiler has interned "test", so can look it
up in a hash map, but if "test" were a dynamically generated name it would
be different).

Rupert

On 19/09/2007, Robert Godfrey <ro...@gmail.com> wrote:
>
> The idea of tokenization is to not do String matching.
>
> It is something we may try to do at AMQP...
>
> e.g.
>
> client says
>
> "From now on we should refer to the Short String
> 'very-very-very-very-long-in-fact-ridiculously-long-routing-key-name' as
> Token 1"
>
> Then, whenever it publishes a message with this string as a routing key,
> it
> simply uses the token instead.  This saves both on bytes over the wire,
> but
> can also make routing more efficient inside the broker.
>
> Hope this helps,
> Rob
>
> On 19/09/2007, Rupert Smith <ru...@googlemail.com> wrote:
> >
> > Just for the record, when short string was introduced, there was a
> > performance boost.
> >
> > I like the idea of a string mapping down onto a byte buffer, its more
> like
> > how things would be done in C. That is the incoming data would not be
> > lifted
> > out of its frame, the string would just be a pointer into the frame.
> >
> > Wrt, tokenization. Doing this fast may require some neat string matching
> > algorithms. With Java String, you get what the String class provides you
> > (you could start iterating over the characters but thats not going to be
> > fast). With a string as a byte buffer, there exists the possibility of
> > handing off the cache string matching algorithm to a carefully chosen
> > native
> > string matching algorithm.
> >
> > Quick example. You have a token cache. You want to tokenize a
> potentially
> > new string. So you need to match a single string against a large set of
> > possible candidates. The string is not likely to be very long (its a
> short
> > string). So multi-match a shortish string. The most efficient algorithm
> > for
> > this scenario might involve using bit sets in machine registers, I'm
> > guessing. I'm not sure that we will really need to do this super fast
> > though?
> >
> > On 19/09/2007, John O'Hara <jo...@gmail.com> wrote:
> > >
> > > Its also a space optimisation on the wire for when we cared about
> > that...
> > > for high volume messaging esp with TCP/IP and serial unicast those
> bytes
> > > start to matter.
> > >
> > > As for Tokenising, to support the notion of trivial clients the idea
> was
> > > to
> > > let the client assert which short strings were tokenized; usually
> > relating
> > > to routing keys which could repeat a lot.
> > >
> > > The original argument you made was to do with typing.
> > > You point out that AMQPShortString has benefits for the broker.
> > > It would also make sense to have that symetry in the client.
> > >
> > > There seems to be no compelling reason to change this.
> > > It's late and I'm tired so I won't go on more
> > >
> > > G'd night
> > > John
> > >
> > >
> > > On 19/09/2007, Rafael Schloming <ra...@redhat.com> wrote:
> > > >
> > > > John O'Hara wrote:
> > > > > I agree with Rob that the lower levels of the stack should be
> > > > implemented in
> > > > > AMQPShortString *where it occurs in the protocol* for the
> following
> > > > reasons:
> > > > >
> > > > > 1) It provides the opportunity to validate the semantics; just
> > because
> > > > we're
> > > > > not checking length today doesn't mean we shouldn't
> > > >
> > > > AMQShortString really isn't the appropriate place to validate domain
> > > > level semantics. Different uses of shortstr have different domain
> > level
> > > > constraints. Also, any validation we put in AMQShortString is forced
> > to
> > > > run for every single shortstr field that passes through a broker.
> This
> > > > isn't particularly useful because when decoding fields off the wire,
> > > > such validation is unnecessary as it is already performed by the
> codec
> > > > in a more efficient manner that is specialized to the wire format.
> > > >
> > > > > 2) We may introduce AMQPShortStrong Tokenisation in the protocol
> in
> > > the
> > > > > future (has been discussed often, I think it's quite
> likely).  Doing
> > > > this we
> > > > > can collapse a shortstring to 2 bytes and reduce garbage.
> > > >
> > > > I presume you're referring to some scheme for caching commonly used
> > > > strings? If so this is a decoding optimization that would equally
> well
> > > > apply when decoding directly to Strings, or any other type for that
> > > > matter. In fact such an optimization would likely nullify any
> > > > performance advantage rendered by AMQShortString since
> > decoding/encoding
> > > > of anything would only be necessary when there is a cache miss.
> > > >
> > > > > 3) I'm unsure of the memory ownership semantics but I believe the
> > JMS
> > > > spec
> > > > > explicitly requires a copy of the message to be take to prevent
> grim
> > > > race
> > > > > conditions on message reuse.  Some products have the option to
> turn
> > > this
> > > > > off, but that's not the spec.  It's like not DMA'ing from
> userspace
> > > > without
> > > > > extreme care.
> > > >
> > > > I'm unsure how this relates to the use of AMQShortString. Any such
> > > > copying would happen well past the point where raw types are decoded
> > off
> > > > the wire.
> > > >
> > > > > Also, Rob has said it has been proven to be faster in the past.
> > > > > In the absence of a measured, demonstrable issue why change this
> > > > arguably
> > > > > more correct implementation?
> > > >
> > > > As it stands today AMQShortString is really just an optimization for
> > the
> > > > broker, and one that comes at a pretty high cost to the client. So
> if
> > > > there is a better way to solve the performance issue for the broker
> > > > without encumbering the client, it's certainly worth investigating.
> > > >
> > > > That's why I asked about the original problem being solved. For
> > example
> > > > I'd guess that in the critical path the broker really never needs to
> > > > decode much more than the exchange name and routing key in order to
> > > > deliver a message, so it might be possible to limit the use of
> > > > AMQShortString to just those fields (or decode to specific Exchange
> > and
> > > > RoutingKey classes) and get the necessary performance benefit in the
> > > > broker, with much less impact on the client.
> > > >
> > > > --Rafael
> > > >
> > > > > Cheers
> > > > > John
> > > > >
> > > > >
> > > > > On 19/09/2007, Rafael Schloming <ra...@redhat.com> wrote:
> > > > >> Robert Godfrey wrote:
> > > > >>> On 13/09/2007, Rajith Attapattu <ra...@gmail.com> wrote:
> > > > >>>> I am wondering why we are using AMQShortString indiscriminately
> > all
> > > > >> over
> > > > >>>> the
> > > > >>>> client side code?
> > > > >>>> There is no performance benefit of using AMQShortString (based
> on
> > > the
> > > > >> way
> > > > >>>> it
> > > > >>>> is used) on the client side and is purely used for encoding.
> > > > >>>
> > > > >>>
> > > > >>> Rajith,
> > > > >>>
> > > > >>> as we have discussed before - there *is* a significant
> performance
> > > > >> benefit
> > > > >>> which we have tested and proved previously.
> > > > >> Can you point me to the previous discussion? I'd like to learn
> more
> > > > >> about the original issue.
> > > > >>
> > > > >>    Many short strings are re-used
> > > > >>> frequently within the client library, and by using our own type
> we
> > > can
> > > > >>> exploit this.
> > > > >> Unless we're excessively copying them I don't see how this
> matters.
> > > For
> > > > >> both an AMQShortString and a String we should just be passing
> > around
> > > > >> pointers when they are reused.
> > > > >>
> > > > >>    Further, the domain for many parameters in AMQP is *not* a
> > > > >>> unicode string, but is tightly defined as upto 255 bytes of data
> > > with
> > > > a
> > > > >>> particular encoding.  Java Strings are not the appropriate type
> to
> > > use
> > > > >> for
> > > > >>> this.  Encoding and decoding Java Strings is expensive, and also
> > > prone
> > > > >> to
> > > > >>> error (i.e. you need to make sure that you *always* use the
> > correct
> > > > >> explicit
> > > > >>> encoding).
> > > > >> Despite the name AMQShortString, I don't think the AMQShortString
> > > class
> > > > >> actually represents the AMQP type short-string, for example there
> > is
> > > no
> > > > >> length limit for an AMQShortString. It's really just a generic
> > > > >> implementation of CharSequence that is optimized specifically for
> > > rapid
> > > > >> decoding from a ByteBuffer. From a domain restriction
> perspective,
> > > > using
> > > > >> an ordinary String is just as correct.
> > > > >>
> > > > >>> It makes sense to use it on Broker side as you deal at bytes
> level
> > > and
> > > > I
> > > > >> can
> > > > >>>> understand the performance benefit of not having convert back
> and
> > > > forth
> > > > >>>> into
> > > > >>>> a String.
> > > > >>>
> > > > >>> The low level API should be using correct AMQ domains.  High
> level
> > > > APIs
> > > > >>> (such as JMS) will obviously want to present these parameters as
> > > java
> > > > >>> Strings.
> > > > >>>
> > > > >>>
> > > > >>> On the client side we just merely wrap/unwrap a String using
> > > > >> AMQShortString.
> > > > >>>> Why can't we do that at the encoding/decoding level for the
> > client
> > > > side
> > > > >> ?
> > > > >>>
> > > > >>> In some cases this may be true, but in others certainly
> not.  When
> > > > >>> converting into JMS Destinations on receipt of a message, for
> > > > instance,
> > > > >> one
> > > > >>> never needs to convert to a String... it is *much* faster to
> > simply
> > > > use
> > > > >> the
> > > > >>> correct type of AMQShortString/
> > > > >> Unfortunately using AMQShortString imposes additional overhead
> > > whenever
> > > > >> we need to en/decode to/from an ordinary String. It basically
> > > requires
> > > > >> an additional copy when compared with directly encoding/decoding
> > > > to/from
> > > > >>   a String. As the common case on the client side is dealing with
> > > > >> Strings, I'm not at all convinced that ubiquitous use of
> > > AMQShortString
> > > > >> is a net win for the client.
> > > > >>
> > > > >> I believe what would be optimal is to use the CharSequence
> > interface
> > > > >> everywhere. This way String values passed to us by an application
> > > could
> > > > >> be directly passed all the way down the stack and encoded
> directly
> > > onto
> > > > >> the wire without an additional copy, and incoming data could be
> > > > >> efficiently decoded into a private impl of CharSequence that
> could
> > be
> > > > >> converted to a String on demand.
> > > > >>
> > > > >> --Rafael
> > > > >>
> > > > >
> > > >
> > >
> >
>

Re: Use of AMQShortString in client side code

Posted by Robert Godfrey <ro...@gmail.com>.

The idea of tokenization is to not do String matching.

It is something we may try to do at AMQP...

e.g.

client says

"From now on we should refer to the Short String
'very-very-very-very-long-in-fact-ridiculously-long-routing-key-name' as
Token 1"

Then, whenever it publishes a message with this string as a routing key, it
simply uses the token instead.  This saves both on bytes over the wire, but
can also make routing more efficient inside the broker.

Hope this helps,
Rob

On 19/09/2007, Rupert Smith <ru...@googlemail.com> wrote:
>
> Just for the record, when short string was introduced, there was a
> performance boost.
>
> I like the idea of a string mapping down onto a byte buffer, its more like
> how things would be done in C. That is the incoming data would not be
> lifted
> out of its frame, the string would just be a pointer into the frame.
>
> Wrt, tokenization. Doing this fast may require some neat string matching
> algorithms. With Java String, you get what the String class provides you
> (you could start iterating over the characters but thats not going to be
> fast). With a string as a byte buffer, there exists the possibility of
> handing off the cache string matching algorithm to a carefully chosen
> native
> string matching algorithm.
>
> Quick example. You have a token cache. You want to tokenize a potentially
> new string. So you need to match a single string against a large set of
> possible candidates. The string is not likely to be very long (its a short
> string). So multi-match a shortish string. The most efficient algorithm
> for
> this scenario might involve using bit sets in machine registers, I'm
> guessing. I'm not sure that we will really need to do this super fast
> though?
>
> On 19/09/2007, John O'Hara <jo...@gmail.com> wrote:
> >
> > Its also a space optimisation on the wire for when we cared about
> that...
> > for high volume messaging esp with TCP/IP and serial unicast those bytes
> > start to matter.
> >
> > As for Tokenising, to support the notion of trivial clients the idea was
> > to
> > let the client assert which short strings were tokenized; usually
> relating
> > to routing keys which could repeat a lot.
> >
> > The original argument you made was to do with typing.
> > You point out that AMQPShortString has benefits for the broker.
> > It would also make sense to have that symetry in the client.
> >
> > There seems to be no compelling reason to change this.
> > It's late and I'm tired so I won't go on more
> >
> > G'd night
> > John
> >
> >
> > On 19/09/2007, Rafael Schloming <ra...@redhat.com> wrote:
> > >
> > > John O'Hara wrote:
> > > > I agree with Rob that the lower levels of the stack should be
> > > implemented in
> > > > AMQPShortString *where it occurs in the protocol* for the following
> > > reasons:
> > > >
> > > > 1) It provides the opportunity to validate the semantics; just
> because
> > > we're
> > > > not checking length today doesn't mean we shouldn't
> > >
> > > AMQShortString really isn't the appropriate place to validate domain
> > > level semantics. Different uses of shortstr have different domain
> level
> > > constraints. Also, any validation we put in AMQShortString is forced
> to
> > > run for every single shortstr field that passes through a broker. This
> > > isn't particularly useful because when decoding fields off the wire,
> > > such validation is unnecessary as it is already performed by the codec
> > > in a more efficient manner that is specialized to the wire format.
> > >
> > > > 2) We may introduce AMQPShortStrong Tokenisation in the protocol in
> > the
> > > > future (has been discussed often, I think it's quite likely).  Doing
> > > this we
> > > > can collapse a shortstring to 2 bytes and reduce garbage.
> > >
> > > I presume you're referring to some scheme for caching commonly used
> > > strings? If so this is a decoding optimization that would equally well
> > > apply when decoding directly to Strings, or any other type for that
> > > matter. In fact such an optimization would likely nullify any
> > > performance advantage rendered by AMQShortString since
> decoding/encoding
> > > of anything would only be necessary when there is a cache miss.
> > >
> > > > 3) I'm unsure of the memory ownership semantics but I believe the
> JMS
> > > spec
> > > > explicitly requires a copy of the message to be take to prevent grim
> > > race
> > > > conditions on message reuse.  Some products have the option to turn
> > this
> > > > off, but that's not the spec.  It's like not DMA'ing from userspace
> > > without
> > > > extreme care.
> > >
> > > I'm unsure how this relates to the use of AMQShortString. Any such
> > > copying would happen well past the point where raw types are decoded
> off
> > > the wire.
> > >
> > > > Also, Rob has said it has been proven to be faster in the past.
> > > > In the absence of a measured, demonstrable issue why change this
> > > arguably
> > > > more correct implementation?
> > >
> > > As it stands today AMQShortString is really just an optimization for
> the
> > > broker, and one that comes at a pretty high cost to the client. So if
> > > there is a better way to solve the performance issue for the broker
> > > without encumbering the client, it's certainly worth investigating.
> > >
> > > That's why I asked about the original problem being solved. For
> example
> > > I'd guess that in the critical path the broker really never needs to
> > > decode much more than the exchange name and routing key in order to
> > > deliver a message, so it might be possible to limit the use of
> > > AMQShortString to just those fields (or decode to specific Exchange
> and
> > > RoutingKey classes) and get the necessary performance benefit in the
> > > broker, with much less impact on the client.
> > >
> > > --Rafael
> > >
> > > > Cheers
> > > > John
> > > >
> > > >
> > > > On 19/09/2007, Rafael Schloming <ra...@redhat.com> wrote:
> > > >> Robert Godfrey wrote:
> > > >>> On 13/09/2007, Rajith Attapattu <ra...@gmail.com> wrote:
> > > >>>> I am wondering why we are using AMQShortString indiscriminately
> all
> > > >> over
> > > >>>> the
> > > >>>> client side code?
> > > >>>> There is no performance benefit of using AMQShortString (based on
> > the
> > > >> way
> > > >>>> it
> > > >>>> is used) on the client side and is purely used for encoding.
> > > >>>
> > > >>>
> > > >>> Rajith,
> > > >>>
> > > >>> as we have discussed before - there *is* a significant performance
> > > >> benefit
> > > >>> which we have tested and proved previously.
> > > >> Can you point me to the previous discussion? I'd like to learn more
> > > >> about the original issue.
> > > >>
> > > >>    Many short strings are re-used
> > > >>> frequently within the client library, and by using our own type we
> > can
> > > >>> exploit this.
> > > >> Unless we're excessively copying them I don't see how this matters.
> > For
> > > >> both an AMQShortString and a String we should just be passing
> around
> > > >> pointers when they are reused.
> > > >>
> > > >>    Further, the domain for many parameters in AMQP is *not* a
> > > >>> unicode string, but is tightly defined as upto 255 bytes of data
> > with
> > > a
> > > >>> particular encoding.  Java Strings are not the appropriate type to
> > use
> > > >> for
> > > >>> this.  Encoding and decoding Java Strings is expensive, and also
> > prone
> > > >> to
> > > >>> error (i.e. you need to make sure that you *always* use the
> correct
> > > >> explicit
> > > >>> encoding).
> > > >> Despite the name AMQShortString, I don't think the AMQShortString
> > class
> > > >> actually represents the AMQP type short-string, for example there
> is
> > no
> > > >> length limit for an AMQShortString. It's really just a generic
> > > >> implementation of CharSequence that is optimized specifically for
> > rapid
> > > >> decoding from a ByteBuffer. From a domain restriction perspective,
> > > using
> > > >> an ordinary String is just as correct.
> > > >>
> > > >>> It makes sense to use it on Broker side as you deal at bytes level
> > and
> > > I
> > > >> can
> > > >>>> understand the performance benefit of not having convert back and
> > > forth
> > > >>>> into
> > > >>>> a String.
> > > >>>
> > > >>> The low level API should be using correct AMQ domains.  High level
> > > APIs
> > > >>> (such as JMS) will obviously want to present these parameters as
> > java
> > > >>> Strings.
> > > >>>
> > > >>>
> > > >>> On the client side we just merely wrap/unwrap a String using
> > > >> AMQShortString.
> > > >>>> Why can't we do that at the encoding/decoding level for the
> client
> > > side
> > > >> ?
> > > >>>
> > > >>> In some cases this may be true, but in others certainly not.  When
> > > >>> converting into JMS Destinations on receipt of a message, for
> > > instance,
> > > >> one
> > > >>> never needs to convert to a String... it is *much* faster to
> simply
> > > use
> > > >> the
> > > >>> correct type of AMQShortString/
> > > >> Unfortunately using AMQShortString imposes additional overhead
> > whenever
> > > >> we need to en/decode to/from an ordinary String. It basically
> > requires
> > > >> an additional copy when compared with directly encoding/decoding
> > > to/from
> > > >>   a String. As the common case on the client side is dealing with
> > > >> Strings, I'm not at all convinced that ubiquitous use of
> > AMQShortString
> > > >> is a net win for the client.
> > > >>
> > > >> I believe what would be optimal is to use the CharSequence
> interface
> > > >> everywhere. This way String values passed to us by an application
> > could
> > > >> be directly passed all the way down the stack and encoded directly
> > onto
> > > >> the wire without an additional copy, and incoming data could be
> > > >> efficiently decoded into a private impl of CharSequence that could
> be
> > > >> converted to a String on demand.
> > > >>
> > > >> --Rafael
> > > >>
> > > >
> > >
> >
>

Re: Use of AMQShortString in client side code

Posted by Rupert Smith <ru...@googlemail.com>.

Just for the record, when short string was introduced, there was a
performance boost.

I like the idea of a string mapping down onto a byte buffer, its more like
how things would be done in C. That is the incoming data would not be lifted
out of its frame, the string would just be a pointer into the frame.

Wrt, tokenization. Doing this fast may require some neat string matching
algorithms. With Java String, you get what the String class provides you
(you could start iterating over the characters but thats not going to be
fast). With a string as a byte buffer, there exists the possibility of
handing off the cache string matching algorithm to a carefully chosen native
string matching algorithm.

Quick example. You have a token cache. You want to tokenize a potentially
new string. So you need to match a single string against a large set of
possible candidates. The string is not likely to be very long (its a short
string). So multi-match a shortish string. The most efficient algorithm for
this scenario might involve using bit sets in machine registers, I'm
guessing. I'm not sure that we will really need to do this super fast
though?

On 19/09/2007, John O'Hara <jo...@gmail.com> wrote:
>
> Its also a space optimisation on the wire for when we cared about that...
> for high volume messaging esp with TCP/IP and serial unicast those bytes
> start to matter.
>
> As for Tokenising, to support the notion of trivial clients the idea was
> to
> let the client assert which short strings were tokenized; usually relating
> to routing keys which could repeat a lot.
>
> The original argument you made was to do with typing.
> You point out that AMQPShortString has benefits for the broker.
> It would also make sense to have that symetry in the client.
>
> There seems to be no compelling reason to change this.
> It's late and I'm tired so I won't go on more
>
> G'd night
> John
>
>
> On 19/09/2007, Rafael Schloming <ra...@redhat.com> wrote:
> >
> > John O'Hara wrote:
> > > I agree with Rob that the lower levels of the stack should be
> > implemented in
> > > AMQPShortString *where it occurs in the protocol* for the following
> > reasons:
> > >
> > > 1) It provides the opportunity to validate the semantics; just because
> > we're
> > > not checking length today doesn't mean we shouldn't
> >
> > AMQShortString really isn't the appropriate place to validate domain
> > level semantics. Different uses of shortstr have different domain level
> > constraints. Also, any validation we put in AMQShortString is forced to
> > run for every single shortstr field that passes through a broker. This
> > isn't particularly useful because when decoding fields off the wire,
> > such validation is unnecessary as it is already performed by the codec
> > in a more efficient manner that is specialized to the wire format.
> >
> > > 2) We may introduce AMQPShortStrong Tokenisation in the protocol in
> the
> > > future (has been discussed often, I think it's quite likely).  Doing
> > this we
> > > can collapse a shortstring to 2 bytes and reduce garbage.
> >
> > I presume you're referring to some scheme for caching commonly used
> > strings? If so this is a decoding optimization that would equally well
> > apply when decoding directly to Strings, or any other type for that
> > matter. In fact such an optimization would likely nullify any
> > performance advantage rendered by AMQShortString since decoding/encoding
> > of anything would only be necessary when there is a cache miss.
> >
> > > 3) I'm unsure of the memory ownership semantics but I believe the JMS
> > spec
> > > explicitly requires a copy of the message to be take to prevent grim
> > race
> > > conditions on message reuse.  Some products have the option to turn
> this
> > > off, but that's not the spec.  It's like not DMA'ing from userspace
> > without
> > > extreme care.
> >
> > I'm unsure how this relates to the use of AMQShortString. Any such
> > copying would happen well past the point where raw types are decoded off
> > the wire.
> >
> > > Also, Rob has said it has been proven to be faster in the past.
> > > In the absence of a measured, demonstrable issue why change this
> > arguably
> > > more correct implementation?
> >
> > As it stands today AMQShortString is really just an optimization for the
> > broker, and one that comes at a pretty high cost to the client. So if
> > there is a better way to solve the performance issue for the broker
> > without encumbering the client, it's certainly worth investigating.
> >
> > That's why I asked about the original problem being solved. For example
> > I'd guess that in the critical path the broker really never needs to
> > decode much more than the exchange name and routing key in order to
> > deliver a message, so it might be possible to limit the use of
> > AMQShortString to just those fields (or decode to specific Exchange and
> > RoutingKey classes) and get the necessary performance benefit in the
> > broker, with much less impact on the client.
> >
> > --Rafael
> >
> > > Cheers
> > > John
> > >
> > >
> > > On 19/09/2007, Rafael Schloming <ra...@redhat.com> wrote:
> > >> Robert Godfrey wrote:
> > >>> On 13/09/2007, Rajith Attapattu <ra...@gmail.com> wrote:
> > >>>> I am wondering why we are using AMQShortString indiscriminately all
> > >> over
> > >>>> the
> > >>>> client side code?
> > >>>> There is no performance benefit of using AMQShortString (based on
> the
> > >> way
> > >>>> it
> > >>>> is used) on the client side and is purely used for encoding.
> > >>>
> > >>>
> > >>> Rajith,
> > >>>
> > >>> as we have discussed before - there *is* a significant performance
> > >> benefit
> > >>> which we have tested and proved previously.
> > >> Can you point me to the previous discussion? I'd like to learn more
> > >> about the original issue.
> > >>
> > >>    Many short strings are re-used
> > >>> frequently within the client library, and by using our own type we
> can
> > >>> exploit this.
> > >> Unless we're excessively copying them I don't see how this matters.
> For
> > >> both an AMQShortString and a String we should just be passing around
> > >> pointers when they are reused.
> > >>
> > >>    Further, the domain for many parameters in AMQP is *not* a
> > >>> unicode string, but is tightly defined as upto 255 bytes of data
> with
> > a
> > >>> particular encoding.  Java Strings are not the appropriate type to
> use
> > >> for
> > >>> this.  Encoding and decoding Java Strings is expensive, and also
> prone
> > >> to
> > >>> error (i.e. you need to make sure that you *always* use the correct
> > >> explicit
> > >>> encoding).
> > >> Despite the name AMQShortString, I don't think the AMQShortString
> class
> > >> actually represents the AMQP type short-string, for example there is
> no
> > >> length limit for an AMQShortString. It's really just a generic
> > >> implementation of CharSequence that is optimized specifically for
> rapid
> > >> decoding from a ByteBuffer. From a domain restriction perspective,
> > using
> > >> an ordinary String is just as correct.
> > >>
> > >>> It makes sense to use it on Broker side as you deal at bytes level
> and
> > I
> > >> can
> > >>>> understand the performance benefit of not having convert back and
> > forth
> > >>>> into
> > >>>> a String.
> > >>>
> > >>> The low level API should be using correct AMQ domains.  High level
> > APIs
> > >>> (such as JMS) will obviously want to present these parameters as
> java
> > >>> Strings.
> > >>>
> > >>>
> > >>> On the client side we just merely wrap/unwrap a String using
> > >> AMQShortString.
> > >>>> Why can't we do that at the encoding/decoding level for the client
> > side
> > >> ?
> > >>>
> > >>> In some cases this may be true, but in others certainly not.  When
> > >>> converting into JMS Destinations on receipt of a message, for
> > instance,
> > >> one
> > >>> never needs to convert to a String... it is *much* faster to simply
> > use
> > >> the
> > >>> correct type of AMQShortString/
> > >> Unfortunately using AMQShortString imposes additional overhead
> whenever
> > >> we need to en/decode to/from an ordinary String. It basically
> requires
> > >> an additional copy when compared with directly encoding/decoding
> > to/from
> > >>   a String. As the common case on the client side is dealing with
> > >> Strings, I'm not at all convinced that ubiquitous use of
> AMQShortString
> > >> is a net win for the client.
> > >>
> > >> I believe what would be optimal is to use the CharSequence interface
> > >> everywhere. This way String values passed to us by an application
> could
> > >> be directly passed all the way down the stack and encoded directly
> onto
> > >> the wire without an additional copy, and incoming data could be
> > >> efficiently decoded into a private impl of CharSequence that could be
> > >> converted to a String on demand.
> > >>
> > >> --Rafael
> > >>
> > >
> >
>

Re: Use of AMQShortString in client side code

Posted by Martin Ritchie <ri...@apache.org>.

On 25/09/2007, Rupert Smith <ru...@googlemail.com> wrote:
> On 19/09/2007, Rafael Schloming <ra...@redhat.com> wrote:
> >
> > This analysis gets a little more complicated in 0-10. In 0-10 a frame
> > boundary can appear anywhere, so it isn't possible to simply wrap the
> > bytes read in off the wire in an AMQShortString object and pass that up.
> > You either need to copy the bytes into a newly allocated ByteBuffer, or
> > have a more sophisticated AMQShortString implementation that operates on
> > a list of fragments. Obviously it is impossible to do the latter if
> > AMQShortString is used directly throughout the code.
> >
>
> I've been wondering about this, and some other issues relating to memory
> management, because there are quite a lot of different factors to consider,
> besides short string decoding.
>
> One problem with the current M2 Java broker, is that it can be extremely
> wasteful of memory. It uses (by default) 32K buffers. So if I send small
> messages, I can waste up to 32K per message, because I will never fill the
> buffers. I think this is a major factor in causing OOMEs on the broker.
>
> The fact that an AMQShortString might have to be decoded across multiple
> buffer fragments, and that we need to pre-determine the buffer size, both
> stem from the fact that we cannot know the size of an arriving message in
> advance.
>
> I may be wrong, but are we using heap buffers on M2 at the moment? Because
> if we are, the JVM will be pulling data into a temporary direct buffer, then
> copying it into the heap. As we have previously discussed, decoding directly
> on the underlying byte array, from a heap buffer, is much faster than using
> the indexed access or iterator methods. Also, presumably the copy operation
> is pretty quick, as I imagine its done in a tight instruction loop on the
> CPU.
>
> Solution 1:
> There doesn't seem to be much point in using heap buffers of a
> pre-determined size. If we can pull the data into direct buffers, of
> pre-determined size, work out what the size is, then copy the data into a
> heap buffer of the correct size (or indeed another direct buffer, or even
> memory mapped buffer for persistent messaging, just to keep these options
> open), then this can be no worse than the current situation of using fixed
> size heap buffers.
>
> This solution will have two advantages. The pre-sized direct buffers will be
> released after the copy, for re-use, and we will not waste valuable heap
> space. The decoding will be able to access byte arrays, and will never need
> to be done over multiple fragments, as the data has now been copied to be
> contiguous.

We can currently do this in the broker for queued messages. Setting
compressBufferOnQueue to true in the configuration file will create a
new buffer to store the message and copy the contents to it.. While
perhaps not the most efficient method, it does prevent the waste of
large buffers with small messages.

> I can think of a couple of other candidate solutions.
>
> Solution 2:
> Use direct buffers of pre-determined size. If data overflows a buffer, add
> on another one, so data can span multiple segments. Maintain statistics
> about the distribution of message sizes. Start with a buffer that 60% (or a
> different %) of messages will fit. If this overflows, add on another big
> enough to hold 60% of messages larger than the original 60%. And so on, at
> some threshold, just add fixed size chunks.
>
> Decoding still has to be done over multiple fragments.
> If the distribution is 1/f, will not be wasteful on small messages. If an
> app leans more towards large messages, the waste on an occasional small one,
> will be less significant anyway. Mileage will vary with the actual
> distribution of sizes though.
>
> Solution 3.
> Pack multiple messages into single buffer. I don't think will be very
> workable though, as the buffer will not be able to be reclaimed until all
> messages in it are ready for garbage collection (or copying out the live
> set, once a % become garbage will be needed).
>
> Rupert
>


-- 
Martin Ritchie

Re: Use of AMQShortString in client side code

Posted by Rupert Smith <ru...@googlemail.com>.

I think I slipped a 0 somewhere, and it is 1.2G/s per second for buffer
access, and 2.5G/s for byte[] access.

On 26/09/2007, Rupert Smith <ru...@googlemail.com> wrote:
>
> Hmm, actually those numbers don't seem right..
>
> 12GBytes/s = 3G 32 bit words/s = 750 M 32 bit words/RAM cycle (DDR ram)
>
> My laptop should only be able to do about around 2.5 to 3G/s..
>
> On 26/09/2007, Rupert Smith <ru...@googlemail.com> wrote:
> >
> > Interestingly, I re-ran these tests, on Linux (2.6 kernel) on a 1.5 and
> > 1.6 jvm, on my 800Mhz laptop.
> >
> > On 1.5, the indexed, and relative put(byte b), methods for reading and
> > writing to byte buffers were maybe 4 times as slow as byte[] access.
> > Reading/writing to the array directly (or in the case of direct buffers to a
> > temp byte[] and copying the entire array in/out of the buffer) was quicker;
> > a lot in some cases, less so in others.
> >
> > On 1.6, everything seems to have been optimized. Even byte[] access
> > itself was 2x as fast. The indexed and relative methods, were approx
> > 1.5-2x as slow as byte[] access. Array access to buffers, is marginally
> > slower. But whatever access method you use, they are all around about the
> > same.
> >
> > Significantly, on 1.5 heap buffers were generally faster, on 1.6 direct
> > buffers take the lead, except for array access to the buffer, where heap
> > buffers are still ahead.
> >
> > Bear in mind that this is only half of the story. All these tests, test
> > the in-memory speed of accessing the buffers, and say nothing about the
> > performance of filling/draining buffers to real devices through the OS.
> >
> > Also worth noting that I am doing 100 * 1000000 reads/writes per test,
> > and this might take 8 milliseconds, for a total throughput of
> > 12GBytes/s = 3G 32 bit words/s = 750 M 32 bit words/RAM cycle (DDR ram)
> > = about the speed of my laptop that I ran the tests on. Way above the device
> > speed for a 1Gbit network, so at least on a jvm 1.6, the choice of
> > buffer access method may be quite irrelevant to overall performance.
> >
> > I shall create a Java micro-benchmarking project on Sourceforge, so
> > anyone interested can run them for themsleves.
> >
> > Rupert
> >
> > On 25/09/2007, Rupert Smith < rupertlssmith@googlemail.com> wrote:
> > >
> > > I am planning on doing some micro-benchmarking. Its my current project
> > > for the train ride to work ;)
> > >
> > > I did do a little bit of benchmarking on this before and found that
> > > direct buffers are slower, when you call them with the indexed access
> > > methods:
> > >
> > > *put<http://java.sun.com/j2se/1.5.0/docs/api/java/nio/ByteBuffer.html#put%28int,%20byte%29>
> > > *(int index, byte b)
> > > *get<http://java.sun.com/j2se/1.5.0/docs/api/java/nio/ByteBuffer.html#get%28int%29>
> > > *(int index)
> > >
> > > compared with heap buffers. Due to overhead on the native calls?
> > > Generally, copying the data out of the buffer into an array, and then
> > > working with the array, is faster for direct buffers. I imagine that the
> > > copy operation is fast, and working with an array eliminates a lot of method
> > > calls. I haven't run these tests for a while mind you, so I should probably
> > > do so again to remind myself. Also, I probably just ran them on windows, so
> > > should run them on linux too.
> > >
> > > On 25/09/2007, Robert Greig < robert.j.greig@gmail.com> wrote:
> > > >
> > > > On 25/09/2007, Rupert Smith <ru...@googlemail.com> wrote:
> > > >
> > > > > Current situation.
> > > > >
> > > > > Messages -> heap buffer.
> > > > > JVM takes data into a temp direct buffer (behind the scenes, this
> > > > is how it
> > > > > does it more or less), copies into heap buffer (of fixed size).
> > > > >
> > > > > Solution 1:
> > > > > Messages -> direct buffer.
> > > > > Work out what the size is.
> > > > > Explicitly copy to heap buffer of the correct size.
> > > > >
> > > > > Presumably in the current situation, if compress is used, its an
> > > > extra copy,
> > > > > heap buffer -> heap buffer. In solution 1, you get the compress
> > > > for free.
> > > >
> > > > OK I understand your logic now.
> > > >
> > > > The question I have is why in our testing direct buffers are so much
> > > > slower than heap? Speaking to other groups (e.g. the Sun grizzly
> > > > project) they have also found this.
> > > >
> > > > Have you done any microbenchmarking to arrive at these approaches?
> > > >
> > > > RG
> > > >
> > >
> > >
> >
>

Re: Use of AMQShortString in client side code

Posted by Rupert Smith <ru...@googlemail.com>.

Hmm, actually those numbers don't seem right..

12GBytes/s = 3G 32 bit words/s = 750 M 32 bit words/RAM cycle (DDR ram)

My laptop should only be able to do about around 2.5 to 3G/s..

On 26/09/2007, Rupert Smith <ru...@googlemail.com> wrote:
>
> Interestingly, I re-ran these tests, on Linux (2.6 kernel) on a 1.5 and
> 1.6 jvm, on my 800Mhz laptop.
>
> On 1.5, the indexed, and relative put(byte b), methods for reading and
> writing to byte buffers were maybe 4 times as slow as byte[] access.
> Reading/writing to the array directly (or in the case of direct buffers to a
> temp byte[] and copying the entire array in/out of the buffer) was quicker;
> a lot in some cases, less so in others.
>
> On 1.6, everything seems to have been optimized. Even byte[] access itself
> was 2x as fast. The indexed and relative methods, were approx 1.5-2x as
> slow as byte[] access. Array access to buffers, is marginally slower. But
> whatever access method you use, they are all around about the same.
>
> Significantly, on 1.5 heap buffers were generally faster, on 1.6 direct
> buffers take the lead, except for array access to the buffer, where heap
> buffers are still ahead.
>
> Bear in mind that this is only half of the story. All these tests, test
> the in-memory speed of accessing the buffers, and say nothing about the
> performance of filling/draining buffers to real devices through the OS.
>
> Also worth noting that I am doing 100 * 1000000 reads/writes per test, and
> this might take 8 milliseconds, for a total throughput of
> 12GBytes/s = 3G 32 bit words/s = 750 M 32 bit words/RAM cycle (DDR ram) =
> about the speed of my laptop that I ran the tests on. Way above the device
> speed for a 1Gbit network, so at least on a jvm 1.6, the choice of buffer
> access method may be quite irrelevant to overall performance.
>
> I shall create a Java micro-benchmarking project on Sourceforge, so anyone
> interested can run them for themsleves.
>
> Rupert
>
> On 25/09/2007, Rupert Smith <ru...@googlemail.com> wrote:
> >
> > I am planning on doing some micro-benchmarking. Its my current project
> > for the train ride to work ;)
> >
> > I did do a little bit of benchmarking on this before and found that
> > direct buffers are slower, when you call them with the indexed access
> > methods:
> >
> > *put<http://java.sun.com/j2se/1.5.0/docs/api/java/nio/ByteBuffer.html#put%28int,%20byte%29>
> > *(int index, byte b)
> > *get<http://java.sun.com/j2se/1.5.0/docs/api/java/nio/ByteBuffer.html#get%28int%29>
> > *(int index)
> >
> > compared with heap buffers. Due to overhead on the native calls?
> > Generally, copying the data out of the buffer into an array, and then
> > working with the array, is faster for direct buffers. I imagine that the
> > copy operation is fast, and working with an array eliminates a lot of method
> > calls. I haven't run these tests for a while mind you, so I should probably
> > do so again to remind myself. Also, I probably just ran them on windows, so
> > should run them on linux too.
> >
> > On 25/09/2007, Robert Greig < robert.j.greig@gmail.com> wrote:
> > >
> > > On 25/09/2007, Rupert Smith <ru...@googlemail.com> wrote:
> > >
> > > > Current situation.
> > > >
> > > > Messages -> heap buffer.
> > > > JVM takes data into a temp direct buffer (behind the scenes, this is
> > > how it
> > > > does it more or less), copies into heap buffer (of fixed size).
> > > >
> > > > Solution 1:
> > > > Messages -> direct buffer.
> > > > Work out what the size is.
> > > > Explicitly copy to heap buffer of the correct size.
> > > >
> > > > Presumably in the current situation, if compress is used, its an
> > > extra copy,
> > > > heap buffer -> heap buffer. In solution 1, you get the compress for
> > > free.
> > >
> > > OK I understand your logic now.
> > >
> > > The question I have is why in our testing direct buffers are so much
> > > slower than heap? Speaking to other groups (e.g. the Sun grizzly
> > > project) they have also found this.
> > >
> > > Have you done any microbenchmarking to arrive at these approaches?
> > >
> > > RG
> > >
> >
> >
>

Re: Use of AMQShortString in client side code

Posted by Rupert Smith <ru...@googlemail.com>.

Interestingly, I re-ran these tests, on Linux (2.6 kernel) on a 1.5
and 1.6jvm, on my 800Mhz laptop.

On 1.5, the indexed, and relative put(byte b), methods for reading and
writing to byte buffers were maybe 4 times as slow as byte[] access.
Reading/writing to the array directly (or in the case of direct buffers to a
temp byte[] and copying the entire array in/out of the buffer) was quicker;
a lot in some cases, less so in others.

On 1.6, everything seems to have been optimized. Even byte[] access itself
was 2x as fast. The indexed and relative methods, were approx 1.5-2x as slow
as byte[] access. Array access to buffers, is marginally slower. But
whatever access method you use, they are all around about the same.

Significantly, on 1.5 heap buffers were generally faster, on 1.6 direct
buffers take the lead, except for array access to the buffer, where heap
buffers are still ahead.

Bear in mind that this is only half of the story. All these tests, test the
in-memory speed of accessing the buffers, and say nothing about the
performance of filling/draining buffers to real devices through the OS.

Also worth noting that I am doing 100 * 1000000 reads/writes per test, and
this might take 8 milliseconds, for a total throughput of
12GBytes/s = 3G 32 bit words/s = 750 M 32 bit words/RAM cycle (DDR ram) =
about the speed of my laptop that I ran the tests on. Way above the device
speed for a 1Gbit network, so at least on a jvm 1.6, the choice of buffer
access method may be quite irrelevant to overall performance.

I shall create a Java micro-benchmarking project on Sourceforge, so anyone
interested can run them for themsleves.

Rupert

On 25/09/2007, Rupert Smith <ru...@googlemail.com> wrote:
>
> I am planning on doing some micro-benchmarking. Its my current project for
> the train ride to work ;)
>
> I did do a little bit of benchmarking on this before and found that direct
> buffers are slower, when you call them with the indexed access methods:
>
> *put<http://java.sun.com/j2se/1.5.0/docs/api/java/nio/ByteBuffer.html#put%28int,%20byte%29>
> *(int index, byte b)
> *get<http://java.sun.com/j2se/1.5.0/docs/api/java/nio/ByteBuffer.html#get%28int%29>
> *(int index)
>
> compared with heap buffers. Due to overhead on the native calls?
> Generally, copying the data out of the buffer into an array, and then
> working with the array, is faster for direct buffers. I imagine that the
> copy operation is fast, and working with an array eliminates a lot of method
> calls. I haven't run these tests for a while mind you, so I should probably
> do so again to remind myself. Also, I probably just ran them on windows, so
> should run them on linux too.
>
> On 25/09/2007, Robert Greig <ro...@gmail.com> wrote:
> >
> > On 25/09/2007, Rupert Smith <ru...@googlemail.com> wrote:
> >
> > > Current situation.
> > >
> > > Messages -> heap buffer.
> > > JVM takes data into a temp direct buffer (behind the scenes, this is
> > how it
> > > does it more or less), copies into heap buffer (of fixed size).
> > >
> > > Solution 1:
> > > Messages -> direct buffer.
> > > Work out what the size is.
> > > Explicitly copy to heap buffer of the correct size.
> > >
> > > Presumably in the current situation, if compress is used, its an extra
> > copy,
> > > heap buffer -> heap buffer. In solution 1, you get the compress for
> > free.
> >
> > OK I understand your logic now.
> >
> > The question I have is why in our testing direct buffers are so much
> > slower than heap? Speaking to other groups (e.g. the Sun grizzly
> > project) they have also found this.
> >
> > Have you done any microbenchmarking to arrive at these approaches?
> >
> > RG
> >
>
>

Re: Use of AMQShortString in client side code

Posted by Rupert Smith <ru...@googlemail.com>.

I am planning on doing some micro-benchmarking. Its my current project for
the train ride to work ;)

I did do a little bit of benchmarking on this before and found that direct
buffers are slower, when you call them with the indexed access methods:

*put<http://java.sun.com/j2se/1.5.0/docs/api/java/nio/ByteBuffer.html#put%28int,%20byte%29>
*(int index, byte b)
*get<http://java.sun.com/j2se/1.5.0/docs/api/java/nio/ByteBuffer.html#get%28int%29>
*(int index)

compared with heap buffers. Due to overhead on the native calls? Generally,
copying the data out of the buffer into an array, and then working with the
array, is faster for direct buffers. I imagine that the copy operation is
fast, and working with an array eliminates a lot of method calls. I haven't
run these tests for a while mind you, so I should probably do so again to
remind myself. Also, I probably just ran them on windows, so should run them
on linux too.

On 25/09/2007, Robert Greig <ro...@gmail.com> wrote:
>
> On 25/09/2007, Rupert Smith <ru...@googlemail.com> wrote:
>
> > Current situation.
> >
> > Messages -> heap buffer.
> > JVM takes data into a temp direct buffer (behind the scenes, this is how
> it
> > does it more or less), copies into heap buffer (of fixed size).
> >
> > Solution 1:
> > Messages -> direct buffer.
> > Work out what the size is.
> > Explicitly copy to heap buffer of the correct size.
> >
> > Presumably in the current situation, if compress is used, its an extra
> copy,
> > heap buffer -> heap buffer. In solution 1, you get the compress for
> free.
>
> OK I understand your logic now.
>
> The question I have is why in our testing direct buffers are so much
> slower than heap? Speaking to other groups (e.g. the Sun grizzly
> project) they have also found this.
>
> Have you done any microbenchmarking to arrive at these approaches?
>
> RG
>

Re: Use of AMQShortString in client side code

Posted by Robert Greig <ro...@gmail.com>.

On 25/09/2007, Rupert Smith <ru...@googlemail.com> wrote:

> Current situation.
>
> Messages -> heap buffer.
> JVM takes data into a temp direct buffer (behind the scenes, this is how it
> does it more or less), copies into heap buffer (of fixed size).
>
> Solution 1:
> Messages -> direct buffer.
> Work out what the size is.
> Explicitly copy to heap buffer of the correct size.
>
> Presumably in the current situation, if compress is used, its an extra copy,
> heap buffer -> heap buffer. In solution 1, you get the compress for free.

OK I understand your logic now.

The question I have is why in our testing direct buffers are so much
slower than heap? Speaking to other groups (e.g. the Sun grizzly
project) they have also found this.

Have you done any microbenchmarking to arrive at these approaches?

RG

Re: Use of AMQShortString in client side code

Posted by Rupert Smith <ru...@googlemail.com>.

On 25/09/2007, Robert Greig <ro...@gmail.com> wrote:
>
> Sorry I don't follow. Why is this useful? Surely this introduces extra
> copying in some cases?
>

Current situation.

Messages -> heap buffer.
JVM takes data into a temp direct buffer (behind the scenes, this is how it
does it more or less), copies into heap buffer (of fixed size).

Solution 1:
Messages -> direct buffer.
Work out what the size is.
Explicitly copy to heap buffer of the correct size.

Presumably in the current situation, if compress is used, its an extra copy,
heap buffer -> heap buffer. In solution 1, you get the compress for free.

Re: Use of AMQShortString in client side code

Posted by Robert Greig <ro...@gmail.com>.

On 25/09/2007, Rupert Smith <ru...@googlemail.com> wrote:

> One problem with the current M2 Java broker, is that it can be extremely
> wasteful of memory. It uses (by default) 32K buffers. So if I send small
> messages, I can waste up to 32K per message, because I will never fill the
> buffers. I think this is a major factor in causing OOMEs on the broker.

Are we not calling compress when the message is being stored?

The idea was that cases where the message is queued would be
"compressed"  into a buffer of the appropriate size but when the data
is going straight out there is no need to copy.

> I may be wrong, but are we using heap buffers on M2 at the moment? Because
> if we are, the JVM will be pulling data into a temporary direct buffer, then
> copying it into the heap. As we have previously discussed, decoding directly
> on the underlying byte array, from a heap buffer, is much faster than using
> the indexed access or iterator methods. Also, presumably the copy operation
> is pretty quick, as I imagine its done in a tight instruction loop on the
> CPU.

Yes we are using heap buffers. You can easily switch to direct buffers
but tests have shown them to be much slower, at least for how we
currently use buffers.

> Solution 1:
> There doesn't seem to be much point in using heap buffers of a
> pre-determined size. If we can pull the data into direct buffers, of
> pre-determined size, work out what the size is, then copy the data into a
> heap buffer of the correct size (or indeed another direct buffer, or even
> memory mapped buffer for persistent messaging, just to keep these options
> open), then this can be no worse than the current situation of using fixed
> size heap buffers.

Sorry I don't follow. Why is this useful? Surely this introduces extra
copying in some cases?

I think it may be worth investigating whether using slices of a
preallocated direct buffer is better than smaller heap buffers, e.g.
allocate 512MB direct buffer and take 32k slices of it. This was just
something I discussed in conversation with someone at Sun.

RG

Re: Use of AMQShortString in client side code

Posted by Rupert Smith <ru...@googlemail.com>.

On 19/09/2007, Rafael Schloming <ra...@redhat.com> wrote:
>
> This analysis gets a little more complicated in 0-10. In 0-10 a frame
> boundary can appear anywhere, so it isn't possible to simply wrap the
> bytes read in off the wire in an AMQShortString object and pass that up.
> You either need to copy the bytes into a newly allocated ByteBuffer, or
> have a more sophisticated AMQShortString implementation that operates on
> a list of fragments. Obviously it is impossible to do the latter if
> AMQShortString is used directly throughout the code.
>

I've been wondering about this, and some other issues relating to memory
management, because there are quite a lot of different factors to consider,
besides short string decoding.

One problem with the current M2 Java broker, is that it can be extremely
wasteful of memory. It uses (by default) 32K buffers. So if I send small
messages, I can waste up to 32K per message, because I will never fill the
buffers. I think this is a major factor in causing OOMEs on the broker.

The fact that an AMQShortString might have to be decoded across multiple
buffer fragments, and that we need to pre-determine the buffer size, both
stem from the fact that we cannot know the size of an arriving message in
advance.

I may be wrong, but are we using heap buffers on M2 at the moment? Because
if we are, the JVM will be pulling data into a temporary direct buffer, then
copying it into the heap. As we have previously discussed, decoding directly
on the underlying byte array, from a heap buffer, is much faster than using
the indexed access or iterator methods. Also, presumably the copy operation
is pretty quick, as I imagine its done in a tight instruction loop on the
CPU.

Solution 1:
There doesn't seem to be much point in using heap buffers of a
pre-determined size. If we can pull the data into direct buffers, of
pre-determined size, work out what the size is, then copy the data into a
heap buffer of the correct size (or indeed another direct buffer, or even
memory mapped buffer for persistent messaging, just to keep these options
open), then this can be no worse than the current situation of using fixed
size heap buffers.

This solution will have two advantages. The pre-sized direct buffers will be
released after the copy, for re-use, and we will not waste valuable heap
space. The decoding will be able to access byte arrays, and will never need
to be done over multiple fragments, as the data has now been copied to be
contiguous.

I can think of a couple of other candidate solutions.

Solution 2:
Use direct buffers of pre-determined size. If data overflows a buffer, add
on another one, so data can span multiple segments. Maintain statistics
about the distribution of message sizes. Start with a buffer that 60% (or a
different %) of messages will fit. If this overflows, add on another big
enough to hold 60% of messages larger than the original 60%. And so on, at
some threshold, just add fixed size chunks.

Decoding still has to be done over multiple fragments.
If the distribution is 1/f, will not be wasteful on small messages. If an
app leans more towards large messages, the waste on an occasional small one,
will be less significant anyway. Mileage will vary with the actual
distribution of sizes though.

Solution 3.
Pack multiple messages into single buffer. I don't think will be very
workable though, as the buffer will not be able to be reclaimed until all
messages in it are ready for garbage collection (or copying out the live
set, once a % become garbage will be needed).

Rupert

Re: Use of AMQShortString in client side code

Posted by Rupert Smith <ru...@googlemail.com>.

Rafael,

Thanks for those links. I see your point.

On 24/09/2007, Rafael Schloming <ra...@redhat.com> wrote:
>
> Rafael Schloming wrote:
> > Rupert Smith wrote:
> >> On 24/09/2007, Rafael Schloming <ra...@redhat.com> wrote:
> >>> So if we were to fix AMQShortString to just use bulk put/get
> operations
> >>> all the time then the only issue is whether it is possible to
> implement
> >>> an aggregating ByteBuffer with efficient bulk put/get operations, and
> >>> the answer to this question depends entirely on the details of the
> >>> internal implementation of the various nio Buffer classes, which
> likely
> >>> means it is impossible to do without tying yourself to a specific JDK
> >>> implementation.
> >>>
> >>
> >> Its an aggregator, so bulk operations just map down onto the bulk
> >> operations
> >> of the aggregated buffers underneath it. So if a bulk get overlaps two
> >> buffers underneath, the get it split into two calls, one to fetch the
> >> relevant bit from each of the underlying buffers.
> >
> > Yes, this is exactly what FragmentDecoder does.
> >
> > I don't follow where the
> >> JDK specific part comes from?
> >
> > The JDK specific part comes in trying to do this as a subclass of
> > ByteBuffer. The various buffer implementations in the java.nio package
> > share internal implementations via package protected members and code.
> > In order to be able to create a subclass of ByteBuffer you would need to
> > obey all the various package protected contracts. These contracts are
> > implementation details of the JDK and may change between releases.
> >
> > A quick google on "extending ByteBuffer" also reveals the issues
> > mentioned here:
> >
> > http://forum.java.sun.com/thread.jspa?threadID=693259&messageID=4028590
> >
> > Basically ByteBuffer, although not final, has entirely private
> > constructors. This prevents it from being extend from outside the
> > java.nio package.
>
> This link would also seem to indicate that there is no intention to ever
> make ByteBuffer extensible:
>
> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4712383
>
> --Rafael
>

Re: Use of AMQShortString in client side code

Posted by Rafael Schloming <ra...@redhat.com>.

Rafael Schloming wrote:
> Rupert Smith wrote:
>> On 24/09/2007, Rafael Schloming <ra...@redhat.com> wrote:
>>> So if we were to fix AMQShortString to just use bulk put/get operations
>>> all the time then the only issue is whether it is possible to implement
>>> an aggregating ByteBuffer with efficient bulk put/get operations, and
>>> the answer to this question depends entirely on the details of the
>>> internal implementation of the various nio Buffer classes, which likely
>>> means it is impossible to do without tying yourself to a specific JDK
>>> implementation.
>>>
>>
>> Its an aggregator, so bulk operations just map down onto the bulk 
>> operations
>> of the aggregated buffers underneath it. So if a bulk get overlaps two
>> buffers underneath, the get it split into two calls, one to fetch the
>> relevant bit from each of the underlying buffers.
> 
> Yes, this is exactly what FragmentDecoder does.
> 
> I don't follow where the
>> JDK specific part comes from?
> 
> The JDK specific part comes in trying to do this as a subclass of 
> ByteBuffer. The various buffer implementations in the java.nio package 
> share internal implementations via package protected members and code. 
> In order to be able to create a subclass of ByteBuffer you would need to 
> obey all the various package protected contracts. These contracts are 
> implementation details of the JDK and may change between releases.
> 
> A quick google on "extending ByteBuffer" also reveals the issues 
> mentioned here:
> 
> http://forum.java.sun.com/thread.jspa?threadID=693259&messageID=4028590
> 
> Basically ByteBuffer, although not final, has entirely private 
> constructors. This prevents it from being extend from outside the 
> java.nio package.

This link would also seem to indicate that there is no intention to ever 
make ByteBuffer extensible:

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4712383

--Rafael

Re: Use of AMQShortString in client side code

Posted by Rafael Schloming <ra...@redhat.com>.

Rupert Smith wrote:
> On 24/09/2007, Rafael Schloming <ra...@redhat.com> wrote:
>> So if we were to fix AMQShortString to just use bulk put/get operations
>> all the time then the only issue is whether it is possible to implement
>> an aggregating ByteBuffer with efficient bulk put/get operations, and
>> the answer to this question depends entirely on the details of the
>> internal implementation of the various nio Buffer classes, which likely
>> means it is impossible to do without tying yourself to a specific JDK
>> implementation.
>>
> 
> Its an aggregator, so bulk operations just map down onto the bulk operations
> of the aggregated buffers underneath it. So if a bulk get overlaps two
> buffers underneath, the get it split into two calls, one to fetch the
> relevant bit from each of the underlying buffers.

Yes, this is exactly what FragmentDecoder does.

I don't follow where the
> JDK specific part comes from?

The JDK specific part comes in trying to do this as a subclass of 
ByteBuffer. The various buffer implementations in the java.nio package 
share internal implementations via package protected members and code. 
In order to be able to create a subclass of ByteBuffer you would need to 
obey all the various package protected contracts. These contracts are 
implementation details of the JDK and may change between releases.

A quick google on "extending ByteBuffer" also reveals the issues 
mentioned here:

http://forum.java.sun.com/thread.jspa?threadID=693259&messageID=4028590

Basically ByteBuffer, although not final, has entirely private 
constructors. This prevents it from being extend from outside the 
java.nio package.

--Rafael

Re: Use of AMQShortString in client side code

Posted by Rupert Smith <ru...@googlemail.com>.

On 24/09/2007, Rafael Schloming <ra...@redhat.com> wrote:
>
> So if we were to fix AMQShortString to just use bulk put/get operations
> all the time then the only issue is whether it is possible to implement
> an aggregating ByteBuffer with efficient bulk put/get operations, and
> the answer to this question depends entirely on the details of the
> internal implementation of the various nio Buffer classes, which likely
> means it is impossible to do without tying yourself to a specific JDK
> implementation.
>

Its an aggregator, so bulk operations just map down onto the bulk operations
of the aggregated buffers underneath it. So if a bulk get overlaps two
buffers underneath, the get it split into two calls, one to fetch the
relevant bit from each of the underlying buffers. I don't follow where the
JDK specific part comes from?

Re: Use of AMQShortString in client side code

Posted by Robert Greig <ro...@gmail.com>.

On 20/09/2007, Rafael Schloming <ra...@redhat.com> wrote:

> Perhaps I didn't understand your suggestion.

OK, now I am confused I think...

> It sounds to me like it is
> the same thing as doing nothing at all.

No, I meant enhance the current implementation of AMQShortString to
correct the weaknesses already discussed for encoding.

> The generated API would use
> String and there would be no copying on encode, however that still
> leaves the overhead of copying and unicode character conversion on
> decode. It is the latter part that using AMQShortString or the
> CharSequence interface can avoid.

But the current implementation of AMQShortString lazily decodes? What
is the copying you are referring to?

RG

Re: Use of AMQShortString in client side code

Posted by Rafael Schloming <ra...@redhat.com>.

Robert Godfrey wrote:
>> Perhaps I didn't understand your suggestion. It sounds to me like it is
>> the same thing as doing nothing at all. The generated API would use
>> String and there would be no copying on encode, however that still
>> leaves the overhead of copying and unicode character conversion on
>> decode. It is the latter part that using AMQShortString or the
>> CharSequence interface can avoid.
> 
> 
> 
> OK... Firstly, I think the decision in AMQP 0-10 to make all shortstr's be
> a valid UTF-8 string is a mistake, but we can re-visit that over at AMQP :-)
> 
> Secondly, the advantages for me of not using String are
> 
> i) In the decode we don't need to make a copy into a String object with the
> expensive conversion routine it has
> ii) In the encode case the encoding for a common string value needs only to
> happen once.  Obviously at some point there is still likely to be a copy.
> 
> Testing has shown that converting String objects to and from bytes is very
> expensive.  In 0-8/0-9 no multibyte encoding was specified for shortstr's
> and it was essentially assumed that they consisted of single-byte
> characters.

I agree that all of those issues preclude the direct use of String for 
shortstr if there is no special exception made for exchange name and 
routing key. I do think using CharSequence avoids this though.

> The decision that was slipped into 0-10 that shortstr's should be
> representations of a particular multi-byte encoding is (I think) a mistake.
> It would seem to require a degree of validation on shortstr's that I do not
> think is appropriate.  My main issue with shortstr is its name.  It should
> instead have been called short-octet-sequence.  The fact that is can be used
> to represent encoded data that corresponds to human readable alphabets is
> not intrinsic to its use in AMQP I think.

The decision wasn't slipped into 0-10. That text has remained unchanged 
since 0-8.

FWIW I don't think shortstr actually is an opaque sequence of bytes. 
Everywhere it is used it refers to a named entity, so it really is a 
sequence of characters and we do need to specify an encoding in order to 
avoid interop issues. I also think the UTF-8 encoding is probably the 
correct choice for most of its uses, i.e. fields going client --> client.

--Rafael

Re: Use of AMQShortString in client side code

Posted by Robert Godfrey <ro...@gmail.com>.

>
> Perhaps I didn't understand your suggestion. It sounds to me like it is
> the same thing as doing nothing at all. The generated API would use
> String and there would be no copying on encode, however that still
> leaves the overhead of copying and unicode character conversion on
> decode. It is the latter part that using AMQShortString or the
> CharSequence interface can avoid.



OK... Firstly, I think the decision in AMQP 0-10 to make all shortstr's be
a valid UTF-8 string is a mistake, but we can re-visit that over at AMQP :-)

Secondly, the advantages for me of not using String are

i) In the decode we don't need to make a copy into a String object with the
expensive conversion routine it has
ii) In the encode case the encoding for a common string value needs only to
happen once.  Obviously at some point there is still likely to be a copy.

Testing has shown that converting String objects to and from bytes is very
expensive.  In 0-8/0-9 no multibyte encoding was specified for shortstr's
and it was essentially assumed that they consisted of single-byte
characters.

The decision that was slipped into 0-10 that shortstr's should be
representations of a particular multi-byte encoding is (I think) a mistake.
It would seem to require a degree of validation on shortstr's that I do not
think is appropriate.  My main issue with shortstr is its name.  It should
instead have been called short-octet-sequence.  The fact that is can be used
to represent encoded data that corresponds to human readable alphabets is
not intrinsic to its use in AMQP I think.

-- Rob

Re: Use of AMQShortString in client side code

Posted by Rafael Schloming <ra...@redhat.com>.

Robert Greig wrote:
> On 20/09/2007, Rafael Schloming <ra...@redhat.com> wrote:
>> Robert Greig wrote:
>> That depends on the details of how tokenizing would work. It's not 100%
>> clear to me how you'd apply a generalized tokenizing scheme to AMQP as
>> some shortstrs are intended for the broker to interpret, and some
>> shortstrs are intended for the receiving client to interpret.
> 
> I had assumed there would be a type TokenizableShortString in the
> protocol which the broker would always have to interpret and that
> would exist only in places where it made sense (i.e. where you would
> always be parsing it anyway). But thinking about that further now,
> there are clearly some problems with that such as whether you allow
> such things to be squirreled away inside field tables etc.
> 
>> Defining a scheme that is specific to exchange name and routing key is
>> probably just simpler and easier. And of course making it fixed width
>> permits easy access to hardware for monitoring and acceleration.
> 
> What length is being considered? I am just concerned that it may be
> quite long if it's to be useful, or there needs to be a namespace
> associated somewhere else e.g. with a producer (not a concept that
> exists currently) or consumer. For example you might want to create a
> queue called:
> 
> com.megabank.ib.fixedincome.repo.IncomingRepos
> 
> which has length 46.

I don't think we would replace exchange names and routing keys, just 
provide the ability to refer to them in a more efficient manner, e.g. 
perhaps use a two byte destination field that must have been previously 
assigned to a long format destination. But as I said this is 0-11 stuff, 
so the details aren't nailed down yet.

>> I could do this, but my code already uses String in the API and it
>> already lazily encodes from String->ByteBuffer, so I think this would be
>>  functionally equivalent to what I'm doing now.
> 
> So if I understand your comment correctly that sounds like a plan?

Perhaps I didn't understand your suggestion. It sounds to me like it is 
the same thing as doing nothing at all. The generated API would use 
String and there would be no copying on encode, however that still 
leaves the overhead of copying and unicode character conversion on 
decode. It is the latter part that using AMQShortString or the 
CharSequence interface can avoid.

--Rafael

Re: Use of AMQShortString in client side code

Posted by Robert Greig <ro...@gmail.com>.

On 20/09/2007, Rafael Schloming <ra...@redhat.com> wrote:
> Robert Greig wrote:
> That depends on the details of how tokenizing would work. It's not 100%
> clear to me how you'd apply a generalized tokenizing scheme to AMQP as
> some shortstrs are intended for the broker to interpret, and some
> shortstrs are intended for the receiving client to interpret.

I had assumed there would be a type TokenizableShortString in the
protocol which the broker would always have to interpret and that
would exist only in places where it made sense (i.e. where you would
always be parsing it anyway). But thinking about that further now,
there are clearly some problems with that such as whether you allow
such things to be squirreled away inside field tables etc.

> Defining a scheme that is specific to exchange name and routing key is
> probably just simpler and easier. And of course making it fixed width
> permits easy access to hardware for monitoring and acceleration.

What length is being considered? I am just concerned that it may be
quite long if it's to be useful, or there needs to be a namespace
associated somewhere else e.g. with a producer (not a concept that
exists currently) or consumer. For example you might want to create a
queue called:

com.megabank.ib.fixedincome.repo.IncomingRepos

which has length 46.

> I could do this, but my code already uses String in the API and it
> already lazily encodes from String->ByteBuffer, so I think this would be
>  functionally equivalent to what I'm doing now.

So if I understand your comment correctly that sounds like a plan?

RG

Re: Use of AMQShortString in client side code

Posted by Rafael Schloming <ra...@redhat.com>.

Robert Greig wrote:
> On 19/09/2007, Rafael Schloming <ra...@redhat.com> wrote:
> 
>> One of the previously discussed improvements for 0-11 is to define a
>> fixed width destination or address type that can be used to efficiently
>> represent the exchange name + routing-key (i.e. something akin to an IP
>> address). I suspect this would actually make the whole issue somewhat
>> moot since shortstr decoding would be eliminated entirely from the
>> critical path of the broker.
> 
> What advantages would that have over variable width plus tokenizing?

That depends on the details of how tokenizing would work. It's not 100% 
clear to me how you'd apply a generalized tokenizing scheme to AMQP as 
some shortstrs are intended for the broker to interpret, and some 
shortstrs are intended for the receiving client to interpret.

For example the broker would clearly be expected to interpret tokens in 
the exchange name + routing key, but what happens if a token is used in 
the headers field table? Is the broker expected to convert it to a non 
tokenized shortstr? If so then that is a significant burden for the 
broker. If not then how does the recipient know how to interpret the token?

Defining a scheme that is specific to exchange name and routing key is 
probably just simpler and easier. And of course making it fixed width 
permits easy access to hardware for monitoring and acceleration.

>> To be clear I'm not against using CharSequence. Right now I'm just
>> trying to nail down whether AMQShortString is a necessary API choice or
>> simply an optimization. If it is the former there isn't much wiggle
>> room. If it is the latter I think there are a number of reasonable
>> alternatives, and using CharSequence would definitely be high on my list
>> to try.
> 
> Thinking about this further, can your generated API not present
> whatever it wants to users (e.g String just like JMS). We can optimise
> AMQShortString so that it lazily encodes String->ByteBuffer. That
> being the case, your API can create AMQShortString on behalf of the
> user to make is "easy" to use, there is no performance hit and we
> don't limit ourselves to CharSequence and avoid Rob's concern about
> using instanceof or casts.

I could do this, but my code already uses String in the API and it 
already lazily encodes from String->ByteBuffer, so I think this would be 
  functionally equivalent to what I'm doing now.

--Rafael

Re: Use of AMQShortString in client side code

Posted by Robert Greig <ro...@gmail.com>.

On 19/09/2007, Rafael Schloming <ra...@redhat.com> wrote:

> One of the previously discussed improvements for 0-11 is to define a
> fixed width destination or address type that can be used to efficiently
> represent the exchange name + routing-key (i.e. something akin to an IP
> address). I suspect this would actually make the whole issue somewhat
> moot since shortstr decoding would be eliminated entirely from the
> critical path of the broker.

What advantages would that have over variable width plus tokenizing?

> To be clear I'm not against using CharSequence. Right now I'm just
> trying to nail down whether AMQShortString is a necessary API choice or
> simply an optimization. If it is the former there isn't much wiggle
> room. If it is the latter I think there are a number of reasonable
> alternatives, and using CharSequence would definitely be high on my list
> to try.

Thinking about this further, can your generated API not present
whatever it wants to users (e.g String just like JMS). We can optimise
AMQShortString so that it lazily encodes String->ByteBuffer. That
being the case, your API can create AMQShortString on behalf of the
user to make is "easy" to use, there is no performance hit and we
don't limit ourselves to CharSequence and avoid Rob's concern about
using instanceof or casts.

RG

Re: Use of AMQShortString in client side code

Posted by Rafael Schloming <ra...@redhat.com>.

Robert Greig wrote:
> On 19/09/2007, Rafael Schloming <ra...@redhat.com> wrote:
>> This is what is in the spec. I think it's the same as 0-8:
>>
>>       Short strings, stored as an 8-bit unsigned integer length
>>       followed by zero or more octets of data. Short strings can
>>       carry up to 255 octets of UTF-8 data, but may not contain
>>       binary zero octets.
>>
>> So strictly speaking the current use of AMQShortString is actually a
>> spec violation since it doesn't do any character conversion and so will
>> only behave correctly if all the bytes happen to be less than 128.
> 
> I think this was something that was wrong with the 0-8 spec. Some
> strings may need to be UTF-8 but many do not. I think a while ago I
> did propose two types of string - a wide string and an ASCII string -
> to sort this. Not sure if that made any headway but if not I would
> suggest bringing it up for 0-10.

One of the previously discussed improvements for 0-11 is to define a 
fixed width destination or address type that can be used to efficiently 
represent the exchange name + routing-key (i.e. something akin to an IP 
address). I suspect this would actually make the whole issue somewhat 
moot since shortstr decoding would be eliminated entirely from the 
critical path of the broker.

Given this type I'm not sure it would be necessary to define an ascii 
version of shortstr. Personally I would even be tempted to up the length 
limit to two bytes at that point since other usages are less performance 
critical.

>> Alternatively I think it would also be reasonable to stick with String
>> and pursue other optimization strategies, e.g. treat exchange-name and
>> routing-key specially, or use tokenization.
> 
> If CharSequence is out, I would like to see full details and
> performance comparisons of any String-based approached. We could
> optimise the AMQShortString to avoid the duplicate copy so I would not
> support ditching that because of some generated alternative API unless
> it could be guaranteed not to reduce performance.

To be clear I'm not against using CharSequence. Right now I'm just 
trying to nail down whether AMQShortString is a necessary API choice or 
simply an optimization. If it is the former there isn't much wiggle 
room. If it is the latter I think there are a number of reasonable 
alternatives, and using CharSequence would definitely be high on my list 
to try.

--Rafael

Re: Use of AMQShortString in client side code

Posted by Robert Greig <ro...@gmail.com>.

On 19/09/2007, Rafael Schloming <ra...@redhat.com> wrote:
> This is what is in the spec. I think it's the same as 0-8:
>
>       Short strings, stored as an 8-bit unsigned integer length
>       followed by zero or more octets of data. Short strings can
>       carry up to 255 octets of UTF-8 data, but may not contain
>       binary zero octets.
>
> So strictly speaking the current use of AMQShortString is actually a
> spec violation since it doesn't do any character conversion and so will
> only behave correctly if all the bytes happen to be less than 128.

I think this was something that was wrong with the 0-8 spec. Some
strings may need to be UTF-8 but many do not. I think a while ago I
did propose two types of string - a wide string and an ASCII string -
to sort this. Not sure if that made any headway but if not I would
suggest bringing it up for 0-10.

> Alternatively I think it would also be reasonable to stick with String
> and pursue other optimization strategies, e.g. treat exchange-name and
> routing-key specially, or use tokenization.

If CharSequence is out, I would like to see full details and
performance comparisons of any String-based approached. We could
optimise the AMQShortString to avoid the duplicate copy so I would not
support ditching that because of some generated alternative API unless
it could be guaranteed not to reduce performance.

RG

Re: Use of AMQShortString in client side code

Posted by Robert Greig <ro...@gmail.com>.

On 24/09/2007, Robert Greig <ro...@gmail.com> wrote:

> the routing key. For the headers exchange the encoding is relevant for
> example.

Sorry I'm talking nonsense, the headers exchange doesn't care about
the encoding.

RG

Re: Use of AMQShortString in client side code

Posted by Robert Greig <ro...@gmail.com>.

On 24/09/2007, Rafael Schloming <ra...@redhat.com> wrote:

> > Surely we should be pushing to modify the protocol?
>
> I'm not yet convinced there is sufficient reason to modify the protocol.
>  Nothing in the protocol forces a *broker* to perform unicode character
> conversion on the exchange name or routing key. The broker can simply
> treat the value as an opaque sequence of bytes and pass it along
> unmodified to the final recipient.

This implies that it knows how an exchange implementation deals with
the routing key. For the headers exchange the encoding is relevant for
example. Even the topic exchange needs to know in order to be able to
do wildcard matching.

> This would be reasonable. We may also be able to decide at runtime which
> CharSequence impl to use, since everything the AMQShortString impl does
> is correct *if* all the byte values are in the ASCII range, and we do
> need to iterate over the values at least once to compute the hashCode().
> If we did this up front and checked the byte range we would know if it
> was safe to use AMQShortString or not.

Yes that's true.

> The same thing using Map instead of HashMap does. We can easily swap in
> alternative implementations. Specifically in this case the client (or
> broker) could swap in an alternative implementation that does correct
> unicode character conversion.

OK that's fine. I had thought you were saying there was some
performance issue with using AMQShortString rather than CharSequence.

RG

Re: Use of AMQShortString in client side code

Posted by Rafael Schloming <ra...@redhat.com>.

Robert Greig wrote:
> On 24/09/2007, Rafael Schloming <ra...@redhat.com> wrote:
> 
>> Seriously though, if we actually need to operate on chars then there is
>> no point in casting. Short of modifying the protocol to redefine
>> shortstr as ASCII only, that will not result in the character that the
>> client intended to send.
> 
> Surely we should be pushing to modify the protocol?

I'm not yet convinced there is sufficient reason to modify the protocol. 
  Nothing in the protocol forces a *broker* to perform unicode character 
conversion on the exchange name or routing key. The broker can simply 
treat the value as an opaque sequence of bytes and pass it along 
unmodified to the final recipient.

> At the very least a -D option to allow users to improve dramatically
> their performance in certain "controlled" circumstances would be
> desirable to me.

This would be reasonable. We may also be able to decide at runtime which 
CharSequence impl to use, since everything the AMQShortString impl does 
is correct *if* all the byte values are in the ASCII range, and we do 
need to iterate over the values at least once to compute the hashCode(). 
If we did this up front and checked the byte range we would know if it 
was safe to use AMQShortString or not.

>> The hashCode() and equals() method in AMQShortString are currently
>> computed based solely on the raw shortstr bytes. They will function
>> correctly when hashing against and comparing to other AMQShortString
>> instances.
> 
> OK, but that is completely independent of whether we use the
> CharSequence interface everywhere?

Yes. When I said "a CharSequence impl" in my prior email I was simply 
distinguishing the CharSequence impl I was describing from the 
particular CharSequence impl called AMQShortString.

> So going back to my original question, I still don't see what
> CharSequence buys us? What does changing the usages to CharSequence
> rather than AMQShortString allow us to do that we can't with
> AMQShortString?

The same thing using Map instead of HashMap does. We can easily swap in 
alternative implementations. Specifically in this case the client (or 
broker) could swap in an alternative implementation that does correct 
unicode character conversion.

--Rafael

Re: Use of AMQShortString in client side code

Posted by Robert Greig <ro...@gmail.com>.

On 24/09/2007, Rafael Schloming <ra...@redhat.com> wrote:

> Seriously though, if we actually need to operate on chars then there is
> no point in casting. Short of modifying the protocol to redefine
> shortstr as ASCII only, that will not result in the character that the
> client intended to send.

Surely we should be pushing to modify the protocol?

At the very least a -D option to allow users to improve dramatically
their performance in certain "controlled" circumstances would be
desirable to me.

> The hashCode() and equals() method in AMQShortString are currently
> computed based solely on the raw shortstr bytes. They will function
> correctly when hashing against and comparing to other AMQShortString
> instances.

OK, but that is completely independent of whether we use the
CharSequence interface everywhere?

So going back to my original question, I still don't see what
CharSequence buys us? What does changing the usages to CharSequence
rather than AMQShortString allow us to do that we can't with
AMQShortString?

RG

Re: Use of AMQShortString in client side code

Posted by Rafael Schloming <ra...@redhat.com>.

Robert Greig wrote:
> On 24/09/2007, Rafael Schloming <ra...@redhat.com> wrote:
> 
>> I do mean lazily decode, however not like the AMQShortString
>> implementation does. It currently just casts from byte to char. This
>> only works if the string consists entirely of non extended ASCII. I was
>> thinking more along the lines of internally converting to a char[] using
>> the proper charset decoder the first time the charAt(idx) method from
>> the CharSequence interface is used.
> 
> OK but using the charset decoder is 5x slower than casting from byte to char.

It is also approximately 5x more correct. ;)

Seriously though, if we actually need to operate on chars then there is 
no point in casting. Short of modifying the protocol to redefine 
shortstr as ASCII only, that will not result in the character that the 
client intended to send.

>> This would provide correct unicode
>> character conversion for clients, and cause no overhead to the broker
>> since the broker just needs to do hash lookups based on the shortstr
>> values, and hashCode() and equals() can be implemented without doing
>> full character conversion.
> 
> Maybe I'm being particularly dim today but I don't follow. Let's say
> the broker gets a Basic.Publish, it has to decode the routing key,
> right? What hash lookups can it do without converting that buffer to a
> String (irrespective of whether it does so using a slow charset
> decoder or a quick cast)?

The hashCode() and equals() method in AMQShortString are currently 
computed based solely on the raw shortstr bytes. They will function 
correctly when hashing against and comparing to other AMQShortString 
instances.

--Rafael

Re: Use of AMQShortString in client side code

Posted by Robert Greig <ro...@gmail.com>.

On 24/09/2007, Rafael Schloming <ra...@redhat.com> wrote:

> I do mean lazily decode, however not like the AMQShortString
> implementation does. It currently just casts from byte to char. This
> only works if the string consists entirely of non extended ASCII. I was
> thinking more along the lines of internally converting to a char[] using
> the proper charset decoder the first time the charAt(idx) method from
> the CharSequence interface is used.

OK but using the charset decoder is 5x slower than casting from byte to char.

> This would provide correct unicode
> character conversion for clients, and cause no overhead to the broker
> since the broker just needs to do hash lookups based on the shortstr
> values, and hashCode() and equals() can be implemented without doing
> full character conversion.

Maybe I'm being particularly dim today but I don't follow. Let's say
the broker gets a Basic.Publish, it has to decode the routing key,
right? What hash lookups can it do without converting that buffer to a
String (irrespective of whether it does so using a slow charset
decoder or a quick cast)?

RG

Re: Use of AMQShortString in client side code

Posted by Rafael Schloming <ra...@redhat.com>.

Robert Greig wrote:
> On 24/09/2007, Rafael Schloming <ra...@redhat.com> wrote:
> 
>> My understanding is that the two proven factors in decoding shortstrs
>> are (1) unicode character conversion, and (2) copy via iteration rather
>> than bulk put/get. And as far as I can see the only correct solution to
>> (1) is a CharSequence implementation that does the conversion on demand
>> rather than up front.
> 
> Sorry I don't know what you mean here. What do you mean by "do the
> conversion on demand"? Do you just mean lazily decode like the current
> AMQShortString implementation does? If so where does CharSequence come
> into play?

I do mean lazily decode, however not like the AMQShortString 
implementation does. It currently just casts from byte to char. This 
only works if the string consists entirely of non extended ASCII. I was 
thinking more along the lines of internally converting to a char[] using 
the proper charset decoder the first time the charAt(idx) method from 
the CharSequence interface is used. This would provide correct unicode 
character conversion for clients, and cause no overhead to the broker 
since the broker just needs to do hash lookups based on the shortstr 
values, and hashCode() and equals() can be implemented without doing 
full character conversion.

--Rafael

Re: Use of AMQShortString in client side code

Posted by Robert Greig <ro...@gmail.com>.

On 24/09/2007, Rafael Schloming <ra...@redhat.com> wrote:

> My understanding is that the two proven factors in decoding shortstrs
> are (1) unicode character conversion, and (2) copy via iteration rather
> than bulk put/get. And as far as I can see the only correct solution to
> (1) is a CharSequence implementation that does the conversion on demand
> rather than up front.

Sorry I don't know what you mean here. What do you mean by "do the
conversion on demand"? Do you just mean lazily decode like the current
AMQShortString implementation does? If so where does CharSequence come
into play?

RG

Re: Use of AMQShortString in client side code

Posted by Rafael Schloming <ra...@redhat.com>.

Rupert Smith wrote:
> Rafael,
> 
> Thanks for those thoughts on the subject.
> 
> On 20/09/2007, Rafael Schloming <ra...@redhat.com> wrote:
>> Also, performance-wise an aggregating ByteBuffer probably wouldn't do so
>> well since you'd have to return false/null from hasArray()/array(), and
>> so you'd nullify the benefit from bulk get/put operations.
>>
> 
> Won't this be the case with direct buffers anyway? I'm not yet sure what all
> the factors are that determine whether direct or heap buffers are faster,
> but a decoder that requires hasArray == true, can only work with heap
> buffers. The bulk array copy operations will still work, they will just have
> to do multiple copy operations.

My code doesn't require hasArray to be true, it simply uses the bulk 
get/put methods which are efficiently implemented on both heap and 
direct ByteBuffers. My comment was actually about the code in 
AMQShortString which calls hasArray() and falls back to iterating when 
it is false.

So if we were to fix AMQShortString to just use bulk put/get operations 
all the time then the only issue is whether it is possible to implement 
an aggregating ByteBuffer with efficient bulk put/get operations, and 
the answer to this question depends entirely on the details of the 
internal implementation of the various nio Buffer classes, which likely 
means it is impossible to do without tying yourself to a specific JDK 
implementation.

> I did do an experiment on byte buffers previously, and found doing an array
> copy from a direct buffer and working on the array is much faster than using
> the indexed access methods (assuming you want to process the entire array,
> that is).

Yes, I've found this to be the case as well.

> All this copying sounds like a pain, but given that a heap buffer will
> already have undergone a copy operation on its entire array to bring it in
> heap, and that copying may be done on demand with a direct buffer, there may
> be an advantage. Its really hard to say without doing some empirical
> testing.

Yes, although from what I understand the actual overhead of copying 
shortstrs on decode isn't necessarily a factor. In fact as Rob Godfrey 
pointed out to me using a slice may in fact be a net loss since the 
slice will contain a pointer to the full frame that was read off of the 
network which may be a much larger byte[] than what is required just to 
store the contents of the shortstr.

My understanding is that the two proven factors in decoding shortstrs 
are (1) unicode character conversion, and (2) copy via iteration rather 
than bulk put/get. And as far as I can see the only correct solution to 
(1) is a CharSequence implementation that does the conversion on demand 
rather than up front.

--Rafael

Re: Use of AMQShortString in client side code

Posted by Rupert Smith <ru...@googlemail.com>.

Rafael,

Thanks for those thoughts on the subject.

On 20/09/2007, Rafael Schloming <ra...@redhat.com> wrote:
>
> Also, performance-wise an aggregating ByteBuffer probably wouldn't do so
> well since you'd have to return false/null from hasArray()/array(), and
> so you'd nullify the benefit from bulk get/put operations.
>

Won't this be the case with direct buffers anyway? I'm not yet sure what all
the factors are that determine whether direct or heap buffers are faster,
but a decoder that requires hasArray == true, can only work with heap
buffers. The bulk array copy operations will still work, they will just have
to do multiple copy operations.

I did do an experiment on byte buffers previously, and found doing an array
copy from a direct buffer and working on the array is much faster than using
the indexed access methods (assuming you want to process the entire array,
that is).

All this copying sounds like a pain, but given that a heap buffer will
already have undergone a copy operation on its entire array to bring it in
heap, and that copying may be done on demand with a direct buffer, there may
be an advantage. Its really hard to say without doing some empirical
testing.

Rupert

Re: Use of AMQShortString in client side code

Posted by Rafael Schloming <ra...@redhat.com>.

Robert Godfrey wrote:
> On 19/09/2007, Robert Greig <ro...@gmail.com> wrote:
>> On 19/09/2007, Rafael Schloming <ra...@redhat.com> wrote:
>>
>>> Yes, except 0-10 adds another level of this since frames are the unit
>>> you read off of the wire, and those can't be directly parsed without
>>> aggregating them.
>> Presumably there is a size so you can pull in all the chunks before
>> starting parsing?
>>
>>> So for the 0-10 transport code you either end up doing
>>> two full copies if you want to provide contiguous byte buffers to the
>>> codec, or you make the codec deal with non contiguous byte buffers. I
>>> chose the latter, and also stopped using the CumulativeProtocolDecoder
>>> as there was no need anymore.
>> That makes sense.
>>
>>> I have something similar to this. It's not an implementation of
>>> ByteBuffer, it's just a Decoder object that knows how to read primitive
>>> AMQP types out of a list of multiple ByteBuffers. It's a bit simpler
>>> than a full ByteBuffer impl, but it has much the same effect.
>> Fair enough, I think a full ByteBuffer like the above would be useful
>> for MINA though if you are feeling community-spirited.
>>
>>> You're right, in theory I could modify or subclass the MINA ByteBuffer
>>> to provide this functionality. That is something I would rather avoid
>>> doing though as to date I've managed to keep MINA dependencies quite
>>> isolated.
>> I think MINA should be able to move away from their ByteBuffer
>> wrapper, or at least make their ByteBuffer extend java.nio.ByteBuffer.
>> Maybe that is something we should modify and suggest to them since I
>> agree it is a pain to have MINA bytebuffers scattered around the
>> place.
>>
>>> I actually don't want to bother going through AMQShortString at all. If
>>> the user passes me a String the most efficient thing for me to do is
>>> encode that directly onto the wire.
>> OK, I buy that.
>>
>>> The other issue is that the generated API is at this point quite usable
>>> on its own, however usage of AMQShortString would make it unsuitable as
>>> a public API.
>> Of course (without having seen your API admittedly) only people who
>> have a burning desire to couple themselves to the protocol would want
>> to use such an API and presumably they would be happy to couple
>> themselves to AMQShortString too. If they weren't they could use the
>> One True API viz. "Extended AMQP JMS".
>>
>> :-)
>>
>>> So I'm forced to make something of a choice here, either
>>> generate code that is unusable as a public API, or generate code that is
>>> unusable by the broker because it is too slow.
>> To me, CharSequence is fine unless someone comes up with a method in
>> AMQShortString that needs to be there but isn't in CharSequence.
>>
>> RG
>>
> 
> 
> 
> What encoding has been defined for shortstrs in 0-10?  In particular do we
> know precisely how to encode a sequence of unicode characters (which is what
> a CharSequence is)?

This is what is in the spec. I think it's the same as 0-8:

       Short strings, stored as an 8-bit unsigned integer length
       followed by zero or more octets of data. Short strings can
       carry up to 255 octets of UTF-8 data, but may not contain
       binary zero octets.

So strictly speaking the current use of AMQShortString is actually a 
spec violation since it doesn't do any character conversion and so will 
only behave correctly if all the bytes happen to be less than 128.

> BTW I agree that we don't want to be going via a ByteBuffer if not
> necessary... One of the reasons of using AMQShortString was so that I could
> swap out the implementation at will.  Using a CharSequence interface (or
> similar) will lead us to code which does instanceof checks when we know that
> particular implementations can be used more efficiently.

Well this gets down to my prior question. Other than the decoding 
optimization (which doesn't require an instanceof) what code actually 
needs to know the difference?

Please note this isn't a rhetorical question, I would like to adjust the 
0-10 code generation to do something more suitable than just using 
String. So far it seems to me like the best option is to use the 
CharSequence interface and switch to a fast-path impl like 
AMQShortString when all the bytes happen to be in the right range, but 
if that will legitimately make the code unusable for 0-10 support inside 
the broker it would be good to know.

Alternatively I think it would also be reasonable to stick with String 
and pursue other optimization strategies, e.g. treat exchange-name and 
routing-key specially, or use tokenization.

--Rafael

Re: Use of AMQShortString in client side code

Posted by Robert Godfrey <ro...@gmail.com>.

On 19/09/2007, Robert Greig <ro...@gmail.com> wrote:
>
> On 19/09/2007, Rafael Schloming <ra...@redhat.com> wrote:
>
> > Yes, except 0-10 adds another level of this since frames are the unit
> > you read off of the wire, and those can't be directly parsed without
> > aggregating them.
>
> Presumably there is a size so you can pull in all the chunks before
> starting parsing?
>
> > So for the 0-10 transport code you either end up doing
> > two full copies if you want to provide contiguous byte buffers to the
> > codec, or you make the codec deal with non contiguous byte buffers. I
> > chose the latter, and also stopped using the CumulativeProtocolDecoder
> > as there was no need anymore.
>
> That makes sense.
>
> > I have something similar to this. It's not an implementation of
> > ByteBuffer, it's just a Decoder object that knows how to read primitive
> > AMQP types out of a list of multiple ByteBuffers. It's a bit simpler
> > than a full ByteBuffer impl, but it has much the same effect.
>
> Fair enough, I think a full ByteBuffer like the above would be useful
> for MINA though if you are feeling community-spirited.
>
> > You're right, in theory I could modify or subclass the MINA ByteBuffer
> > to provide this functionality. That is something I would rather avoid
> > doing though as to date I've managed to keep MINA dependencies quite
> > isolated.
>
> I think MINA should be able to move away from their ByteBuffer
> wrapper, or at least make their ByteBuffer extend java.nio.ByteBuffer.
> Maybe that is something we should modify and suggest to them since I
> agree it is a pain to have MINA bytebuffers scattered around the
> place.
>
> > I actually don't want to bother going through AMQShortString at all. If
> > the user passes me a String the most efficient thing for me to do is
> > encode that directly onto the wire.
>
> OK, I buy that.
>
> > The other issue is that the generated API is at this point quite usable
> > on its own, however usage of AMQShortString would make it unsuitable as
> > a public API.
>
> Of course (without having seen your API admittedly) only people who
> have a burning desire to couple themselves to the protocol would want
> to use such an API and presumably they would be happy to couple
> themselves to AMQShortString too. If they weren't they could use the
> One True API viz. "Extended AMQP JMS".
>
> :-)
>
> > So I'm forced to make something of a choice here, either
> > generate code that is unusable as a public API, or generate code that is
> > unusable by the broker because it is too slow.
>
> To me, CharSequence is fine unless someone comes up with a method in
> AMQShortString that needs to be there but isn't in CharSequence.
>
> RG
>



What encoding has been defined for shortstrs in 0-10?  In particular do we
know precisely how to encode a sequence of unicode characters (which is what
a CharSequence is)?

BTW I agree that we don't want to be going via a ByteBuffer if not
necessary... One of the reasons of using AMQShortString was so that I could
swap out the implementation at will.  Using a CharSequence interface (or
similar) will lead us to code which does instanceof checks when we know that
particular implementations can be used more efficiently.


-- Rob

Re: Use of AMQShortString in client side code

Posted by Robert Greig <ro...@gmail.com>.

On 19/09/2007, Rafael Schloming <ra...@redhat.com> wrote:

> Yes, except 0-10 adds another level of this since frames are the unit
> you read off of the wire, and those can't be directly parsed without
> aggregating them.

Presumably there is a size so you can pull in all the chunks before
starting parsing?

> So for the 0-10 transport code you either end up doing
> two full copies if you want to provide contiguous byte buffers to the
> codec, or you make the codec deal with non contiguous byte buffers. I
> chose the latter, and also stopped using the CumulativeProtocolDecoder
> as there was no need anymore.

That makes sense.

> I have something similar to this. It's not an implementation of
> ByteBuffer, it's just a Decoder object that knows how to read primitive
> AMQP types out of a list of multiple ByteBuffers. It's a bit simpler
> than a full ByteBuffer impl, but it has much the same effect.

Fair enough, I think a full ByteBuffer like the above would be useful
for MINA though if you are feeling community-spirited.

> You're right, in theory I could modify or subclass the MINA ByteBuffer
> to provide this functionality. That is something I would rather avoid
> doing though as to date I've managed to keep MINA dependencies quite
> isolated.

I think MINA should be able to move away from their ByteBuffer
wrapper, or at least make their ByteBuffer extend java.nio.ByteBuffer.
Maybe that is something we should modify and suggest to them since I
agree it is a pain to have MINA bytebuffers scattered around the
place.

> I actually don't want to bother going through AMQShortString at all. If
> the user passes me a String the most efficient thing for me to do is
> encode that directly onto the wire.

OK, I buy that.

> The other issue is that the generated API is at this point quite usable
> on its own, however usage of AMQShortString would make it unsuitable as
> a public API.

Of course (without having seen your API admittedly) only people who
have a burning desire to couple themselves to the protocol would want
to use such an API and presumably they would be happy to couple
themselves to AMQShortString too. If they weren't they could use the
One True API viz. "Extended AMQP JMS".

:-)

> So I'm forced to make something of a choice here, either
> generate code that is unusable as a public API, or generate code that is
> unusable by the broker because it is too slow.

To me, CharSequence is fine unless someone comes up with a method in
AMQShortString that needs to be there but isn't in CharSequence.

RG

Re: Use of AMQShortString in client side code

Posted by Rafael Schloming <ra...@redhat.com>.

Rupert Smith wrote:
> Rafael,
> 
> Is there much difference between your decoder that works on multiple byte
> buffers, and a decoder that works on a byte buffer but that could be passed
> a special aggregating byte buffer that pulls together several byte buffers
> and presents them as if they are one buffer? That is, do you think the
> latter option could be just as efficient as the former one?
> 
> The aggregating byte buffer concept sounds like a very useful re-usable
> building block, and conceptually neater than a special decoder that works on
> many buffers; it can simply be substituted in wherever a ByteBuffer can be
> used. I'd like to pull it out of the code, and write a few tests to exercise
> it, as an experiment.
> 
> Writing out the message, from an aggregated list of byte buffers, sounds
> like it can be done efficiently as a gathering write.

An aggregating byte buffer sounds nice, but I don't think it would work 
so well in practice. Despite not being final, the ByteBuffer class isn't 
actually designed to be extended. It contains the internal 
representation for heap buffers, and it is package visible to the other 
nio classes which directly access it. Since an aggregating ByteBuffer 
wouldn't use that, it could break in strange ways when interacting with 
other classes.

Also, performance-wise an aggregating ByteBuffer probably wouldn't do so 
well since you'd have to return false/null from hasArray()/array(), and 
so you'd nullify the benefit from bulk get/put operations.

And finally it would be a lot of work since you'd need to also create 
specialized implementations of all the view classes (e.g. 
asCharBuffer(), asShortBuffer(), etc), and that still wouldn't guarantee 
that your impl would work correctly with the rest of the nio package.

With my own FragmentDecoder it's a fairly simple piece of code, and it's 
easy for me to continue to use bulk get/put operations. The class is 
org.apache.qpidity.codec.FragmentDecoder (in common) if you want to have 
a look.

--Rafael

Re: Use of AMQShortString in client side code

Posted by Rupert Smith <ru...@googlemail.com>.

Rafael,

Is there much difference between your decoder that works on multiple byte
buffers, and a decoder that works on a byte buffer but that could be passed
a special aggregating byte buffer that pulls together several byte buffers
and presents them as if they are one buffer? That is, do you think the
latter option could be just as efficient as the former one?

The aggregating byte buffer concept sounds like a very useful re-usable
building block, and conceptually neater than a special decoder that works on
many buffers; it can simply be substituted in wherever a ByteBuffer can be
used. I'd like to pull it out of the code, and write a few tests to exercise
it, as an experiment.

Writing out the message, from an aggregated list of byte buffers, sounds
like it can be done efficiently as a gathering write.

Rupert

On 19/09/2007, Rafael Schloming <ra...@redhat.com> wrote:
>
> Yes, except 0-10 adds another level of this since frames are the unit
> you read off of the wire, and those can't be directly parsed without
> aggregating them. So for the 0-10 transport code you either end up doing
> two full copies if you want to provide contiguous byte buffers to the
> codec, or you make the codec deal with non contiguous byte buffers. I
> chose the latter, and also stopped using the CumulativeProtocolDecoder
> as there was no need anymore.
>
> > I have discussed in the past using a special implementation of
> > ByteBuffer that can aggregate sub byte buffers to avoid copying (that
> > is currently done in the CumulativeProtocolDecoder or something like
> > that).
>
> I have something similar to this. It's not an implementation of
> ByteBuffer, it's just a Decoder object that knows how to read primitive
> AMQP types out of a list of multiple ByteBuffers. It's a bit simpler
> than a full ByteBuffer impl, but it has much the same effect.

Re: Use of AMQShortString in client side code

Posted by Rafael Schloming <ra...@redhat.com>.

Robert Greig wrote:
> On 19/09/2007, Rafael Schloming <ra...@redhat.com> wrote:
> 
>> At a minimum it would be nice if FieldTable implemented the Map
>> interface and had a copy constructor that took a regular Map.
> 
> In fact FieldTable used to be a subclass of HashMap (or AbstractMap or
> something like that). I cannot remember why that was changed now - Rob
> can you recall?
> 
>> This analysis gets a little more complicated in 0-10. In 0-10 a frame
>> boundary can appear anywhere, so it isn't possible to simply wrap the
>> bytes read in off the wire in an AMQShortString object and pass that up.
> 
> In a sense we have this today (if I understand you correctly) since
> you may only read a partial frame off the wire. That is a lower level
> obviously but is AFAICS exactly the same problem.

Yes, except 0-10 adds another level of this since frames are the unit 
you read off of the wire, and those can't be directly parsed without 
aggregating them. So for the 0-10 transport code you either end up doing 
two full copies if you want to provide contiguous byte buffers to the 
codec, or you make the codec deal with non contiguous byte buffers. I 
chose the latter, and also stopped using the CumulativeProtocolDecoder 
as there was no need anymore.

> I have discussed in the past using a special implementation of
> ByteBuffer that can aggregate sub byte buffers to avoid copying (that
> is currently done in the CumulativeProtocolDecoder or something like
> that).

I have something similar to this. It's not an implementation of 
ByteBuffer, it's just a Decoder object that knows how to read primitive 
AMQP types out of a list of multiple ByteBuffers. It's a bit simpler 
than a full ByteBuffer impl, but it has much the same effect.

>> You either need to copy the bytes into a newly allocated ByteBuffer, or
>> have a more sophisticated AMQShortString implementation that operates on
>> a list of fragments. Obviously it is impossible to do the latter if
>> AMQShortString is used directly throughout the code.
> 
> I don't follow the last sentence? AMQShortString would contain a
> subclass of ByteBuffer that under the hood was really multiple
> java.nio.ByteBuffer slices.

You're right, in theory I could modify or subclass the MINA ByteBuffer 
to provide this functionality. That is something I would rather avoid 
doing though as to date I've managed to keep MINA dependencies quite 
isolated.

>> I actually do want to expose a non JMS interface that doesn't use
>> AMQShortString. There is a lot of ground between providing an AMQP
>> specific messaging API and providing an AMQP specific String API.
> 
> And you want to postpone the creation of an AMQShortString? Your API
> layer could presumably hide that from the user of your API in the same
> way JMS does?

I actually don't want to bother going through AMQShortString at all. If 
the user passes me a String the most efficient thing for me to do is 
encode that directly onto the wire.

> Why does postponing the creation of the AMQShortString help? Are there
> cases where you might choose not to encode the string into a
> bytebuffer? Or are you avoiding the case of copying into a bytebuffer
> then copying that buffer into the buffer that contains the whole
> frame?

Yes, I'm avoiding that extra copy.

The other issue is that the generated API is at this point quite usable 
on its own, however usage of AMQShortString would make it unsuitable as 
a public API. So I'm forced to make something of a choice here, either 
generate code that is unusable as a public API, or generate code that is 
unusable by the broker because it is too slow.

That's why I'm trying to figure out if there are any options that 
satisfy both constraints, e.g. using CharSequence. It would make the 
generated code a whole lot more usable as a public API since on the 
input side you could just pass in a String directly without wrapping it, 
and it would permit the same optimized decoding that AMQShortString 
does. Obviously on the output side if you actually needed a concrete 
String you would still need to call toString(), but many of the java 
APIs will let you pass in a CharSequence anyways.

--Rafael

Re: Use of AMQShortString in client side code

Posted by Robert Greig <ro...@gmail.com>.

On 19/09/2007, Rafael Schloming <ra...@redhat.com> wrote:

> At a minimum it would be nice if FieldTable implemented the Map
> interface and had a copy constructor that took a regular Map.

In fact FieldTable used to be a subclass of HashMap (or AbstractMap or
something like that). I cannot remember why that was changed now - Rob
can you recall?

> This analysis gets a little more complicated in 0-10. In 0-10 a frame
> boundary can appear anywhere, so it isn't possible to simply wrap the
> bytes read in off the wire in an AMQShortString object and pass that up.

In a sense we have this today (if I understand you correctly) since
you may only read a partial frame off the wire. That is a lower level
obviously but is AFAICS exactly the same problem.

I have discussed in the past using a special implementation of
ByteBuffer that can aggregate sub byte buffers to avoid copying (that
is currently done in the CumulativeProtocolDecoder or something like
that).

> You either need to copy the bytes into a newly allocated ByteBuffer, or
> have a more sophisticated AMQShortString implementation that operates on
> a list of fragments. Obviously it is impossible to do the latter if
> AMQShortString is used directly throughout the code.

I don't follow the last sentence? AMQShortString would contain a
subclass of ByteBuffer that under the hood was really multiple
java.nio.ByteBuffer slices.

> I actually do want to expose a non JMS interface that doesn't use
> AMQShortString. There is a lot of ground between providing an AMQP
> specific messaging API and providing an AMQP specific String API.

And you want to postpone the creation of an AMQShortString? Your API
layer could presumably hide that from the user of your API in the same
way JMS does?

Why does postponing the creation of the AMQShortString help? Are there
cases where you might choose not to encode the string into a
bytebuffer? Or are you avoiding the case of copying into a bytebuffer
then copying that buffer into the buffer that contains the whole
frame?

RG

Re: Use of AMQShortString in client side code

Posted by Rafael Schloming <ra...@redhat.com>.

Robert Godfrey wrote:
> OK - let's try to examine this in detail...
> 
> The objection to using AMQShortString is, I presume, as much aesthetic as
> anything else.  We want to provide through the Java API a type which is
> natural to the Java programmer.  [As an aside, I presume therefor that where
> the argument is a Field Table you wish to allow the programmer to pass in
> any valid Map implementation].

At a minimum it would be nice if FieldTable implemented the Map 
interface and had a copy constructor that took a regular Map.

> AMQShortString (which should have a length <= 255 check in it, I admit)
> represents the type that is sent down the wire.  In the same way the
> FieldTable class represents what an AMQPFieldTable is.  Where the object
> never escapes the qpid library we do not want to convert this into a String
> (nor convert our field table into a general purpose map).  Only at the
> interafce between our internal library and the calling application do we
> have an issue.   Here we have a choice

This analysis gets a little more complicated in 0-10. In 0-10 a frame 
boundary can appear anywhere, so it isn't possible to simply wrap the 
bytes read in off the wire in an AMQShortString object and pass that up. 
You either need to copy the bytes into a newly allocated ByteBuffer, or 
have a more sophisticated AMQShortString implementation that operates on 
a list of fragments. Obviously it is impossible to do the latter if 
AMQShortString is used directly throughout the code.

> 1) we can always hide our internals and present an interface that simply
> uses the Java types
> 2) we can have an API which is purely in terms of our AMQP domain objects
> and makes no concession to Java idioms
> 3) we can present an interface that explicitly offers the programmer the
> choice
> 4) we can present an interface defined in terms of common interfaces that
> both the AMQP domain types and the Java types implement
> 
> 
> If you believe 1) the you should necessarily also believe that we should
> only be producing a JMS interface as you have no desire to program to AMQP
> but only to Java conventions :-)

I actually do want to expose a non JMS interface that doesn't use 
AMQShortString. There is a lot of ground between providing an AMQP 
specific messaging API and providing an AMQP specific String API.

> 2) is ruled out by our need to implement JMS
> 
> 3) requires double the number of methods in the interface
> 
> 4) requires the library to be full of code such as
> 
> if(parameter instanceof AMQShortString)
> {
> ...
> }
> else
> {
> ...
> }

Why would the library need to be full of code like this? The interface 
AMQShortString presents is just a CharSequence. The only place that 
would need to know the difference is at the point of encoding/decoding. 
When decoding we would always construct the more efficient impl of 
CharSequence that directly wraps a ByteBuffer or a list of fragments as 
the case may be. When encoding we could just use the CharSequence 
interface since you need to copy anyways, or if there was some 
optimization necessary we could dispatch to type specific codecs.

> Personally I'd be more inclined to go for option 3 because it makes explicit
> the options that the programmer has.
> 
> There should be no need to convert the short string into a string multiple
> times.

If I knew what the choice was I might be inclined to agree, however it 
seems like the choice we'd be offering is between one concrete impl of 
CharSequence and another concrete impl of CharSequence. Am I missing 
some aspect of the AMQShortString interface that we depend on that isn't 
part of the CharSequence interface?

--Rafael

Re: Use of AMQShortString in client side code

Posted by Robert Godfrey <ro...@gmail.com>.

OK - let's try to examine this in detail...

The objection to using AMQShortString is, I presume, as much aesthetic as
anything else.  We want to provide through the Java API a type which is
natural to the Java programmer.  [As an aside, I presume therefor that where
the argument is a Field Table you wish to allow the programmer to pass in
any valid Map implementation].

AMQShortString (which should have a length <= 255 check in it, I admit)
represents the type that is sent down the wire.  In the same way the
FieldTable class represents what an AMQPFieldTable is.  Where the object
never escapes the qpid library we do not want to convert this into a String
(nor convert our field table into a general purpose map).  Only at the
interafce between our internal library and the calling application do we
have an issue.   Here we have a choice

1) we can always hide our internals and present an interface that simply
uses the Java types
2) we can have an API which is purely in terms of our AMQP domain objects
and makes no concession to Java idioms
3) we can present an interface that explicitly offers the programmer the
choice
4) we can present an interface defined in terms of common interfaces that
both the AMQP domain types and the Java types implement


If you believe 1) the you should necessarily also believe that we should
only be producing a JMS interface as you have no desire to program to AMQP
but only to Java conventions :-)

2) is ruled out by our need to implement JMS

3) requires double the number of methods in the interface

4) requires the library to be full of code such as

if(parameter instanceof AMQShortString)
{
...
}
else
{
...
}


Personally I'd be more inclined to go for option 3 because it makes explicit
the options that the programmer has.

There should be no need to convert the short string into a string multiple
times.

Hope this helps,
Rob

Re: Use of AMQShortString in client side code

Posted by John O'Hara <jo...@gmail.com>.

Its also a space optimisation on the wire for when we cared about that...
for high volume messaging esp with TCP/IP and serial unicast those bytes
start to matter.

As for Tokenising, to support the notion of trivial clients the idea was to
let the client assert which short strings were tokenized; usually relating
to routing keys which could repeat a lot.

The original argument you made was to do with typing.
You point out that AMQPShortString has benefits for the broker.
It would also make sense to have that symetry in the client.

There seems to be no compelling reason to change this.
It's late and I'm tired so I won't go on more

G'd night
John


On 19/09/2007, Rafael Schloming <ra...@redhat.com> wrote:
>
> John O'Hara wrote:
> > I agree with Rob that the lower levels of the stack should be
> implemented in
> > AMQPShortString *where it occurs in the protocol* for the following
> reasons:
> >
> > 1) It provides the opportunity to validate the semantics; just because
> we're
> > not checking length today doesn't mean we shouldn't
>
> AMQShortString really isn't the appropriate place to validate domain
> level semantics. Different uses of shortstr have different domain level
> constraints. Also, any validation we put in AMQShortString is forced to
> run for every single shortstr field that passes through a broker. This
> isn't particularly useful because when decoding fields off the wire,
> such validation is unnecessary as it is already performed by the codec
> in a more efficient manner that is specialized to the wire format.
>
> > 2) We may introduce AMQPShortStrong Tokenisation in the protocol in the
> > future (has been discussed often, I think it's quite likely).  Doing
> this we
> > can collapse a shortstring to 2 bytes and reduce garbage.
>
> I presume you're referring to some scheme for caching commonly used
> strings? If so this is a decoding optimization that would equally well
> apply when decoding directly to Strings, or any other type for that
> matter. In fact such an optimization would likely nullify any
> performance advantage rendered by AMQShortString since decoding/encoding
> of anything would only be necessary when there is a cache miss.
>
> > 3) I'm unsure of the memory ownership semantics but I believe the JMS
> spec
> > explicitly requires a copy of the message to be take to prevent grim
> race
> > conditions on message reuse.  Some products have the option to turn this
> > off, but that's not the spec.  It's like not DMA'ing from userspace
> without
> > extreme care.
>
> I'm unsure how this relates to the use of AMQShortString. Any such
> copying would happen well past the point where raw types are decoded off
> the wire.
>
> > Also, Rob has said it has been proven to be faster in the past.
> > In the absence of a measured, demonstrable issue why change this
> arguably
> > more correct implementation?
>
> As it stands today AMQShortString is really just an optimization for the
> broker, and one that comes at a pretty high cost to the client. So if
> there is a better way to solve the performance issue for the broker
> without encumbering the client, it's certainly worth investigating.
>
> That's why I asked about the original problem being solved. For example
> I'd guess that in the critical path the broker really never needs to
> decode much more than the exchange name and routing key in order to
> deliver a message, so it might be possible to limit the use of
> AMQShortString to just those fields (or decode to specific Exchange and
> RoutingKey classes) and get the necessary performance benefit in the
> broker, with much less impact on the client.
>
> --Rafael
>
> > Cheers
> > John
> >
> >
> > On 19/09/2007, Rafael Schloming <ra...@redhat.com> wrote:
> >> Robert Godfrey wrote:
> >>> On 13/09/2007, Rajith Attapattu <ra...@gmail.com> wrote:
> >>>> I am wondering why we are using AMQShortString indiscriminately all
> >> over
> >>>> the
> >>>> client side code?
> >>>> There is no performance benefit of using AMQShortString (based on the
> >> way
> >>>> it
> >>>> is used) on the client side and is purely used for encoding.
> >>>
> >>>
> >>> Rajith,
> >>>
> >>> as we have discussed before - there *is* a significant performance
> >> benefit
> >>> which we have tested and proved previously.
> >> Can you point me to the previous discussion? I'd like to learn more
> >> about the original issue.
> >>
> >>    Many short strings are re-used
> >>> frequently within the client library, and by using our own type we can
> >>> exploit this.
> >> Unless we're excessively copying them I don't see how this matters. For
> >> both an AMQShortString and a String we should just be passing around
> >> pointers when they are reused.
> >>
> >>    Further, the domain for many parameters in AMQP is *not* a
> >>> unicode string, but is tightly defined as upto 255 bytes of data with
> a
> >>> particular encoding.  Java Strings are not the appropriate type to use
> >> for
> >>> this.  Encoding and decoding Java Strings is expensive, and also prone
> >> to
> >>> error (i.e. you need to make sure that you *always* use the correct
> >> explicit
> >>> encoding).
> >> Despite the name AMQShortString, I don't think the AMQShortString class
> >> actually represents the AMQP type short-string, for example there is no
> >> length limit for an AMQShortString. It's really just a generic
> >> implementation of CharSequence that is optimized specifically for rapid
> >> decoding from a ByteBuffer. From a domain restriction perspective,
> using
> >> an ordinary String is just as correct.
> >>
> >>> It makes sense to use it on Broker side as you deal at bytes level and
> I
> >> can
> >>>> understand the performance benefit of not having convert back and
> forth
> >>>> into
> >>>> a String.
> >>>
> >>> The low level API should be using correct AMQ domains.  High level
> APIs
> >>> (such as JMS) will obviously want to present these parameters as java
> >>> Strings.
> >>>
> >>>
> >>> On the client side we just merely wrap/unwrap a String using
> >> AMQShortString.
> >>>> Why can't we do that at the encoding/decoding level for the client
> side
> >> ?
> >>>
> >>> In some cases this may be true, but in others certainly not.  When
> >>> converting into JMS Destinations on receipt of a message, for
> instance,
> >> one
> >>> never needs to convert to a String... it is *much* faster to simply
> use
> >> the
> >>> correct type of AMQShortString/
> >> Unfortunately using AMQShortString imposes additional overhead whenever
> >> we need to en/decode to/from an ordinary String. It basically requires
> >> an additional copy when compared with directly encoding/decoding
> to/from
> >>   a String. As the common case on the client side is dealing with
> >> Strings, I'm not at all convinced that ubiquitous use of AMQShortString
> >> is a net win for the client.
> >>
> >> I believe what would be optimal is to use the CharSequence interface
> >> everywhere. This way String values passed to us by an application could
> >> be directly passed all the way down the stack and encoded directly onto
> >> the wire without an additional copy, and incoming data could be
> >> efficiently decoded into a private impl of CharSequence that could be
> >> converted to a String on demand.
> >>
> >> --Rafael
> >>
> >
>

Re: Use of AMQShortString in client side code

Posted by Robert Greig <ro...@gmail.com>.

On 19/09/2007, Rafael Schloming <ra...@redhat.com> wrote:

> Sorry, my question really is, given that AMQShortString already
> implements CharSequence, why do we need to use AMQShortString all over
> the code rather than the CharSequence interface?

I'm not sure. I'm inclined to say we could use CharSequence but I
don't recall whether there is a dependency (either now or planned) on
extra methods not in CharSequence. Rob can probably give a better
answer than I on this point.

RG

Re: Use of AMQShortString in client side code

Posted by Rafael Schloming <ra...@redhat.com>.

Robert Greig wrote:
> On 19/09/2007, Rafael Schloming <ra...@redhat.com> wrote:
> 
>> That is what I meant by converting to a String "on demand."
> 
> It was the use of conditional tense that threw me since that is what
> we do already.

Sorry, my question really is, given that AMQShortString already 
implements CharSequence, why do we need to use AMQShortString all over 
the code rather than the CharSequence interface?

--Rafael

Re: Use of AMQShortString in client side code

Posted by Robert Greig <ro...@gmail.com>.

On 19/09/2007, Rafael Schloming <ra...@redhat.com> wrote:

> That is what I meant by converting to a String "on demand."

It was the use of conditional tense that threw me since that is what
we do already.

RG

Re: Use of AMQShortString in client side code

Posted by Rafael Schloming <ra...@redhat.com>.

Robert Greig wrote:
>>>> I believe what would be optimal is to use the CharSequence interface
>>>> everywhere. This way String values passed to us by an application could
>>>> be directly passed all the way down the stack and encoded directly onto
>>>> the wire without an additional copy, and incoming data could be
>>>> efficiently decoded into a private impl of CharSequence that could be
>>>> converted to a String on demand.
> 
> It already implements CharSequence but in the client implementation I
> don't see why using that interface is particularly useful.
> 
> Anyway there are sound performance reasons even in the client for
> using AMQShortString to wrap an underlying buffer slice since it means
> we don't have to decode things up front.
> 
> Consider a client reading a message a message which has say 10 string
> message properties (JMS properties). These are encoded in a field
> table in AMQP. The client app may or may not use these properties - in
> my experience you generally don't need all of them particularly if the
> message is generic. By using the short string you can lazily decode
> only when the client app asks for a property value.

That is what I meant by converting to a String "on demand."

--Rafael

Re: Use of AMQShortString in client side code

Posted by Rupert Smith <ru...@googlemail.com>.

Also, consider the example of the low-level back to back client, that moves
messages between brokers. It wants to do as little decoding as possible.
Good reason to have lazy decoding at least at the low-level API.

Rupert

On 19/09/2007, Robert Greig <ro...@gmail.com> wrote:
>
> > >> I believe what would be optimal is to use the CharSequence interface
> > >> everywhere. This way String values passed to us by an application
> could
> > >> be directly passed all the way down the stack and encoded directly
> onto
> > >> the wire without an additional copy, and incoming data could be
> > >> efficiently decoded into a private impl of CharSequence that could be
> > >> converted to a String on demand.
>
> It already implements CharSequence but in the client implementation I
> don't see why using that interface is particularly useful.
>
> Anyway there are sound performance reasons even in the client for
> using AMQShortString to wrap an underlying buffer slice since it means
> we don't have to decode things up front.
>
> Consider a client reading a message a message which has say 10 string
> message properties (JMS properties). These are encoded in a field
> table in AMQP. The client app may or may not use these properties - in
> my experience you generally don't need all of them particularly if the
> message is generic. By using the short string you can lazily decode
> only when the client app asks for a property value.
>
> RG
>

Re: Use of AMQShortString in client side code

Posted by Robert Greig <ro...@gmail.com>.

> >> I believe what would be optimal is to use the CharSequence interface
> >> everywhere. This way String values passed to us by an application could
> >> be directly passed all the way down the stack and encoded directly onto
> >> the wire without an additional copy, and incoming data could be
> >> efficiently decoded into a private impl of CharSequence that could be
> >> converted to a String on demand.

It already implements CharSequence but in the client implementation I
don't see why using that interface is particularly useful.

Anyway there are sound performance reasons even in the client for
using AMQShortString to wrap an underlying buffer slice since it means
we don't have to decode things up front.

Consider a client reading a message a message which has say 10 string
message properties (JMS properties). These are encoded in a field
table in AMQP. The client app may or may not use these properties - in
my experience you generally don't need all of them particularly if the
message is generic. By using the short string you can lazily decode
only when the client app asks for a property value.

RG

Re: Use of AMQShortString in client side code

Posted by Rafael Schloming <ra...@redhat.com>.

John O'Hara wrote:
> I agree with Rob that the lower levels of the stack should be implemented in
> AMQPShortString *where it occurs in the protocol* for the following reasons:
> 
> 1) It provides the opportunity to validate the semantics; just because we're
> not checking length today doesn't mean we shouldn't

AMQShortString really isn't the appropriate place to validate domain 
level semantics. Different uses of shortstr have different domain level 
constraints. Also, any validation we put in AMQShortString is forced to 
run for every single shortstr field that passes through a broker. This 
isn't particularly useful because when decoding fields off the wire, 
such validation is unnecessary as it is already performed by the codec 
in a more efficient manner that is specialized to the wire format.

> 2) We may introduce AMQPShortStrong Tokenisation in the protocol in the
> future (has been discussed often, I think it's quite likely).  Doing this we
> can collapse a shortstring to 2 bytes and reduce garbage.

I presume you're referring to some scheme for caching commonly used 
strings? If so this is a decoding optimization that would equally well 
apply when decoding directly to Strings, or any other type for that 
matter. In fact such an optimization would likely nullify any 
performance advantage rendered by AMQShortString since decoding/encoding 
of anything would only be necessary when there is a cache miss.

> 3) I'm unsure of the memory ownership semantics but I believe the JMS spec
> explicitly requires a copy of the message to be take to prevent grim race
> conditions on message reuse.  Some products have the option to turn this
> off, but that's not the spec.  It's like not DMA'ing from userspace without
> extreme care.

I'm unsure how this relates to the use of AMQShortString. Any such 
copying would happen well past the point where raw types are decoded off 
the wire.

> Also, Rob has said it has been proven to be faster in the past.
> In the absence of a measured, demonstrable issue why change this arguably
> more correct implementation?

As it stands today AMQShortString is really just an optimization for the 
broker, and one that comes at a pretty high cost to the client. So if 
there is a better way to solve the performance issue for the broker 
without encumbering the client, it's certainly worth investigating.

That's why I asked about the original problem being solved. For example 
I'd guess that in the critical path the broker really never needs to 
decode much more than the exchange name and routing key in order to 
deliver a message, so it might be possible to limit the use of 
AMQShortString to just those fields (or decode to specific Exchange and 
RoutingKey classes) and get the necessary performance benefit in the 
broker, with much less impact on the client.

--Rafael

> Cheers
> John
> 
> 
> On 19/09/2007, Rafael Schloming <ra...@redhat.com> wrote:
>> Robert Godfrey wrote:
>>> On 13/09/2007, Rajith Attapattu <ra...@gmail.com> wrote:
>>>> I am wondering why we are using AMQShortString indiscriminately all
>> over
>>>> the
>>>> client side code?
>>>> There is no performance benefit of using AMQShortString (based on the
>> way
>>>> it
>>>> is used) on the client side and is purely used for encoding.
>>>
>>>
>>> Rajith,
>>>
>>> as we have discussed before - there *is* a significant performance
>> benefit
>>> which we have tested and proved previously.
>> Can you point me to the previous discussion? I'd like to learn more
>> about the original issue.
>>
>>    Many short strings are re-used
>>> frequently within the client library, and by using our own type we can
>>> exploit this.
>> Unless we're excessively copying them I don't see how this matters. For
>> both an AMQShortString and a String we should just be passing around
>> pointers when they are reused.
>>
>>    Further, the domain for many parameters in AMQP is *not* a
>>> unicode string, but is tightly defined as upto 255 bytes of data with a
>>> particular encoding.  Java Strings are not the appropriate type to use
>> for
>>> this.  Encoding and decoding Java Strings is expensive, and also prone
>> to
>>> error (i.e. you need to make sure that you *always* use the correct
>> explicit
>>> encoding).
>> Despite the name AMQShortString, I don't think the AMQShortString class
>> actually represents the AMQP type short-string, for example there is no
>> length limit for an AMQShortString. It's really just a generic
>> implementation of CharSequence that is optimized specifically for rapid
>> decoding from a ByteBuffer. From a domain restriction perspective, using
>> an ordinary String is just as correct.
>>
>>> It makes sense to use it on Broker side as you deal at bytes level and I
>> can
>>>> understand the performance benefit of not having convert back and forth
>>>> into
>>>> a String.
>>>
>>> The low level API should be using correct AMQ domains.  High level APIs
>>> (such as JMS) will obviously want to present these parameters as java
>>> Strings.
>>>
>>>
>>> On the client side we just merely wrap/unwrap a String using
>> AMQShortString.
>>>> Why can't we do that at the encoding/decoding level for the client side
>> ?
>>>
>>> In some cases this may be true, but in others certainly not.  When
>>> converting into JMS Destinations on receipt of a message, for instance,
>> one
>>> never needs to convert to a String... it is *much* faster to simply use
>> the
>>> correct type of AMQShortString/
>> Unfortunately using AMQShortString imposes additional overhead whenever
>> we need to en/decode to/from an ordinary String. It basically requires
>> an additional copy when compared with directly encoding/decoding to/from
>>   a String. As the common case on the client side is dealing with
>> Strings, I'm not at all convinced that ubiquitous use of AMQShortString
>> is a net win for the client.
>>
>> I believe what would be optimal is to use the CharSequence interface
>> everywhere. This way String values passed to us by an application could
>> be directly passed all the way down the stack and encoded directly onto
>> the wire without an additional copy, and incoming data could be
>> efficiently decoded into a private impl of CharSequence that could be
>> converted to a String on demand.
>>
>> --Rafael
>>
>

Re: Use of AMQShortString in client side code

Posted by John O'Hara <jo...@gmail.com>.

I agree with Rob that the lower levels of the stack should be implemented in
AMQPShortString *where it occurs in the protocol* for the following reasons:

1) It provides the opportunity to validate the semantics; just because we're
not checking length today doesn't mean we shouldn't
2) We may introduce AMQPShortStrong Tokenisation in the protocol in the
future (has been discussed often, I think it's quite likely).  Doing this we
can collapse a shortstring to 2 bytes and reduce garbage.
3) I'm unsure of the memory ownership semantics but I believe the JMS spec
explicitly requires a copy of the message to be take to prevent grim race
conditions on message reuse.  Some products have the option to turn this
off, but that's not the spec.  It's like not DMA'ing from userspace without
extreme care.

Also, Rob has said it has been proven to be faster in the past.
In the absence of a measured, demonstrable issue why change this arguably
more correct implementation?

Cheers
John


On 19/09/2007, Rafael Schloming <ra...@redhat.com> wrote:
>
> Robert Godfrey wrote:
> > On 13/09/2007, Rajith Attapattu <ra...@gmail.com> wrote:
> >> I am wondering why we are using AMQShortString indiscriminately all
> over
> >> the
> >> client side code?
> >> There is no performance benefit of using AMQShortString (based on the
> way
> >> it
> >> is used) on the client side and is purely used for encoding.
> >
> >
> >
> > Rajith,
> >
> > as we have discussed before - there *is* a significant performance
> benefit
> > which we have tested and proved previously.
>
> Can you point me to the previous discussion? I'd like to learn more
> about the original issue.
>
>    Many short strings are re-used
> > frequently within the client library, and by using our own type we can
> > exploit this.
>
> Unless we're excessively copying them I don't see how this matters. For
> both an AMQShortString and a String we should just be passing around
> pointers when they are reused.
>
>    Further, the domain for many parameters in AMQP is *not* a
> > unicode string, but is tightly defined as upto 255 bytes of data with a
> > particular encoding.  Java Strings are not the appropriate type to use
> for
> > this.  Encoding and decoding Java Strings is expensive, and also prone
> to
> > error (i.e. you need to make sure that you *always* use the correct
> explicit
> > encoding).
>
> Despite the name AMQShortString, I don't think the AMQShortString class
> actually represents the AMQP type short-string, for example there is no
> length limit for an AMQShortString. It's really just a generic
> implementation of CharSequence that is optimized specifically for rapid
> decoding from a ByteBuffer. From a domain restriction perspective, using
> an ordinary String is just as correct.
>
> > It makes sense to use it on Broker side as you deal at bytes level and I
> can
> >> understand the performance benefit of not having convert back and forth
> >> into
> >> a String.
> >
> >
> > The low level API should be using correct AMQ domains.  High level APIs
> > (such as JMS) will obviously want to present these parameters as java
> > Strings.
> >
> >
> > On the client side we just merely wrap/unwrap a String using
> AMQShortString.
> >> Why can't we do that at the encoding/decoding level for the client side
> ?
> >
> >
> > In some cases this may be true, but in others certainly not.  When
> > converting into JMS Destinations on receipt of a message, for instance,
> one
> > never needs to convert to a String... it is *much* faster to simply use
> the
> > correct type of AMQShortString/
>
> Unfortunately using AMQShortString imposes additional overhead whenever
> we need to en/decode to/from an ordinary String. It basically requires
> an additional copy when compared with directly encoding/decoding to/from
>   a String. As the common case on the client side is dealing with
> Strings, I'm not at all convinced that ubiquitous use of AMQShortString
> is a net win for the client.
>
> I believe what would be optimal is to use the CharSequence interface
> everywhere. This way String values passed to us by an application could
> be directly passed all the way down the stack and encoded directly onto
> the wire without an additional copy, and incoming data could be
> efficiently decoded into a private impl of CharSequence that could be
> converted to a String on demand.
>
> --Rafael
>

Re: Use of AMQShortString in client side code

Posted by Rajith Attapattu <ra...@gmail.com>.

>As the common case on the client side is dealing with Strings, I'm not at
all convinced that ubiquitous use of AMQShortString
> is a net win for the client.

It is exactly my point.
Also as I mentioned we sometimes do the asString() method in logging etc
which does the String encoding every time.
Using Strings or CharSequence is fine.

Rajith


> > On 13/09/2007, Rajith Attapattu <ra...@gmail.com> wrote:
> >> I am wondering why we are using AMQShortString indiscriminately all
> over
> >> the
> >> client side code?
> >> There is no performance benefit of using AMQShortString (based on the
> way
> >> it
> >> is used) on the client side and is purely used for encoding.
> >
> >
> >
> > Rajith,
> >
> > as we have discussed before - there *is* a significant performance
> benefit
> > which we have tested and proved previously.
>
> Can you point me to the previous discussion? I'd like to learn more
> about the original issue.
>
>    Many short strings are re-used
> > frequently within the client library, and by using our own type we can
> > exploit this.
>
> Unless we're excessively copying them I don't see how this matters. For
> both an AMQShortString and a String we should just be passing around
> pointers when they are reused.
>
>    Further, the domain for many parameters in AMQP is *not* a
> > unicode string, but is tightly defined as upto 255 bytes of data with a
> > particular encoding.  Java Strings are not the appropriate type to use
> for
> > this.  Encoding and decoding Java Strings is expensive, and also prone
> to
> > error (i.e. you need to make sure that you *always* use the correct
> explicit
> > encoding).
>
> Despite the name AMQShortString, I don't think the AMQShortString class
> actually represents the AMQP type short-string, for example there is no
> length limit for an AMQShortString. It's really just a generic
> implementation of CharSequence that is optimized specifically for rapid
> decoding from a ByteBuffer. From a domain restriction perspective, using
> an ordinary String is just as correct.
>
> > It makes sense to use it on Broker side as you deal at bytes level and I
> can
> >> understand the performance benefit of not having convert back and forth
> >> into
> >> a String.
> >
> >
> > The low level API should be using correct AMQ domains.  High level APIs
> > (such as JMS) will obviously want to present these parameters as java
> > Strings.
> >
> >
> > On the client side we just merely wrap/unwrap a String using
> AMQShortString.
> >> Why can't we do that at the encoding/decoding level for the client side
> ?
> >
> >
> > In some cases this may be true, but in others certainly not.  When
> > converting into JMS Destinations on receipt of a message, for instance,
> one
> > never needs to convert to a String... it is *much* faster to simply use
> the
> > correct type of AMQShortString/
>
> Unfortunately using AMQShortString imposes additional overhead whenever
> we need to en/decode to/from an ordinary String. It basically requires
> an additional copy when compared with directly encoding/decoding to/from
>   a String. As the common case on the client side is dealing with
> Strings, I'm not at all convinced that ubiquitous use of AMQShortString
> is a net win for the client.
>
> I believe what would be optimal is to use the CharSequence interface
> everywhere. This way String values passed to us by an application could
> be directly passed all the way down the stack and encoded directly onto
> the wire without an additional copy, and incoming data could be
> efficiently decoded into a private impl of CharSequence that could be
> converted to a String on demand.
>
> --Rafael
>

Re: Use of AMQShortString in client side code

Posted by Rafael Schloming <ra...@redhat.com>.

Robert Godfrey wrote:
> On 13/09/2007, Rajith Attapattu <ra...@gmail.com> wrote:
>> I am wondering why we are using AMQShortString indiscriminately all over
>> the
>> client side code?
>> There is no performance benefit of using AMQShortString (based on the way
>> it
>> is used) on the client side and is purely used for encoding.
> 
> 
> 
> Rajith,
> 
> as we have discussed before - there *is* a significant performance benefit
> which we have tested and proved previously.

Can you point me to the previous discussion? I'd like to learn more 
about the original issue.

   Many short strings are re-used
> frequently within the client library, and by using our own type we can
> exploit this.

Unless we're excessively copying them I don't see how this matters. For 
both an AMQShortString and a String we should just be passing around 
pointers when they are reused.

   Further, the domain for many parameters in AMQP is *not* a
> unicode string, but is tightly defined as upto 255 bytes of data with a
> particular encoding.  Java Strings are not the appropriate type to use for
> this.  Encoding and decoding Java Strings is expensive, and also prone to
> error (i.e. you need to make sure that you *always* use the correct explicit
> encoding).

Despite the name AMQShortString, I don't think the AMQShortString class 
actually represents the AMQP type short-string, for example there is no 
length limit for an AMQShortString. It's really just a generic 
implementation of CharSequence that is optimized specifically for rapid 
decoding from a ByteBuffer. From a domain restriction perspective, using 
an ordinary String is just as correct.

> It makes sense to use it on Broker side as you deal at bytes level and I can
>> understand the performance benefit of not having convert back and forth
>> into
>> a String.
> 
> 
> The low level API should be using correct AMQ domains.  High level APIs
> (such as JMS) will obviously want to present these parameters as java
> Strings.
> 
> 
> On the client side we just merely wrap/unwrap a String using AMQShortString.
>> Why can't we do that at the encoding/decoding level for the client side ?
> 
> 
> In some cases this may be true, but in others certainly not.  When
> converting into JMS Destinations on receipt of a message, for instance, one
> never needs to convert to a String... it is *much* faster to simply use the
> correct type of AMQShortString/

Unfortunately using AMQShortString imposes additional overhead whenever 
we need to en/decode to/from an ordinary String. It basically requires 
an additional copy when compared with directly encoding/decoding to/from 
  a String. As the common case on the client side is dealing with 
Strings, I'm not at all convinced that ubiquitous use of AMQShortString 
is a net win for the client.

I believe what would be optimal is to use the CharSequence interface 
everywhere. This way String values passed to us by an application could 
be directly passed all the way down the stack and encoded directly onto 
the wire without an additional copy, and incoming data could be 
efficiently decoded into a private impl of CharSequence that could be 
converted to a String on demand.

--Rafael

Re: Use of AMQShortString in client side code

Posted by Rajith Attapattu <ra...@gmail.com>.

Hello Robert,

On 9/14/07, Robert Godfrey <ro...@gmail.com> wrote:
>
> On 13/09/2007, Rajith Attapattu <ra...@gmail.com> wrote:
> >
> > I am wondering why we are using AMQShortString indiscriminately all over
> > the
> > client side code?
> > There is no performance benefit of using AMQShortString (based on the
> way
> > it
> > is used) on the client side and is purely used for encoding.
>
>
>
> Rajith,
>
> as we have discussed before - there *is* a significant performance benefit
> which we have tested and proved previously.

I agree and understand the performance benifit here and I am not opposed to
the use of  AMQShortString where it makes sense.
However I am opposed to the **indiscriminate** use of AMQShortString inside
the client library.

Many short strings are re-used frequently within the client library, and by
> using our own type we can
> exploit this.  Further, the domain for many parameters in AMQP is *not* a
> unicode string, but is tightly defined as upto 255 bytes of data with a
> particular encoding.  Java Strings are not the appropriate type to use for
> this.  Encoding and decoding Java Strings is expensive, and also prone to
> error (i.e. you need to make sure that you *always* use the correct
> explicit
> encoding).

I understand that encoding/decoding to java String is expensive.
So we only gain that benefit if we hang around to the AMQShortString in
bytes without doing any encoding into String.
Now that is a justifiable reason to pass the data in AMQShortString format
around. Ex. the Destination use case you mentioned.
If we are eventually going to use it as a String we could easily do the
conversion once and pass it as a java String.
Since u have to do the conversion once, it doesn't matter where u do it.

I also see frequent use of the asString() method especially in log messages.
Each time u call this, you are doing the String encoding again.
One optimisation is once we create the String we can hang on to it and
return it inside the asString().

On the same token, when we create AMQShortStrings out of Java Strings we can
do it at the lower layers as you have alluded to in your reply.
We need to do the conversion once, and it doesn't matter which layer you do
it.
This will free the upper layers from having to use AMQShortString unless it
is necessary.

It makes sense to use it on Broker side as you deal at bytes level and I can
> > understand the performance benefit of not having convert back and forth
> > into
> > a String.
>
>
> The low level API should be using correct AMQ domains.  High level APIs
> (such as JMS) will obviously want to present these parameters as java
> Strings.

Exactly my point. Appologies if I didn't communicate this properly.

On the client side we just merely wrap/unwrap a String using AMQShortString.
> > Why can't we do that at the encoding/decoding level for the client side
> ?
>
>
> In some cases this may be true, but in others certainly not.  When
> converting into JMS Destinations on receipt of a message, for instance,
> one
> never needs to convert to a String... it is *much* faster to simply use
> the
> correct type of AMQShortString/

Again, I am only talking about the cases where you say it is true.

I really hate typing more than I need to :)
>
>
> I'm not sure that you laziness is sufficient motivation :-)
>
> -- Rob
>
> Regards,
> >
> > Rajith
> >
>

Re: Use of AMQShortString in client side code

Posted by Robert Godfrey <ro...@gmail.com>.

On 13/09/2007, Rajith Attapattu <ra...@gmail.com> wrote:
>
> I am wondering why we are using AMQShortString indiscriminately all over
> the
> client side code?
> There is no performance benefit of using AMQShortString (based on the way
> it
> is used) on the client side and is purely used for encoding.

Rajith,

as we have discussed before - there *is* a significant performance benefit
which we have tested and proved previously.  Many short strings are re-used
frequently within the client library, and by using our own type we can
exploit this.  Further, the domain for many parameters in AMQP is *not* a
unicode string, but is tightly defined as upto 255 bytes of data with a
particular encoding.  Java Strings are not the appropriate type to use for
this.  Encoding and decoding Java Strings is expensive, and also prone to
error (i.e. you need to make sure that you *always* use the correct explicit
encoding).

It makes sense to use it on Broker side as you deal at bytes level and I can
> understand the performance benefit of not having convert back and forth
> into
> a String.

The low level API should be using correct AMQ domains.  High level APIs
(such as JMS) will obviously want to present these parameters as java
Strings.

On the client side we just merely wrap/unwrap a String using AMQShortString.
> Why can't we do that at the encoding/decoding level for the client side ?

In some cases this may be true, but in others certainly not.  When
converting into JMS Destinations on receipt of a message, for instance, one
never needs to convert to a String... it is *much* faster to simply use the
correct type of AMQShortString/

I really hate typing more than I need to :)

I'm not sure that you laziness is sufficient motivation :-)

-- Rob

Regards,
>
> Rajith
>