You are viewing a plain text version of this content. The canonical link for it is here.
Posted to proton@qpid.apache.org by Rafael Schloming <rh...@alum.mit.edu> on 2014/05/24 17:49:10 UTC

codec updates

Hi Everyone,

I've been doing a bit more exploration around some of the codec strategies
I posted about a few weeks ago and I'd like to share some results.

There are still some gaps to fill in, but all the complex data types
(lists, maps, arrays, described types, etc) are dealt with for both
encode/decode. Those are important for evaluating performance as they are
the most complex to encode/decode and as such they significantly impact
performance.

You can look at what I've done here:

  -
https://github.com/rhs/qpid-proton/tree/codec/proton-j/src/main/java/org/apache/qpid/proton/codec2

Note the codec2 package name is temporary, it's just so it could live
alongside the existing codec in the same codebase.

I've put together a basic benchmark to compare against the existing codec
performance here:

  -
https://github.com/rhs/qpid-proton/blob/codec/proton-j/src/main/java/org/apache/qpid/proton/codec2/Benchmark.java

The benchmark encodes and decodes a list of 10 integers and a UUID. My hope
is that this is a reasonable approximation of what is in a common frame,
e.g. a transfer or flow frame. So far the results are encouraging. On my
system the new codec is roughly 8 to 9 times faster than the existing codec
on encode, and about 5 times faster than the existing codec for decode:

  [rhs@venture build]$ java -cp proton-j/proton-j.jar
org.apache.qpid.proton.codec2.Benchmark 100000000 all
  new encode: 9270 millis
  new decode: 7764 millis
  existing encode: 78725 millis
  existing decode: 40175 millis

The above Benchmark invocation is running through 100 million
encode/decodes and you can see the timing results for a typical run.

In addition to the raw performance considerations demonstrated by the
Benchmark, there are some interesting and potentially key aspects of the
design that would enable higher performance usage patterns.

The way the decoder works it scans the encoded byte stream and calls into
the data handler when types are encountered. The data handler is not
actually passed the decoded type, but instead it is passed a Decoder (which
is just a reference into the stream). The decoder can then be used by the
handler to extract the desired value from the data stream. This design
allows for a couple of nice things.

For one thing there is zero intermediate garbage created by the decoding
process itself, the only garbage produced is at the request of the handler,
e.g. if the handler wants a to extract a string as a full blown String
object it is free to do that and it will incur the associated overhead, but
the handler could also just choose to copy the utf8 bytes directly to some
final destination and avoid any conversion overhead. This also provides an
added measure of convenience and robustness since the ''type on the wire'
can be converted directly to the desired Java type, e.g. if it's an
integral type on the wire, your handler can just call getInt() or getLong()
and the decoder will convert/coerce automatically.

Another nice thing about this design is that there is minimal decode
overhead if the handler doesn't decode the type. This makes it possible to
quite efficiently scan for particular value(s) deep inside an encoded
stream. For example it should be possible to write a handler that extremely
efficiently evaluates a predicate against the message properties for things
like selectors/content based routing rules.

It should also be possible to write a handler that very efficiently copies
a data stream while modifying only a few values, e.g. copy a message from
an input buffer to an output buffer while updating just the ttl and
delivery count and adding some sort of trace header. We could even extend
the design to allow extremely efficient in-place modification of fixed
width values if we find that to be useful.

In addition to these lower level usage scenarios, it is also quite
straightforward to transform an encoded data stream into a full blown
object representation if performance is less critical. The codec includes a
POJOBuilder which implements the DataHandler interface and transforms an
AMQP byte stream into simple java objects. I've put together an example
usage here:

  -
https://github.com/rhs/qpid-proton/blob/codec/proton-j/src/main/java/org/apache/qpid/proton/codec2/Example.java

I'd like to get people's feedback on these ideas. I would like this codec
layer to be usable/useful as a first class API in it's own right, and not
just an implementation detail of the protocol engine. If people are happy
with the design and the API, I think it would be a relatively
straightforward process to generate some DataHandler implementations from
the protocol XML that would effectively replace the existing codec layer in
the engine and hopefully provide a significant performance improvement as a
result.

--Rafael

Re: codec updates

Posted by Rafael Schloming <rh...@alum.mit.edu>.
On Tue, May 27, 2014 at 5:15 PM, Robbie Gemmell <ro...@gmail.com>wrote:

> I haven't had time to run any of this or look through it properly yet, but
> after a very very quick scroll through...
>
> - Should we not use LinkedHashMap instances, due to the ordering
> restriction on equality checks for amqp maps?
>

Probably, although I should say that I didn't intend for POJOBuilder to
retain 100% fidelity of AMQP type information, but rather to transform the
data stream into something that is natural to process in Java. The idea
being that this gives you a high degree of convenience (as well as avoiding
overhead) most of the time when full fidelity shouldn't matter.

As a general note, the overall design is intended to make transformation
efficient and relatively easy, so it should be possible to go directly from
the input byte stream to the domain specific objects you want to work with
in your program without unnecessary intermediate representations. This also
means that there isn't one single canonical transformer, but multiple
transformers depending on what you want to do. For example an application
might use a POJO style transformer or even a domain specific transformer
for decoding and processing an application message, but a broker might use
a transformer that goes directly from input message bytes to output message
bytes (modulo updated ttl), or the protocol engine might use a transformer
that directly executes performatives.


> - It looked like it only uses list32, array32, string32 for encoding...do
> you envisage supporting the others?
>

Yes, however there is a time/space-on-the-wire trade-off here. One of the
reasons this codec is faster on encode than the existing codec is that it
doesn't attempt to compute the encoded size of a given value in advance. It
simply reserves the space on the wire and then goes and backfills it when
it is done encoding the value. This alone turns into something close to a
factor of two in speedup because computing the size has approximately the
same cost as performing the encoding. The trade-off of course is that it
doesn't know in advance what width to reserve for the size and count, so it
needs to assume the wider width. There may be some clever things we can do
here, e.g. there may be an inexpensive way to make a mostly accurate guess
and then shift stuff around if it turns out to be wrong, however those
kinds of things would require a more representative sample of data to work
with in order to validate.

Ultimately I suspect what is optimal for speed may never be optimal for
size, so the fallback here would be to simply provide two flavors of
implementation of the Encoder interface, one tailored for speed, and one
tailored for space-on-the-wire.


> - Glancing at the example, this made me do a double take - is that an array
> of string with a url in it, or somehow an array of url?
>   (I guess I need to look at what the code actually does with this :P)
>
>         enc.putArray(Type.STRING);
>         enc.putDescriptor();
>         enc.putSymbol("url");
>         enc.putString("http://one");
>         enc.putString("http://two");
>         enc.putString("http://three");
>         enc.end();
>

This is actually an array of described strings where the descriptor is the
symbol "url".


>
> - For the maps, it feels a little weird adding the keys and values
> independently, though I guess thats to let us away without additional
> map-specific methods?
>
>         enc.putMap();
>         enc.putString("key");
>         enc.putString("value");
>         enc.putString("pi");
>         enc.putDouble(3.14159265359);
>         enc.end();
>

I'm not sure how map specific methods would work. I can't think of a way to
do it offhand that wouldn't result in a fairly ugly combinatorial explosion
of signatures. I think there are two key things about this aspect of the
design. First, the API deals in stuff that is presented in serial order, so
the implementation can simply directly encode values onto the wire when
they are presented and only needs to keep around minimal internal state.
Secondly, the API is designed to be composable, e.g. imagine you have some
DomainType that knows how to serialize itself. With this interface you can
do things like:

  enc.putMap();
  enc.putString("key");
  domainType.encodeYourself(enc);
  enc.putString("next-key");
  enc.putString("next-value");
  enc.end();

You can also do:

  enc.putList();
  ...
  domainType.encodeYourself(enc);
  ...
  enc.end();

The general idea is to allow domain specific encoding pretty much the same
way that people write toString() methods, and so you need a given value to
render the same way regardless of whether it appears in a list or in a map
or even as a key in a map vs a value in a map.

I think this model becomes more natural if you think of this as an API for
expressing map/list literals rather than as an API for constructing
map/list objects, e.g.:

 { "key": "value", "pi": 3.14159265359 } -> enc.putMap();
enc.putString("key"); enc.putString("value"); enc.putString("pi");
enc.putDouble(3.14159265359); enc.end();



> - Out of interest, did you try larger heap sizes to see how much difference
> (if any) there was in their relative performance?
>

I did, and it didn't seem to make a difference. I think isolating/measuring
the overhead of "garbage" is likely to be pretty tricky outside of a real
application. In these micro benchmarks there is very little live data at
any given time, so the cost of traversing the entire set of live objects is
likely quite cheap compared to what it would be in a real application with
a larger set of live objects. Hypothetically I suppose we could try
creating artificial data sets, but I didn't bother.


> - Well into personal preference terriroty, can't say I'm a huge fan of the
> decoder converting nulls/booleans/etc into integers (albeit, if asked to).
>

That isn't intended to be the one and only way of implementing the decoder
interface. But I definitely wanted the design to be able to support that
level of coersion/conversion without significant penalties. I do believe it
is a very important use case because when you are automatically serializing
based on introspection and you're attempting to interop between languages
with different type systems, it's really kind of a crap shoot when it comes
to what type you will get. Javascript doesn't distinguish between floating
point and integral types, perl doesn't always distinguish well between
numbers and strings-that-look-like-numbers, java doesn't have unsigned
types, python doesn't really do fixed width types, so pretty much every
language that serializes based on introspection has the potential to encode
a number into an AMQP type that some other language doesn't natively
understand. So for reasonable interop based on introspective encode/decode
I think we really need to follow the robustness principal, i.e. be precise
in how we encode types and be liberal in how we decode them. The level of
conversion I put in is really geared at ensuring that we can be
sufficiently liberal. For something like the protocol engine which wouldn't
be doing encode/decode based on introspection, but instead based on a
shared schema, we could certainly use a stricter implementation of the
decoder interface that doesn't perform the same conversions.

--Rafael

Re: codec updates

Posted by Robbie Gemmell <ro...@gmail.com>.
I haven't had time to run any of this or look through it properly yet, but
after a very very quick scroll through...

- Should we not use LinkedHashMap instances, due to the ordering
restriction on equality checks for amqp maps?
- It looked like it only uses list32, array32, string32 for encoding...do
you envisage supporting the others?
- Glancing at the example, this made me do a double take - is that an array
of string with a url in it, or somehow an array of url?
  (I guess I need to look at what the code actually does with this :P)

        enc.putArray(Type.STRING);
        enc.putDescriptor();
        enc.putSymbol("url");
        enc.putString("http://one");
        enc.putString("http://two");
        enc.putString("http://three");
        enc.end();

- For the maps, it feels a little weird adding the keys and values
independently, though I guess thats to let us away without additional
map-specific methods?

        enc.putMap();
        enc.putString("key");
        enc.putString("value");
        enc.putString("pi");
        enc.putDouble(3.14159265359);
        enc.end();

- Out of interest, did you try larger heap sizes to see how much difference
(if any) there was in their relative performance?
- Well into personal preference terriroty, can't say I'm a huge fan of the
decoder converting nulls/booleans/etc into integers (albeit, if asked to).

Robbie


On 24 May 2014 16:49, Rafael Schloming <rh...@alum.mit.edu> wrote:

> Hi Everyone,
>
> I've been doing a bit more exploration around some of the codec strategies
> I posted about a few weeks ago and I'd like to share some results.
>
> There are still some gaps to fill in, but all the complex data types
> (lists, maps, arrays, described types, etc) are dealt with for both
> encode/decode. Those are important for evaluating performance as they are
> the most complex to encode/decode and as such they significantly impact
> performance.
>
> You can look at what I've done here:
>
>   -
>
> https://github.com/rhs/qpid-proton/tree/codec/proton-j/src/main/java/org/apache/qpid/proton/codec2
>
> Note the codec2 package name is temporary, it's just so it could live
> alongside the existing codec in the same codebase.
>
> I've put together a basic benchmark to compare against the existing codec
> performance here:
>
>   -
>
> https://github.com/rhs/qpid-proton/blob/codec/proton-j/src/main/java/org/apache/qpid/proton/codec2/Benchmark.java
>
> The benchmark encodes and decodes a list of 10 integers and a UUID. My hope
> is that this is a reasonable approximation of what is in a common frame,
> e.g. a transfer or flow frame. So far the results are encouraging. On my
> system the new codec is roughly 8 to 9 times faster than the existing codec
> on encode, and about 5 times faster than the existing codec for decode:
>
>   [rhs@venture build]$ java -cp proton-j/proton-j.jar
> org.apache.qpid.proton.codec2.Benchmark 100000000 all
>   new encode: 9270 millis
>   new decode: 7764 millis
>   existing encode: 78725 millis
>   existing decode: 40175 millis
>
> The above Benchmark invocation is running through 100 million
> encode/decodes and you can see the timing results for a typical run.
>
> In addition to the raw performance considerations demonstrated by the
> Benchmark, there are some interesting and potentially key aspects of the
> design that would enable higher performance usage patterns.
>
> The way the decoder works it scans the encoded byte stream and calls into
> the data handler when types are encountered. The data handler is not
> actually passed the decoded type, but instead it is passed a Decoder (which
> is just a reference into the stream). The decoder can then be used by the
> handler to extract the desired value from the data stream. This design
> allows for a couple of nice things.
>
> For one thing there is zero intermediate garbage created by the decoding
> process itself, the only garbage produced is at the request of the handler,
> e.g. if the handler wants a to extract a string as a full blown String
> object it is free to do that and it will incur the associated overhead, but
> the handler could also just choose to copy the utf8 bytes directly to some
> final destination and avoid any conversion overhead. This also provides an
> added measure of convenience and robustness since the ''type on the wire'
> can be converted directly to the desired Java type, e.g. if it's an
> integral type on the wire, your handler can just call getInt() or getLong()
> and the decoder will convert/coerce automatically.
>
> Another nice thing about this design is that there is minimal decode
> overhead if the handler doesn't decode the type. This makes it possible to
> quite efficiently scan for particular value(s) deep inside an encoded
> stream. For example it should be possible to write a handler that extremely
> efficiently evaluates a predicate against the message properties for things
> like selectors/content based routing rules.
>
> It should also be possible to write a handler that very efficiently copies
> a data stream while modifying only a few values, e.g. copy a message from
> an input buffer to an output buffer while updating just the ttl and
> delivery count and adding some sort of trace header. We could even extend
> the design to allow extremely efficient in-place modification of fixed
> width values if we find that to be useful.
>
> In addition to these lower level usage scenarios, it is also quite
> straightforward to transform an encoded data stream into a full blown
> object representation if performance is less critical. The codec includes a
> POJOBuilder which implements the DataHandler interface and transforms an
> AMQP byte stream into simple java objects. I've put together an example
> usage here:
>
>   -
>
> https://github.com/rhs/qpid-proton/blob/codec/proton-j/src/main/java/org/apache/qpid/proton/codec2/Example.java
>
> I'd like to get people's feedback on these ideas. I would like this codec
> layer to be usable/useful as a first class API in it's own right, and not
> just an implementation detail of the protocol engine. If people are happy
> with the design and the API, I think it would be a relatively
> straightforward process to generate some DataHandler implementations from
> the protocol XML that would effectively replace the existing codec layer in
> the engine and hopefully provide a significant performance improvement as a
> result.
>
> --Rafael
>