You are viewing a plain text version of this content. The canonical link for it is here.

Posted to proton@qpid.apache.org by Clebert Suconic <cs...@redhat.com> on 2014/04/29 15:27:28 UTC

Optimizations on Proton-j

I have done some work last week on optimizing the Codec.. and I think i've gotten some interesting results.

- The Decoder now is stateless, meaning the same instance can be used over and over (no more need for one instance per connection). Bozo Dragojefic has actually seen how heavy is to create a Decoder and has recently optimized MessageImpl to always take the same instance through ThreadLocals. This optimization goes a step further
- I have changed the ListDecoders somehow you won't need intermediate objects to parse Types. For now I have only made Transfer as that effective type but I could do that for all the other Types at some point
- There were a few hotspots that I found on the test and I have refactored accordingly, meaning no semantic changes.

As a result of these optimizations, DecoderImpl won't have a setBuffer method any longer. Instead of that each method will take a read(ReadableBuffer..., old signature).

And talking about ReadableBuffer, I have introduced the interface ReadableBuffer. When integrating on the broker, I had a situation where I won't have a ByteBuffer, and this interface will allow me to further optimize the Parser later as I could take the usage of Netty Buffer (aka ByteBuf).

You will find these optimizations on my branch on github: https://github.com/clebertsuconic/qpid-proton/tree/optimizations

Where I will have two commits:

I - a micro benchmark where I added a testcase doing a direct read on the buffer without any framework. I've actually written a simple parser that will work for the byte array I have, but that's very close to reading directly from the bytes.
I used that to compare raw reading and interpreting the buffer to the current framework we had.
I was actually concerned about the number of intermediate objects, so I used that to map these differences.

https://github.com/clebertsuconic/qpid-proton/commit/7b2b02649e5bdd35aa2e4cc487ffb91c01e75685

I - a commit with the actual optimizations:

https://github.com/clebertsuconic/qpid-proton/commit/305ecc6aaa5192fc0a1ae42b90cb4eb8ddfe046e

Without these optimizations my MicroBenchmark, parsing 10000000L instances of Transfer, without reallocating any buffers could complete on my laptop in:

- 3480 milliseconds , against 750 milliseconds with raw reading

After these optimizations:
- 1927 milliseconds, against 750 milliseconds with raw reading

Notice that this will also minimize the footprint of the codec but I'm not measuring that here.

I'm looking forward to work with this group, I actually had a meeting with Rafi and Ted last week, and I plan to work closer to you guys on this

Clebert Suconic

Re: Optimizations on Proton-j

Posted by Rafael Schloming <rh...@alum.mit.edu>.

Hi,

Comments inline...

On Mon, May 5, 2014 at 5:51 PM, Clebert Suconic <cs...@redhat.com> wrote:

> I have some ideas as well:
>
>
> - Calculating size prior to sending:
>
>  - We could write zeroes, write to the buffer... come back to the previous
> position.. write the size instead of calculating it.
>

Yeah, this is what I've done before. The only tricky thing here is figuring
out how much space to reserve for the size. In order to minimize the size
of the encoded data, it's better to use an encoding with a 1 byte size when
you can, but of course you don't know in advance if the size of the encoded
data will fit within a single byte.

>
> I have read this code a lot.. and I wouldn't rewrite the code.. just
> optimize these cases... I wouldn't optimize it for every possible case
> TBH.. just on message Delivery and Settling unless you want to also
> optimize other cases for use cases that I'm not aware at the moment.
>

I think the ability to access key data without doing a full decode is
likely to be useful at some point. I should also say that I think the
actual codec interface is not terribly useful/friendly right now for end
users. I don't particularly mind whether we iterate or replace the current
implementation, but I do think we need a solid idea of the end goal. To
that end I've put together a mock up of a few different strategies that
I've posted in another thread.

>
>
> other things that could boost performance based on the micro benchmark I
> wrote:
>
>
> - Using Integer, Long.. etc..inside of UnsignedInt, UnsignedLong would
> give you a good boost in performance. The JDK is already optimized to box
> these types... while UnsignedInt, UnsignedLong.. etc.. its not well
> optimized.
>

I haven't noticed much of a difference between Integer and UnsignedInteger
in any of the profiling I've done, but using the unboxed variants would
definitely make a difference.

>
> - Reusing buffers.. maybe adding a framework where we could reuse
> buffers.. or delegate into other frameworks (e.g. Netty).
>

Yeah, we should look at this in the context of copying as well.

--Rafael

Re: Optimizations on Proton-j

Posted by Clebert Suconic <cs...@redhat.com>.

I have some ideas as well:


- Calculating size prior to sending:

 - We could write zeroes, write to the buffer... come back to the previous position.. write the size instead of calculating it.

I have read this code a lot.. and I wouldn't rewrite the code.. just optimize these cases... I wouldn't optimize it for every possible case TBH.. just on message Delivery and Settling unless you want to also optimize other cases for use cases that I'm not aware at the moment.



other things that could boost performance based on the micro benchmark I wrote:


- Using Integer, Long.. etc..inside of UnsignedInt, UnsignedLong would give you a good boost in performance. The JDK is already optimized to box these types... while UnsignedInt, UnsignedLong.. etc.. its not well optimized.

- Reusing buffers.. maybe adding a framework where we could reuse buffers.. or delegate into other frameworks (e.g. Netty).








On May 5, 2014, at 5:40 PM, Clebert Suconic <cs...@redhat.com> wrote:

> 
> On May 5, 2014, at 5:02 PM, Rafael Schloming <rh...@alum.mit.edu> wrote:
> 
>> Hi Clebert,
>> 
>> Sorry for taking so long to follow up with the benchmark. I've been tweaking it and using it to do some memory and CPU profiling.
>> 
>> The good news is I was able to significantly reduce memory utilization by making some simple changes to CollectorImpl. The benchmark triggers over 42 million events. The way CollectorImpl was coded initially this would create two throwaway objects for each event. This was ending up somewhere north of 84 million throw away objects over the course of the benchmark. One being the event itself, and the other being the linked list node. I changed CollectorImpl to use a simple chain instead of java.util's linked list and also to pool/reuse popped events. The same benchmark now only results in about 250 actual event objects being created in total in order to handle the same 42 million events.
>> 
>> While this reduces memory pressure a lot, surprisingly enough, the event related objects were not the biggest source of garbage. At the top is java.nio.HeapByteBuffer (I haven't looked into this, but I suspect we're throwing away some of these that we could be reusing), and the second biggest source of garbage is org.apache.qpid.proton.amqp.UnsignedInteger. The latter one I find a bit more concerning as its use is fairly interwingled into the design of the current codec and as a result it is likely less straightforward to address.
>> 
>> On the CPU profiling front I've noticed a couple of interesting things. First off it does appear that much of the processing time is spent in codec, roughly 63%, but the cost is not symmetrically divided between encode and decode. Encoding accounts for about 46% of the total, and decoding about 17%. I think this may be why its hard to measure the effect of your patch on this benchmark. The decoding cost just isn't all that high compared to encoding. I did the same profiling again with your patch applied and decoding dropped to about 10% of the total while encoding increased to about 50% of the total.
>> 
>> Digging into the encoding side a bit, it appears that a significant amount of time is being spent calculating the encoded size of a value prior to writing its encoded representation to the wire. One of the optimizations I've had success with in the past (both on previous Java codecs and in the C codec) is to avoid calculating the size up front and instead simply reserve the necessary space for it and fill it in after the encoded representation has been written. In the past this has close to doubled the performance of encode since calculating the encoded size is often as expensive as simply doing the encoding. Unfortunately I'm guessing this kind of thing would probably require a major rework of the codec.
>> 
>> To summarize I think there are really two design related issues we will need to address in order to achieve optimum performance. On the memory front, I think the fact that every described type is rendered into a tree of generic objects on both decode/encode is going to be problematic. The strategy you've taken in your patch to special case certain frames and eliminate the intermediate list objects helps with this, but I think we could do a whole lot better if we were to adopt a design that did not require any intermediate objects at all. On the CPU front, I think we'll get the biggest bang for our buck if we look into a design that doesn't require calculating the size up front.
>> 
>> I have some ideas in mind for a new design that I hope will address both of these issues. I'm going to write them up in a separate post.
>> 
>> Regarding your patch, I'm happy to apply it, but I suspect that much of the current codec layer would need to be modified and/or replaced to address the above findings. Let me know how you would like to proceed.
> 
> 
> 
> It's all consistent with what I have seen... I have also realized the calculating encode size prior to sending.
> 
> 
> I would prefer if we could evolve in top of what did. I think the patch that I did is one step further on avoiding intermediate objects...   I have seen already other cases where we could avoid it...  It's all an evolution... and I'm still working into other cases...  if you take this patch now we would move a step further.
> 
> the patch is not only addressing the codec optimizations, but it's also avoiding multiple instances of the codec itself.. what makes it lighter.
> 
> I'm now working on a benchmark based on yours that will be closer to what I will be using.

Re: Optimizations on Proton-j

Posted by Clebert Suconic <cs...@redhat.com>.

On May 5, 2014, at 5:02 PM, Rafael Schloming <rh...@alum.mit.edu> wrote:

> Hi Clebert,
> 
> Sorry for taking so long to follow up with the benchmark. I've been tweaking it and using it to do some memory and CPU profiling.
> 
> The good news is I was able to significantly reduce memory utilization by making some simple changes to CollectorImpl. The benchmark triggers over 42 million events. The way CollectorImpl was coded initially this would create two throwaway objects for each event. This was ending up somewhere north of 84 million throw away objects over the course of the benchmark. One being the event itself, and the other being the linked list node. I changed CollectorImpl to use a simple chain instead of java.util's linked list and also to pool/reuse popped events. The same benchmark now only results in about 250 actual event objects being created in total in order to handle the same 42 million events.
> 
> While this reduces memory pressure a lot, surprisingly enough, the event related objects were not the biggest source of garbage. At the top is java.nio.HeapByteBuffer (I haven't looked into this, but I suspect we're throwing away some of these that we could be reusing), and the second biggest source of garbage is org.apache.qpid.proton.amqp.UnsignedInteger. The latter one I find a bit more concerning as its use is fairly interwingled into the design of the current codec and as a result it is likely less straightforward to address.
> 
> On the CPU profiling front I've noticed a couple of interesting things. First off it does appear that much of the processing time is spent in codec, roughly 63%, but the cost is not symmetrically divided between encode and decode. Encoding accounts for about 46% of the total, and decoding about 17%. I think this may be why its hard to measure the effect of your patch on this benchmark. The decoding cost just isn't all that high compared to encoding. I did the same profiling again with your patch applied and decoding dropped to about 10% of the total while encoding increased to about 50% of the total.
> 
> Digging into the encoding side a bit, it appears that a significant amount of time is being spent calculating the encoded size of a value prior to writing its encoded representation to the wire. One of the optimizations I've had success with in the past (both on previous Java codecs and in the C codec) is to avoid calculating the size up front and instead simply reserve the necessary space for it and fill it in after the encoded representation has been written. In the past this has close to doubled the performance of encode since calculating the encoded size is often as expensive as simply doing the encoding. Unfortunately I'm guessing this kind of thing would probably require a major rework of the codec.
> 
> To summarize I think there are really two design related issues we will need to address in order to achieve optimum performance. On the memory front, I think the fact that every described type is rendered into a tree of generic objects on both decode/encode is going to be problematic. The strategy you've taken in your patch to special case certain frames and eliminate the intermediate list objects helps with this, but I think we could do a whole lot better if we were to adopt a design that did not require any intermediate objects at all. On the CPU front, I think we'll get the biggest bang for our buck if we look into a design that doesn't require calculating the size up front.
> 
> I have some ideas in mind for a new design that I hope will address both of these issues. I'm going to write them up in a separate post.
> 
> Regarding your patch, I'm happy to apply it, but I suspect that much of the current codec layer would need to be modified and/or replaced to address the above findings. Let me know how you would like to proceed.



It's all consistent with what I have seen... I have also realized the calculating encode size prior to sending.


I would prefer if we could evolve in top of what did. I think the patch that I did is one step further on avoiding intermediate objects...   I have seen already other cases where we could avoid it...  It's all an evolution... and I'm still working into other cases...  if you take this patch now we would move a step further.

the patch is not only addressing the codec optimizations, but it's also avoiding multiple instances of the codec itself.. what makes it lighter.

I'm now working on a benchmark based on yours that will be closer to what I will be using.

Re: Optimizations on Proton-j

Posted by Rafael Schloming <rh...@alum.mit.edu>.

Hi Clebert,

Sorry for taking so long to follow up with the benchmark. I've been
tweaking it and using it to do some memory and CPU profiling.

The good news is I was able to significantly reduce memory utilization by
making some simple changes to CollectorImpl. The benchmark triggers over 42
million events. The way CollectorImpl was coded initially this would create
two throwaway objects for each event. This was ending up somewhere north of
84 million throw away objects over the course of the benchmark. One being
the event itself, and the other being the linked list node. I changed
CollectorImpl to use a simple chain instead of java.util's linked list and
also to pool/reuse popped events. The same benchmark now only results in
about 250 actual event objects being created in total in order to handle
the same 42 million events.

While this reduces memory pressure a lot, surprisingly enough, the event
related objects were not the biggest source of garbage. At the top is
java.nio.HeapByteBuffer (I haven't looked into this, but I suspect we're
throwing away some of these that we could be reusing), and the second
biggest source of garbage is org.apache.qpid.proton.amqp.UnsignedInteger.
The latter one I find a bit more concerning as its use is fairly
interwingled into the design of the current codec and as a result it is
likely less straightforward to address.

On the CPU profiling front I've noticed a couple of interesting things.
First off it does appear that much of the processing time is spent in
codec, roughly 63%, but the cost is not symmetrically divided between
encode and decode. Encoding accounts for about 46% of the total, and
decoding about 17%. I think this may be why its hard to measure the effect
of your patch on this benchmark. The decoding cost just isn't all that high
compared to encoding. I did the same profiling again with your patch
applied and decoding dropped to about 10% of the total while encoding
increased to about 50% of the total.

Digging into the encoding side a bit, it appears that a significant amount
of time is being spent calculating the encoded size of a value prior to
writing its encoded representation to the wire. One of the optimizations
I've had success with in the past (both on previous Java codecs and in the
C codec) is to avoid calculating the size up front and instead simply
reserve the necessary space for it and fill it in after the encoded
representation has been written. In the past this has close to doubled the
performance of encode since calculating the encoded size is often as
expensive as simply doing the encoding. Unfortunately I'm guessing this
kind of thing would probably require a major rework of the codec.

To summarize I think there are really two design related issues we will
need to address in order to achieve optimum performance. On the memory
front, I think the fact that every described type is rendered into a tree
of generic objects on both decode/encode is going to be problematic. The
strategy you've taken in your patch to special case certain frames and
eliminate the intermediate list objects helps with this, but I think we
could do a whole lot better if we were to adopt a design that did not
require any intermediate objects at all. On the CPU front, I think we'll
get the biggest bang for our buck if we look into a design that doesn't
require calculating the size up front.

I have some ideas in mind for a new design that I hope will address both of
these issues. I'm going to write them up in a separate post.

Regarding your patch, I'm happy to apply it, but I suspect that much of the
current codec layer would need to be modified and/or replaced to address
the above findings. Let me know how you would like to proceed.

--Rafael

On Fri, May 2, 2014 at 10:45 AM, Clebert Suconic <cs...@redhat.com>wrote:

> These shuld be all cleared now..
>
>
> My github branch and PR are up to date now:
>
> https://github.com/apache/qpid-proton/pull/1
>
>
> .... And isn't git is beautiful. It's already rebased with Rafi's last
> commit!
>
>
>
> On May 1, 2014, at 5:32 PM, Clebert Suconic <cs...@redhat.com> wrote:
>
> > I will do some cleanup on this .. I already fixed the headers here on my
> copy and I will do some cleanup on imports that were't supposed to be done.
> > On May 1, 2014, at 5:11 PM, Robbie Gemmell <ro...@gmail.com>
> wrote:
> >
> >> As no mail arrived here or qpid-dev, and none seems to have arrived at
> what
> >> used to be the default location (infra-dev) either, I had a quick look
> and
> >> it seems like they might have changed the process slightly and we will
> need
> >> to ask for the mails to be enabled at all:
> >>
> https://blogs.apache.org/infra/entry/improved_integration_between_apache_and
> >>
> >> I particularly like the mention of the new comment syncing between our
> >> mailing list and the Pull Requests.
> >>
> >> Regarding closing the pull requests, it seems like something along the
> >> lines of "This closes #<request number> at GitHub" added to the end of
> the
> >> svn commit message should do the trick:
> >> https://help.github.com/articles/closing-issues-via-commit-messages
> >>
> >> I havent had a chance to really look at the actual code change but when
> I
> >> was quickly scrolling down the PR, in addition to the licence headers on
> >> the new files that Rafi already mentioned (which I spotted due to the
> >> Copyright notices we wouldnt typically have) I noticed Encoder.java
> having
> >> its existing licence header corrupted a little by some wayward code.
> >>
> >> Robbie
> >> I just submitted it as a git PR:
> >>
> >> https://github.com/apache/qpid-proton/pull/1
> >>
> >>
> >>
> >> On Apr 30, 2014, at 10:47 AM, Robbie Gemmell <ro...@gmail.com>
> >> wrote:
> >>
> >>> I think anyone can sign up for ReviewBoard themselves. It certainly
> didn't
> >>> used to be linked to the ASF LDAP in the past, presumably for that
> reason.
> >>>
> >>> Its probably also worth noting you can initiate pull requests against
> the
> >>> github mirrors. If it hasn't already been done for the proton mirror,
> we
> >>> can have the emails that would generate be directed to this list (e.g.
> >>>
> >>
> http://mail-archives.apache.org/mod_mbox/qpid-dev/201401.mbox/%3C20140130180355.3CF9E9160F7@tyr.zones.apache.org%3E
> >> ).
> >>> We obviously can't merge the pull request via github, but you can use
> >>> the reviewing tools etc and the resultant patch can be downloaded or
> >>> attached to a JIRA and then applied in the usual fashion (I believe
> there
> >>> is a commit message syntax that can be used to trigger closing the pull
> >>> request).
> >>>
> >>> Robbie
> >>>
> >>> On 30 April 2014 15:22, Rafael Schloming <rh...@alum.mit.edu> wrote:
> >>>
> >>>> On Wed, Apr 30, 2014 at 8:35 AM, Clebert Suconic <csuconic@redhat.com
> >>>>> wrote:
> >>>>
> >>>>> @Rafi: I see there is a patch review  process within Apache (based on
> >>>> your
> >>>>> other thread on Java8)
> >>>>>
> >>>>> Should we make this through the patch process at some point?
> >>>>>
> >>>>
> >>>> I'm fine looking at it on your git branch, but if you'd like to play
> with
> >>>> the review tool then feel free.  Just let me know if you need an
> account
> >>>> and I will try to remember how to set one up (or who to bug to get you
> >>>> one). ;-)
> >>>>
> >>>> --Rafael
> >>>>
> >
>
>

Re: Optimizations on Proton-j

Posted by Clebert Suconic <cs...@redhat.com>.

These shuld be all cleared now..


My github branch and PR are up to date now:

https://github.com/apache/qpid-proton/pull/1


.... And isn't git is beautiful. It's already rebased with Rafi's last commit! 



On May 1, 2014, at 5:32 PM, Clebert Suconic <cs...@redhat.com> wrote:

> I will do some cleanup on this .. I already fixed the headers here on my copy and I will do some cleanup on imports that were't supposed to be done.
> On May 1, 2014, at 5:11 PM, Robbie Gemmell <ro...@gmail.com> wrote:
> 
>> As no mail arrived here or qpid-dev, and none seems to have arrived at what
>> used to be the default location (infra-dev) either, I had a quick look and
>> it seems like they might have changed the process slightly and we will need
>> to ask for the mails to be enabled at all:
>> https://blogs.apache.org/infra/entry/improved_integration_between_apache_and
>> 
>> I particularly like the mention of the new comment syncing between our
>> mailing list and the Pull Requests.
>> 
>> Regarding closing the pull requests, it seems like something along the
>> lines of "This closes #<request number> at GitHub" added to the end of the
>> svn commit message should do the trick:
>> https://help.github.com/articles/closing-issues-via-commit-messages
>> 
>> I havent had a chance to really look at the actual code change but when I
>> was quickly scrolling down the PR, in addition to the licence headers on
>> the new files that Rafi already mentioned (which I spotted due to the
>> Copyright notices we wouldnt typically have) I noticed Encoder.java having
>> its existing licence header corrupted a little by some wayward code.
>> 
>> Robbie
>> I just submitted it as a git PR:
>> 
>> https://github.com/apache/qpid-proton/pull/1
>> 
>> 
>> 
>> On Apr 30, 2014, at 10:47 AM, Robbie Gemmell <ro...@gmail.com>
>> wrote:
>> 
>>> I think anyone can sign up for ReviewBoard themselves. It certainly didn't
>>> used to be linked to the ASF LDAP in the past, presumably for that reason.
>>> 
>>> Its probably also worth noting you can initiate pull requests against the
>>> github mirrors. If it hasn't already been done for the proton mirror, we
>>> can have the emails that would generate be directed to this list (e.g.
>>> 
>> http://mail-archives.apache.org/mod_mbox/qpid-dev/201401.mbox/%3C20140130180355.3CF9E9160F7@tyr.zones.apache.org%3E
>> ).
>>> We obviously can't merge the pull request via github, but you can use
>>> the reviewing tools etc and the resultant patch can be downloaded or
>>> attached to a JIRA and then applied in the usual fashion (I believe there
>>> is a commit message syntax that can be used to trigger closing the pull
>>> request).
>>> 
>>> Robbie
>>> 
>>> On 30 April 2014 15:22, Rafael Schloming <rh...@alum.mit.edu> wrote:
>>> 
>>>> On Wed, Apr 30, 2014 at 8:35 AM, Clebert Suconic <csuconic@redhat.com
>>>>> wrote:
>>>> 
>>>>> @Rafi: I see there is a patch review  process within Apache (based on
>>>> your
>>>>> other thread on Java8)
>>>>> 
>>>>> Should we make this through the patch process at some point?
>>>>> 
>>>> 
>>>> I'm fine looking at it on your git branch, but if you'd like to play with
>>>> the review tool then feel free.  Just let me know if you need an account
>>>> and I will try to remember how to set one up (or who to bug to get you
>>>> one). ;-)
>>>> 
>>>> --Rafael
>>>> 
>

Re: Optimizations on Proton-j

Posted by Clebert Suconic <cs...@redhat.com>.

I will do some cleanup on this .. I already fixed the headers here on my copy and I will do some cleanup on imports that were't supposed to be done.
On May 1, 2014, at 5:11 PM, Robbie Gemmell <ro...@gmail.com> wrote:

> As no mail arrived here or qpid-dev, and none seems to have arrived at what
> used to be the default location (infra-dev) either, I had a quick look and
> it seems like they might have changed the process slightly and we will need
> to ask for the mails to be enabled at all:
> https://blogs.apache.org/infra/entry/improved_integration_between_apache_and
> 
> I particularly like the mention of the new comment syncing between our
> mailing list and the Pull Requests.
> 
> Regarding closing the pull requests, it seems like something along the
> lines of "This closes #<request number> at GitHub" added to the end of the
> svn commit message should do the trick:
> https://help.github.com/articles/closing-issues-via-commit-messages
> 
> I havent had a chance to really look at the actual code change but when I
> was quickly scrolling down the PR, in addition to the licence headers on
> the new files that Rafi already mentioned (which I spotted due to the
> Copyright notices we wouldnt typically have) I noticed Encoder.java having
> its existing licence header corrupted a little by some wayward code.
> 
> Robbie
> I just submitted it as a git PR:
> 
> https://github.com/apache/qpid-proton/pull/1
> 
> 
> 
> On Apr 30, 2014, at 10:47 AM, Robbie Gemmell <ro...@gmail.com>
> wrote:
> 
>> I think anyone can sign up for ReviewBoard themselves. It certainly didn't
>> used to be linked to the ASF LDAP in the past, presumably for that reason.
>> 
>> Its probably also worth noting you can initiate pull requests against the
>> github mirrors. If it hasn't already been done for the proton mirror, we
>> can have the emails that would generate be directed to this list (e.g.
>> 
> http://mail-archives.apache.org/mod_mbox/qpid-dev/201401.mbox/%3C20140130180355.3CF9E9160F7@tyr.zones.apache.org%3E
> ).
>> We obviously can't merge the pull request via github, but you can use
>> the reviewing tools etc and the resultant patch can be downloaded or
>> attached to a JIRA and then applied in the usual fashion (I believe there
>> is a commit message syntax that can be used to trigger closing the pull
>> request).
>> 
>> Robbie
>> 
>> On 30 April 2014 15:22, Rafael Schloming <rh...@alum.mit.edu> wrote:
>> 
>>> On Wed, Apr 30, 2014 at 8:35 AM, Clebert Suconic <csuconic@redhat.com
>>>> wrote:
>>> 
>>>> @Rafi: I see there is a patch review  process within Apache (based on
>>> your
>>>> other thread on Java8)
>>>> 
>>>> Should we make this through the patch process at some point?
>>>> 
>>> 
>>> I'm fine looking at it on your git branch, but if you'd like to play with
>>> the review tool then feel free.  Just let me know if you need an account
>>> and I will try to remember how to set one up (or who to bug to get you
>>> one). ;-)
>>> 
>>> --Rafael
>>>

Re: Optimizations on Proton-j

Posted by Robbie Gemmell <ro...@gmail.com>.

As no mail arrived here or qpid-dev, and none seems to have arrived at what
used to be the default location (infra-dev) either, I had a quick look and
it seems like they might have changed the process slightly and we will need
to ask for the mails to be enabled at all:
https://blogs.apache.org/infra/entry/improved_integration_between_apache_and

I particularly like the mention of the new comment syncing between our
mailing list and the Pull Requests.

Regarding closing the pull requests, it seems like something along the
lines of "This closes #<request number> at GitHub" added to the end of the
svn commit message should do the trick:
https://help.github.com/articles/closing-issues-via-commit-messages

I havent had a chance to really look at the actual code change but when I
was quickly scrolling down the PR, in addition to the licence headers on
the new files that Rafi already mentioned (which I spotted due to the
Copyright notices we wouldnt typically have) I noticed Encoder.java having
its existing licence header corrupted a little by some wayward code.

Robbie
I just submitted it as a git PR:

https://github.com/apache/qpid-proton/pull/1



On Apr 30, 2014, at 10:47 AM, Robbie Gemmell <ro...@gmail.com>
wrote:

> I think anyone can sign up for ReviewBoard themselves. It certainly didn't
> used to be linked to the ASF LDAP in the past, presumably for that reason.
>
> Its probably also worth noting you can initiate pull requests against the
> github mirrors. If it hasn't already been done for the proton mirror, we
> can have the emails that would generate be directed to this list (e.g.
>
http://mail-archives.apache.org/mod_mbox/qpid-dev/201401.mbox/%3C20140130180355.3CF9E9160F7@tyr.zones.apache.org%3E
).
> We obviously can't merge the pull request via github, but you can use
> the reviewing tools etc and the resultant patch can be downloaded or
> attached to a JIRA and then applied in the usual fashion (I believe there
> is a commit message syntax that can be used to trigger closing the pull
> request).
>
> Robbie
>
> On 30 April 2014 15:22, Rafael Schloming <rh...@alum.mit.edu> wrote:
>
>> On Wed, Apr 30, 2014 at 8:35 AM, Clebert Suconic <csuconic@redhat.com
>>> wrote:
>>
>>> @Rafi: I see there is a patch review  process within Apache (based on
>> your
>>> other thread on Java8)
>>>
>>> Should we make this through the patch process at some point?
>>>
>>
>> I'm fine looking at it on your git branch, but if you'd like to play with
>> the review tool then feel free.  Just let me know if you need an account
>> and I will try to remember how to set one up (or who to bug to get you
>> one). ;-)
>>
>> --Rafael
>>

Re: Optimizations on Proton-j

Posted by Clebert Suconic <cs...@redhat.com>.

Ah... I will fix the license headers shortly.



On May 1, 2014, at 4:51 PM, Clebert Suconic <cs...@redhat.com> wrote:

> For now I have pretty much optimized the Transfer type only. no other types.
> 
> for instance I see that Disposition type needs optimization as well... 
> 
> For us though the biggest advantage on the patch I'm making .
> 
> If you send a 1K message.. you won't have much of the optimization on the codec being exercised.
> 
> we could do 10 Million Transfer in 3 seconds before... against 1.5 on my laptop. If transferring 10Million * 10K is taking 40 seconds the optimization of the 1.5 would be spread among the delivery and you wouldn't be able to see a difference.
> 
> 
> Why don't you try sending empty messages? meaning.. a message is received with an empty body.
> 
> On May 1, 2014, at 4:44 PM, Rafael Schloming <rh...@alum.mit.edu> wrote:
> 
>> Hi Clebert,
>> 
>> I've been (amongst other things) doing a little bit of investigation on
>> this topic over the past couple of days. I wrote a microbenchmark that
>> takes two engines and directly wires their transports together. It then
>> pumps about 10 million 1K messages from one engine to the other. I ran this
>> benchmark under jprofiler and codec definitely came up as a hot spot, but
>> when I apply your patch, I don't see any measurable difference in results.
>> Either way it's taking about 40 seconds to pump all the messages through.
>> 
>> I'm not quite sure what is going on, but I'm guessing either the code path
>> you've optimized isn't coming up enough to make much of a difference, or
>> I've somehow messed up the measurements. I will post the benchmark shortly,
>> so hopefully you can check up on my measurements yourself.
>> 
>> On a more mundane note, Andrew pointed out that the new files you've added
>> in your patch use an outdated license header. You can take a look at some
>> existing files in the repo to get a current license header.
>> 
>> --Rafael
>> 
>> 
>> 
>> On Wed, Apr 30, 2014 at 2:15 PM, Clebert Suconic <cs...@redhat.com>wrote:
>> 
>>> I just submitted it as a git PR:
>>> 
>>> https://github.com/apache/qpid-proton/pull/1
>>> 
>>> 
>>> 
>>> On Apr 30, 2014, at 10:47 AM, Robbie Gemmell <ro...@gmail.com>
>>> wrote:
>>> 
>>>> I think anyone can sign up for ReviewBoard themselves. It certainly
>>> didn't
>>>> used to be linked to the ASF LDAP in the past, presumably for that
>>> reason.
>>>> 
>>>> Its probably also worth noting you can initiate pull requests against the
>>>> github mirrors. If it hasn't already been done for the proton mirror, we
>>>> can have the emails that would generate be directed to this list (e.g.
>>>> 
>>> http://mail-archives.apache.org/mod_mbox/qpid-dev/201401.mbox/%3C20140130180355.3CF9E9160F7@tyr.zones.apache.org%3E
>>> ).
>>>> We obviously can't merge the pull request via github, but you can use
>>>> the reviewing tools etc and the resultant patch can be downloaded or
>>>> attached to a JIRA and then applied in the usual fashion (I believe there
>>>> is a commit message syntax that can be used to trigger closing the pull
>>>> request).
>>>> 
>>>> Robbie
>>>> 
>>>> On 30 April 2014 15:22, Rafael Schloming <rh...@alum.mit.edu> wrote:
>>>> 
>>>>> On Wed, Apr 30, 2014 at 8:35 AM, Clebert Suconic <csuconic@redhat.com
>>>>>> wrote:
>>>>> 
>>>>>> @Rafi: I see there is a patch review  process within Apache (based on
>>>>> your
>>>>>> other thread on Java8)
>>>>>> 
>>>>>> Should we make this through the patch process at some point?
>>>>>> 
>>>>> 
>>>>> I'm fine looking at it on your git branch, but if you'd like to play
>>> with
>>>>> the review tool then feel free.  Just let me know if you need an account
>>>>> and I will try to remember how to set one up (or who to bug to get you
>>>>> one). ;-)
>>>>> 
>>>>> --Rafael
>>>>> 
>>> 
>>> 
>

Re: Optimizations on Proton-j

Posted by Clebert Suconic <cs...@redhat.com>.

For now I have pretty much optimized the Transfer type only. no other types.

for instance I see that Disposition type needs optimization as well... 

For us though the biggest advantage on the patch I'm making .

If you send a 1K message.. you won't have much of the optimization on the codec being exercised.

we could do 10 Million Transfer in 3 seconds before... against 1.5 on my laptop. If transferring 10Million * 10K is taking 40 seconds the optimization of the 1.5 would be spread among the delivery and you wouldn't be able to see a difference.


Why don't you try sending empty messages? meaning.. a message is received with an empty body.

On May 1, 2014, at 4:44 PM, Rafael Schloming <rh...@alum.mit.edu> wrote:

> Hi Clebert,
> 
> I've been (amongst other things) doing a little bit of investigation on
> this topic over the past couple of days. I wrote a microbenchmark that
> takes two engines and directly wires their transports together. It then
> pumps about 10 million 1K messages from one engine to the other. I ran this
> benchmark under jprofiler and codec definitely came up as a hot spot, but
> when I apply your patch, I don't see any measurable difference in results.
> Either way it's taking about 40 seconds to pump all the messages through.
> 
> I'm not quite sure what is going on, but I'm guessing either the code path
> you've optimized isn't coming up enough to make much of a difference, or
> I've somehow messed up the measurements. I will post the benchmark shortly,
> so hopefully you can check up on my measurements yourself.
> 
> On a more mundane note, Andrew pointed out that the new files you've added
> in your patch use an outdated license header. You can take a look at some
> existing files in the repo to get a current license header.
> 
> --Rafael
> 
> 
> 
> On Wed, Apr 30, 2014 at 2:15 PM, Clebert Suconic <cs...@redhat.com>wrote:
> 
>> I just submitted it as a git PR:
>> 
>> https://github.com/apache/qpid-proton/pull/1
>> 
>> 
>> 
>> On Apr 30, 2014, at 10:47 AM, Robbie Gemmell <ro...@gmail.com>
>> wrote:
>> 
>>> I think anyone can sign up for ReviewBoard themselves. It certainly
>> didn't
>>> used to be linked to the ASF LDAP in the past, presumably for that
>> reason.
>>> 
>>> Its probably also worth noting you can initiate pull requests against the
>>> github mirrors. If it hasn't already been done for the proton mirror, we
>>> can have the emails that would generate be directed to this list (e.g.
>>> 
>> http://mail-archives.apache.org/mod_mbox/qpid-dev/201401.mbox/%3C20140130180355.3CF9E9160F7@tyr.zones.apache.org%3E
>> ).
>>> We obviously can't merge the pull request via github, but you can use
>>> the reviewing tools etc and the resultant patch can be downloaded or
>>> attached to a JIRA and then applied in the usual fashion (I believe there
>>> is a commit message syntax that can be used to trigger closing the pull
>>> request).
>>> 
>>> Robbie
>>> 
>>> On 30 April 2014 15:22, Rafael Schloming <rh...@alum.mit.edu> wrote:
>>> 
>>>> On Wed, Apr 30, 2014 at 8:35 AM, Clebert Suconic <csuconic@redhat.com
>>>>> wrote:
>>>> 
>>>>> @Rafi: I see there is a patch review  process within Apache (based on
>>>> your
>>>>> other thread on Java8)
>>>>> 
>>>>> Should we make this through the patch process at some point?
>>>>> 
>>>> 
>>>> I'm fine looking at it on your git branch, but if you'd like to play
>> with
>>>> the review tool then feel free.  Just let me know if you need an account
>>>> and I will try to remember how to set one up (or who to bug to get you
>>>> one). ;-)
>>>> 
>>>> --Rafael
>>>> 
>> 
>>

Re: Optimizations on Proton-j

Posted by Rafael Schloming <rh...@alum.mit.edu>.

Hi Clebert,

I've been (amongst other things) doing a little bit of investigation on
this topic over the past couple of days. I wrote a microbenchmark that
takes two engines and directly wires their transports together. It then
pumps about 10 million 1K messages from one engine to the other. I ran this
benchmark under jprofiler and codec definitely came up as a hot spot, but
when I apply your patch, I don't see any measurable difference in results.
Either way it's taking about 40 seconds to pump all the messages through.

I'm not quite sure what is going on, but I'm guessing either the code path
you've optimized isn't coming up enough to make much of a difference, or
I've somehow messed up the measurements. I will post the benchmark shortly,
so hopefully you can check up on my measurements yourself.

On a more mundane note, Andrew pointed out that the new files you've added
in your patch use an outdated license header. You can take a look at some
existing files in the repo to get a current license header.

--Rafael

On Wed, Apr 30, 2014 at 2:15 PM, Clebert Suconic <cs...@redhat.com>wrote:

> I just submitted it as a git PR:
>
> https://github.com/apache/qpid-proton/pull/1
>
>
>
> On Apr 30, 2014, at 10:47 AM, Robbie Gemmell <ro...@gmail.com>
> wrote:
>
> > I think anyone can sign up for ReviewBoard themselves. It certainly
> didn't
> > used to be linked to the ASF LDAP in the past, presumably for that
> reason.
> >
> > Its probably also worth noting you can initiate pull requests against the
> > github mirrors. If it hasn't already been done for the proton mirror, we
> > can have the emails that would generate be directed to this list (e.g.
> >
> http://mail-archives.apache.org/mod_mbox/qpid-dev/201401.mbox/%3C20140130180355.3CF9E9160F7@tyr.zones.apache.org%3E
> ).
> > We obviously can't merge the pull request via github, but you can use
> > the reviewing tools etc and the resultant patch can be downloaded or
> > attached to a JIRA and then applied in the usual fashion (I believe there
> > is a commit message syntax that can be used to trigger closing the pull
> > request).
> >
> > Robbie
> >
> > On 30 April 2014 15:22, Rafael Schloming <rh...@alum.mit.edu> wrote:
> >
> >> On Wed, Apr 30, 2014 at 8:35 AM, Clebert Suconic <csuconic@redhat.com
> >>> wrote:
> >>
> >>> @Rafi: I see there is a patch review  process within Apache (based on
> >> your
> >>> other thread on Java8)
> >>>
> >>> Should we make this through the patch process at some point?
> >>>
> >>
> >> I'm fine looking at it on your git branch, but if you'd like to play
> with
> >> the review tool then feel free.  Just let me know if you need an account
> >> and I will try to remember how to set one up (or who to bug to get you
> >> one). ;-)
> >>
> >> --Rafael
> >>
>
>

Re: Optimizations on Proton-j

Posted by Clebert Suconic <cs...@redhat.com>.

I just submitted it as a git PR:

https://github.com/apache/qpid-proton/pull/1



On Apr 30, 2014, at 10:47 AM, Robbie Gemmell <ro...@gmail.com> wrote:

> I think anyone can sign up for ReviewBoard themselves. It certainly didn't
> used to be linked to the ASF LDAP in the past, presumably for that reason.
> 
> Its probably also worth noting you can initiate pull requests against the
> github mirrors. If it hasn't already been done for the proton mirror, we
> can have the emails that would generate be directed to this list (e.g.
> http://mail-archives.apache.org/mod_mbox/qpid-dev/201401.mbox/%3C20140130180355.3CF9E9160F7@tyr.zones.apache.org%3E).
> We obviously can't merge the pull request via github, but you can use
> the reviewing tools etc and the resultant patch can be downloaded or
> attached to a JIRA and then applied in the usual fashion (I believe there
> is a commit message syntax that can be used to trigger closing the pull
> request).
> 
> Robbie
> 
> On 30 April 2014 15:22, Rafael Schloming <rh...@alum.mit.edu> wrote:
> 
>> On Wed, Apr 30, 2014 at 8:35 AM, Clebert Suconic <csuconic@redhat.com
>>> wrote:
>> 
>>> @Rafi: I see there is a patch review  process within Apache (based on
>> your
>>> other thread on Java8)
>>> 
>>> Should we make this through the patch process at some point?
>>> 
>> 
>> I'm fine looking at it on your git branch, but if you'd like to play with
>> the review tool then feel free.  Just let me know if you need an account
>> and I will try to remember how to set one up (or who to bug to get you
>> one). ;-)
>> 
>> --Rafael
>>

Re: Optimizations on Proton-j

Posted by Robbie Gemmell <ro...@gmail.com>.

I think anyone can sign up for ReviewBoard themselves. It certainly didn't
used to be linked to the ASF LDAP in the past, presumably for that reason.

Its probably also worth noting you can initiate pull requests against the
github mirrors. If it hasn't already been done for the proton mirror, we
can have the emails that would generate be directed to this list (e.g.
http://mail-archives.apache.org/mod_mbox/qpid-dev/201401.mbox/%3C20140130180355.3CF9E9160F7@tyr.zones.apache.org%3E).
We obviously can't merge the pull request via github, but you can use
the reviewing tools etc and the resultant patch can be downloaded or
attached to a JIRA and then applied in the usual fashion (I believe there
is a commit message syntax that can be used to trigger closing the pull
request).

Robbie

On 30 April 2014 15:22, Rafael Schloming <rh...@alum.mit.edu> wrote:

> On Wed, Apr 30, 2014 at 8:35 AM, Clebert Suconic <csuconic@redhat.com
> >wrote:
>
> > @Rafi: I see there is a patch review  process within Apache (based on
> your
> > other thread on Java8)
> >
> > Should we make this through the patch process at some point?
> >
>
> I'm fine looking at it on your git branch, but if you'd like to play with
> the review tool then feel free.  Just let me know if you need an account
> and I will try to remember how to set one up (or who to bug to get you
> one). ;-)
>
> --Rafael
>

Re: Optimizations on Proton-j

Posted by Rafael Schloming <rh...@alum.mit.edu>.

On Wed, Apr 30, 2014 at 8:35 AM, Clebert Suconic <cs...@redhat.com>wrote:

> @Rafi: I see there is a patch review  process within Apache (based on your
> other thread on Java8)
>
> Should we make this through the patch process at some point?
>

I'm fine looking at it on your git branch, but if you'd like to play with
the review tool then feel free.  Just let me know if you need an account
and I will try to remember how to set one up (or who to bug to get you
one). ;-)

--Rafael

Re: Optimizations on Proton-j

Posted by Clebert Suconic <cs...@redhat.com>.

@Rafi: I see there is a patch review  process within Apache (based on your other thread on Java8)

Should we make this through the patch process at some point?


On Apr 29, 2014, at 4:04 PM, Clebert Suconic <cs...@redhat.com> wrote:

> 
> On Apr 29, 2014, at 4:00 PM, Rafael Schloming <rh...@alum.mit.edu> wrote:
>>> 
>>> Notice that this will also minimize the footprint of the codec but I'm not
>>> measuring that here.
>>> 
>> 
>> Just out of curiosity, when you say minimize the footprint, are you
>> referring to the in memory overhead, or do you mean the encoded size on the
>> wire?
> 
> I didn't change any encoding size on these refactoring. The code should produce the same bytes as it was producing before.
> 
> 
> I meant memory overhead.. Before my changes you had to instantiate a Decoder per Connection, and a Decoder per Message (that was actually recently changed to per thread). The decoder has the DefinedTypes and registered objects... with this change now you can have a single instance of the Decoder and Encoder per JVM since it's stateless now.
>

Re: Optimizations on Proton-j

Posted by Clebert Suconic <cs...@redhat.com>.

On Apr 29, 2014, at 4:00 PM, Rafael Schloming <rh...@alum.mit.edu> wrote:
>> 
>> Notice that this will also minimize the footprint of the codec but I'm not
>> measuring that here.
>> 
> 
> Just out of curiosity, when you say minimize the footprint, are you
> referring to the in memory overhead, or do you mean the encoded size on the
> wire?

I didn't change any encoding size on these refactoring. The code should produce the same bytes as it was producing before.

I meant memory overhead.. Before my changes you had to instantiate a Decoder per Connection, and a Decoder per Message (that was actually recently changed to per thread). The decoder has the DefinedTypes and registered objects... with this change now you can have a single instance of the Decoder and Encoder per JVM since it's stateless now.

Re: Optimizations on Proton-j

Posted by Rafael Schloming <rh...@alum.mit.edu>.

On Tue, Apr 29, 2014 at 9:27 AM, Clebert Suconic <cs...@redhat.com>wrote:

> I have done some work last week on optimizing the Codec.. and I think i've
> gotten some interesting results.
>
>
> - The Decoder now is stateless, meaning the same instance can be used over
> and over (no more need for one instance per connection). Bozo Dragojefic
> has actually seen how heavy is to create a Decoder and has recently
> optimized MessageImpl to always take the same instance through
> ThreadLocals. This optimization goes a step further
> - I have changed the ListDecoders somehow  you won't need intermediate
> objects to parse Types. For now I have only made Transfer as that effective
> type but I could do that for all the other Types at some point
> - There were a few hotspots that I found on the test and I have refactored
> accordingly, meaning no semantic changes.
>
> As a result of these optimizations, DecoderImpl won't have a setBuffer
> method any longer. Instead of that each method will take a
> read(ReadableBuffer..., old signature).
>
>
> And talking about ReadableBuffer, I have introduced the interface
> ReadableBuffer. When integrating on the broker, I had a situation where I
> won't have a ByteBuffer, and this interface will allow me to further
> optimize the Parser later as I could take the usage of Netty Buffer (aka
> ByteBuf).
>
>
> You will find these optimizations on my branch on github:
> https://github.com/clebertsuconic/qpid-proton/tree/optimizations
>
>
> Where I will have two commits:
>
> I - a micro benchmark where I added a testcase doing a direct read on the
> buffer without any framework. I've actually written a simple parser that
> will work for the byte array I have, but that's very close to reading
> directly from the bytes.
>    I used that to compare raw reading and interpreting the buffer to the
> current framework we had.
>    I was actually concerned about the number of intermediate objects, so I
> used that to map these differences.
>
>
> https://github.com/clebertsuconic/qpid-proton/commit/7b2b02649e5bdd35aa2e4cc487ffb91c01e75685
>
>
> I - a commit with the actual optimizations:
>
>
>
> https://github.com/clebertsuconic/qpid-proton/commit/305ecc6aaa5192fc0a1ae42b90cb4eb8ddfe046e
>
>
>
>
>
>
>
>
> Without these optimizations my MicroBenchmark, parsing 10000000L instances
> of Transfer, without reallocating any buffers could complete on my laptop
> in:
>
> - 3480 milliseconds , against 750 milliseconds with raw reading
>
>
> After these optimizations:
> - 1927 milliseconds, against 750 milliseconds with raw reading
>

This sounds very promising and an excellent excuse for me to dust off some
of my somewhat rusty git skills. ;-)


>
> Notice that this will also minimize the footprint of the codec but I'm not
> measuring that here.
>

Just out of curiosity, when you say minimize the footprint, are you
referring to the in memory overhead, or do you mean the encoded size on the
wire?


> I'm looking forward to work with this group, I actually had a meeting with
> Rafi and Ted last week, and I plan to work closer to you guys on this
>

Excellent! I'm also looking forward to digging in a bit more on the Java
side of things.

--Rafael