You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Gonzalo Ortiz Jaureguizar <go...@gmail.com> on 2017/09/15 11:21:31 UTC

A simple benchmark on Java implementation

Hi there,

I have created a little JMH test to check the Arrow performance. You can
found it here. The idea is to test an API with implementations on heap
arrays, nio buffers (that follow the arrow format) and Arrow. At this
moment the API only supports nullable int buffers and contains read only
methods.

The benchmark run on automatically generated vectors of 2^10, 2^20 and 2^26
never-null integers and it tests three different access patterns:

   - Random access: Where a random element is read
   - Sequential access: Where a random index is chosen and then the
   following 32 elements are read
   - Sum access: Similar to sequential, but instead of simply read them,
   they are added into a long.

Disclaimer: Microbenchmars are error prone and I'm not an expert on JMH and
this benchmark has been done in a couple of hours.

Results
On all charts the Y axis is the relation between the throughput of the
offheap versions with the heap version (so the higher the better).

TD;LR: It seems that the complex structures of Arrow are preventing some
optimizations on the JVM.

Random
The random access is quite good. The heap version is a little bit better,
but both offheap solutions seems pretty similar.

        1K      1M      64M
Array   75.139  53.025  10.872
Arrow   67.399  43.491  10.42
Buf     82.877  38.092  10.753
[image: Imágenes integradas 1]

Sequential
If you see the absolute values, it is clear that JMH's blackhole is
preventing any JVM optimization on the loop. I think thats fine, as it
simulates several calls to the vector on a *not omptimized* scenario.
It seems that the JVM is not smart enough to optimize offheap sequential as
much as it does with heap structures. Although both offheap implementations
are worse than the heap version, the one that uses Arrow is sensible worse
than the one that directly uses ByteBuffers:
        1K      1M      64M
Array   6.335   4.563   3.145
Arrow   2.664   2.453   1.989
Buf     4.456   3.971   3.018
[image: Imágenes integradas 2]

Sum
The result is awful. It seems that the JVM is able to optimize (I guess
vectorizing) the heap and ByteBuffer implementation (at least with small
vectors), but not in the case with the Arrow version. I guess it is due to
the indirections and deeper stack required to execute the same code on
Arrow.

        1K      1M      64M
Array   44.833  26.617  9.787
Arrow   3.426   3.265   2.521
Buf     38.288  19.295  5.668
[image: Imágenes integradas 4]

Re: A simple benchmark on Java implementation

Posted by Gonzalo Ortiz Jaureguizar <go...@gmail.com>.
Yeah... I said "you can found it here" but forgot to add the link. My bad.
You can found it here <https://github.com/gortiz/arrow-jmh>.

2017-09-15 15:33 GMT+02:00 Wes McKinney <we...@gmail.com>:

> hi Gonzalo,
>
> This is interesting, thank you. Do you have code available to reproduce
> these results?
>
> - Wes
>
> On Fri, Sep 15, 2017 at 9:28 AM, Gonzalo Ortiz Jaureguizar <
> golthiryus@gmail.com> wrote:
>
> > I forgot to say that test were executed on my Ubuntu 17.04 laptop on
> > Oracle JDK 1.8.0_144-b01.
> >
> > 2017-09-15 13:21 GMT+02:00 Gonzalo Ortiz Jaureguizar <
> golthiryus@gmail.com
> > >:
> >
> >> Hi there,
> >>
> >> I have created a little JMH test to check the Arrow performance. You can
> >> found it here. The idea is to test an API with implementations on heap
> >> arrays, nio buffers (that follow the arrow format) and Arrow. At this
> >> moment the API only supports nullable int buffers and contains read only
> >> methods.
> >>
> >> The benchmark run on automatically generated vectors of 2^10, 2^20 and
> >> 2^26 never-null integers and it tests three different access patterns:
> >>
> >>    - Random access: Where a random element is read
> >>    - Sequential access: Where a random index is chosen and then the
> >>    following 32 elements are read
> >>    - Sum access: Similar to sequential, but instead of simply read them,
> >>    they are added into a long.
> >>
> >> Disclaimer: Microbenchmars are error prone and I'm not an expert on JMH
> >> and this benchmark has been done in a couple of hours.
> >>
> >> Results
> >> On all charts the Y axis is the relation between the throughput of the
> >> offheap versions with the heap version (so the higher the better).
> >>
> >> TD;LR: It seems that the complex structures of Arrow are preventing some
> >> optimizations on the JVM.
> >>
> >> Random
> >> The random access is quite good. The heap version is a little bit
> better,
> >> but both offheap solutions seems pretty similar.
> >>
> >>         1K      1M      64M
> >> Array   75.139  53.025  10.872
> >> Arrow   67.399  43.491  10.42
> >> Buf     82.877  38.092  10.753
> >> [image: Imágenes integradas 1]
> >>
> >> Sequential
> >> If you see the absolute values, it is clear that JMH's blackhole is
> >> preventing any JVM optimization on the loop. I think thats fine, as it
> >> simulates several calls to the vector on a *not omptimized* scenario.
> >> It seems that the JVM is not smart enough to optimize offheap sequential
> >> as much as it does with heap structures. Although both offheap
> >> implementations are worse than the heap version, the one that uses
> Arrow is
> >> sensible worse than the one that directly uses ByteBuffers:
> >>         1K      1M      64M
> >> Array   6.335   4.563   3.145
> >> Arrow   2.664   2.453   1.989
> >> Buf     4.456   3.971   3.018
> >> [image: Imágenes integradas 2]
> >>
> >> Sum
> >> The result is awful. It seems that the JVM is able to optimize (I guess
> >> vectorizing) the heap and ByteBuffer implementation (at least with small
> >> vectors), but not in the case with the Arrow version. I guess it is due
> to
> >> the indirections and deeper stack required to execute the same code on
> >> Arrow.
> >>
> >>         1K      1M      64M
> >> Array   44.833  26.617  9.787
> >> Arrow   3.426   3.265   2.521
> >> Buf     38.288  19.295  5.668
> >> [image: Imágenes integradas 4]
> >>
> >>
> >
>

Re: A simple benchmark on Java implementation

Posted by Wes McKinney <we...@gmail.com>.
hi Gonzalo,

This is interesting, thank you. Do you have code available to reproduce
these results?

- Wes

On Fri, Sep 15, 2017 at 9:28 AM, Gonzalo Ortiz Jaureguizar <
golthiryus@gmail.com> wrote:

> I forgot to say that test were executed on my Ubuntu 17.04 laptop on
> Oracle JDK 1.8.0_144-b01.
>
> 2017-09-15 13:21 GMT+02:00 Gonzalo Ortiz Jaureguizar <golthiryus@gmail.com
> >:
>
>> Hi there,
>>
>> I have created a little JMH test to check the Arrow performance. You can
>> found it here. The idea is to test an API with implementations on heap
>> arrays, nio buffers (that follow the arrow format) and Arrow. At this
>> moment the API only supports nullable int buffers and contains read only
>> methods.
>>
>> The benchmark run on automatically generated vectors of 2^10, 2^20 and
>> 2^26 never-null integers and it tests three different access patterns:
>>
>>    - Random access: Where a random element is read
>>    - Sequential access: Where a random index is chosen and then the
>>    following 32 elements are read
>>    - Sum access: Similar to sequential, but instead of simply read them,
>>    they are added into a long.
>>
>> Disclaimer: Microbenchmars are error prone and I'm not an expert on JMH
>> and this benchmark has been done in a couple of hours.
>>
>> Results
>> On all charts the Y axis is the relation between the throughput of the
>> offheap versions with the heap version (so the higher the better).
>>
>> TD;LR: It seems that the complex structures of Arrow are preventing some
>> optimizations on the JVM.
>>
>> Random
>> The random access is quite good. The heap version is a little bit better,
>> but both offheap solutions seems pretty similar.
>>
>>         1K      1M      64M
>> Array   75.139  53.025  10.872
>> Arrow   67.399  43.491  10.42
>> Buf     82.877  38.092  10.753
>> [image: Imágenes integradas 1]
>>
>> Sequential
>> If you see the absolute values, it is clear that JMH's blackhole is
>> preventing any JVM optimization on the loop. I think thats fine, as it
>> simulates several calls to the vector on a *not omptimized* scenario.
>> It seems that the JVM is not smart enough to optimize offheap sequential
>> as much as it does with heap structures. Although both offheap
>> implementations are worse than the heap version, the one that uses Arrow is
>> sensible worse than the one that directly uses ByteBuffers:
>>         1K      1M      64M
>> Array   6.335   4.563   3.145
>> Arrow   2.664   2.453   1.989
>> Buf     4.456   3.971   3.018
>> [image: Imágenes integradas 2]
>>
>> Sum
>> The result is awful. It seems that the JVM is able to optimize (I guess
>> vectorizing) the heap and ByteBuffer implementation (at least with small
>> vectors), but not in the case with the Arrow version. I guess it is due to
>> the indirections and deeper stack required to execute the same code on
>> Arrow.
>>
>>         1K      1M      64M
>> Array   44.833  26.617  9.787
>> Arrow   3.426   3.265   2.521
>> Buf     38.288  19.295  5.668
>> [image: Imágenes integradas 4]
>>
>>
>

Re: A simple benchmark on Java implementation

Posted by Gonzalo Ortiz Jaureguizar <go...@gmail.com>.
I forgot to say that test were executed on my Ubuntu 17.04 laptop on Oracle
JDK 1.8.0_144-b01.

2017-09-15 13:21 GMT+02:00 Gonzalo Ortiz Jaureguizar <go...@gmail.com>:

> Hi there,
>
> I have created a little JMH test to check the Arrow performance. You can
> found it here. The idea is to test an API with implementations on heap
> arrays, nio buffers (that follow the arrow format) and Arrow. At this
> moment the API only supports nullable int buffers and contains read only
> methods.
>
> The benchmark run on automatically generated vectors of 2^10, 2^20 and
> 2^26 never-null integers and it tests three different access patterns:
>
>    - Random access: Where a random element is read
>    - Sequential access: Where a random index is chosen and then the
>    following 32 elements are read
>    - Sum access: Similar to sequential, but instead of simply read them,
>    they are added into a long.
>
> Disclaimer: Microbenchmars are error prone and I'm not an expert on JMH
> and this benchmark has been done in a couple of hours.
>
> Results
> On all charts the Y axis is the relation between the throughput of the
> offheap versions with the heap version (so the higher the better).
>
> TD;LR: It seems that the complex structures of Arrow are preventing some
> optimizations on the JVM.
>
> Random
> The random access is quite good. The heap version is a little bit better,
> but both offheap solutions seems pretty similar.
>
>         1K      1M      64M
> Array   75.139  53.025  10.872
> Arrow   67.399  43.491  10.42
> Buf     82.877  38.092  10.753
> [image: Imágenes integradas 1]
>
> Sequential
> If you see the absolute values, it is clear that JMH's blackhole is
> preventing any JVM optimization on the loop. I think thats fine, as it
> simulates several calls to the vector on a *not omptimized* scenario.
> It seems that the JVM is not smart enough to optimize offheap sequential
> as much as it does with heap structures. Although both offheap
> implementations are worse than the heap version, the one that uses Arrow is
> sensible worse than the one that directly uses ByteBuffers:
>         1K      1M      64M
> Array   6.335   4.563   3.145
> Arrow   2.664   2.453   1.989
> Buf     4.456   3.971   3.018
> [image: Imágenes integradas 2]
>
> Sum
> The result is awful. It seems that the JVM is able to optimize (I guess
> vectorizing) the heap and ByteBuffer implementation (at least with small
> vectors), but not in the case with the Arrow version. I guess it is due to
> the indirections and deeper stack required to execute the same code on
> Arrow.
>
>         1K      1M      64M
> Array   44.833  26.617  9.787
> Arrow   3.426   3.265   2.521
> Buf     38.288  19.295  5.668
> [image: Imágenes integradas 4]
>
>