You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@arrow.apache.org by Micah Kornfield <em...@gmail.com> on 2019/11/15 09:07:41 UTC

Re: [Discuss][Java] 64-bit lengths for ValueVectors

Apologies for the long delay, I chose to do the minimal work of limiting
this change [1] to allowing ArrowBuf to 64-bit lengths.  This would unblock
work on LargeString and LargeBinary.  If this change looks OK, I think
there is some follow-up work to add more thorough unit/integration tests.

As an aside, it does seem like the 2GB limit is affecting some users in
Spark [2][3], so hopefully LargeString would help with this.

Allowing more than MAX_INT elements is Vectors/array still a blocker for
making LargeList useful.

Thanks,
Micah

[1]  https://github.com/apache/arrow/pull/5020
[2]
https://stackoverflow.com/questions/58739888/spark-is-it-possible-to-increase-pyarrow-buffer#comment103812119_58739888
[3] https://issues.apache.org/jira/browse/ARROW-4890

On Fri, Aug 23, 2019 at 8:33 AM Jacques Nadeau <ja...@apache.org> wrote:

>
>
> On Fri, Aug 23, 2019, 8:55 PM Micah Kornfield <em...@gmail.com>
> wrote:
>
>> The vector indexes being limited to 32 bits doesn't limit the addressing
>>> to 32 bit chunks of memory. For example, you're prime example before was
>>> image data. Having 2 billion images of 1mb images would still be supported
>>> without changing the index addressing.
>>
>> This might be pre-coffee math, but I think we are limited to
>> approximately 2000 images because an ArrowBuf only holds up-to 2 billion
>> bytes [1].  While we have plenty of room for the offsets, we quickly run
>> out of contiguous data storage space. For LargeString and LargeBinary this
>> could be fixed by changing ArrowBuf.
>>
>> LargeArray faces the same problem only it applies to its child vectors.
>> Supporting LargeArray properly is really what drove the large-scale
>> interface change.
>>
>
> My expressed concern about these changes was specifically about the use of
> long for get/set in the vector interfaces. I'm not saying that we constrain
> memory/ArrowBufs to 32bits.
>
>
>> Rebase would help if possible.
>>
>> I'll try to get to this in the next few days.
>>
>> [1]
>> https://github.com/apache/arrow/blob/95175fe7cb8439eebe6d2f6e0495f551d6864380/java/memory/src/main/java/io/netty/buffer/ArrowBuf.java#L164
>>
>> On Fri, Aug 23, 2019 at 4:55 AM Jacques Nadeau <ja...@apache.org>
>> wrote:
>>
>>>
>>>
>>> On Fri, Aug 23, 2019, 11:49 AM Micah Kornfield <em...@gmail.com>
>>> wrote:
>>>
>>>> I don't think we should couple this discussion with the implementation
>>>>> of large list, etc since I think those two concepts are independent.
>>>>
>>>> I'm still trying to balance in my mind which is a worse experience for
>>>> consumers of the libraries for these types.  Claiming that Java supports
>>>> these types but throwing an exception when the Vectors exceed 32-bits or
>>>> just say they aren't supported until we have 64-bit support in Java.
>>>>
>>>
>>> The vector indexes being limited to 32 bits doesn't limit the addressing
>>> to 32 bit chunks of memory. For example, you're prime example before was
>>> image data. Having 2 billion images of 1mb images would still be supported
>>> without changing the index addressing.
>>>
>>>
>>>
>>>>
>>>>> I've asked some others on my team their opinions on the risk here. I
>>>>> think we should probably review some our more complex vector interactions
>>>>> and see how the jvm's assembly changes with this kind of change. Using
>>>>> microbenchmarking is good but I think we also need to see whether we're
>>>>> constantly inserting additional instructions or if in most cases, this
>>>>> actually doesn't impact instruction count.
>>>>
>>>>
>>>> Is this something that your team will take on?
>>>>
>>>
>>>
>>> Yeah, we need to look at this I think.
>>>
>>> Do you need a rebased version of the PR or is the existing one
>>>> sufficient?
>>>>
>>>
>>> Rebase would help if possible.
>>>
>>>
>>>
>>>> Thanks,
>>>> Micah
>>>>
>>>> On Thu, Aug 22, 2019 at 8:56 PM Jacques Nadeau <ja...@apache.org>
>>>> wrote:
>>>>
>>>>> I don't think we should couple this discussion with the implementation
>>>>> of large list, etc since I think those two concepts are independent.
>>>>>
>>>>> I've asked some others on my team their opinions on the risk here. I
>>>>> think we should probably review some our more complex vector interactions
>>>>> and see how the jvm's assembly changes with this kind of change. Using
>>>>> microbenchmarking is good but I think we also need to see whether we're
>>>>> constantly inserting additional instructions or if in most cases, this
>>>>> actually doesn't impact instruction count.
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Aug 21, 2019 at 12:18 PM Micah Kornfield <
>>>>> emkornfield@gmail.com> wrote:
>>>>>
>>>>>>
>>>>>>> With regards to the reference implementation point. It is a good
>>>>>>> point. I'm on vacation this week. Unless you're pushing hard on this, can
>>>>>>> we pick this up and discuss more next week?
>>>>>>
>>>>>>
>>>>>> Hi Jacques, I hope you had a good rest.  Any more thoughts on the
>>>>>> reference implementation aspect of this?
>>>>>>
>>>>>>
>>>>>>> To copy the sentiments from the 0.15.0 release thread, I think it
>>>>>>> would be best to decouple this discussion from the release timeline
>>>>>>> given how many people we have relying on regular releases coming out.
>>>>>>> We can keep continue making major 0.x releases until we're ready to
>>>>>>> release 1.0.0.
>>>>>>
>>>>>>
>>>>>> I'm OK with it as long as other stakeholders are. Timed releases are
>>>>>> the way to go.  As stated on the release thread [1] we need a better
>>>>>> mechanism to avoid this type of issue arising again.  The release thread
>>>>>> also had some more discussion on compatibility.
>>>>>>
>>>>>> Thanks,
>>>>>> Micah
>>>>>>
>>>>>> [1]
>>>>>> https://lists.apache.org/thread.html/d70feeceaf2570906ade117030b29887af7c77ca5c4a976e6d555920@%3Cdev.arrow.apache.org%3E
>>>>>>
>>>>>>
>>>>>> On Wed, Aug 14, 2019 at 3:23 PM Wes McKinney <we...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> On Sun, Aug 11, 2019 at 9:40 PM Micah Kornfield <
>>>>>>> emkornfield@gmail.com> wrote:
>>>>>>> >
>>>>>>> > Hi Wes and Jacques,
>>>>>>> > See responses below.
>>>>>>> >
>>>>>>> > With regards to the reference implementation point. It is a good
>>>>>>> point. I'm
>>>>>>> > > on vacation this week. Unless you're pushing hard on this, can
>>>>>>> we pick this
>>>>>>> > > up and discuss more next week?
>>>>>>> >
>>>>>>> >
>>>>>>> > Sure thing, enjoy your vacation.  I think the only practical
>>>>>>> implications
>>>>>>> > are it delays choices around implementing LargeList, LargeBinary,
>>>>>>> > LargeString in Java, which in turn might push out the 0.15.0
>>>>>>> release.
>>>>>>> >
>>>>>>>
>>>>>>> To copy the sentiments from the 0.15.0 release thread, I think it
>>>>>>> would be best to decouple this discussion from the release timeline
>>>>>>> given how many people we have relying on regular releases coming out.
>>>>>>> We can keep continue making major 0.x releases until we're ready to
>>>>>>> release 1.0.0.
>>>>>>>
>>>>>>> > My stance on this is that I don't know how important it is for
>>>>>>> Java to
>>>>>>> > > support vectors over INT32_MAX elements. The use cases enabled by
>>>>>>> > > having very large arrays seem to be concentrated in the native
>>>>>>> code
>>>>>>> > > world (e.g. C/C++/Rust) -- that could just be
>>>>>>> implementation-centrism
>>>>>>> > > on my part, though.
>>>>>>> >
>>>>>>> >
>>>>>>> > A data point against this view is Spark has done work to eliminate
>>>>>>> 2GB
>>>>>>> > memory limits on its block sizes [1].  I don't claim to understand
>>>>>>> the
>>>>>>> > implications of this. Bryan might you have any thoughts here?  I'm
>>>>>>> OK with
>>>>>>> > INT32_MAX, as well, I think we should think about what this means
>>>>>>> for
>>>>>>> > adding Large types to Java and implications for reference
>>>>>>> implementations.
>>>>>>> >
>>>>>>> > Thanks,
>>>>>>> > Micah
>>>>>>> >
>>>>>>> > [1] https://issues.apache.org/jira/browse/SPARK-6235
>>>>>>> >
>>>>>>> > On Sun, Aug 11, 2019 at 6:31 PM Jacques Nadeau <ja...@apache.org>
>>>>>>> wrote:
>>>>>>> >
>>>>>>> > > Hey Micah,
>>>>>>> > >
>>>>>>> > > Appreciate the offer on the compiling. The reality is I'm more
>>>>>>> concerned
>>>>>>> > > about the unknowns than the compiling issue itself. Any time
>>>>>>> you've been
>>>>>>> > > tuning for a while, changing something like this could be
>>>>>>> totally fine or
>>>>>>> > > cause a couple of major issues. For example, we've done a very
>>>>>>> large amount
>>>>>>> > > of work reducing heap memory footprint of the vectors. Are
>>>>>>> target is to
>>>>>>> > > actually get it down to 24 bytes per ArrowBuf and 24 bytes heap
>>>>>>> per vector
>>>>>>> > > (not including arrow bufs).
>>>>>>> > >
>>>>>>> > > With regards to the reference implementation point. It is a good
>>>>>>> point.
>>>>>>> > > I'm on vacation this week. Unless you're pushing hard on this,
>>>>>>> can we pick
>>>>>>> > > this up and discuss more next week?
>>>>>>> > >
>>>>>>> > > thanks,
>>>>>>> > > Jacques
>>>>>>> > >
>>>>>>> > > On Sat, Aug 10, 2019 at 7:39 PM Micah Kornfield <
>>>>>>> emkornfield@gmail.com>
>>>>>>> > > wrote:
>>>>>>> > >
>>>>>>> > >> Hi Jacques,
>>>>>>> > >> I definitely understand these concerns and this change is risky
>>>>>>> because it
>>>>>>> > >> is so large.  Perhaps, creating a new hierarchy, might be the
>>>>>>> cleanest way
>>>>>>> > >> of dealing with this.  This could have other benefits like
>>>>>>> cleaning up
>>>>>>> > >> some
>>>>>>> > >> cruft around dictionary encode and "orphaned" method.   Per
>>>>>>> past e-mail
>>>>>>> > >> threads I agree it is beneficial to have 2 separate reference
>>>>>>> > >> implementations that can communicate fully, and my intent here
>>>>>>> was to
>>>>>>> > >> close
>>>>>>> > >> that gap.
>>>>>>> > >>
>>>>>>> > >> Trying to
>>>>>>> > >> > determine the ramifications of these changes would be
>>>>>>> challenging and
>>>>>>> > >> time
>>>>>>> > >> > consuming against all the different ways we interact with the
>>>>>>> Arrow Java
>>>>>>> > >> > library.
>>>>>>> > >>
>>>>>>> > >>
>>>>>>> > >> Understood.  I took a quick look at Dremio-OSS it seems like it
>>>>>>> has a
>>>>>>> > >> simple java build system?  If it is helpful, I can try to get a
>>>>>>> fork
>>>>>>> > >> running that at least compiles against this PR.  My plan would
>>>>>>> be to cast
>>>>>>> > >> any place that was changed to return a long back to an int, so
>>>>>>> in essence
>>>>>>> > >> the Dremio algorithms would reman 32-bit implementations.
>>>>>>> > >>
>>>>>>> > >> I don't  have the infrastructure to test this change properly
>>>>>>> from a
>>>>>>> > >> distributed systems perspective, so it would still take some
>>>>>>> time from
>>>>>>> > >> Dremio to validate for regressions.
>>>>>>> > >>
>>>>>>> > >> I'm not saying I'm against this but want to make sure we've
>>>>>>> > >> > explored all less disruptive options before considering
>>>>>>> changing
>>>>>>> > >> something
>>>>>>> > >> > this fundamental (especially when I generally hold the view
>>>>>>> that large
>>>>>>> > >> cell
>>>>>>> > >> > counts against massive contiguous memory is an anti pattern
>>>>>>> to scalable
>>>>>>> > >> > analytical processing--purely subjective of course).
>>>>>>> > >>
>>>>>>> > >>
>>>>>>> > >> I'm open to other ideas here, as well. I don't think it is out
>>>>>>> of the
>>>>>>> > >> question to leave the Java implementation as 32-bit, but if we
>>>>>>> do, then I
>>>>>>> > >> think we should consider a different strategy for reference
>>>>>>> > >> implementations.
>>>>>>> > >>
>>>>>>> > >> Thanks,
>>>>>>> > >> Micah
>>>>>>> > >>
>>>>>>> > >> On Sat, Aug 10, 2019 at 5:09 PM Jacques Nadeau <
>>>>>>> jacques@apache.org>
>>>>>>> > >> wrote:
>>>>>>> > >>
>>>>>>> > >> > Hey Micah, I didn't have a particular path in mind. Was
>>>>>>> thinking more
>>>>>>> > >> along
>>>>>>> > >> > the lines of extra methods as opposed to separate classes.
>>>>>>> > >> >
>>>>>>> > >> > Arrow hasn't historically been a place where we're writing
>>>>>>> algorithms in
>>>>>>> > >> > Java so the fact that they aren't there doesn't mean they
>>>>>>> don't exist.
>>>>>>> > >> We
>>>>>>> > >> > have a large amount of code that depends on the current
>>>>>>> behavior that is
>>>>>>> > >> > deployed in hundreds of customer clusters (you can peruse our
>>>>>>> dremio
>>>>>>> > >> repo
>>>>>>> > >> > to see how extensively we leverage Arrow if interested).
>>>>>>> Trying to
>>>>>>> > >> > determine the ramifications of these changes would be
>>>>>>> challenging and
>>>>>>> > >> time
>>>>>>> > >> > consuming against all the different ways we interact with the
>>>>>>> Arrow Java
>>>>>>> > >> > library. I'm not saying I'm against this but want to make
>>>>>>> sure we've
>>>>>>> > >> > explored all less disruptive options before considering
>>>>>>> changing
>>>>>>> > >> something
>>>>>>> > >> > this fundamental (especially when I generally hold the view
>>>>>>> that large
>>>>>>> > >> cell
>>>>>>> > >> > counts against massive contiguous memory is an anti pattern
>>>>>>> to scalable
>>>>>>> > >> > analytical processing--purely subjective of course).
>>>>>>> > >> >
>>>>>>> > >> > On Sat, Aug 10, 2019, 4:17 PM Micah Kornfield <
>>>>>>> emkornfield@gmail.com>
>>>>>>> > >> > wrote:
>>>>>>> > >> >
>>>>>>> > >> > > Hi Jacques,
>>>>>>> > >> > > What avenue were you thinking for supporting both paths?
>>>>>>>  I didn't
>>>>>>> > >> want
>>>>>>> > >> > > to pursue a different class hierarchy, because I felt like
>>>>>>> that would
>>>>>>> > >> > > effectively fork the code base, but that is potentially an
>>>>>>> option that
>>>>>>> > >> > > would allow us to have a complete reference implementation
>>>>>>> in Java
>>>>>>> > >> that
>>>>>>> > >> > can
>>>>>>> > >> > > fully interact with C++, without major changes to this code.
>>>>>>> > >> > >
>>>>>>> > >> > > For supporting both APIs on the same classes/interfaces, I
>>>>>>> think they
>>>>>>> > >> > > roughly fall into three categories, changes to input
>>>>>>> parameters,
>>>>>>> > >> changes
>>>>>>> > >> > to
>>>>>>> > >> > > output parameters and algorithm changes.
>>>>>>> > >> > >
>>>>>>> > >> > > For inputs, changing from int to long is essentially a
>>>>>>> no-op from the
>>>>>>> > >> > > compiler perspective.  From the limited micro-benchmarking
>>>>>>> this also
>>>>>>> > >> > > doesn't seem to have a performance impact.  So we could
>>>>>>> keep two
>>>>>>> > >> versions
>>>>>>> > >> > > of the methods that only differ on inputs, but it is not
>>>>>>> clear what
>>>>>>> > >> the
>>>>>>> > >> > > value of that would be.
>>>>>>> > >> > >
>>>>>>> > >> > > For outputs, we can't support methods "long getLength()"
>>>>>>> and "int
>>>>>>> > >> > > getLength()" in the same class, so we would be forced into
>>>>>>> something
>>>>>>> > >> like
>>>>>>> > >> > > "long getLength(boolean dummy)" which I think is a less
>>>>>>> desirable.
>>>>>>> > >> > >
>>>>>>> > >> > > For algorithm changes, there did not appear to be too many
>>>>>>> places
>>>>>>> > >> where
>>>>>>> > >> > we
>>>>>>> > >> > > actually loop over all elements (it is quite possible I
>>>>>>> missed
>>>>>>> > >> something
>>>>>>> > >> > > here), the ones that I did find I was able to mitigate
>>>>>>> performance
>>>>>>> > >> > > penalties as noted above.  Some of the current
>>>>>>> implementation will
>>>>>>> > >> get a
>>>>>>> > >> > > lot slower for "large arrays", but we can likely fix those
>>>>>>> later or in
>>>>>>> > >> > this
>>>>>>> > >> > > PR with a nested while loop instead of 2 for loops.
>>>>>>> > >> > >
>>>>>>> > >> > > Thanks,
>>>>>>> > >> > > Micah
>>>>>>> > >> > >
>>>>>>> > >> > >
>>>>>>> > >> > > On Saturday, August 10, 2019, Jacques Nadeau <
>>>>>>> jacques@apache.org>
>>>>>>> > >> wrote:
>>>>>>> > >> > >
>>>>>>> > >> > >> This is a pretty massive change to the apis. I wonder how
>>>>>>> nasty it
>>>>>>> > >> would
>>>>>>> > >> > >> be to just support both paths. Have you evaluated how
>>>>>>> complex that
>>>>>>> > >> > would be?
>>>>>>> > >> > >>
>>>>>>> > >> > >> On Wed, Aug 7, 2019 at 11:08 PM Micah Kornfield <
>>>>>>> > >> emkornfield@gmail.com>
>>>>>>> > >> > >> wrote:
>>>>>>> > >> > >>
>>>>>>> > >> > >>> After more investigation, it looks like Float8Benchmarks
>>>>>>> at least
>>>>>>> > >> on my
>>>>>>> > >> > >>> machine are within the range of noise.
>>>>>>> > >> > >>>
>>>>>>> > >> > >>> For BitVectorHelper I pushed a new commit [1], seems to
>>>>>>> bring the
>>>>>>> > >> > >>> BitVectorHelper benchmarks back inline (and even with some
>>>>>>> > >> improvement
>>>>>>> > >> > >>> for
>>>>>>> > >> > >>> getNullCountBenchmark).
>>>>>>> > >> > >>>
>>>>>>> > >> > >>> Benchmark                                        Mode
>>>>>>> Cnt   Score
>>>>>>> > >> > >>>  Error
>>>>>>> > >> > >>>  Units
>>>>>>> > >> > >>> BitVectorHelperBenchmarks.allBitsNullBenchmark   avgt
>>>>>>> 5   3.821 ±
>>>>>>> > >> > >>> 0.031
>>>>>>> > >> > >>>  ns/op
>>>>>>> > >> > >>> BitVectorHelperBenchmarks.getNullCountBenchmark  avgt
>>>>>>> 5  14.884 ±
>>>>>>> > >> > >>> 0.141
>>>>>>> > >> > >>>  ns/op
>>>>>>> > >> > >>>
>>>>>>> > >> > >>> I applied the same pattern to other loops that I could
>>>>>>> find, and for
>>>>>>> > >> > any
>>>>>>> > >> > >>> "for (long" loop on the critical path, I broke it up into
>>>>>>> two loops.
>>>>>>> > >> > the
>>>>>>> > >> > >>> first loop does iteration by integer, the second finishes
>>>>>>> off for
>>>>>>> > >> any
>>>>>>> > >> > >>> long
>>>>>>> > >> > >>> values.  As a side note it seems like optimization for
>>>>>>> loops using
>>>>>>> > >> long
>>>>>>> > >> > >>> counters at least have a semi-recent open bug for the JVM
>>>>>>> [2]
>>>>>>> > >> > >>>
>>>>>>> > >> > >>> Thanks,
>>>>>>> > >> > >>> Micah
>>>>>>> > >> > >>>
>>>>>>> > >> > >>> [1]
>>>>>>> > >> > >>>
>>>>>>> > >> > >>>
>>>>>>> > >> >
>>>>>>> > >>
>>>>>>> https://github.com/apache/arrow/pull/5020/commits/2ea2c1ae83e3baa7b9a99a6d06276d968df41797
>>>>>>> > >> > >>> [2] https://bugs.openjdk.java.net/browse/JDK-8223051
>>>>>>> > >> > >>>
>>>>>>> > >> > >>> On Wed, Aug 7, 2019 at 8:11 PM Micah Kornfield <
>>>>>>> > >> emkornfield@gmail.com>
>>>>>>> > >> > >>> wrote:
>>>>>>> > >> > >>>
>>>>>>> > >> > >>> > Indeed, the BoundChecking and CheckNullForGet variables
>>>>>>> can make a
>>>>>>> > >> > big
>>>>>>> > >> > >>> > difference.  I didn't initially run the benchmarks with
>>>>>>> these
>>>>>>> > >> turned
>>>>>>> > >> > on
>>>>>>> > >> > >>> > (you can see the result from above with
>>>>>>> Float8Benchmarks).  Here
>>>>>>> > >> are
>>>>>>> > >> > >>> new
>>>>>>> > >> > >>> > numbers including with the flags enabled.  It looks
>>>>>>> like using
>>>>>>> > >> longs
>>>>>>> > >> > >>> might
>>>>>>> > >> > >>> > be a little bit slower, I'll see what I can do to
>>>>>>> mitigate this.
>>>>>>> > >> > >>> >
>>>>>>> > >> > >>> > Ravindra also volunteered to try to benchmark the
>>>>>>> changes with
>>>>>>> > >> > Dremio's
>>>>>>> > >> > >>> > code on today's sync call.
>>>>>>> > >> > >>> >
>>>>>>> > >> > >>> > New
>>>>>>> > >> > >>> >
>>>>>>> > >> > >>> > Benchmark                                        Mode
>>>>>>> Cnt   Score
>>>>>>> > >> > >>>  Error
>>>>>>> > >> > >>> > Units
>>>>>>> > >> > >>> >
>>>>>>> > >> > >>> > BitVectorHelperBenchmarks.allBitsNullBenchmark   avgt
>>>>>>>   5
>>>>>>> > >>  4.176 ±
>>>>>>> > >> > >>> 1.292
>>>>>>> > >> > >>> > ns/op
>>>>>>> > >> > >>> >
>>>>>>> > >> > >>> > BitVectorHelperBenchmarks.getNullCountBenchmark  avgt
>>>>>>>   5
>>>>>>> > >> 26.102 ±
>>>>>>> > >> > >>> 0.700
>>>>>>> > >> > >>> > ns/op
>>>>>>> > >> > >>> >
>>>>>>> > >> > >>> > Float8Benchmarks.copyFromBenchmark   avgt    5  7.398 ±
>>>>>>> 0.084
>>>>>>> > >> us/op
>>>>>>> > >> > >>> >
>>>>>>> > >> > >>> > Float8Benchmarks.readWriteBenchmark  avgt    5  2.711 ±
>>>>>>> 0.057
>>>>>>> > >> us/op
>>>>>>> > >> > >>> >
>>>>>>> > >> > >>> >
>>>>>>> > >> > >>> >
>>>>>>> > >> > >>> > old
>>>>>>> > >> > >>> >
>>>>>>> > >> > >>> > BitVectorHelperBenchmarks.allBitsNullBenchmark   avgt
>>>>>>>   5
>>>>>>> > >>  3.828 ±
>>>>>>> > >> > >>> 0.030
>>>>>>> > >> > >>> > ns/op
>>>>>>> > >> > >>> >
>>>>>>> > >> > >>> > BitVectorHelperBenchmarks.getNullCountBenchmark  avgt
>>>>>>>   5
>>>>>>> > >> 20.611 ±
>>>>>>> > >> > >>> 0.188
>>>>>>> > >> > >>> > ns/op
>>>>>>> > >> > >>> >
>>>>>>> > >> > >>> > Float8Benchmarks.copyFromBenchmark   avgt    5  6.597 ±
>>>>>>> 0.462
>>>>>>> > >> us/op
>>>>>>> > >> > >>> >
>>>>>>> > >> > >>> > Float8Benchmarks.readWriteBenchmark  avgt    5  2.615 ±
>>>>>>> 0.027
>>>>>>> > >> us/op
>>>>>>> > >> > >>> >
>>>>>>> > >> > >>> > On Wed, Aug 7, 2019 at 7:13 PM Fan Liya <
>>>>>>> liya.fan03@gmail.com>
>>>>>>> > >> > wrote:
>>>>>>> > >> > >>> >
>>>>>>> > >> > >>> >> Hi Gonzalo,
>>>>>>> > >> > >>> >>
>>>>>>> > >> > >>> >> Thanks for sharing the performance results.
>>>>>>> > >> > >>> >> I am wondering if you have turned off the flag
>>>>>>> > >> > >>> >> BoundsChecking#BOUNDS_CHECKING_ENABLED.
>>>>>>> > >> > >>> >> If not, the lower throughput should be expected.
>>>>>>> > >> > >>> >>
>>>>>>> > >> > >>> >> Best,
>>>>>>> > >> > >>> >> Liya Fan
>>>>>>> > >> > >>> >>
>>>>>>> > >> > >>> >> On Wed, Aug 7, 2019 at 10:23 PM Micah Kornfield <
>>>>>>> > >> > >>> emkornfield@gmail.com>
>>>>>>> > >> > >>> >> wrote:
>>>>>>> > >> > >>> >>
>>>>>>> > >> > >>> >>> Hi Gonzalo,
>>>>>>> > >> > >>> >>> Thank you for the feedback.  I wasn't aware of the JIT
>>>>>>> > >> > >>> implications.   At
>>>>>>> > >> > >>> >>> least on the benchmark run they don't seem to have an
>>>>>>> impact.
>>>>>>> > >> > >>> >>>
>>>>>>> > >> > >>> >>> If there are other benchmarks that people have that
>>>>>>> can
>>>>>>> > >> validate if
>>>>>>> > >> > >>> this
>>>>>>> > >> > >>> >>> change will be problematic I would appreciate trying
>>>>>>> to run them
>>>>>>> > >> > >>> with the
>>>>>>> > >> > >>> >>> PR.  I will try to run the ones for zeroing/popcnt
>>>>>>> tonight to
>>>>>>> > >> see
>>>>>>> > >> > if
>>>>>>> > >> > >>> >>> there
>>>>>>> > >> > >>> >>> is a change in those.
>>>>>>> > >> > >>> >>>
>>>>>>> > >> > >>> >>> -Micah
>>>>>>> > >> > >>> >>>
>>>>>>> > >> > >>> >>>
>>>>>>> > >> > >>> >>>
>>>>>>> > >> > >>> >>> On Wednesday, August 7, 2019, Gonzalo Ortiz
>>>>>>> Jaureguizar <
>>>>>>> > >> > >>> >>> golthiryus@gmail.com> wrote:
>>>>>>> > >> > >>> >>>
>>>>>>> > >> > >>> >>> > I would recommend to take care with this kind of
>>>>>>> changes.
>>>>>>> > >> > >>> >>> >
>>>>>>> > >> > >>> >>> > I didn't try Arrow in more than one year, but by
>>>>>>> then the
>>>>>>> > >> > >>> performance
>>>>>>> > >> > >>> >>> was
>>>>>>> > >> > >>> >>> > quite bad in comparison with plain byte buffer
>>>>>>> access
>>>>>>> > >> > >>> >>> > (see
>>>>>>> http://git.net/apache-arrow-development/msg02353.html *)
>>>>>>> > >> > and
>>>>>>> > >> > >>> >>> > there are several optimizations that the JVM
>>>>>>> (specifically,
>>>>>>> > >> C2)
>>>>>>> > >> > >>> does
>>>>>>> > >> > >>> >>> not
>>>>>>> > >> > >>> >>> > apply when dealing with int instead of longs. One
>>>>>>> of the
>>>>>>> > >> > >>> >>> > most commons is the loop unrolling and
>>>>>>> vectorization.
>>>>>>> > >> > >>> >>> >
>>>>>>> > >> > >>> >>> > * It doesn't seem the best way to reference an old
>>>>>>> email on
>>>>>>> > >> the
>>>>>>> > >> > >>> list,
>>>>>>> > >> > >>> >>> but
>>>>>>> > >> > >>> >>> > it is the only result shown by Google
>>>>>>> > >> > >>> >>> >
>>>>>>> > >> > >>> >>> > El mié., 7 ago. 2019 a las 11:42, Fan Liya (<
>>>>>>> > >> > liya.fan03@gmail.com
>>>>>>> > >> > >>> >)
>>>>>>> > >> > >>> >>> > escribió:
>>>>>>> > >> > >>> >>> >
>>>>>>> > >> > >>> >>> >> Hi Micah,
>>>>>>> > >> > >>> >>> >>
>>>>>>> > >> > >>> >>> >> Thanks for your effort. The performance result
>>>>>>> looks good.
>>>>>>> > >> > >>> >>> >>
>>>>>>> > >> > >>> >>> >> As you indicated, ArrowBuf will take additional 12
>>>>>>> bytes (4
>>>>>>> > >> > bytes
>>>>>>> > >> > >>> for
>>>>>>> > >> > >>> >>> each
>>>>>>> > >> > >>> >>> >> of length, write index, and read index).
>>>>>>> > >> > >>> >>> >> Similar overheads also exist for vectors like
>>>>>>> > >> > >>> BaseFixedWidthVector,
>>>>>>> > >> > >>> >>> >> BaseVariableWidthVector, etc.
>>>>>>> > >> > >>> >>> >>
>>>>>>> > >> > >>> >>> >> IMO, such overheads are small enough to justify
>>>>>>> the change.
>>>>>>> > >> > >>> >>> >> Let's check if there are other overheads.
>>>>>>> > >> > >>> >>> >>
>>>>>>> > >> > >>> >>> >> Best,
>>>>>>> > >> > >>> >>> >> Liya Fan
>>>>>>> > >> > >>> >>> >>
>>>>>>> > >> > >>> >>> >> On Wed, Aug 7, 2019 at 3:30 PM Micah Kornfield <
>>>>>>> > >> > >>> emkornfield@gmail.com
>>>>>>> > >> > >>> >>> >
>>>>>>> > >> > >>> >>> >> wrote:
>>>>>>> > >> > >>> >>> >>
>>>>>>> > >> > >>> >>> >> > Hi Liya Fan,
>>>>>>> > >> > >>> >>> >> > Based on the Float8Benchmark there does not seem
>>>>>>> to be any
>>>>>>> > >> > >>> >>> meaningful
>>>>>>> > >> > >>> >>> >> > performance difference on my machine.  At least
>>>>>>> for me, the
>>>>>>> > >> > >>> >>> benchmarks
>>>>>>> > >> > >>> >>> >> are
>>>>>>> > >> > >>> >>> >> > not stable enough to say one is faster than the
>>>>>>> other (I've
>>>>>>> > >> > >>> pasted
>>>>>>> > >> > >>> >>> >> results
>>>>>>> > >> > >>> >>> >> > below).  That being said my machine isn't
>>>>>>> necessarily the
>>>>>>> > >> most
>>>>>>> > >> > >>> >>> reliable
>>>>>>> > >> > >>> >>> >> for
>>>>>>> > >> > >>> >>> >> > benchmarking.
>>>>>>> > >> > >>> >>> >> >
>>>>>>> > >> > >>> >>> >> > On an intuitive level, this makes sense to me,
>>>>>>> for the
>>>>>>> > >> most
>>>>>>> > >> > >>> part it
>>>>>>> > >> > >>> >>> >> seems
>>>>>>> > >> > >>> >>> >> > like the change just moves casting from "int" to
>>>>>>> "long"
>>>>>>> > >> > further
>>>>>>> > >> > >>> up
>>>>>>> > >> > >>> >>> the
>>>>>>> > >> > >>> >>> >> > stack  for  "PlatformDepdendent" operations.  If
>>>>>>> there are
>>>>>>> > >> > other
>>>>>>> > >> > >>> >>> >> benchmarks
>>>>>>> > >> > >>> >>> >> > that you think are worth running let me know.
>>>>>>> > >> > >>> >>> >> >
>>>>>>> > >> > >>> >>> >> > One downside performance wise I think for his
>>>>>>> change is it
>>>>>>> > >> > >>> >>> increases the
>>>>>>> > >> > >>> >>> >> > size of ArrowBuf objects, which I suppose could
>>>>>>> influence
>>>>>>> > >> > cache
>>>>>>> > >> > >>> >>> misses
>>>>>>> > >> > >>> >>> >> at
>>>>>>> > >> > >>> >>> >> > some level or increase the size of call-stacks,
>>>>>>> but this
>>>>>>> > >> > doesn't
>>>>>>> > >> > >>> >>> seem to
>>>>>>> > >> > >>> >>> >> > show up in the benchmark..
>>>>>>> > >> > >>> >>> >> >
>>>>>>> > >> > >>> >>> >> > Thanks,
>>>>>>> > >> > >>> >>> >> > Micah
>>>>>>> > >> > >>> >>> >> >
>>>>>>> > >> > >>> >>> >> > Sample benchmark numbers:
>>>>>>> > >> > >>> >>> >> >
>>>>>>> > >> > >>> >>> >> > [New Code]
>>>>>>> > >> > >>> >>> >> > Benchmark                            Mode  Cnt
>>>>>>>  Score
>>>>>>> > >>  Error
>>>>>>> > >> > >>> >>> Units
>>>>>>> > >> > >>> >>> >> > Float8Benchmarks.copyFromBenchmark   avgt    5
>>>>>>> 15.441 ±
>>>>>>> > >> 0.469
>>>>>>> > >> > >>> >>> us/op
>>>>>>> > >> > >>> >>> >> > Float8Benchmarks.readWriteBenchmark  avgt    5
>>>>>>> 14.057 ±
>>>>>>> > >> 0.115
>>>>>>> > >> > >>> >>> us/op
>>>>>>> > >> > >>> >>> >> >
>>>>>>> > >> > >>> >>> >> >
>>>>>>> > >> > >>> >>> >> > [Old code]
>>>>>>> > >> > >>> >>> >> > Benchmark                            Mode  Cnt
>>>>>>>  Score
>>>>>>> > >>  Error
>>>>>>> > >> > >>> >>> Units
>>>>>>> > >> > >>> >>> >> > Float8Benchmarks.copyFromBenchmark   avgt    5
>>>>>>> 16.248 ±
>>>>>>> > >> 1.409
>>>>>>> > >> > >>> >>> us/op
>>>>>>> > >> > >>> >>> >> > Float8Benchmarks.readWriteBenchmark  avgt    5
>>>>>>> 14.150 ±
>>>>>>> > >> 0.084
>>>>>>> > >> > >>> >>> us/op
>>>>>>> > >> > >>> >>> >> >
>>>>>>> > >> > >>> >>> >> > On Tue, Aug 6, 2019 at 1:18 AM Fan Liya <
>>>>>>> > >> liya.fan03@gmail.com
>>>>>>> > >> > >
>>>>>>> > >> > >>> >>> wrote:
>>>>>>> > >> > >>> >>> >> >
>>>>>>> > >> > >>> >>> >> >> Hi Micah,
>>>>>>> > >> > >>> >>> >> >>
>>>>>>> > >> > >>> >>> >> >> Thanks a lot for doing this.
>>>>>>> > >> > >>> >>> >> >>
>>>>>>> > >> > >>> >>> >> >> I am a little concerned about if there is any
>>>>>>> negative
>>>>>>> > >> > >>> performance
>>>>>>> > >> > >>> >>> >> impact
>>>>>>> > >> > >>> >>> >> >> on the current 32-bit-length based applications.
>>>>>>> > >> > >>> >>> >> >> Can we do some performance comparison on our
>>>>>>> existing
>>>>>>> > >> > >>> benchmarks?
>>>>>>> > >> > >>> >>> >> >>
>>>>>>> > >> > >>> >>> >> >> Best,
>>>>>>> > >> > >>> >>> >> >> Liya Fan
>>>>>>> > >> > >>> >>> >> >>
>>>>>>> > >> > >>> >>> >> >>
>>>>>>> > >> > >>> >>> >> >> On Tue, Aug 6, 2019 at 3:35 PM Micah Kornfield <
>>>>>>> > >> > >>> >>> emkornfield@gmail.com>
>>>>>>> > >> > >>> >>> >> >> wrote:
>>>>>>> > >> > >>> >>> >> >>
>>>>>>> > >> > >>> >>> >> >>> There have been some previous discussions on
>>>>>>> the mailing
>>>>>>> > >> > about
>>>>>>> > >> > >>> >>> >> supporting
>>>>>>> > >> > >>> >>> >> >>> 64-bit lengths for  Java ValueVectors (this is
>>>>>>> what the
>>>>>>> > >> IPC
>>>>>>> > >> > >>> >>> >> specification
>>>>>>> > >> > >>> >>> >> >>> and C++ support).  I created a PR [1] that
>>>>>>> changes all
>>>>>>> > >> APIs
>>>>>>> > >> > >>> that I
>>>>>>> > >> > >>> >>> >> could
>>>>>>> > >> > >>> >>> >> >>> find that take an index to take an "long"
>>>>>>> instead of an
>>>>>>> > >> > "int"
>>>>>>> > >> > >>> (and
>>>>>>> > >> > >>> >>> >> >>> similarly change "size/rowcount" APIs).
>>>>>>> > >> > >>> >>> >> >>>
>>>>>>> > >> > >>> >>> >> >>> It is a big change, so I think it is worth
>>>>>>> discussing if
>>>>>>> > >> it
>>>>>>> > >> > is
>>>>>>> > >> > >>> >>> >> something
>>>>>>> > >> > >>> >>> >> >>> we
>>>>>>> > >> > >>> >>> >> >>> still want to move forward with.  It would be
>>>>>>> nice to
>>>>>>> > >> come
>>>>>>> > >> > to
>>>>>>> > >> > >>> a
>>>>>>> > >> > >>> >>> >> >>> conclusion
>>>>>>> > >> > >>> >>> >> >>> quickly, ideally in the next few days, to
>>>>>>> avoid a lot of
>>>>>>> > >> > merge
>>>>>>> > >> > >>> >>> >> conflicts.
>>>>>>> > >> > >>> >>> >> >>>
>>>>>>> > >> > >>> >>> >> >>> The reason I did this work now is the C++
>>>>>>> implementation
>>>>>>> > >> has
>>>>>>> > >> > >>> added
>>>>>>> > >> > >>> >>> >> >>> support
>>>>>>> > >> > >>> >>> >> >>> for LargeList, LargeBinary and LargeString
>>>>>>> arrays and
>>>>>>> > >> based
>>>>>>> > >> > on
>>>>>>> > >> > >>> >>> prior
>>>>>>> > >> > >>> >>> >> >>> discussions we need to have similar support in
>>>>>>> Java
>>>>>>> > >> before
>>>>>>> > >> > our
>>>>>>> > >> > >>> >>> next
>>>>>>> > >> > >>> >>> >> >>> release. Support 64-bit indexes means we can
>>>>>>> have full
>>>>>>> > >> > >>> >>> compatibility
>>>>>>> > >> > >>> >>> >> and
>>>>>>> > >> > >>> >>> >> >>> make the most use of the types in Java.
>>>>>>> > >> > >>> >>> >> >>>
>>>>>>> > >> > >>> >>> >> >>> Look forward to hearing feedback.
>>>>>>> > >> > >>> >>> >> >>>
>>>>>>> > >> > >>> >>> >> >>> Thanks,
>>>>>>> > >> > >>> >>> >> >>> Micah
>>>>>>> > >> > >>> >>> >> >>>
>>>>>>> > >> > >>> >>> >> >>> [1] https://github.com/apache/arrow/pull/5020
>>>>>>> > >> > >>> >>> >> >>>
>>>>>>> > >> > >>> >>> >> >>
>>>>>>> > >> > >>> >>> >>
>>>>>>> > >> > >>> >>> >
>>>>>>> > >> > >>> >>>
>>>>>>> > >> > >>> >>
>>>>>>> > >> > >>>
>>>>>>> > >> > >>
>>>>>>> > >> >
>>>>>>> > >>
>>>>>>> > >
>>>>>>>
>>>>>>

Re: [Discuss][Java] 64-bit lengths for ValueVectors

Posted by Micah Kornfield <em...@gmail.com>.

I'll also note this isn't quite in final form, I'd still like to add some
more unit tests.

On Fri, Nov 22, 2019 at 11:36 AM Wes McKinney <we...@gmail.com> wrote:

> hi Micah -- it makes sense to limit the scope for the time being to
> permitting LargeString/Binary work to proceed. Jacques, have you had a
> chance to look at this?
>
> On Fri, Nov 15, 2019 at 3:07 AM Micah Kornfield <em...@gmail.com>
> wrote:
> >
> > Apologies for the long delay, I chose to do the minimal work of limiting
> this change [1] to allowing ArrowBuf to 64-bit lengths.  This would unblock
> work on LargeString and LargeBinary.  If this change looks OK, I think
> there is some follow-up work to add more thorough unit/integration tests.
> >
> > As an aside, it does seem like the 2GB limit is affecting some users in
> Spark [2][3], so hopefully LargeString would help with this.
> >
> > Allowing more than MAX_INT elements is Vectors/array still a blocker for
> making LargeList useful.
> >
> > Thanks,
> > Micah
> >
> > [1]  https://github.com/apache/arrow/pull/5020
> > [2]
> https://stackoverflow.com/questions/58739888/spark-is-it-possible-to-increase-pyarrow-buffer#comment103812119_58739888
> > [3] https://issues.apache.org/jira/browse/ARROW-4890
> >
> > On Fri, Aug 23, 2019 at 8:33 AM Jacques Nadeau <ja...@apache.org>
> wrote:
> >>
> >>
> >>
> >> On Fri, Aug 23, 2019, 8:55 PM Micah Kornfield <em...@gmail.com>
> wrote:
> >>>>
> >>>> The vector indexes being limited to 32 bits doesn't limit the
> addressing to 32 bit chunks of memory. For example, you're prime example
> before was image data. Having 2 billion images of 1mb images would still be
> supported without changing the index addressing.
> >>>
> >>> This might be pre-coffee math, but I think we are limited to
> approximately 2000 images because an ArrowBuf only holds up-to 2 billion
> bytes [1].  While we have plenty of room for the offsets, we quickly run
> out of contiguous data storage space. For LargeString and LargeBinary this
> could be fixed by changing ArrowBuf.
> >>>
> >>> LargeArray faces the same problem only it applies to its child
> vectors.  Supporting LargeArray properly is really what drove the
> large-scale interface change.
> >>
> >>
> >> My expressed concern about these changes was specifically about the use
> of long for get/set in the vector interfaces. I'm not saying that we
> constrain memory/ArrowBufs to 32bits.
> >>
> >>>
> >>>> Rebase would help if possible.
> >>>
> >>> I'll try to get to this in the next few days.
> >>>
> >>> [1]
> https://github.com/apache/arrow/blob/95175fe7cb8439eebe6d2f6e0495f551d6864380/java/memory/src/main/java/io/netty/buffer/ArrowBuf.java#L164
> >>>
> >>> On Fri, Aug 23, 2019 at 4:55 AM Jacques Nadeau <ja...@apache.org>
> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On Fri, Aug 23, 2019, 11:49 AM Micah Kornfield <em...@gmail.com>
> wrote:
> >>>>>>
> >>>>>> I don't think we should couple this discussion with the
> implementation of large list, etc since I think those two concepts are
> independent.
> >>>>>
> >>>>> I'm still trying to balance in my mind which is a worse experience
> for consumers of the libraries for these types.  Claiming that Java
> supports these types but throwing an exception when the Vectors exceed
> 32-bits or just say they aren't supported until we have 64-bit support in
> Java.
> >>>>
> >>>>
> >>>> The vector indexes being limited to 32 bits doesn't limit the
> addressing to 32 bit chunks of memory. For example, you're prime example
> before was image data. Having 2 billion images of 1mb images would still be
> supported without changing the index addressing.
> >>>>
> >>>>
> >>>>>
> >>>>>>
> >>>>>> I've asked some others on my team their opinions on the risk here.
> I think we should probably review some our more complex vector interactions
> and see how the jvm's assembly changes with this kind of change. Using
> microbenchmarking is good but I think we also need to see whether we're
> constantly inserting additional instructions or if in most cases, this
> actually doesn't impact instruction count.
> >>>>>
> >>>>>
> >>>>> Is this something that your team will take on?
> >>>>
> >>>>
> >>>>
> >>>> Yeah, we need to look at this I think.
> >>>>
> >>>>> Do you need a rebased version of the PR or is the existing one
> sufficient?
> >>>>
> >>>>
> >>>> Rebase would help if possible.
> >>>>
> >>>>
> >>>>>
> >>>>> Thanks,
> >>>>> Micah
> >>>>>
> >>>>> On Thu, Aug 22, 2019 at 8:56 PM Jacques Nadeau <ja...@apache.org>
> wrote:
> >>>>>>
> >>>>>> I don't think we should couple this discussion with the
> implementation of large list, etc since I think those two concepts are
> independent.
> >>>>>>
> >>>>>> I've asked some others on my team their opinions on the risk here.
> I think we should probably review some our more complex vector interactions
> and see how the jvm's assembly changes with this kind of change. Using
> microbenchmarking is good but I think we also need to see whether we're
> constantly inserting additional instructions or if in most cases, this
> actually doesn't impact instruction count.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On Wed, Aug 21, 2019 at 12:18 PM Micah Kornfield <
> emkornfield@gmail.com> wrote:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> With regards to the reference implementation point. It is a good
> point. I'm on vacation this week. Unless you're pushing hard on this, can
> we pick this up and discuss more next week?
> >>>>>>>
> >>>>>>>
> >>>>>>> Hi Jacques, I hope you had a good rest.  Any more thoughts on the
> reference implementation aspect of this?
> >>>>>>>
> >>>>>>>>
> >>>>>>>> To copy the sentiments from the 0.15.0 release thread, I think it
> >>>>>>>> would be best to decouple this discussion from the release
> timeline
> >>>>>>>> given how many people we have relying on regular releases coming
> out.
> >>>>>>>> We can keep continue making major 0.x releases until we're ready
> to
> >>>>>>>> release 1.0.0.
> >>>>>>>
> >>>>>>>
> >>>>>>> I'm OK with it as long as other stakeholders are. Timed releases
> are the way to go.  As stated on the release thread [1] we need a better
> mechanism to avoid this type of issue arising again.  The release thread
> also had some more discussion on compatibility.
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> Micah
> >>>>>>>
> >>>>>>> [1]
> https://lists.apache.org/thread.html/d70feeceaf2570906ade117030b29887af7c77ca5c4a976e6d555920@%3Cdev.arrow.apache.org%3E
> >>>>>>>
> >>>>>>>
> >>>>>>> On Wed, Aug 14, 2019 at 3:23 PM Wes McKinney <we...@gmail.com>
> wrote:
> >>>>>>>>
> >>>>>>>> On Sun, Aug 11, 2019 at 9:40 PM Micah Kornfield <
> emkornfield@gmail.com> wrote:
> >>>>>>>> >
> >>>>>>>> > Hi Wes and Jacques,
> >>>>>>>> > See responses below.
> >>>>>>>> >
> >>>>>>>> > With regards to the reference implementation point. It is a
> good point. I'm
> >>>>>>>> > > on vacation this week. Unless you're pushing hard on this,
> can we pick this
> >>>>>>>> > > up and discuss more next week?
> >>>>>>>> >
> >>>>>>>> >
> >>>>>>>> > Sure thing, enjoy your vacation.  I think the only practical
> implications
> >>>>>>>> > are it delays choices around implementing LargeList,
> LargeBinary,
> >>>>>>>> > LargeString in Java, which in turn might push out the 0.15.0
> release.
> >>>>>>>> >
> >>>>>>>>
> >>>>>>>> To copy the sentiments from the 0.15.0 release thread, I think it
> >>>>>>>> would be best to decouple this discussion from the release
> timeline
> >>>>>>>> given how many people we have relying on regular releases coming
> out.
> >>>>>>>> We can keep continue making major 0.x releases until we're ready
> to
> >>>>>>>> release 1.0.0.
> >>>>>>>>
> >>>>>>>> > My stance on this is that I don't know how important it is for
> Java to
> >>>>>>>> > > support vectors over INT32_MAX elements. The use cases
> enabled by
> >>>>>>>> > > having very large arrays seem to be concentrated in the
> native code
> >>>>>>>> > > world (e.g. C/C++/Rust) -- that could just be
> implementation-centrism
> >>>>>>>> > > on my part, though.
> >>>>>>>> >
> >>>>>>>> >
> >>>>>>>> > A data point against this view is Spark has done work to
> eliminate 2GB
> >>>>>>>> > memory limits on its block sizes [1].  I don't claim to
> understand the
> >>>>>>>> > implications of this. Bryan might you have any thoughts here?
> I'm OK with
> >>>>>>>> > INT32_MAX, as well, I think we should think about what this
> means for
> >>>>>>>> > adding Large types to Java and implications for reference
> implementations.
> >>>>>>>> >
> >>>>>>>> > Thanks,
> >>>>>>>> > Micah
> >>>>>>>> >
> >>>>>>>> > [1] https://issues.apache.org/jira/browse/SPARK-6235
> >>>>>>>> >
> >>>>>>>> > On Sun, Aug 11, 2019 at 6:31 PM Jacques Nadeau <
> jacques@apache.org> wrote:
> >>>>>>>> >
> >>>>>>>> > > Hey Micah,
> >>>>>>>> > >
> >>>>>>>> > > Appreciate the offer on the compiling. The reality is I'm
> more concerned
> >>>>>>>> > > about the unknowns than the compiling issue itself. Any time
> you've been
> >>>>>>>> > > tuning for a while, changing something like this could be
> totally fine or
> >>>>>>>> > > cause a couple of major issues. For example, we've done a
> very large amount
> >>>>>>>> > > of work reducing heap memory footprint of the vectors. Are
> target is to
> >>>>>>>> > > actually get it down to 24 bytes per ArrowBuf and 24 bytes
> heap per vector
> >>>>>>>> > > (not including arrow bufs).
> >>>>>>>> > >
> >>>>>>>> > > With regards to the reference implementation point. It is a
> good point.
> >>>>>>>> > > I'm on vacation this week. Unless you're pushing hard on
> this, can we pick
> >>>>>>>> > > this up and discuss more next week?
> >>>>>>>> > >
> >>>>>>>> > > thanks,
> >>>>>>>> > > Jacques
> >>>>>>>> > >
> >>>>>>>> > > On Sat, Aug 10, 2019 at 7:39 PM Micah Kornfield <
> emkornfield@gmail.com>
> >>>>>>>> > > wrote:
> >>>>>>>> > >
> >>>>>>>> > >> Hi Jacques,
> >>>>>>>> > >> I definitely understand these concerns and this change is
> risky because it
> >>>>>>>> > >> is so large.  Perhaps, creating a new hierarchy, might be
> the cleanest way
> >>>>>>>> > >> of dealing with this.  This could have other benefits like
> cleaning up
> >>>>>>>> > >> some
> >>>>>>>> > >> cruft around dictionary encode and "orphaned" method.   Per
> past e-mail
> >>>>>>>> > >> threads I agree it is beneficial to have 2 separate reference
> >>>>>>>> > >> implementations that can communicate fully, and my intent
> here was to
> >>>>>>>> > >> close
> >>>>>>>> > >> that gap.
> >>>>>>>> > >>
> >>>>>>>> > >> Trying to
> >>>>>>>> > >> > determine the ramifications of these changes would be
> challenging and
> >>>>>>>> > >> time
> >>>>>>>> > >> > consuming against all the different ways we interact with
> the Arrow Java
> >>>>>>>> > >> > library.
> >>>>>>>> > >>
> >>>>>>>> > >>
> >>>>>>>> > >> Understood.  I took a quick look at Dremio-OSS it seems like
> it has a
> >>>>>>>> > >> simple java build system?  If it is helpful, I can try to
> get a fork
> >>>>>>>> > >> running that at least compiles against this PR.  My plan
> would be to cast
> >>>>>>>> > >> any place that was changed to return a long back to an int,
> so in essence
> >>>>>>>> > >> the Dremio algorithms would reman 32-bit implementations.
> >>>>>>>> > >>
> >>>>>>>> > >> I don't  have the infrastructure to test this change
> properly from a
> >>>>>>>> > >> distributed systems perspective, so it would still take some
> time from
> >>>>>>>> > >> Dremio to validate for regressions.
> >>>>>>>> > >>
> >>>>>>>> > >> I'm not saying I'm against this but want to make sure we've
> >>>>>>>> > >> > explored all less disruptive options before considering
> changing
> >>>>>>>> > >> something
> >>>>>>>> > >> > this fundamental (especially when I generally hold the
> view that large
> >>>>>>>> > >> cell
> >>>>>>>> > >> > counts against massive contiguous memory is an anti
> pattern to scalable
> >>>>>>>> > >> > analytical processing--purely subjective of course).
> >>>>>>>> > >>
> >>>>>>>> > >>
> >>>>>>>> > >> I'm open to other ideas here, as well. I don't think it is
> out of the
> >>>>>>>> > >> question to leave the Java implementation as 32-bit, but if
> we do, then I
> >>>>>>>> > >> think we should consider a different strategy for reference
> >>>>>>>> > >> implementations.
> >>>>>>>> > >>
> >>>>>>>> > >> Thanks,
> >>>>>>>> > >> Micah
> >>>>>>>> > >>
> >>>>>>>> > >> On Sat, Aug 10, 2019 at 5:09 PM Jacques Nadeau <
> jacques@apache.org>
> >>>>>>>> > >> wrote:
> >>>>>>>> > >>
> >>>>>>>> > >> > Hey Micah, I didn't have a particular path in mind. Was
> thinking more
> >>>>>>>> > >> along
> >>>>>>>> > >> > the lines of extra methods as opposed to separate classes.
> >>>>>>>> > >> >
> >>>>>>>> > >> > Arrow hasn't historically been a place where we're writing
> algorithms in
> >>>>>>>> > >> > Java so the fact that they aren't there doesn't mean they
> don't exist.
> >>>>>>>> > >> We
> >>>>>>>> > >> > have a large amount of code that depends on the current
> behavior that is
> >>>>>>>> > >> > deployed in hundreds of customer clusters (you can peruse
> our dremio
> >>>>>>>> > >> repo
> >>>>>>>> > >> > to see how extensively we leverage Arrow if interested).
> Trying to
> >>>>>>>> > >> > determine the ramifications of these changes would be
> challenging and
> >>>>>>>> > >> time
> >>>>>>>> > >> > consuming against all the different ways we interact with
> the Arrow Java
> >>>>>>>> > >> > library. I'm not saying I'm against this but want to make
> sure we've
> >>>>>>>> > >> > explored all less disruptive options before considering
> changing
> >>>>>>>> > >> something
> >>>>>>>> > >> > this fundamental (especially when I generally hold the
> view that large
> >>>>>>>> > >> cell
> >>>>>>>> > >> > counts against massive contiguous memory is an anti
> pattern to scalable
> >>>>>>>> > >> > analytical processing--purely subjective of course).
> >>>>>>>> > >> >
> >>>>>>>> > >> > On Sat, Aug 10, 2019, 4:17 PM Micah Kornfield <
> emkornfield@gmail.com>
> >>>>>>>> > >> > wrote:
> >>>>>>>> > >> >
> >>>>>>>> > >> > > Hi Jacques,
> >>>>>>>> > >> > > What avenue were you thinking for supporting both
> paths?   I didn't
> >>>>>>>> > >> want
> >>>>>>>> > >> > > to pursue a different class hierarchy, because I felt
> like that would
> >>>>>>>> > >> > > effectively fork the code base, but that is potentially
> an option that
> >>>>>>>> > >> > > would allow us to have a complete reference
> implementation in Java
> >>>>>>>> > >> that
> >>>>>>>> > >> > can
> >>>>>>>> > >> > > fully interact with C++, without major changes to this
> code.
> >>>>>>>> > >> > >
> >>>>>>>> > >> > > For supporting both APIs on the same classes/interfaces,
> I think they
> >>>>>>>> > >> > > roughly fall into three categories, changes to input
> parameters,
> >>>>>>>> > >> changes
> >>>>>>>> > >> > to
> >>>>>>>> > >> > > output parameters and algorithm changes.
> >>>>>>>> > >> > >
> >>>>>>>> > >> > > For inputs, changing from int to long is essentially a
> no-op from the
> >>>>>>>> > >> > > compiler perspective.  From the limited
> micro-benchmarking this also
> >>>>>>>> > >> > > doesn't seem to have a performance impact.  So we could
> keep two
> >>>>>>>> > >> versions
> >>>>>>>> > >> > > of the methods that only differ on inputs, but it is not
> clear what
> >>>>>>>> > >> the
> >>>>>>>> > >> > > value of that would be.
> >>>>>>>> > >> > >
> >>>>>>>> > >> > > For outputs, we can't support methods "long getLength()"
> and "int
> >>>>>>>> > >> > > getLength()" in the same class, so we would be forced
> into something
> >>>>>>>> > >> like
> >>>>>>>> > >> > > "long getLength(boolean dummy)" which I think is a less
> desirable.
> >>>>>>>> > >> > >
> >>>>>>>> > >> > > For algorithm changes, there did not appear to be too
> many places
> >>>>>>>> > >> where
> >>>>>>>> > >> > we
> >>>>>>>> > >> > > actually loop over all elements (it is quite possible I
> missed
> >>>>>>>> > >> something
> >>>>>>>> > >> > > here), the ones that I did find I was able to mitigate
> performance
> >>>>>>>> > >> > > penalties as noted above.  Some of the current
> implementation will
> >>>>>>>> > >> get a
> >>>>>>>> > >> > > lot slower for "large arrays", but we can likely fix
> those later or in
> >>>>>>>> > >> > this
> >>>>>>>> > >> > > PR with a nested while loop instead of 2 for loops.
> >>>>>>>> > >> > >
> >>>>>>>> > >> > > Thanks,
> >>>>>>>> > >> > > Micah
> >>>>>>>> > >> > >
> >>>>>>>> > >> > >
> >>>>>>>> > >> > > On Saturday, August 10, 2019, Jacques Nadeau <
> jacques@apache.org>
> >>>>>>>> > >> wrote:
> >>>>>>>> > >> > >
> >>>>>>>> > >> > >> This is a pretty massive change to the apis. I wonder
> how nasty it
> >>>>>>>> > >> would
> >>>>>>>> > >> > >> be to just support both paths. Have you evaluated how
> complex that
> >>>>>>>> > >> > would be?
> >>>>>>>> > >> > >>
> >>>>>>>> > >> > >> On Wed, Aug 7, 2019 at 11:08 PM Micah Kornfield <
> >>>>>>>> > >> emkornfield@gmail.com>
> >>>>>>>> > >> > >> wrote:
> >>>>>>>> > >> > >>
> >>>>>>>> > >> > >>> After more investigation, it looks like
> Float8Benchmarks at least
> >>>>>>>> > >> on my
> >>>>>>>> > >> > >>> machine are within the range of noise.
> >>>>>>>> > >> > >>>
> >>>>>>>> > >> > >>> For BitVectorHelper I pushed a new commit [1], seems
> to bring the
> >>>>>>>> > >> > >>> BitVectorHelper benchmarks back inline (and even with
> some
> >>>>>>>> > >> improvement
> >>>>>>>> > >> > >>> for
> >>>>>>>> > >> > >>> getNullCountBenchmark).
> >>>>>>>> > >> > >>>
> >>>>>>>> > >> > >>> Benchmark                                        Mode
> Cnt   Score
> >>>>>>>> > >> > >>>  Error
> >>>>>>>> > >> > >>>  Units
> >>>>>>>> > >> > >>> BitVectorHelperBenchmarks.allBitsNullBenchmark   avgt
>   5   3.821 ±
> >>>>>>>> > >> > >>> 0.031
> >>>>>>>> > >> > >>>  ns/op
> >>>>>>>> > >> > >>> BitVectorHelperBenchmarks.getNullCountBenchmark  avgt
>   5  14.884 ±
> >>>>>>>> > >> > >>> 0.141
> >>>>>>>> > >> > >>>  ns/op
> >>>>>>>> > >> > >>>
> >>>>>>>> > >> > >>> I applied the same pattern to other loops that I could
> find, and for
> >>>>>>>> > >> > any
> >>>>>>>> > >> > >>> "for (long" loop on the critical path, I broke it up
> into two loops.
> >>>>>>>> > >> > the
> >>>>>>>> > >> > >>> first loop does iteration by integer, the second
> finishes off for
> >>>>>>>> > >> any
> >>>>>>>> > >> > >>> long
> >>>>>>>> > >> > >>> values.  As a side note it seems like optimization for
> loops using
> >>>>>>>> > >> long
> >>>>>>>> > >> > >>> counters at least have a semi-recent open bug for the
> JVM [2]
> >>>>>>>> > >> > >>>
> >>>>>>>> > >> > >>> Thanks,
> >>>>>>>> > >> > >>> Micah
> >>>>>>>> > >> > >>>
> >>>>>>>> > >> > >>> [1]
> >>>>>>>> > >> > >>>
> >>>>>>>> > >> > >>>
> >>>>>>>> > >> >
> >>>>>>>> > >>
> https://github.com/apache/arrow/pull/5020/commits/2ea2c1ae83e3baa7b9a99a6d06276d968df41797
> >>>>>>>> > >> > >>> [2] https://bugs.openjdk.java.net/browse/JDK-8223051
> >>>>>>>> > >> > >>>
> >>>>>>>> > >> > >>> On Wed, Aug 7, 2019 at 8:11 PM Micah Kornfield <
> >>>>>>>> > >> emkornfield@gmail.com>
> >>>>>>>> > >> > >>> wrote:
> >>>>>>>> > >> > >>>
> >>>>>>>> > >> > >>> > Indeed, the BoundChecking and CheckNullForGet
> variables can make a
> >>>>>>>> > >> > big
> >>>>>>>> > >> > >>> > difference.  I didn't initially run the benchmarks
> with these
> >>>>>>>> > >> turned
> >>>>>>>> > >> > on
> >>>>>>>> > >> > >>> > (you can see the result from above with
> Float8Benchmarks).  Here
> >>>>>>>> > >> are
> >>>>>>>> > >> > >>> new
> >>>>>>>> > >> > >>> > numbers including with the flags enabled.  It looks
> like using
> >>>>>>>> > >> longs
> >>>>>>>> > >> > >>> might
> >>>>>>>> > >> > >>> > be a little bit slower, I'll see what I can do to
> mitigate this.
> >>>>>>>> > >> > >>> >
> >>>>>>>> > >> > >>> > Ravindra also volunteered to try to benchmark the
> changes with
> >>>>>>>> > >> > Dremio's
> >>>>>>>> > >> > >>> > code on today's sync call.
> >>>>>>>> > >> > >>> >
> >>>>>>>> > >> > >>> > New
> >>>>>>>> > >> > >>> >
> >>>>>>>> > >> > >>> > Benchmark
> Mode  Cnt   Score
> >>>>>>>> > >> > >>>  Error
> >>>>>>>> > >> > >>> > Units
> >>>>>>>> > >> > >>> >
> >>>>>>>> > >> > >>> > BitVectorHelperBenchmarks.allBitsNullBenchmark
>  avgt    5
> >>>>>>>> > >>  4.176 ±
> >>>>>>>> > >> > >>> 1.292
> >>>>>>>> > >> > >>> > ns/op
> >>>>>>>> > >> > >>> >
> >>>>>>>> > >> > >>> > BitVectorHelperBenchmarks.getNullCountBenchmark
> avgt    5
> >>>>>>>> > >> 26.102 ±
> >>>>>>>> > >> > >>> 0.700
> >>>>>>>> > >> > >>> > ns/op
> >>>>>>>> > >> > >>> >
> >>>>>>>> > >> > >>> > Float8Benchmarks.copyFromBenchmark   avgt    5
> 7.398 ± 0.084
> >>>>>>>> > >> us/op
> >>>>>>>> > >> > >>> >
> >>>>>>>> > >> > >>> > Float8Benchmarks.readWriteBenchmark  avgt    5
> 2.711 ± 0.057
> >>>>>>>> > >> us/op
> >>>>>>>> > >> > >>> >
> >>>>>>>> > >> > >>> >
> >>>>>>>> > >> > >>> >
> >>>>>>>> > >> > >>> > old
> >>>>>>>> > >> > >>> >
> >>>>>>>> > >> > >>> > BitVectorHelperBenchmarks.allBitsNullBenchmark
>  avgt    5
> >>>>>>>> > >>  3.828 ±
> >>>>>>>> > >> > >>> 0.030
> >>>>>>>> > >> > >>> > ns/op
> >>>>>>>> > >> > >>> >
> >>>>>>>> > >> > >>> > BitVectorHelperBenchmarks.getNullCountBenchmark
> avgt    5
> >>>>>>>> > >> 20.611 ±
> >>>>>>>> > >> > >>> 0.188
> >>>>>>>> > >> > >>> > ns/op
> >>>>>>>> > >> > >>> >
> >>>>>>>> > >> > >>> > Float8Benchmarks.copyFromBenchmark   avgt    5
> 6.597 ± 0.462
> >>>>>>>> > >> us/op
> >>>>>>>> > >> > >>> >
> >>>>>>>> > >> > >>> > Float8Benchmarks.readWriteBenchmark  avgt    5
> 2.615 ± 0.027
> >>>>>>>> > >> us/op
> >>>>>>>> > >> > >>> >
> >>>>>>>> > >> > >>> > On Wed, Aug 7, 2019 at 7:13 PM Fan Liya <
> liya.fan03@gmail.com>
> >>>>>>>> > >> > wrote:
> >>>>>>>> > >> > >>> >
> >>>>>>>> > >> > >>> >> Hi Gonzalo,
> >>>>>>>> > >> > >>> >>
> >>>>>>>> > >> > >>> >> Thanks for sharing the performance results.
> >>>>>>>> > >> > >>> >> I am wondering if you have turned off the flag
> >>>>>>>> > >> > >>> >> BoundsChecking#BOUNDS_CHECKING_ENABLED.
> >>>>>>>> > >> > >>> >> If not, the lower throughput should be expected.
> >>>>>>>> > >> > >>> >>
> >>>>>>>> > >> > >>> >> Best,
> >>>>>>>> > >> > >>> >> Liya Fan
> >>>>>>>> > >> > >>> >>
> >>>>>>>> > >> > >>> >> On Wed, Aug 7, 2019 at 10:23 PM Micah Kornfield <
> >>>>>>>> > >> > >>> emkornfield@gmail.com>
> >>>>>>>> > >> > >>> >> wrote:
> >>>>>>>> > >> > >>> >>
> >>>>>>>> > >> > >>> >>> Hi Gonzalo,
> >>>>>>>> > >> > >>> >>> Thank you for the feedback.  I wasn't aware of the
> JIT
> >>>>>>>> > >> > >>> implications.   At
> >>>>>>>> > >> > >>> >>> least on the benchmark run they don't seem to have
> an impact.
> >>>>>>>> > >> > >>> >>>
> >>>>>>>> > >> > >>> >>> If there are other benchmarks that people have
> that can
> >>>>>>>> > >> validate if
> >>>>>>>> > >> > >>> this
> >>>>>>>> > >> > >>> >>> change will be problematic I would appreciate
> trying to run them
> >>>>>>>> > >> > >>> with the
> >>>>>>>> > >> > >>> >>> PR.  I will try to run the ones for zeroing/popcnt
> tonight to
> >>>>>>>> > >> see
> >>>>>>>> > >> > if
> >>>>>>>> > >> > >>> >>> there
> >>>>>>>> > >> > >>> >>> is a change in those.
> >>>>>>>> > >> > >>> >>>
> >>>>>>>> > >> > >>> >>> -Micah
> >>>>>>>> > >> > >>> >>>
> >>>>>>>> > >> > >>> >>>
> >>>>>>>> > >> > >>> >>>
> >>>>>>>> > >> > >>> >>> On Wednesday, August 7, 2019, Gonzalo Ortiz
> Jaureguizar <
> >>>>>>>> > >> > >>> >>> golthiryus@gmail.com> wrote:
> >>>>>>>> > >> > >>> >>>
> >>>>>>>> > >> > >>> >>> > I would recommend to take care with this kind of
> changes.
> >>>>>>>> > >> > >>> >>> >
> >>>>>>>> > >> > >>> >>> > I didn't try Arrow in more than one year, but by
> then the
> >>>>>>>> > >> > >>> performance
> >>>>>>>> > >> > >>> >>> was
> >>>>>>>> > >> > >>> >>> > quite bad in comparison with plain byte buffer
> access
> >>>>>>>> > >> > >>> >>> > (see
> http://git.net/apache-arrow-development/msg02353.html *)
> >>>>>>>> > >> > and
> >>>>>>>> > >> > >>> >>> > there are several optimizations that the JVM
> (specifically,
> >>>>>>>> > >> C2)
> >>>>>>>> > >> > >>> does
> >>>>>>>> > >> > >>> >>> not
> >>>>>>>> > >> > >>> >>> > apply when dealing with int instead of longs.
> One of the
> >>>>>>>> > >> > >>> >>> > most commons is the loop unrolling and
> vectorization.
> >>>>>>>> > >> > >>> >>> >
> >>>>>>>> > >> > >>> >>> > * It doesn't seem the best way to reference an
> old email on
> >>>>>>>> > >> the
> >>>>>>>> > >> > >>> list,
> >>>>>>>> > >> > >>> >>> but
> >>>>>>>> > >> > >>> >>> > it is the only result shown by Google
> >>>>>>>> > >> > >>> >>> >
> >>>>>>>> > >> > >>> >>> > El mié., 7 ago. 2019 a las 11:42, Fan Liya (<
> >>>>>>>> > >> > liya.fan03@gmail.com
> >>>>>>>> > >> > >>> >)
> >>>>>>>> > >> > >>> >>> > escribió:
> >>>>>>>> > >> > >>> >>> >
> >>>>>>>> > >> > >>> >>> >> Hi Micah,
> >>>>>>>> > >> > >>> >>> >>
> >>>>>>>> > >> > >>> >>> >> Thanks for your effort. The performance result
> looks good.
> >>>>>>>> > >> > >>> >>> >>
> >>>>>>>> > >> > >>> >>> >> As you indicated, ArrowBuf will take additional
> 12 bytes (4
> >>>>>>>> > >> > bytes
> >>>>>>>> > >> > >>> for
> >>>>>>>> > >> > >>> >>> each
> >>>>>>>> > >> > >>> >>> >> of length, write index, and read index).
> >>>>>>>> > >> > >>> >>> >> Similar overheads also exist for vectors like
> >>>>>>>> > >> > >>> BaseFixedWidthVector,
> >>>>>>>> > >> > >>> >>> >> BaseVariableWidthVector, etc.
> >>>>>>>> > >> > >>> >>> >>
> >>>>>>>> > >> > >>> >>> >> IMO, such overheads are small enough to justify
> the change.
> >>>>>>>> > >> > >>> >>> >> Let's check if there are other overheads.
> >>>>>>>> > >> > >>> >>> >>
> >>>>>>>> > >> > >>> >>> >> Best,
> >>>>>>>> > >> > >>> >>> >> Liya Fan
> >>>>>>>> > >> > >>> >>> >>
> >>>>>>>> > >> > >>> >>> >> On Wed, Aug 7, 2019 at 3:30 PM Micah Kornfield <
> >>>>>>>> > >> > >>> emkornfield@gmail.com
> >>>>>>>> > >> > >>> >>> >
> >>>>>>>> > >> > >>> >>> >> wrote:
> >>>>>>>> > >> > >>> >>> >>
> >>>>>>>> > >> > >>> >>> >> > Hi Liya Fan,
> >>>>>>>> > >> > >>> >>> >> > Based on the Float8Benchmark there does not
> seem to be any
> >>>>>>>> > >> > >>> >>> meaningful
> >>>>>>>> > >> > >>> >>> >> > performance difference on my machine.  At
> least for me, the
> >>>>>>>> > >> > >>> >>> benchmarks
> >>>>>>>> > >> > >>> >>> >> are
> >>>>>>>> > >> > >>> >>> >> > not stable enough to say one is faster than
> the other (I've
> >>>>>>>> > >> > >>> pasted
> >>>>>>>> > >> > >>> >>> >> results
> >>>>>>>> > >> > >>> >>> >> > below).  That being said my machine isn't
> necessarily the
> >>>>>>>> > >> most
> >>>>>>>> > >> > >>> >>> reliable
> >>>>>>>> > >> > >>> >>> >> for
> >>>>>>>> > >> > >>> >>> >> > benchmarking.
> >>>>>>>> > >> > >>> >>> >> >
> >>>>>>>> > >> > >>> >>> >> > On an intuitive level, this makes sense to
> me,  for the
> >>>>>>>> > >> most
> >>>>>>>> > >> > >>> part it
> >>>>>>>> > >> > >>> >>> >> seems
> >>>>>>>> > >> > >>> >>> >> > like the change just moves casting from "int"
> to "long"
> >>>>>>>> > >> > further
> >>>>>>>> > >> > >>> up
> >>>>>>>> > >> > >>> >>> the
> >>>>>>>> > >> > >>> >>> >> > stack  for  "PlatformDepdendent" operations.
> If there are
> >>>>>>>> > >> > other
> >>>>>>>> > >> > >>> >>> >> benchmarks
> >>>>>>>> > >> > >>> >>> >> > that you think are worth running let me know.
> >>>>>>>> > >> > >>> >>> >> >
> >>>>>>>> > >> > >>> >>> >> > One downside performance wise I think for his
> change is it
> >>>>>>>> > >> > >>> >>> increases the
> >>>>>>>> > >> > >>> >>> >> > size of ArrowBuf objects, which I suppose
> could influence
> >>>>>>>> > >> > cache
> >>>>>>>> > >> > >>> >>> misses
> >>>>>>>> > >> > >>> >>> >> at
> >>>>>>>> > >> > >>> >>> >> > some level or increase the size of
> call-stacks, but this
> >>>>>>>> > >> > doesn't
> >>>>>>>> > >> > >>> >>> seem to
> >>>>>>>> > >> > >>> >>> >> > show up in the benchmark..
> >>>>>>>> > >> > >>> >>> >> >
> >>>>>>>> > >> > >>> >>> >> > Thanks,
> >>>>>>>> > >> > >>> >>> >> > Micah
> >>>>>>>> > >> > >>> >>> >> >
> >>>>>>>> > >> > >>> >>> >> > Sample benchmark numbers:
> >>>>>>>> > >> > >>> >>> >> >
> >>>>>>>> > >> > >>> >>> >> > [New Code]
> >>>>>>>> > >> > >>> >>> >> > Benchmark                            Mode
> Cnt   Score
> >>>>>>>> > >>  Error
> >>>>>>>> > >> > >>> >>> Units
> >>>>>>>> > >> > >>> >>> >> > Float8Benchmarks.copyFromBenchmark   avgt
> 5  15.441 ±
> >>>>>>>> > >> 0.469
> >>>>>>>> > >> > >>> >>> us/op
> >>>>>>>> > >> > >>> >>> >> > Float8Benchmarks.readWriteBenchmark  avgt
> 5  14.057 ±
> >>>>>>>> > >> 0.115
> >>>>>>>> > >> > >>> >>> us/op
> >>>>>>>> > >> > >>> >>> >> >
> >>>>>>>> > >> > >>> >>> >> >
> >>>>>>>> > >> > >>> >>> >> > [Old code]
> >>>>>>>> > >> > >>> >>> >> > Benchmark                            Mode
> Cnt   Score
> >>>>>>>> > >>  Error
> >>>>>>>> > >> > >>> >>> Units
> >>>>>>>> > >> > >>> >>> >> > Float8Benchmarks.copyFromBenchmark   avgt
> 5  16.248 ±
> >>>>>>>> > >> 1.409
> >>>>>>>> > >> > >>> >>> us/op
> >>>>>>>> > >> > >>> >>> >> > Float8Benchmarks.readWriteBenchmark  avgt
> 5  14.150 ±
> >>>>>>>> > >> 0.084
> >>>>>>>> > >> > >>> >>> us/op
> >>>>>>>> > >> > >>> >>> >> >
> >>>>>>>> > >> > >>> >>> >> > On Tue, Aug 6, 2019 at 1:18 AM Fan Liya <
> >>>>>>>> > >> liya.fan03@gmail.com
> >>>>>>>> > >> > >
> >>>>>>>> > >> > >>> >>> wrote:
> >>>>>>>> > >> > >>> >>> >> >
> >>>>>>>> > >> > >>> >>> >> >> Hi Micah,
> >>>>>>>> > >> > >>> >>> >> >>
> >>>>>>>> > >> > >>> >>> >> >> Thanks a lot for doing this.
> >>>>>>>> > >> > >>> >>> >> >>
> >>>>>>>> > >> > >>> >>> >> >> I am a little concerned about if there is
> any negative
> >>>>>>>> > >> > >>> performance
> >>>>>>>> > >> > >>> >>> >> impact
> >>>>>>>> > >> > >>> >>> >> >> on the current 32-bit-length based
> applications.
> >>>>>>>> > >> > >>> >>> >> >> Can we do some performance comparison on our
> existing
> >>>>>>>> > >> > >>> benchmarks?
> >>>>>>>> > >> > >>> >>> >> >>
> >>>>>>>> > >> > >>> >>> >> >> Best,
> >>>>>>>> > >> > >>> >>> >> >> Liya Fan
> >>>>>>>> > >> > >>> >>> >> >>
> >>>>>>>> > >> > >>> >>> >> >>
> >>>>>>>> > >> > >>> >>> >> >> On Tue, Aug 6, 2019 at 3:35 PM Micah
> Kornfield <
> >>>>>>>> > >> > >>> >>> emkornfield@gmail.com>
> >>>>>>>> > >> > >>> >>> >> >> wrote:
> >>>>>>>> > >> > >>> >>> >> >>
> >>>>>>>> > >> > >>> >>> >> >>> There have been some previous discussions
> on the mailing
> >>>>>>>> > >> > about
> >>>>>>>> > >> > >>> >>> >> supporting
> >>>>>>>> > >> > >>> >>> >> >>> 64-bit lengths for  Java ValueVectors (this
> is what the
> >>>>>>>> > >> IPC
> >>>>>>>> > >> > >>> >>> >> specification
> >>>>>>>> > >> > >>> >>> >> >>> and C++ support).  I created a PR [1] that
> changes all
> >>>>>>>> > >> APIs
> >>>>>>>> > >> > >>> that I
> >>>>>>>> > >> > >>> >>> >> could
> >>>>>>>> > >> > >>> >>> >> >>> find that take an index to take an "long"
> instead of an
> >>>>>>>> > >> > "int"
> >>>>>>>> > >> > >>> (and
> >>>>>>>> > >> > >>> >>> >> >>> similarly change "size/rowcount" APIs).
> >>>>>>>> > >> > >>> >>> >> >>>
> >>>>>>>> > >> > >>> >>> >> >>> It is a big change, so I think it is worth
> discussing if
> >>>>>>>> > >> it
> >>>>>>>> > >> > is
> >>>>>>>> > >> > >>> >>> >> something
> >>>>>>>> > >> > >>> >>> >> >>> we
> >>>>>>>> > >> > >>> >>> >> >>> still want to move forward with.  It would
> be nice to
> >>>>>>>> > >> come
> >>>>>>>> > >> > to
> >>>>>>>> > >> > >>> a
> >>>>>>>> > >> > >>> >>> >> >>> conclusion
> >>>>>>>> > >> > >>> >>> >> >>> quickly, ideally in the next few days, to
> avoid a lot of
> >>>>>>>> > >> > merge
> >>>>>>>> > >> > >>> >>> >> conflicts.
> >>>>>>>> > >> > >>> >>> >> >>>
> >>>>>>>> > >> > >>> >>> >> >>> The reason I did this work now is the C++
> implementation
> >>>>>>>> > >> has
> >>>>>>>> > >> > >>> added
> >>>>>>>> > >> > >>> >>> >> >>> support
> >>>>>>>> > >> > >>> >>> >> >>> for LargeList, LargeBinary and LargeString
> arrays and
> >>>>>>>> > >> based
> >>>>>>>> > >> > on
> >>>>>>>> > >> > >>> >>> prior
> >>>>>>>> > >> > >>> >>> >> >>> discussions we need to have similar support
> in Java
> >>>>>>>> > >> before
> >>>>>>>> > >> > our
> >>>>>>>> > >> > >>> >>> next
> >>>>>>>> > >> > >>> >>> >> >>> release. Support 64-bit indexes means we
> can have full
> >>>>>>>> > >> > >>> >>> compatibility
> >>>>>>>> > >> > >>> >>> >> and
> >>>>>>>> > >> > >>> >>> >> >>> make the most use of the types in Java.
> >>>>>>>> > >> > >>> >>> >> >>>
> >>>>>>>> > >> > >>> >>> >> >>> Look forward to hearing feedback.
> >>>>>>>> > >> > >>> >>> >> >>>
> >>>>>>>> > >> > >>> >>> >> >>> Thanks,
> >>>>>>>> > >> > >>> >>> >> >>> Micah
> >>>>>>>> > >> > >>> >>> >> >>>
> >>>>>>>> > >> > >>> >>> >> >>> [1]
> https://github.com/apache/arrow/pull/5020
> >>>>>>>> > >> > >>> >>> >> >>>
> >>>>>>>> > >> > >>> >>> >> >>
> >>>>>>>> > >> > >>> >>> >>
> >>>>>>>> > >> > >>> >>> >
> >>>>>>>> > >> > >>> >>>
> >>>>>>>> > >> > >>> >>
> >>>>>>>> > >> > >>>
> >>>>>>>> > >> > >>
> >>>>>>>> > >> >
> >>>>>>>> > >>
> >>>>>>>> > >
>

Re: [Discuss][Java] 64-bit lengths for ValueVectors

Posted by Wes McKinney <we...@gmail.com>.

hi Micah -- it makes sense to limit the scope for the time being to
permitting LargeString/Binary work to proceed. Jacques, have you had a
chance to look at this?

On Fri, Nov 15, 2019 at 3:07 AM Micah Kornfield <em...@gmail.com> wrote:
>
> Apologies for the long delay, I chose to do the minimal work of limiting this change [1] to allowing ArrowBuf to 64-bit lengths.  This would unblock work on LargeString and LargeBinary.  If this change looks OK, I think there is some follow-up work to add more thorough unit/integration tests.
>
> As an aside, it does seem like the 2GB limit is affecting some users in Spark [2][3], so hopefully LargeString would help with this.
>
> Allowing more than MAX_INT elements is Vectors/array still a blocker for making LargeList useful.
>
> Thanks,
> Micah
>
> [1]  https://github.com/apache/arrow/pull/5020
> [2] https://stackoverflow.com/questions/58739888/spark-is-it-possible-to-increase-pyarrow-buffer#comment103812119_58739888
> [3] https://issues.apache.org/jira/browse/ARROW-4890
>
> On Fri, Aug 23, 2019 at 8:33 AM Jacques Nadeau <ja...@apache.org> wrote:
>>
>>
>>
>> On Fri, Aug 23, 2019, 8:55 PM Micah Kornfield <em...@gmail.com> wrote:
>>>>
>>>> The vector indexes being limited to 32 bits doesn't limit the addressing to 32 bit chunks of memory. For example, you're prime example before was image data. Having 2 billion images of 1mb images would still be supported without changing the index addressing.
>>>
>>> This might be pre-coffee math, but I think we are limited to approximately 2000 images because an ArrowBuf only holds up-to 2 billion bytes [1].  While we have plenty of room for the offsets, we quickly run out of contiguous data storage space. For LargeString and LargeBinary this could be fixed by changing ArrowBuf.
>>>
>>> LargeArray faces the same problem only it applies to its child vectors.  Supporting LargeArray properly is really what drove the large-scale interface change.
>>
>>
>> My expressed concern about these changes was specifically about the use of long for get/set in the vector interfaces. I'm not saying that we constrain memory/ArrowBufs to 32bits.
>>
>>>
>>>> Rebase would help if possible.
>>>
>>> I'll try to get to this in the next few days.
>>>
>>> [1] https://github.com/apache/arrow/blob/95175fe7cb8439eebe6d2f6e0495f551d6864380/java/memory/src/main/java/io/netty/buffer/ArrowBuf.java#L164
>>>
>>> On Fri, Aug 23, 2019 at 4:55 AM Jacques Nadeau <ja...@apache.org> wrote:
>>>>
>>>>
>>>>
>>>> On Fri, Aug 23, 2019, 11:49 AM Micah Kornfield <em...@gmail.com> wrote:
>>>>>>
>>>>>> I don't think we should couple this discussion with the implementation of large list, etc since I think those two concepts are independent.
>>>>>
>>>>> I'm still trying to balance in my mind which is a worse experience for consumers of the libraries for these types.  Claiming that Java supports these types but throwing an exception when the Vectors exceed 32-bits or just say they aren't supported until we have 64-bit support in Java.
>>>>
>>>>
>>>> The vector indexes being limited to 32 bits doesn't limit the addressing to 32 bit chunks of memory. For example, you're prime example before was image data. Having 2 billion images of 1mb images would still be supported without changing the index addressing.
>>>>
>>>>
>>>>>
>>>>>>
>>>>>> I've asked some others on my team their opinions on the risk here. I think we should probably review some our more complex vector interactions and see how the jvm's assembly changes with this kind of change. Using microbenchmarking is good but I think we also need to see whether we're constantly inserting additional instructions or if in most cases, this actually doesn't impact instruction count.
>>>>>
>>>>>
>>>>> Is this something that your team will take on?
>>>>
>>>>
>>>>
>>>> Yeah, we need to look at this I think.
>>>>
>>>>> Do you need a rebased version of the PR or is the existing one sufficient?
>>>>
>>>>
>>>> Rebase would help if possible.
>>>>
>>>>
>>>>>
>>>>> Thanks,
>>>>> Micah
>>>>>
>>>>> On Thu, Aug 22, 2019 at 8:56 PM Jacques Nadeau <ja...@apache.org> wrote:
>>>>>>
>>>>>> I don't think we should couple this discussion with the implementation of large list, etc since I think those two concepts are independent.
>>>>>>
>>>>>> I've asked some others on my team their opinions on the risk here. I think we should probably review some our more complex vector interactions and see how the jvm's assembly changes with this kind of change. Using microbenchmarking is good but I think we also need to see whether we're constantly inserting additional instructions or if in most cases, this actually doesn't impact instruction count.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Aug 21, 2019 at 12:18 PM Micah Kornfield <em...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> With regards to the reference implementation point. It is a good point. I'm on vacation this week. Unless you're pushing hard on this, can we pick this up and discuss more next week?
>>>>>>>
>>>>>>>
>>>>>>> Hi Jacques, I hope you had a good rest.  Any more thoughts on the reference implementation aspect of this?
>>>>>>>
>>>>>>>>
>>>>>>>> To copy the sentiments from the 0.15.0 release thread, I think it
>>>>>>>> would be best to decouple this discussion from the release timeline
>>>>>>>> given how many people we have relying on regular releases coming out.
>>>>>>>> We can keep continue making major 0.x releases until we're ready to
>>>>>>>> release 1.0.0.
>>>>>>>
>>>>>>>
>>>>>>> I'm OK with it as long as other stakeholders are. Timed releases are the way to go.  As stated on the release thread [1] we need a better mechanism to avoid this type of issue arising again.  The release thread also had some more discussion on compatibility.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Micah
>>>>>>>
>>>>>>> [1] https://lists.apache.org/thread.html/d70feeceaf2570906ade117030b29887af7c77ca5c4a976e6d555920@%3Cdev.arrow.apache.org%3E
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Aug 14, 2019 at 3:23 PM Wes McKinney <we...@gmail.com> wrote:
>>>>>>>>
>>>>>>>> On Sun, Aug 11, 2019 at 9:40 PM Micah Kornfield <em...@gmail.com> wrote:
>>>>>>>> >
>>>>>>>> > Hi Wes and Jacques,
>>>>>>>> > See responses below.
>>>>>>>> >
>>>>>>>> > With regards to the reference implementation point. It is a good point. I'm
>>>>>>>> > > on vacation this week. Unless you're pushing hard on this, can we pick this
>>>>>>>> > > up and discuss more next week?
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > Sure thing, enjoy your vacation.  I think the only practical implications
>>>>>>>> > are it delays choices around implementing LargeList, LargeBinary,
>>>>>>>> > LargeString in Java, which in turn might push out the 0.15.0 release.
>>>>>>>> >
>>>>>>>>
>>>>>>>> To copy the sentiments from the 0.15.0 release thread, I think it
>>>>>>>> would be best to decouple this discussion from the release timeline
>>>>>>>> given how many people we have relying on regular releases coming out.
>>>>>>>> We can keep continue making major 0.x releases until we're ready to
>>>>>>>> release 1.0.0.
>>>>>>>>
>>>>>>>> > My stance on this is that I don't know how important it is for Java to
>>>>>>>> > > support vectors over INT32_MAX elements. The use cases enabled by
>>>>>>>> > > having very large arrays seem to be concentrated in the native code
>>>>>>>> > > world (e.g. C/C++/Rust) -- that could just be implementation-centrism
>>>>>>>> > > on my part, though.
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > A data point against this view is Spark has done work to eliminate 2GB
>>>>>>>> > memory limits on its block sizes [1].  I don't claim to understand the
>>>>>>>> > implications of this. Bryan might you have any thoughts here?  I'm OK with
>>>>>>>> > INT32_MAX, as well, I think we should think about what this means for
>>>>>>>> > adding Large types to Java and implications for reference implementations.
>>>>>>>> >
>>>>>>>> > Thanks,
>>>>>>>> > Micah
>>>>>>>> >
>>>>>>>> > [1] https://issues.apache.org/jira/browse/SPARK-6235
>>>>>>>> >
>>>>>>>> > On Sun, Aug 11, 2019 at 6:31 PM Jacques Nadeau <ja...@apache.org> wrote:
>>>>>>>> >
>>>>>>>> > > Hey Micah,
>>>>>>>> > >
>>>>>>>> > > Appreciate the offer on the compiling. The reality is I'm more concerned
>>>>>>>> > > about the unknowns than the compiling issue itself. Any time you've been
>>>>>>>> > > tuning for a while, changing something like this could be totally fine or
>>>>>>>> > > cause a couple of major issues. For example, we've done a very large amount
>>>>>>>> > > of work reducing heap memory footprint of the vectors. Are target is to
>>>>>>>> > > actually get it down to 24 bytes per ArrowBuf and 24 bytes heap per vector
>>>>>>>> > > (not including arrow bufs).
>>>>>>>> > >
>>>>>>>> > > With regards to the reference implementation point. It is a good point.
>>>>>>>> > > I'm on vacation this week. Unless you're pushing hard on this, can we pick
>>>>>>>> > > this up and discuss more next week?
>>>>>>>> > >
>>>>>>>> > > thanks,
>>>>>>>> > > Jacques
>>>>>>>> > >
>>>>>>>> > > On Sat, Aug 10, 2019 at 7:39 PM Micah Kornfield <em...@gmail.com>
>>>>>>>> > > wrote:
>>>>>>>> > >
>>>>>>>> > >> Hi Jacques,
>>>>>>>> > >> I definitely understand these concerns and this change is risky because it
>>>>>>>> > >> is so large.  Perhaps, creating a new hierarchy, might be the cleanest way
>>>>>>>> > >> of dealing with this.  This could have other benefits like cleaning up
>>>>>>>> > >> some
>>>>>>>> > >> cruft around dictionary encode and "orphaned" method.   Per past e-mail
>>>>>>>> > >> threads I agree it is beneficial to have 2 separate reference
>>>>>>>> > >> implementations that can communicate fully, and my intent here was to
>>>>>>>> > >> close
>>>>>>>> > >> that gap.
>>>>>>>> > >>
>>>>>>>> > >> Trying to
>>>>>>>> > >> > determine the ramifications of these changes would be challenging and
>>>>>>>> > >> time
>>>>>>>> > >> > consuming against all the different ways we interact with the Arrow Java
>>>>>>>> > >> > library.
>>>>>>>> > >>
>>>>>>>> > >>
>>>>>>>> > >> Understood.  I took a quick look at Dremio-OSS it seems like it has a
>>>>>>>> > >> simple java build system?  If it is helpful, I can try to get a fork
>>>>>>>> > >> running that at least compiles against this PR.  My plan would be to cast
>>>>>>>> > >> any place that was changed to return a long back to an int, so in essence
>>>>>>>> > >> the Dremio algorithms would reman 32-bit implementations.
>>>>>>>> > >>
>>>>>>>> > >> I don't  have the infrastructure to test this change properly from a
>>>>>>>> > >> distributed systems perspective, so it would still take some time from
>>>>>>>> > >> Dremio to validate for regressions.
>>>>>>>> > >>
>>>>>>>> > >> I'm not saying I'm against this but want to make sure we've
>>>>>>>> > >> > explored all less disruptive options before considering changing
>>>>>>>> > >> something
>>>>>>>> > >> > this fundamental (especially when I generally hold the view that large
>>>>>>>> > >> cell
>>>>>>>> > >> > counts against massive contiguous memory is an anti pattern to scalable
>>>>>>>> > >> > analytical processing--purely subjective of course).
>>>>>>>> > >>
>>>>>>>> > >>
>>>>>>>> > >> I'm open to other ideas here, as well. I don't think it is out of the
>>>>>>>> > >> question to leave the Java implementation as 32-bit, but if we do, then I
>>>>>>>> > >> think we should consider a different strategy for reference
>>>>>>>> > >> implementations.
>>>>>>>> > >>
>>>>>>>> > >> Thanks,
>>>>>>>> > >> Micah
>>>>>>>> > >>
>>>>>>>> > >> On Sat, Aug 10, 2019 at 5:09 PM Jacques Nadeau <ja...@apache.org>
>>>>>>>> > >> wrote:
>>>>>>>> > >>
>>>>>>>> > >> > Hey Micah, I didn't have a particular path in mind. Was thinking more
>>>>>>>> > >> along
>>>>>>>> > >> > the lines of extra methods as opposed to separate classes.
>>>>>>>> > >> >
>>>>>>>> > >> > Arrow hasn't historically been a place where we're writing algorithms in
>>>>>>>> > >> > Java so the fact that they aren't there doesn't mean they don't exist.
>>>>>>>> > >> We
>>>>>>>> > >> > have a large amount of code that depends on the current behavior that is
>>>>>>>> > >> > deployed in hundreds of customer clusters (you can peruse our dremio
>>>>>>>> > >> repo
>>>>>>>> > >> > to see how extensively we leverage Arrow if interested). Trying to
>>>>>>>> > >> > determine the ramifications of these changes would be challenging and
>>>>>>>> > >> time
>>>>>>>> > >> > consuming against all the different ways we interact with the Arrow Java
>>>>>>>> > >> > library. I'm not saying I'm against this but want to make sure we've
>>>>>>>> > >> > explored all less disruptive options before considering changing
>>>>>>>> > >> something
>>>>>>>> > >> > this fundamental (especially when I generally hold the view that large
>>>>>>>> > >> cell
>>>>>>>> > >> > counts against massive contiguous memory is an anti pattern to scalable
>>>>>>>> > >> > analytical processing--purely subjective of course).
>>>>>>>> > >> >
>>>>>>>> > >> > On Sat, Aug 10, 2019, 4:17 PM Micah Kornfield <em...@gmail.com>
>>>>>>>> > >> > wrote:
>>>>>>>> > >> >
>>>>>>>> > >> > > Hi Jacques,
>>>>>>>> > >> > > What avenue were you thinking for supporting both paths?   I didn't
>>>>>>>> > >> want
>>>>>>>> > >> > > to pursue a different class hierarchy, because I felt like that would
>>>>>>>> > >> > > effectively fork the code base, but that is potentially an option that
>>>>>>>> > >> > > would allow us to have a complete reference implementation in Java
>>>>>>>> > >> that
>>>>>>>> > >> > can
>>>>>>>> > >> > > fully interact with C++, without major changes to this code.
>>>>>>>> > >> > >
>>>>>>>> > >> > > For supporting both APIs on the same classes/interfaces, I think they
>>>>>>>> > >> > > roughly fall into three categories, changes to input parameters,
>>>>>>>> > >> changes
>>>>>>>> > >> > to
>>>>>>>> > >> > > output parameters and algorithm changes.
>>>>>>>> > >> > >
>>>>>>>> > >> > > For inputs, changing from int to long is essentially a no-op from the
>>>>>>>> > >> > > compiler perspective.  From the limited micro-benchmarking this also
>>>>>>>> > >> > > doesn't seem to have a performance impact.  So we could keep two
>>>>>>>> > >> versions
>>>>>>>> > >> > > of the methods that only differ on inputs, but it is not clear what
>>>>>>>> > >> the
>>>>>>>> > >> > > value of that would be.
>>>>>>>> > >> > >
>>>>>>>> > >> > > For outputs, we can't support methods "long getLength()" and "int
>>>>>>>> > >> > > getLength()" in the same class, so we would be forced into something
>>>>>>>> > >> like
>>>>>>>> > >> > > "long getLength(boolean dummy)" which I think is a less desirable.
>>>>>>>> > >> > >
>>>>>>>> > >> > > For algorithm changes, there did not appear to be too many places
>>>>>>>> > >> where
>>>>>>>> > >> > we
>>>>>>>> > >> > > actually loop over all elements (it is quite possible I missed
>>>>>>>> > >> something
>>>>>>>> > >> > > here), the ones that I did find I was able to mitigate performance
>>>>>>>> > >> > > penalties as noted above.  Some of the current implementation will
>>>>>>>> > >> get a
>>>>>>>> > >> > > lot slower for "large arrays", but we can likely fix those later or in
>>>>>>>> > >> > this
>>>>>>>> > >> > > PR with a nested while loop instead of 2 for loops.
>>>>>>>> > >> > >
>>>>>>>> > >> > > Thanks,
>>>>>>>> > >> > > Micah
>>>>>>>> > >> > >
>>>>>>>> > >> > >
>>>>>>>> > >> > > On Saturday, August 10, 2019, Jacques Nadeau <ja...@apache.org>
>>>>>>>> > >> wrote:
>>>>>>>> > >> > >
>>>>>>>> > >> > >> This is a pretty massive change to the apis. I wonder how nasty it
>>>>>>>> > >> would
>>>>>>>> > >> > >> be to just support both paths. Have you evaluated how complex that
>>>>>>>> > >> > would be?
>>>>>>>> > >> > >>
>>>>>>>> > >> > >> On Wed, Aug 7, 2019 at 11:08 PM Micah Kornfield <
>>>>>>>> > >> emkornfield@gmail.com>
>>>>>>>> > >> > >> wrote:
>>>>>>>> > >> > >>
>>>>>>>> > >> > >>> After more investigation, it looks like Float8Benchmarks at least
>>>>>>>> > >> on my
>>>>>>>> > >> > >>> machine are within the range of noise.
>>>>>>>> > >> > >>>
>>>>>>>> > >> > >>> For BitVectorHelper I pushed a new commit [1], seems to bring the
>>>>>>>> > >> > >>> BitVectorHelper benchmarks back inline (and even with some
>>>>>>>> > >> improvement
>>>>>>>> > >> > >>> for
>>>>>>>> > >> > >>> getNullCountBenchmark).
>>>>>>>> > >> > >>>
>>>>>>>> > >> > >>> Benchmark                                        Mode  Cnt   Score
>>>>>>>> > >> > >>>  Error
>>>>>>>> > >> > >>>  Units
>>>>>>>> > >> > >>> BitVectorHelperBenchmarks.allBitsNullBenchmark   avgt    5   3.821 ±
>>>>>>>> > >> > >>> 0.031
>>>>>>>> > >> > >>>  ns/op
>>>>>>>> > >> > >>> BitVectorHelperBenchmarks.getNullCountBenchmark  avgt    5  14.884 ±
>>>>>>>> > >> > >>> 0.141
>>>>>>>> > >> > >>>  ns/op
>>>>>>>> > >> > >>>
>>>>>>>> > >> > >>> I applied the same pattern to other loops that I could find, and for
>>>>>>>> > >> > any
>>>>>>>> > >> > >>> "for (long" loop on the critical path, I broke it up into two loops.
>>>>>>>> > >> > the
>>>>>>>> > >> > >>> first loop does iteration by integer, the second finishes off for
>>>>>>>> > >> any
>>>>>>>> > >> > >>> long
>>>>>>>> > >> > >>> values.  As a side note it seems like optimization for loops using
>>>>>>>> > >> long
>>>>>>>> > >> > >>> counters at least have a semi-recent open bug for the JVM [2]
>>>>>>>> > >> > >>>
>>>>>>>> > >> > >>> Thanks,
>>>>>>>> > >> > >>> Micah
>>>>>>>> > >> > >>>
>>>>>>>> > >> > >>> [1]
>>>>>>>> > >> > >>>
>>>>>>>> > >> > >>>
>>>>>>>> > >> >
>>>>>>>> > >> https://github.com/apache/arrow/pull/5020/commits/2ea2c1ae83e3baa7b9a99a6d06276d968df41797
>>>>>>>> > >> > >>> [2] https://bugs.openjdk.java.net/browse/JDK-8223051
>>>>>>>> > >> > >>>
>>>>>>>> > >> > >>> On Wed, Aug 7, 2019 at 8:11 PM Micah Kornfield <
>>>>>>>> > >> emkornfield@gmail.com>
>>>>>>>> > >> > >>> wrote:
>>>>>>>> > >> > >>>
>>>>>>>> > >> > >>> > Indeed, the BoundChecking and CheckNullForGet variables can make a
>>>>>>>> > >> > big
>>>>>>>> > >> > >>> > difference.  I didn't initially run the benchmarks with these
>>>>>>>> > >> turned
>>>>>>>> > >> > on
>>>>>>>> > >> > >>> > (you can see the result from above with Float8Benchmarks).  Here
>>>>>>>> > >> are
>>>>>>>> > >> > >>> new
>>>>>>>> > >> > >>> > numbers including with the flags enabled.  It looks like using
>>>>>>>> > >> longs
>>>>>>>> > >> > >>> might
>>>>>>>> > >> > >>> > be a little bit slower, I'll see what I can do to mitigate this.
>>>>>>>> > >> > >>> >
>>>>>>>> > >> > >>> > Ravindra also volunteered to try to benchmark the changes with
>>>>>>>> > >> > Dremio's
>>>>>>>> > >> > >>> > code on today's sync call.
>>>>>>>> > >> > >>> >
>>>>>>>> > >> > >>> > New
>>>>>>>> > >> > >>> >
>>>>>>>> > >> > >>> > Benchmark                                        Mode  Cnt   Score
>>>>>>>> > >> > >>>  Error
>>>>>>>> > >> > >>> > Units
>>>>>>>> > >> > >>> >
>>>>>>>> > >> > >>> > BitVectorHelperBenchmarks.allBitsNullBenchmark   avgt    5
>>>>>>>> > >>  4.176 ±
>>>>>>>> > >> > >>> 1.292
>>>>>>>> > >> > >>> > ns/op
>>>>>>>> > >> > >>> >
>>>>>>>> > >> > >>> > BitVectorHelperBenchmarks.getNullCountBenchmark  avgt    5
>>>>>>>> > >> 26.102 ±
>>>>>>>> > >> > >>> 0.700
>>>>>>>> > >> > >>> > ns/op
>>>>>>>> > >> > >>> >
>>>>>>>> > >> > >>> > Float8Benchmarks.copyFromBenchmark   avgt    5  7.398 ± 0.084
>>>>>>>> > >> us/op
>>>>>>>> > >> > >>> >
>>>>>>>> > >> > >>> > Float8Benchmarks.readWriteBenchmark  avgt    5  2.711 ± 0.057
>>>>>>>> > >> us/op
>>>>>>>> > >> > >>> >
>>>>>>>> > >> > >>> >
>>>>>>>> > >> > >>> >
>>>>>>>> > >> > >>> > old
>>>>>>>> > >> > >>> >
>>>>>>>> > >> > >>> > BitVectorHelperBenchmarks.allBitsNullBenchmark   avgt    5
>>>>>>>> > >>  3.828 ±
>>>>>>>> > >> > >>> 0.030
>>>>>>>> > >> > >>> > ns/op
>>>>>>>> > >> > >>> >
>>>>>>>> > >> > >>> > BitVectorHelperBenchmarks.getNullCountBenchmark  avgt    5
>>>>>>>> > >> 20.611 ±
>>>>>>>> > >> > >>> 0.188
>>>>>>>> > >> > >>> > ns/op
>>>>>>>> > >> > >>> >
>>>>>>>> > >> > >>> > Float8Benchmarks.copyFromBenchmark   avgt    5  6.597 ± 0.462
>>>>>>>> > >> us/op
>>>>>>>> > >> > >>> >
>>>>>>>> > >> > >>> > Float8Benchmarks.readWriteBenchmark  avgt    5  2.615 ± 0.027
>>>>>>>> > >> us/op
>>>>>>>> > >> > >>> >
>>>>>>>> > >> > >>> > On Wed, Aug 7, 2019 at 7:13 PM Fan Liya <li...@gmail.com>
>>>>>>>> > >> > wrote:
>>>>>>>> > >> > >>> >
>>>>>>>> > >> > >>> >> Hi Gonzalo,
>>>>>>>> > >> > >>> >>
>>>>>>>> > >> > >>> >> Thanks for sharing the performance results.
>>>>>>>> > >> > >>> >> I am wondering if you have turned off the flag
>>>>>>>> > >> > >>> >> BoundsChecking#BOUNDS_CHECKING_ENABLED.
>>>>>>>> > >> > >>> >> If not, the lower throughput should be expected.
>>>>>>>> > >> > >>> >>
>>>>>>>> > >> > >>> >> Best,
>>>>>>>> > >> > >>> >> Liya Fan
>>>>>>>> > >> > >>> >>
>>>>>>>> > >> > >>> >> On Wed, Aug 7, 2019 at 10:23 PM Micah Kornfield <
>>>>>>>> > >> > >>> emkornfield@gmail.com>
>>>>>>>> > >> > >>> >> wrote:
>>>>>>>> > >> > >>> >>
>>>>>>>> > >> > >>> >>> Hi Gonzalo,
>>>>>>>> > >> > >>> >>> Thank you for the feedback.  I wasn't aware of the JIT
>>>>>>>> > >> > >>> implications.   At
>>>>>>>> > >> > >>> >>> least on the benchmark run they don't seem to have an impact.
>>>>>>>> > >> > >>> >>>
>>>>>>>> > >> > >>> >>> If there are other benchmarks that people have that can
>>>>>>>> > >> validate if
>>>>>>>> > >> > >>> this
>>>>>>>> > >> > >>> >>> change will be problematic I would appreciate trying to run them
>>>>>>>> > >> > >>> with the
>>>>>>>> > >> > >>> >>> PR.  I will try to run the ones for zeroing/popcnt tonight to
>>>>>>>> > >> see
>>>>>>>> > >> > if
>>>>>>>> > >> > >>> >>> there
>>>>>>>> > >> > >>> >>> is a change in those.
>>>>>>>> > >> > >>> >>>
>>>>>>>> > >> > >>> >>> -Micah
>>>>>>>> > >> > >>> >>>
>>>>>>>> > >> > >>> >>>
>>>>>>>> > >> > >>> >>>
>>>>>>>> > >> > >>> >>> On Wednesday, August 7, 2019, Gonzalo Ortiz Jaureguizar <
>>>>>>>> > >> > >>> >>> golthiryus@gmail.com> wrote:
>>>>>>>> > >> > >>> >>>
>>>>>>>> > >> > >>> >>> > I would recommend to take care with this kind of changes.
>>>>>>>> > >> > >>> >>> >
>>>>>>>> > >> > >>> >>> > I didn't try Arrow in more than one year, but by then the
>>>>>>>> > >> > >>> performance
>>>>>>>> > >> > >>> >>> was
>>>>>>>> > >> > >>> >>> > quite bad in comparison with plain byte buffer access
>>>>>>>> > >> > >>> >>> > (see http://git.net/apache-arrow-development/msg02353.html *)
>>>>>>>> > >> > and
>>>>>>>> > >> > >>> >>> > there are several optimizations that the JVM (specifically,
>>>>>>>> > >> C2)
>>>>>>>> > >> > >>> does
>>>>>>>> > >> > >>> >>> not
>>>>>>>> > >> > >>> >>> > apply when dealing with int instead of longs. One of the
>>>>>>>> > >> > >>> >>> > most commons is the loop unrolling and vectorization.
>>>>>>>> > >> > >>> >>> >
>>>>>>>> > >> > >>> >>> > * It doesn't seem the best way to reference an old email on
>>>>>>>> > >> the
>>>>>>>> > >> > >>> list,
>>>>>>>> > >> > >>> >>> but
>>>>>>>> > >> > >>> >>> > it is the only result shown by Google
>>>>>>>> > >> > >>> >>> >
>>>>>>>> > >> > >>> >>> > El mié., 7 ago. 2019 a las 11:42, Fan Liya (<
>>>>>>>> > >> > liya.fan03@gmail.com
>>>>>>>> > >> > >>> >)
>>>>>>>> > >> > >>> >>> > escribió:
>>>>>>>> > >> > >>> >>> >
>>>>>>>> > >> > >>> >>> >> Hi Micah,
>>>>>>>> > >> > >>> >>> >>
>>>>>>>> > >> > >>> >>> >> Thanks for your effort. The performance result looks good.
>>>>>>>> > >> > >>> >>> >>
>>>>>>>> > >> > >>> >>> >> As you indicated, ArrowBuf will take additional 12 bytes (4
>>>>>>>> > >> > bytes
>>>>>>>> > >> > >>> for
>>>>>>>> > >> > >>> >>> each
>>>>>>>> > >> > >>> >>> >> of length, write index, and read index).
>>>>>>>> > >> > >>> >>> >> Similar overheads also exist for vectors like
>>>>>>>> > >> > >>> BaseFixedWidthVector,
>>>>>>>> > >> > >>> >>> >> BaseVariableWidthVector, etc.
>>>>>>>> > >> > >>> >>> >>
>>>>>>>> > >> > >>> >>> >> IMO, such overheads are small enough to justify the change.
>>>>>>>> > >> > >>> >>> >> Let's check if there are other overheads.
>>>>>>>> > >> > >>> >>> >>
>>>>>>>> > >> > >>> >>> >> Best,
>>>>>>>> > >> > >>> >>> >> Liya Fan
>>>>>>>> > >> > >>> >>> >>
>>>>>>>> > >> > >>> >>> >> On Wed, Aug 7, 2019 at 3:30 PM Micah Kornfield <
>>>>>>>> > >> > >>> emkornfield@gmail.com
>>>>>>>> > >> > >>> >>> >
>>>>>>>> > >> > >>> >>> >> wrote:
>>>>>>>> > >> > >>> >>> >>
>>>>>>>> > >> > >>> >>> >> > Hi Liya Fan,
>>>>>>>> > >> > >>> >>> >> > Based on the Float8Benchmark there does not seem to be any
>>>>>>>> > >> > >>> >>> meaningful
>>>>>>>> > >> > >>> >>> >> > performance difference on my machine.  At least for me, the
>>>>>>>> > >> > >>> >>> benchmarks
>>>>>>>> > >> > >>> >>> >> are
>>>>>>>> > >> > >>> >>> >> > not stable enough to say one is faster than the other (I've
>>>>>>>> > >> > >>> pasted
>>>>>>>> > >> > >>> >>> >> results
>>>>>>>> > >> > >>> >>> >> > below).  That being said my machine isn't necessarily the
>>>>>>>> > >> most
>>>>>>>> > >> > >>> >>> reliable
>>>>>>>> > >> > >>> >>> >> for
>>>>>>>> > >> > >>> >>> >> > benchmarking.
>>>>>>>> > >> > >>> >>> >> >
>>>>>>>> > >> > >>> >>> >> > On an intuitive level, this makes sense to me,  for the
>>>>>>>> > >> most
>>>>>>>> > >> > >>> part it
>>>>>>>> > >> > >>> >>> >> seems
>>>>>>>> > >> > >>> >>> >> > like the change just moves casting from "int" to "long"
>>>>>>>> > >> > further
>>>>>>>> > >> > >>> up
>>>>>>>> > >> > >>> >>> the
>>>>>>>> > >> > >>> >>> >> > stack  for  "PlatformDepdendent" operations.  If there are
>>>>>>>> > >> > other
>>>>>>>> > >> > >>> >>> >> benchmarks
>>>>>>>> > >> > >>> >>> >> > that you think are worth running let me know.
>>>>>>>> > >> > >>> >>> >> >
>>>>>>>> > >> > >>> >>> >> > One downside performance wise I think for his change is it
>>>>>>>> > >> > >>> >>> increases the
>>>>>>>> > >> > >>> >>> >> > size of ArrowBuf objects, which I suppose could influence
>>>>>>>> > >> > cache
>>>>>>>> > >> > >>> >>> misses
>>>>>>>> > >> > >>> >>> >> at
>>>>>>>> > >> > >>> >>> >> > some level or increase the size of call-stacks, but this
>>>>>>>> > >> > doesn't
>>>>>>>> > >> > >>> >>> seem to
>>>>>>>> > >> > >>> >>> >> > show up in the benchmark..
>>>>>>>> > >> > >>> >>> >> >
>>>>>>>> > >> > >>> >>> >> > Thanks,
>>>>>>>> > >> > >>> >>> >> > Micah
>>>>>>>> > >> > >>> >>> >> >
>>>>>>>> > >> > >>> >>> >> > Sample benchmark numbers:
>>>>>>>> > >> > >>> >>> >> >
>>>>>>>> > >> > >>> >>> >> > [New Code]
>>>>>>>> > >> > >>> >>> >> > Benchmark                            Mode  Cnt   Score
>>>>>>>> > >>  Error
>>>>>>>> > >> > >>> >>> Units
>>>>>>>> > >> > >>> >>> >> > Float8Benchmarks.copyFromBenchmark   avgt    5  15.441 ±
>>>>>>>> > >> 0.469
>>>>>>>> > >> > >>> >>> us/op
>>>>>>>> > >> > >>> >>> >> > Float8Benchmarks.readWriteBenchmark  avgt    5  14.057 ±
>>>>>>>> > >> 0.115
>>>>>>>> > >> > >>> >>> us/op
>>>>>>>> > >> > >>> >>> >> >
>>>>>>>> > >> > >>> >>> >> >
>>>>>>>> > >> > >>> >>> >> > [Old code]
>>>>>>>> > >> > >>> >>> >> > Benchmark                            Mode  Cnt   Score
>>>>>>>> > >>  Error
>>>>>>>> > >> > >>> >>> Units
>>>>>>>> > >> > >>> >>> >> > Float8Benchmarks.copyFromBenchmark   avgt    5  16.248 ±
>>>>>>>> > >> 1.409
>>>>>>>> > >> > >>> >>> us/op
>>>>>>>> > >> > >>> >>> >> > Float8Benchmarks.readWriteBenchmark  avgt    5  14.150 ±
>>>>>>>> > >> 0.084
>>>>>>>> > >> > >>> >>> us/op
>>>>>>>> > >> > >>> >>> >> >
>>>>>>>> > >> > >>> >>> >> > On Tue, Aug 6, 2019 at 1:18 AM Fan Liya <
>>>>>>>> > >> liya.fan03@gmail.com
>>>>>>>> > >> > >
>>>>>>>> > >> > >>> >>> wrote:
>>>>>>>> > >> > >>> >>> >> >
>>>>>>>> > >> > >>> >>> >> >> Hi Micah,
>>>>>>>> > >> > >>> >>> >> >>
>>>>>>>> > >> > >>> >>> >> >> Thanks a lot for doing this.
>>>>>>>> > >> > >>> >>> >> >>
>>>>>>>> > >> > >>> >>> >> >> I am a little concerned about if there is any negative
>>>>>>>> > >> > >>> performance
>>>>>>>> > >> > >>> >>> >> impact
>>>>>>>> > >> > >>> >>> >> >> on the current 32-bit-length based applications.
>>>>>>>> > >> > >>> >>> >> >> Can we do some performance comparison on our existing
>>>>>>>> > >> > >>> benchmarks?
>>>>>>>> > >> > >>> >>> >> >>
>>>>>>>> > >> > >>> >>> >> >> Best,
>>>>>>>> > >> > >>> >>> >> >> Liya Fan
>>>>>>>> > >> > >>> >>> >> >>
>>>>>>>> > >> > >>> >>> >> >>
>>>>>>>> > >> > >>> >>> >> >> On Tue, Aug 6, 2019 at 3:35 PM Micah Kornfield <
>>>>>>>> > >> > >>> >>> emkornfield@gmail.com>
>>>>>>>> > >> > >>> >>> >> >> wrote:
>>>>>>>> > >> > >>> >>> >> >>
>>>>>>>> > >> > >>> >>> >> >>> There have been some previous discussions on the mailing
>>>>>>>> > >> > about
>>>>>>>> > >> > >>> >>> >> supporting
>>>>>>>> > >> > >>> >>> >> >>> 64-bit lengths for  Java ValueVectors (this is what the
>>>>>>>> > >> IPC
>>>>>>>> > >> > >>> >>> >> specification
>>>>>>>> > >> > >>> >>> >> >>> and C++ support).  I created a PR [1] that changes all
>>>>>>>> > >> APIs
>>>>>>>> > >> > >>> that I
>>>>>>>> > >> > >>> >>> >> could
>>>>>>>> > >> > >>> >>> >> >>> find that take an index to take an "long" instead of an
>>>>>>>> > >> > "int"
>>>>>>>> > >> > >>> (and
>>>>>>>> > >> > >>> >>> >> >>> similarly change "size/rowcount" APIs).
>>>>>>>> > >> > >>> >>> >> >>>
>>>>>>>> > >> > >>> >>> >> >>> It is a big change, so I think it is worth discussing if
>>>>>>>> > >> it
>>>>>>>> > >> > is
>>>>>>>> > >> > >>> >>> >> something
>>>>>>>> > >> > >>> >>> >> >>> we
>>>>>>>> > >> > >>> >>> >> >>> still want to move forward with.  It would be nice to
>>>>>>>> > >> come
>>>>>>>> > >> > to
>>>>>>>> > >> > >>> a
>>>>>>>> > >> > >>> >>> >> >>> conclusion
>>>>>>>> > >> > >>> >>> >> >>> quickly, ideally in the next few days, to avoid a lot of
>>>>>>>> > >> > merge
>>>>>>>> > >> > >>> >>> >> conflicts.
>>>>>>>> > >> > >>> >>> >> >>>
>>>>>>>> > >> > >>> >>> >> >>> The reason I did this work now is the C++ implementation
>>>>>>>> > >> has
>>>>>>>> > >> > >>> added
>>>>>>>> > >> > >>> >>> >> >>> support
>>>>>>>> > >> > >>> >>> >> >>> for LargeList, LargeBinary and LargeString arrays and
>>>>>>>> > >> based
>>>>>>>> > >> > on
>>>>>>>> > >> > >>> >>> prior
>>>>>>>> > >> > >>> >>> >> >>> discussions we need to have similar support in Java
>>>>>>>> > >> before
>>>>>>>> > >> > our
>>>>>>>> > >> > >>> >>> next
>>>>>>>> > >> > >>> >>> >> >>> release. Support 64-bit indexes means we can have full
>>>>>>>> > >> > >>> >>> compatibility
>>>>>>>> > >> > >>> >>> >> and
>>>>>>>> > >> > >>> >>> >> >>> make the most use of the types in Java.
>>>>>>>> > >> > >>> >>> >> >>>
>>>>>>>> > >> > >>> >>> >> >>> Look forward to hearing feedback.
>>>>>>>> > >> > >>> >>> >> >>>
>>>>>>>> > >> > >>> >>> >> >>> Thanks,
>>>>>>>> > >> > >>> >>> >> >>> Micah
>>>>>>>> > >> > >>> >>> >> >>>
>>>>>>>> > >> > >>> >>> >> >>> [1] https://github.com/apache/arrow/pull/5020
>>>>>>>> > >> > >>> >>> >> >>>
>>>>>>>> > >> > >>> >>> >> >>
>>>>>>>> > >> > >>> >>> >>
>>>>>>>> > >> > >>> >>> >
>>>>>>>> > >> > >>> >>>
>>>>>>>> > >> > >>> >>
>>>>>>>> > >> > >>>
>>>>>>>> > >> > >>
>>>>>>>> > >> >
>>>>>>>> > >>
>>>>>>>> > >

Re: [Discuss][Java] 64-bit lengths for ValueVectors

Posted by Fan Liya <li...@gmail.com>.

I think the 2GB limit is overly restrictive for modern computers.
This is a problem we must face anyway.

Best,
Liya Fan

On Fri, Nov 15, 2019 at 5:07 PM Micah Kornfield <em...@gmail.com>
wrote:

> Apologies for the long delay, I chose to do the minimal work of limiting
> this change [1] to allowing ArrowBuf to 64-bit lengths.  This would unblock
> work on LargeString and LargeBinary.  If this change looks OK, I think
> there is some follow-up work to add more thorough unit/integration tests.
>
> As an aside, it does seem like the 2GB limit is affecting some users in
> Spark [2][3], so hopefully LargeString would help with this.
>
> Allowing more than MAX_INT elements is Vectors/array still a blocker for
> making LargeList useful.
>
> Thanks,
> Micah
>
> [1]  https://github.com/apache/arrow/pull/5020
> [2]
> https://stackoverflow.com/questions/58739888/spark-is-it-possible-to-increase-pyarrow-buffer#comment103812119_58739888
> [3] https://issues.apache.org/jira/browse/ARROW-4890
>
> On Fri, Aug 23, 2019 at 8:33 AM Jacques Nadeau <ja...@apache.org> wrote:
>
>>
>>
>> On Fri, Aug 23, 2019, 8:55 PM Micah Kornfield <em...@gmail.com>
>> wrote:
>>
>>> The vector indexes being limited to 32 bits doesn't limit the addressing
>>>> to 32 bit chunks of memory. For example, you're prime example before was
>>>> image data. Having 2 billion images of 1mb images would still be supported
>>>> without changing the index addressing.
>>>
>>> This might be pre-coffee math, but I think we are limited to
>>> approximately 2000 images because an ArrowBuf only holds up-to 2 billion
>>> bytes [1].  While we have plenty of room for the offsets, we quickly run
>>> out of contiguous data storage space. For LargeString and LargeBinary this
>>> could be fixed by changing ArrowBuf.
>>>
>>> LargeArray faces the same problem only it applies to its child vectors.
>>> Supporting LargeArray properly is really what drove the large-scale
>>> interface change.
>>>
>>
>> My expressed concern about these changes was specifically about the use
>> of long for get/set in the vector interfaces. I'm not saying that we
>> constrain memory/ArrowBufs to 32bits.
>>
>>
>>> Rebase would help if possible.
>>>
>>> I'll try to get to this in the next few days.
>>>
>>> [1]
>>> https://github.com/apache/arrow/blob/95175fe7cb8439eebe6d2f6e0495f551d6864380/java/memory/src/main/java/io/netty/buffer/ArrowBuf.java#L164
>>>
>>> On Fri, Aug 23, 2019 at 4:55 AM Jacques Nadeau <ja...@apache.org>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Fri, Aug 23, 2019, 11:49 AM Micah Kornfield <em...@gmail.com>
>>>> wrote:
>>>>
>>>>> I don't think we should couple this discussion with the implementation
>>>>>> of large list, etc since I think those two concepts are independent.
>>>>>
>>>>> I'm still trying to balance in my mind which is a worse experience for
>>>>> consumers of the libraries for these types.  Claiming that Java supports
>>>>> these types but throwing an exception when the Vectors exceed 32-bits or
>>>>> just say they aren't supported until we have 64-bit support in Java.
>>>>>
>>>>
>>>> The vector indexes being limited to 32 bits doesn't limit the
>>>> addressing to 32 bit chunks of memory. For example, you're prime example
>>>> before was image data. Having 2 billion images of 1mb images would still be
>>>> supported without changing the index addressing.
>>>>
>>>>
>>>>
>>>>>
>>>>>> I've asked some others on my team their opinions on the risk here. I
>>>>>> think we should probably review some our more complex vector interactions
>>>>>> and see how the jvm's assembly changes with this kind of change. Using
>>>>>> microbenchmarking is good but I think we also need to see whether we're
>>>>>> constantly inserting additional instructions or if in most cases, this
>>>>>> actually doesn't impact instruction count.
>>>>>
>>>>>
>>>>> Is this something that your team will take on?
>>>>>
>>>>
>>>>
>>>> Yeah, we need to look at this I think.
>>>>
>>>> Do you need a rebased version of the PR or is the existing one
>>>>> sufficient?
>>>>>
>>>>
>>>> Rebase would help if possible.
>>>>
>>>>
>>>>
>>>>> Thanks,
>>>>> Micah
>>>>>
>>>>> On Thu, Aug 22, 2019 at 8:56 PM Jacques Nadeau <ja...@apache.org>
>>>>> wrote:
>>>>>
>>>>>> I don't think we should couple this discussion with the
>>>>>> implementation of large list, etc since I think those two concepts are
>>>>>> independent.
>>>>>>
>>>>>> I've asked some others on my team their opinions on the risk here. I
>>>>>> think we should probably review some our more complex vector interactions
>>>>>> and see how the jvm's assembly changes with this kind of change. Using
>>>>>> microbenchmarking is good but I think we also need to see whether we're
>>>>>> constantly inserting additional instructions or if in most cases, this
>>>>>> actually doesn't impact instruction count.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Aug 21, 2019 at 12:18 PM Micah Kornfield <
>>>>>> emkornfield@gmail.com> wrote:
>>>>>>
>>>>>>>
>>>>>>>> With regards to the reference implementation point. It is a good
>>>>>>>> point. I'm on vacation this week. Unless you're pushing hard on this, can
>>>>>>>> we pick this up and discuss more next week?
>>>>>>>
>>>>>>>
>>>>>>> Hi Jacques, I hope you had a good rest.  Any more thoughts on the
>>>>>>> reference implementation aspect of this?
>>>>>>>
>>>>>>>
>>>>>>>> To copy the sentiments from the 0.15.0 release thread, I think it
>>>>>>>> would be best to decouple this discussion from the release timeline
>>>>>>>> given how many people we have relying on regular releases coming
>>>>>>>> out.
>>>>>>>> We can keep continue making major 0.x releases until we're ready to
>>>>>>>> release 1.0.0.
>>>>>>>
>>>>>>>
>>>>>>> I'm OK with it as long as other stakeholders are. Timed releases are
>>>>>>> the way to go.  As stated on the release thread [1] we need a better
>>>>>>> mechanism to avoid this type of issue arising again.  The release thread
>>>>>>> also had some more discussion on compatibility.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Micah
>>>>>>>
>>>>>>> [1]
>>>>>>> https://lists.apache.org/thread.html/d70feeceaf2570906ade117030b29887af7c77ca5c4a976e6d555920@%3Cdev.arrow.apache.org%3E
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Aug 14, 2019 at 3:23 PM Wes McKinney <we...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> On Sun, Aug 11, 2019 at 9:40 PM Micah Kornfield <
>>>>>>>> emkornfield@gmail.com> wrote:
>>>>>>>> >
>>>>>>>> > Hi Wes and Jacques,
>>>>>>>> > See responses below.
>>>>>>>> >
>>>>>>>> > With regards to the reference implementation point. It is a good
>>>>>>>> point. I'm
>>>>>>>> > > on vacation this week. Unless you're pushing hard on this, can
>>>>>>>> we pick this
>>>>>>>> > > up and discuss more next week?
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > Sure thing, enjoy your vacation.  I think the only practical
>>>>>>>> implications
>>>>>>>> > are it delays choices around implementing LargeList, LargeBinary,
>>>>>>>> > LargeString in Java, which in turn might push out the 0.15.0
>>>>>>>> release.
>>>>>>>> >
>>>>>>>>
>>>>>>>> To copy the sentiments from the 0.15.0 release thread, I think it
>>>>>>>> would be best to decouple this discussion from the release timeline
>>>>>>>> given how many people we have relying on regular releases coming
>>>>>>>> out.
>>>>>>>> We can keep continue making major 0.x releases until we're ready to
>>>>>>>> release 1.0.0.
>>>>>>>>
>>>>>>>> > My stance on this is that I don't know how important it is for
>>>>>>>> Java to
>>>>>>>> > > support vectors over INT32_MAX elements. The use cases enabled
>>>>>>>> by
>>>>>>>> > > having very large arrays seem to be concentrated in the native
>>>>>>>> code
>>>>>>>> > > world (e.g. C/C++/Rust) -- that could just be
>>>>>>>> implementation-centrism
>>>>>>>> > > on my part, though.
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > A data point against this view is Spark has done work to
>>>>>>>> eliminate 2GB
>>>>>>>> > memory limits on its block sizes [1].  I don't claim to
>>>>>>>> understand the
>>>>>>>> > implications of this. Bryan might you have any thoughts here?
>>>>>>>> I'm OK with
>>>>>>>> > INT32_MAX, as well, I think we should think about what this means
>>>>>>>> for
>>>>>>>> > adding Large types to Java and implications for reference
>>>>>>>> implementations.
>>>>>>>> >
>>>>>>>> > Thanks,
>>>>>>>> > Micah
>>>>>>>> >
>>>>>>>> > [1] https://issues.apache.org/jira/browse/SPARK-6235
>>>>>>>> >
>>>>>>>> > On Sun, Aug 11, 2019 at 6:31 PM Jacques Nadeau <
>>>>>>>> jacques@apache.org> wrote:
>>>>>>>> >
>>>>>>>> > > Hey Micah,
>>>>>>>> > >
>>>>>>>> > > Appreciate the offer on the compiling. The reality is I'm more
>>>>>>>> concerned
>>>>>>>> > > about the unknowns than the compiling issue itself. Any time
>>>>>>>> you've been
>>>>>>>> > > tuning for a while, changing something like this could be
>>>>>>>> totally fine or
>>>>>>>> > > cause a couple of major issues. For example, we've done a very
>>>>>>>> large amount
>>>>>>>> > > of work reducing heap memory footprint of the vectors. Are
>>>>>>>> target is to
>>>>>>>> > > actually get it down to 24 bytes per ArrowBuf and 24 bytes heap
>>>>>>>> per vector
>>>>>>>> > > (not including arrow bufs).
>>>>>>>> > >
>>>>>>>> > > With regards to the reference implementation point. It is a
>>>>>>>> good point.
>>>>>>>> > > I'm on vacation this week. Unless you're pushing hard on this,
>>>>>>>> can we pick
>>>>>>>> > > this up and discuss more next week?
>>>>>>>> > >
>>>>>>>> > > thanks,
>>>>>>>> > > Jacques
>>>>>>>> > >
>>>>>>>> > > On Sat, Aug 10, 2019 at 7:39 PM Micah Kornfield <
>>>>>>>> emkornfield@gmail.com>
>>>>>>>> > > wrote:
>>>>>>>> > >
>>>>>>>> > >> Hi Jacques,
>>>>>>>> > >> I definitely understand these concerns and this change is
>>>>>>>> risky because it
>>>>>>>> > >> is so large.  Perhaps, creating a new hierarchy, might be the
>>>>>>>> cleanest way
>>>>>>>> > >> of dealing with this.  This could have other benefits like
>>>>>>>> cleaning up
>>>>>>>> > >> some
>>>>>>>> > >> cruft around dictionary encode and "orphaned" method.   Per
>>>>>>>> past e-mail
>>>>>>>> > >> threads I agree it is beneficial to have 2 separate reference
>>>>>>>> > >> implementations that can communicate fully, and my intent here
>>>>>>>> was to
>>>>>>>> > >> close
>>>>>>>> > >> that gap.
>>>>>>>> > >>
>>>>>>>> > >> Trying to
>>>>>>>> > >> > determine the ramifications of these changes would be
>>>>>>>> challenging and
>>>>>>>> > >> time
>>>>>>>> > >> > consuming against all the different ways we interact with
>>>>>>>> the Arrow Java
>>>>>>>> > >> > library.
>>>>>>>> > >>
>>>>>>>> > >>
>>>>>>>> > >> Understood.  I took a quick look at Dremio-OSS it seems like
>>>>>>>> it has a
>>>>>>>> > >> simple java build system?  If it is helpful, I can try to get
>>>>>>>> a fork
>>>>>>>> > >> running that at least compiles against this PR.  My plan would
>>>>>>>> be to cast
>>>>>>>> > >> any place that was changed to return a long back to an int, so
>>>>>>>> in essence
>>>>>>>> > >> the Dremio algorithms would reman 32-bit implementations.
>>>>>>>> > >>
>>>>>>>> > >> I don't  have the infrastructure to test this change properly
>>>>>>>> from a
>>>>>>>> > >> distributed systems perspective, so it would still take some
>>>>>>>> time from
>>>>>>>> > >> Dremio to validate for regressions.
>>>>>>>> > >>
>>>>>>>> > >> I'm not saying I'm against this but want to make sure we've
>>>>>>>> > >> > explored all less disruptive options before considering
>>>>>>>> changing
>>>>>>>> > >> something
>>>>>>>> > >> > this fundamental (especially when I generally hold the view
>>>>>>>> that large
>>>>>>>> > >> cell
>>>>>>>> > >> > counts against massive contiguous memory is an anti pattern
>>>>>>>> to scalable
>>>>>>>> > >> > analytical processing--purely subjective of course).
>>>>>>>> > >>
>>>>>>>> > >>
>>>>>>>> > >> I'm open to other ideas here, as well. I don't think it is out
>>>>>>>> of the
>>>>>>>> > >> question to leave the Java implementation as 32-bit, but if we
>>>>>>>> do, then I
>>>>>>>> > >> think we should consider a different strategy for reference
>>>>>>>> > >> implementations.
>>>>>>>> > >>
>>>>>>>> > >> Thanks,
>>>>>>>> > >> Micah
>>>>>>>> > >>
>>>>>>>> > >> On Sat, Aug 10, 2019 at 5:09 PM Jacques Nadeau <
>>>>>>>> jacques@apache.org>
>>>>>>>> > >> wrote:
>>>>>>>> > >>
>>>>>>>> > >> > Hey Micah, I didn't have a particular path in mind. Was
>>>>>>>> thinking more
>>>>>>>> > >> along
>>>>>>>> > >> > the lines of extra methods as opposed to separate classes.
>>>>>>>> > >> >
>>>>>>>> > >> > Arrow hasn't historically been a place where we're writing
>>>>>>>> algorithms in
>>>>>>>> > >> > Java so the fact that they aren't there doesn't mean they
>>>>>>>> don't exist.
>>>>>>>> > >> We
>>>>>>>> > >> > have a large amount of code that depends on the current
>>>>>>>> behavior that is
>>>>>>>> > >> > deployed in hundreds of customer clusters (you can peruse
>>>>>>>> our dremio
>>>>>>>> > >> repo
>>>>>>>> > >> > to see how extensively we leverage Arrow if interested).
>>>>>>>> Trying to
>>>>>>>> > >> > determine the ramifications of these changes would be
>>>>>>>> challenging and
>>>>>>>> > >> time
>>>>>>>> > >> > consuming against all the different ways we interact with
>>>>>>>> the Arrow Java
>>>>>>>> > >> > library. I'm not saying I'm against this but want to make
>>>>>>>> sure we've
>>>>>>>> > >> > explored all less disruptive options before considering
>>>>>>>> changing
>>>>>>>> > >> something
>>>>>>>> > >> > this fundamental (especially when I generally hold the view
>>>>>>>> that large
>>>>>>>> > >> cell
>>>>>>>> > >> > counts against massive contiguous memory is an anti pattern
>>>>>>>> to scalable
>>>>>>>> > >> > analytical processing--purely subjective of course).
>>>>>>>> > >> >
>>>>>>>> > >> > On Sat, Aug 10, 2019, 4:17 PM Micah Kornfield <
>>>>>>>> emkornfield@gmail.com>
>>>>>>>> > >> > wrote:
>>>>>>>> > >> >
>>>>>>>> > >> > > Hi Jacques,
>>>>>>>> > >> > > What avenue were you thinking for supporting both paths?
>>>>>>>>  I didn't
>>>>>>>> > >> want
>>>>>>>> > >> > > to pursue a different class hierarchy, because I felt like
>>>>>>>> that would
>>>>>>>> > >> > > effectively fork the code base, but that is potentially an
>>>>>>>> option that
>>>>>>>> > >> > > would allow us to have a complete reference implementation
>>>>>>>> in Java
>>>>>>>> > >> that
>>>>>>>> > >> > can
>>>>>>>> > >> > > fully interact with C++, without major changes to this
>>>>>>>> code.
>>>>>>>> > >> > >
>>>>>>>> > >> > > For supporting both APIs on the same classes/interfaces, I
>>>>>>>> think they
>>>>>>>> > >> > > roughly fall into three categories, changes to input
>>>>>>>> parameters,
>>>>>>>> > >> changes
>>>>>>>> > >> > to
>>>>>>>> > >> > > output parameters and algorithm changes.
>>>>>>>> > >> > >
>>>>>>>> > >> > > For inputs, changing from int to long is essentially a
>>>>>>>> no-op from the
>>>>>>>> > >> > > compiler perspective.  From the limited micro-benchmarking
>>>>>>>> this also
>>>>>>>> > >> > > doesn't seem to have a performance impact.  So we could
>>>>>>>> keep two
>>>>>>>> > >> versions
>>>>>>>> > >> > > of the methods that only differ on inputs, but it is not
>>>>>>>> clear what
>>>>>>>> > >> the
>>>>>>>> > >> > > value of that would be.
>>>>>>>> > >> > >
>>>>>>>> > >> > > For outputs, we can't support methods "long getLength()"
>>>>>>>> and "int
>>>>>>>> > >> > > getLength()" in the same class, so we would be forced into
>>>>>>>> something
>>>>>>>> > >> like
>>>>>>>> > >> > > "long getLength(boolean dummy)" which I think is a less
>>>>>>>> desirable.
>>>>>>>> > >> > >
>>>>>>>> > >> > > For algorithm changes, there did not appear to be too many
>>>>>>>> places
>>>>>>>> > >> where
>>>>>>>> > >> > we
>>>>>>>> > >> > > actually loop over all elements (it is quite possible I
>>>>>>>> missed
>>>>>>>> > >> something
>>>>>>>> > >> > > here), the ones that I did find I was able to mitigate
>>>>>>>> performance
>>>>>>>> > >> > > penalties as noted above.  Some of the current
>>>>>>>> implementation will
>>>>>>>> > >> get a
>>>>>>>> > >> > > lot slower for "large arrays", but we can likely fix those
>>>>>>>> later or in
>>>>>>>> > >> > this
>>>>>>>> > >> > > PR with a nested while loop instead of 2 for loops.
>>>>>>>> > >> > >
>>>>>>>> > >> > > Thanks,
>>>>>>>> > >> > > Micah
>>>>>>>> > >> > >
>>>>>>>> > >> > >
>>>>>>>> > >> > > On Saturday, August 10, 2019, Jacques Nadeau <
>>>>>>>> jacques@apache.org>
>>>>>>>> > >> wrote:
>>>>>>>> > >> > >
>>>>>>>> > >> > >> This is a pretty massive change to the apis. I wonder how
>>>>>>>> nasty it
>>>>>>>> > >> would
>>>>>>>> > >> > >> be to just support both paths. Have you evaluated how
>>>>>>>> complex that
>>>>>>>> > >> > would be?
>>>>>>>> > >> > >>
>>>>>>>> > >> > >> On Wed, Aug 7, 2019 at 11:08 PM Micah Kornfield <
>>>>>>>> > >> emkornfield@gmail.com>
>>>>>>>> > >> > >> wrote:
>>>>>>>> > >> > >>
>>>>>>>> > >> > >>> After more investigation, it looks like Float8Benchmarks
>>>>>>>> at least
>>>>>>>> > >> on my
>>>>>>>> > >> > >>> machine are within the range of noise.
>>>>>>>> > >> > >>>
>>>>>>>> > >> > >>> For BitVectorHelper I pushed a new commit [1], seems to
>>>>>>>> bring the
>>>>>>>> > >> > >>> BitVectorHelper benchmarks back inline (and even with
>>>>>>>> some
>>>>>>>> > >> improvement
>>>>>>>> > >> > >>> for
>>>>>>>> > >> > >>> getNullCountBenchmark).
>>>>>>>> > >> > >>>
>>>>>>>> > >> > >>> Benchmark                                        Mode
>>>>>>>> Cnt   Score
>>>>>>>> > >> > >>>  Error
>>>>>>>> > >> > >>>  Units
>>>>>>>> > >> > >>> BitVectorHelperBenchmarks.allBitsNullBenchmark   avgt
>>>>>>>> 5   3.821 ±
>>>>>>>> > >> > >>> 0.031
>>>>>>>> > >> > >>>  ns/op
>>>>>>>> > >> > >>> BitVectorHelperBenchmarks.getNullCountBenchmark  avgt
>>>>>>>> 5  14.884 ±
>>>>>>>> > >> > >>> 0.141
>>>>>>>> > >> > >>>  ns/op
>>>>>>>> > >> > >>>
>>>>>>>> > >> > >>> I applied the same pattern to other loops that I could
>>>>>>>> find, and for
>>>>>>>> > >> > any
>>>>>>>> > >> > >>> "for (long" loop on the critical path, I broke it up
>>>>>>>> into two loops.
>>>>>>>> > >> > the
>>>>>>>> > >> > >>> first loop does iteration by integer, the second
>>>>>>>> finishes off for
>>>>>>>> > >> any
>>>>>>>> > >> > >>> long
>>>>>>>> > >> > >>> values.  As a side note it seems like optimization for
>>>>>>>> loops using
>>>>>>>> > >> long
>>>>>>>> > >> > >>> counters at least have a semi-recent open bug for the
>>>>>>>> JVM [2]
>>>>>>>> > >> > >>>
>>>>>>>> > >> > >>> Thanks,
>>>>>>>> > >> > >>> Micah
>>>>>>>> > >> > >>>
>>>>>>>> > >> > >>> [1]
>>>>>>>> > >> > >>>
>>>>>>>> > >> > >>>
>>>>>>>> > >> >
>>>>>>>> > >>
>>>>>>>> https://github.com/apache/arrow/pull/5020/commits/2ea2c1ae83e3baa7b9a99a6d06276d968df41797
>>>>>>>> > >> > >>> [2] https://bugs.openjdk.java.net/browse/JDK-8223051
>>>>>>>> > >> > >>>
>>>>>>>> > >> > >>> On Wed, Aug 7, 2019 at 8:11 PM Micah Kornfield <
>>>>>>>> > >> emkornfield@gmail.com>
>>>>>>>> > >> > >>> wrote:
>>>>>>>> > >> > >>>
>>>>>>>> > >> > >>> > Indeed, the BoundChecking and CheckNullForGet
>>>>>>>> variables can make a
>>>>>>>> > >> > big
>>>>>>>> > >> > >>> > difference.  I didn't initially run the benchmarks
>>>>>>>> with these
>>>>>>>> > >> turned
>>>>>>>> > >> > on
>>>>>>>> > >> > >>> > (you can see the result from above with
>>>>>>>> Float8Benchmarks).  Here
>>>>>>>> > >> are
>>>>>>>> > >> > >>> new
>>>>>>>> > >> > >>> > numbers including with the flags enabled.  It looks
>>>>>>>> like using
>>>>>>>> > >> longs
>>>>>>>> > >> > >>> might
>>>>>>>> > >> > >>> > be a little bit slower, I'll see what I can do to
>>>>>>>> mitigate this.
>>>>>>>> > >> > >>> >
>>>>>>>> > >> > >>> > Ravindra also volunteered to try to benchmark the
>>>>>>>> changes with
>>>>>>>> > >> > Dremio's
>>>>>>>> > >> > >>> > code on today's sync call.
>>>>>>>> > >> > >>> >
>>>>>>>> > >> > >>> > New
>>>>>>>> > >> > >>> >
>>>>>>>> > >> > >>> > Benchmark                                        Mode
>>>>>>>> Cnt   Score
>>>>>>>> > >> > >>>  Error
>>>>>>>> > >> > >>> > Units
>>>>>>>> > >> > >>> >
>>>>>>>> > >> > >>> > BitVectorHelperBenchmarks.allBitsNullBenchmark   avgt
>>>>>>>>   5
>>>>>>>> > >>  4.176 ±
>>>>>>>> > >> > >>> 1.292
>>>>>>>> > >> > >>> > ns/op
>>>>>>>> > >> > >>> >
>>>>>>>> > >> > >>> > BitVectorHelperBenchmarks.getNullCountBenchmark  avgt
>>>>>>>>   5
>>>>>>>> > >> 26.102 ±
>>>>>>>> > >> > >>> 0.700
>>>>>>>> > >> > >>> > ns/op
>>>>>>>> > >> > >>> >
>>>>>>>> > >> > >>> > Float8Benchmarks.copyFromBenchmark   avgt    5  7.398
>>>>>>>> ± 0.084
>>>>>>>> > >> us/op
>>>>>>>> > >> > >>> >
>>>>>>>> > >> > >>> > Float8Benchmarks.readWriteBenchmark  avgt    5  2.711
>>>>>>>> ± 0.057
>>>>>>>> > >> us/op
>>>>>>>> > >> > >>> >
>>>>>>>> > >> > >>> >
>>>>>>>> > >> > >>> >
>>>>>>>> > >> > >>> > old
>>>>>>>> > >> > >>> >
>>>>>>>> > >> > >>> > BitVectorHelperBenchmarks.allBitsNullBenchmark   avgt
>>>>>>>>   5
>>>>>>>> > >>  3.828 ±
>>>>>>>> > >> > >>> 0.030
>>>>>>>> > >> > >>> > ns/op
>>>>>>>> > >> > >>> >
>>>>>>>> > >> > >>> > BitVectorHelperBenchmarks.getNullCountBenchmark  avgt
>>>>>>>>   5
>>>>>>>> > >> 20.611 ±
>>>>>>>> > >> > >>> 0.188
>>>>>>>> > >> > >>> > ns/op
>>>>>>>> > >> > >>> >
>>>>>>>> > >> > >>> > Float8Benchmarks.copyFromBenchmark   avgt    5  6.597
>>>>>>>> ± 0.462
>>>>>>>> > >> us/op
>>>>>>>> > >> > >>> >
>>>>>>>> > >> > >>> > Float8Benchmarks.readWriteBenchmark  avgt    5  2.615
>>>>>>>> ± 0.027
>>>>>>>> > >> us/op
>>>>>>>> > >> > >>> >
>>>>>>>> > >> > >>> > On Wed, Aug 7, 2019 at 7:13 PM Fan Liya <
>>>>>>>> liya.fan03@gmail.com>
>>>>>>>> > >> > wrote:
>>>>>>>> > >> > >>> >
>>>>>>>> > >> > >>> >> Hi Gonzalo,
>>>>>>>> > >> > >>> >>
>>>>>>>> > >> > >>> >> Thanks for sharing the performance results.
>>>>>>>> > >> > >>> >> I am wondering if you have turned off the flag
>>>>>>>> > >> > >>> >> BoundsChecking#BOUNDS_CHECKING_ENABLED.
>>>>>>>> > >> > >>> >> If not, the lower throughput should be expected.
>>>>>>>> > >> > >>> >>
>>>>>>>> > >> > >>> >> Best,
>>>>>>>> > >> > >>> >> Liya Fan
>>>>>>>> > >> > >>> >>
>>>>>>>> > >> > >>> >> On Wed, Aug 7, 2019 at 10:23 PM Micah Kornfield <
>>>>>>>> > >> > >>> emkornfield@gmail.com>
>>>>>>>> > >> > >>> >> wrote:
>>>>>>>> > >> > >>> >>
>>>>>>>> > >> > >>> >>> Hi Gonzalo,
>>>>>>>> > >> > >>> >>> Thank you for the feedback.  I wasn't aware of the
>>>>>>>> JIT
>>>>>>>> > >> > >>> implications.   At
>>>>>>>> > >> > >>> >>> least on the benchmark run they don't seem to have
>>>>>>>> an impact.
>>>>>>>> > >> > >>> >>>
>>>>>>>> > >> > >>> >>> If there are other benchmarks that people have that
>>>>>>>> can
>>>>>>>> > >> validate if
>>>>>>>> > >> > >>> this
>>>>>>>> > >> > >>> >>> change will be problematic I would appreciate trying
>>>>>>>> to run them
>>>>>>>> > >> > >>> with the
>>>>>>>> > >> > >>> >>> PR.  I will try to run the ones for zeroing/popcnt
>>>>>>>> tonight to
>>>>>>>> > >> see
>>>>>>>> > >> > if
>>>>>>>> > >> > >>> >>> there
>>>>>>>> > >> > >>> >>> is a change in those.
>>>>>>>> > >> > >>> >>>
>>>>>>>> > >> > >>> >>> -Micah
>>>>>>>> > >> > >>> >>>
>>>>>>>> > >> > >>> >>>
>>>>>>>> > >> > >>> >>>
>>>>>>>> > >> > >>> >>> On Wednesday, August 7, 2019, Gonzalo Ortiz
>>>>>>>> Jaureguizar <
>>>>>>>> > >> > >>> >>> golthiryus@gmail.com> wrote:
>>>>>>>> > >> > >>> >>>
>>>>>>>> > >> > >>> >>> > I would recommend to take care with this kind of
>>>>>>>> changes.
>>>>>>>> > >> > >>> >>> >
>>>>>>>> > >> > >>> >>> > I didn't try Arrow in more than one year, but by
>>>>>>>> then the
>>>>>>>> > >> > >>> performance
>>>>>>>> > >> > >>> >>> was
>>>>>>>> > >> > >>> >>> > quite bad in comparison with plain byte buffer
>>>>>>>> access
>>>>>>>> > >> > >>> >>> > (see
>>>>>>>> http://git.net/apache-arrow-development/msg02353.html *)
>>>>>>>> > >> > and
>>>>>>>> > >> > >>> >>> > there are several optimizations that the JVM
>>>>>>>> (specifically,
>>>>>>>> > >> C2)
>>>>>>>> > >> > >>> does
>>>>>>>> > >> > >>> >>> not
>>>>>>>> > >> > >>> >>> > apply when dealing with int instead of longs. One
>>>>>>>> of the
>>>>>>>> > >> > >>> >>> > most commons is the loop unrolling and
>>>>>>>> vectorization.
>>>>>>>> > >> > >>> >>> >
>>>>>>>> > >> > >>> >>> > * It doesn't seem the best way to reference an old
>>>>>>>> email on
>>>>>>>> > >> the
>>>>>>>> > >> > >>> list,
>>>>>>>> > >> > >>> >>> but
>>>>>>>> > >> > >>> >>> > it is the only result shown by Google
>>>>>>>> > >> > >>> >>> >
>>>>>>>> > >> > >>> >>> > El mié., 7 ago. 2019 a las 11:42, Fan Liya (<
>>>>>>>> > >> > liya.fan03@gmail.com
>>>>>>>> > >> > >>> >)
>>>>>>>> > >> > >>> >>> > escribió:
>>>>>>>> > >> > >>> >>> >
>>>>>>>> > >> > >>> >>> >> Hi Micah,
>>>>>>>> > >> > >>> >>> >>
>>>>>>>> > >> > >>> >>> >> Thanks for your effort. The performance result
>>>>>>>> looks good.
>>>>>>>> > >> > >>> >>> >>
>>>>>>>> > >> > >>> >>> >> As you indicated, ArrowBuf will take additional
>>>>>>>> 12 bytes (4
>>>>>>>> > >> > bytes
>>>>>>>> > >> > >>> for
>>>>>>>> > >> > >>> >>> each
>>>>>>>> > >> > >>> >>> >> of length, write index, and read index).
>>>>>>>> > >> > >>> >>> >> Similar overheads also exist for vectors like
>>>>>>>> > >> > >>> BaseFixedWidthVector,
>>>>>>>> > >> > >>> >>> >> BaseVariableWidthVector, etc.
>>>>>>>> > >> > >>> >>> >>
>>>>>>>> > >> > >>> >>> >> IMO, such overheads are small enough to justify
>>>>>>>> the change.
>>>>>>>> > >> > >>> >>> >> Let's check if there are other overheads.
>>>>>>>> > >> > >>> >>> >>
>>>>>>>> > >> > >>> >>> >> Best,
>>>>>>>> > >> > >>> >>> >> Liya Fan
>>>>>>>> > >> > >>> >>> >>
>>>>>>>> > >> > >>> >>> >> On Wed, Aug 7, 2019 at 3:30 PM Micah Kornfield <
>>>>>>>> > >> > >>> emkornfield@gmail.com
>>>>>>>> > >> > >>> >>> >
>>>>>>>> > >> > >>> >>> >> wrote:
>>>>>>>> > >> > >>> >>> >>
>>>>>>>> > >> > >>> >>> >> > Hi Liya Fan,
>>>>>>>> > >> > >>> >>> >> > Based on the Float8Benchmark there does not
>>>>>>>> seem to be any
>>>>>>>> > >> > >>> >>> meaningful
>>>>>>>> > >> > >>> >>> >> > performance difference on my machine.  At least
>>>>>>>> for me, the
>>>>>>>> > >> > >>> >>> benchmarks
>>>>>>>> > >> > >>> >>> >> are
>>>>>>>> > >> > >>> >>> >> > not stable enough to say one is faster than the
>>>>>>>> other (I've
>>>>>>>> > >> > >>> pasted
>>>>>>>> > >> > >>> >>> >> results
>>>>>>>> > >> > >>> >>> >> > below).  That being said my machine isn't
>>>>>>>> necessarily the
>>>>>>>> > >> most
>>>>>>>> > >> > >>> >>> reliable
>>>>>>>> > >> > >>> >>> >> for
>>>>>>>> > >> > >>> >>> >> > benchmarking.
>>>>>>>> > >> > >>> >>> >> >
>>>>>>>> > >> > >>> >>> >> > On an intuitive level, this makes sense to me,
>>>>>>>> for the
>>>>>>>> > >> most
>>>>>>>> > >> > >>> part it
>>>>>>>> > >> > >>> >>> >> seems
>>>>>>>> > >> > >>> >>> >> > like the change just moves casting from "int"
>>>>>>>> to "long"
>>>>>>>> > >> > further
>>>>>>>> > >> > >>> up
>>>>>>>> > >> > >>> >>> the
>>>>>>>> > >> > >>> >>> >> > stack  for  "PlatformDepdendent" operations.
>>>>>>>> If there are
>>>>>>>> > >> > other
>>>>>>>> > >> > >>> >>> >> benchmarks
>>>>>>>> > >> > >>> >>> >> > that you think are worth running let me know.
>>>>>>>> > >> > >>> >>> >> >
>>>>>>>> > >> > >>> >>> >> > One downside performance wise I think for his
>>>>>>>> change is it
>>>>>>>> > >> > >>> >>> increases the
>>>>>>>> > >> > >>> >>> >> > size of ArrowBuf objects, which I suppose could
>>>>>>>> influence
>>>>>>>> > >> > cache
>>>>>>>> > >> > >>> >>> misses
>>>>>>>> > >> > >>> >>> >> at
>>>>>>>> > >> > >>> >>> >> > some level or increase the size of call-stacks,
>>>>>>>> but this
>>>>>>>> > >> > doesn't
>>>>>>>> > >> > >>> >>> seem to
>>>>>>>> > >> > >>> >>> >> > show up in the benchmark..
>>>>>>>> > >> > >>> >>> >> >
>>>>>>>> > >> > >>> >>> >> > Thanks,
>>>>>>>> > >> > >>> >>> >> > Micah
>>>>>>>> > >> > >>> >>> >> >
>>>>>>>> > >> > >>> >>> >> > Sample benchmark numbers:
>>>>>>>> > >> > >>> >>> >> >
>>>>>>>> > >> > >>> >>> >> > [New Code]
>>>>>>>> > >> > >>> >>> >> > Benchmark                            Mode  Cnt
>>>>>>>>  Score
>>>>>>>> > >>  Error
>>>>>>>> > >> > >>> >>> Units
>>>>>>>> > >> > >>> >>> >> > Float8Benchmarks.copyFromBenchmark   avgt    5
>>>>>>>> 15.441 ±
>>>>>>>> > >> 0.469
>>>>>>>> > >> > >>> >>> us/op
>>>>>>>> > >> > >>> >>> >> > Float8Benchmarks.readWriteBenchmark  avgt    5
>>>>>>>> 14.057 ±
>>>>>>>> > >> 0.115
>>>>>>>> > >> > >>> >>> us/op
>>>>>>>> > >> > >>> >>> >> >
>>>>>>>> > >> > >>> >>> >> >
>>>>>>>> > >> > >>> >>> >> > [Old code]
>>>>>>>> > >> > >>> >>> >> > Benchmark                            Mode  Cnt
>>>>>>>>  Score
>>>>>>>> > >>  Error
>>>>>>>> > >> > >>> >>> Units
>>>>>>>> > >> > >>> >>> >> > Float8Benchmarks.copyFromBenchmark   avgt    5
>>>>>>>> 16.248 ±
>>>>>>>> > >> 1.409
>>>>>>>> > >> > >>> >>> us/op
>>>>>>>> > >> > >>> >>> >> > Float8Benchmarks.readWriteBenchmark  avgt    5
>>>>>>>> 14.150 ±
>>>>>>>> > >> 0.084
>>>>>>>> > >> > >>> >>> us/op
>>>>>>>> > >> > >>> >>> >> >
>>>>>>>> > >> > >>> >>> >> > On Tue, Aug 6, 2019 at 1:18 AM Fan Liya <
>>>>>>>> > >> liya.fan03@gmail.com
>>>>>>>> > >> > >
>>>>>>>> > >> > >>> >>> wrote:
>>>>>>>> > >> > >>> >>> >> >
>>>>>>>> > >> > >>> >>> >> >> Hi Micah,
>>>>>>>> > >> > >>> >>> >> >>
>>>>>>>> > >> > >>> >>> >> >> Thanks a lot for doing this.
>>>>>>>> > >> > >>> >>> >> >>
>>>>>>>> > >> > >>> >>> >> >> I am a little concerned about if there is any
>>>>>>>> negative
>>>>>>>> > >> > >>> performance
>>>>>>>> > >> > >>> >>> >> impact
>>>>>>>> > >> > >>> >>> >> >> on the current 32-bit-length based
>>>>>>>> applications.
>>>>>>>> > >> > >>> >>> >> >> Can we do some performance comparison on our
>>>>>>>> existing
>>>>>>>> > >> > >>> benchmarks?
>>>>>>>> > >> > >>> >>> >> >>
>>>>>>>> > >> > >>> >>> >> >> Best,
>>>>>>>> > >> > >>> >>> >> >> Liya Fan
>>>>>>>> > >> > >>> >>> >> >>
>>>>>>>> > >> > >>> >>> >> >>
>>>>>>>> > >> > >>> >>> >> >> On Tue, Aug 6, 2019 at 3:35 PM Micah Kornfield
>>>>>>>> <
>>>>>>>> > >> > >>> >>> emkornfield@gmail.com>
>>>>>>>> > >> > >>> >>> >> >> wrote:
>>>>>>>> > >> > >>> >>> >> >>
>>>>>>>> > >> > >>> >>> >> >>> There have been some previous discussions on
>>>>>>>> the mailing
>>>>>>>> > >> > about
>>>>>>>> > >> > >>> >>> >> supporting
>>>>>>>> > >> > >>> >>> >> >>> 64-bit lengths for  Java ValueVectors (this
>>>>>>>> is what the
>>>>>>>> > >> IPC
>>>>>>>> > >> > >>> >>> >> specification
>>>>>>>> > >> > >>> >>> >> >>> and C++ support).  I created a PR [1] that
>>>>>>>> changes all
>>>>>>>> > >> APIs
>>>>>>>> > >> > >>> that I
>>>>>>>> > >> > >>> >>> >> could
>>>>>>>> > >> > >>> >>> >> >>> find that take an index to take an "long"
>>>>>>>> instead of an
>>>>>>>> > >> > "int"
>>>>>>>> > >> > >>> (and
>>>>>>>> > >> > >>> >>> >> >>> similarly change "size/rowcount" APIs).
>>>>>>>> > >> > >>> >>> >> >>>
>>>>>>>> > >> > >>> >>> >> >>> It is a big change, so I think it is worth
>>>>>>>> discussing if
>>>>>>>> > >> it
>>>>>>>> > >> > is
>>>>>>>> > >> > >>> >>> >> something
>>>>>>>> > >> > >>> >>> >> >>> we
>>>>>>>> > >> > >>> >>> >> >>> still want to move forward with.  It would be
>>>>>>>> nice to
>>>>>>>> > >> come
>>>>>>>> > >> > to
>>>>>>>> > >> > >>> a
>>>>>>>> > >> > >>> >>> >> >>> conclusion
>>>>>>>> > >> > >>> >>> >> >>> quickly, ideally in the next few days, to
>>>>>>>> avoid a lot of
>>>>>>>> > >> > merge
>>>>>>>> > >> > >>> >>> >> conflicts.
>>>>>>>> > >> > >>> >>> >> >>>
>>>>>>>> > >> > >>> >>> >> >>> The reason I did this work now is the C++
>>>>>>>> implementation
>>>>>>>> > >> has
>>>>>>>> > >> > >>> added
>>>>>>>> > >> > >>> >>> >> >>> support
>>>>>>>> > >> > >>> >>> >> >>> for LargeList, LargeBinary and LargeString
>>>>>>>> arrays and
>>>>>>>> > >> based
>>>>>>>> > >> > on
>>>>>>>> > >> > >>> >>> prior
>>>>>>>> > >> > >>> >>> >> >>> discussions we need to have similar support
>>>>>>>> in Java
>>>>>>>> > >> before
>>>>>>>> > >> > our
>>>>>>>> > >> > >>> >>> next
>>>>>>>> > >> > >>> >>> >> >>> release. Support 64-bit indexes means we can
>>>>>>>> have full
>>>>>>>> > >> > >>> >>> compatibility
>>>>>>>> > >> > >>> >>> >> and
>>>>>>>> > >> > >>> >>> >> >>> make the most use of the types in Java.
>>>>>>>> > >> > >>> >>> >> >>>
>>>>>>>> > >> > >>> >>> >> >>> Look forward to hearing feedback.
>>>>>>>> > >> > >>> >>> >> >>>
>>>>>>>> > >> > >>> >>> >> >>> Thanks,
>>>>>>>> > >> > >>> >>> >> >>> Micah
>>>>>>>> > >> > >>> >>> >> >>>
>>>>>>>> > >> > >>> >>> >> >>> [1] https://github.com/apache/arrow/pull/5020
>>>>>>>> > >> > >>> >>> >> >>>
>>>>>>>> > >> > >>> >>> >> >>
>>>>>>>> > >> > >>> >>> >>
>>>>>>>> > >> > >>> >>> >
>>>>>>>> > >> > >>> >>>
>>>>>>>> > >> > >>> >>
>>>>>>>> > >> > >>>
>>>>>>>> > >> > >>
>>>>>>>> > >> >
>>>>>>>> > >>
>>>>>>>> > >
>>>>>>>>
>>>>>>>