You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Micah Kornfield <em...@gmail.com> on 2019/10/24 04:22:10 UTC

[DISCUSS][Java] Builders for java classes

As part a PR Ji Liu has made to help populate data for test cases [1], the
question came up on whether we should provide a more  builder classes in
java for ValueVectors.  The proposed implementation would wrap the existing
Writer classes.

Do people think this would be a valuable addition to the java library? I
imagine it would be a builder per ValueVectorType.  The main benefit I see
to this is making the library potentially slightly easier to use for
new-comers, but might not be the most efficient.  A straw-man interface is
listed below.

Thoughts?

Thanks,
Micah

class IntVectorBuilder {
   public IntVectorBuilder(BufferAllocator allocator);

   IntVectorBuilder add(int value);
    IntVectorBuilder addAll(int[] values);
    IntVectorBuilder addNull();
    // handles null values in array
    IntVectorBuilder addAll(Integer... values);
    IntVectorBuilder addAll(List<Integer> values);
    IntVector build(String name);
}

Re: [DISCUSS][Java] Builders for java classes

Posted by Micah Kornfield <em...@gmail.com>.
>
>  Just to clarify, how will this be different than the current vector
> writers that they are wrapping? Is it just the ability to add multiple
> values at once, or more efficiently?


Based on the discussion on the thread it sounds like we should only wrap
the writers if they can provide best possible performance (which I think
for builders they can't beat specific code written for addition).  I think
it also provides an easier API to use since it automatically would
automatically advance the current index.

I'm still not clear if the builders would be used much in practice which is
something I would like to make sure of before investing time in them.  I
suppose they could be used to make some of the code in adapters cleaner,
but I'm not sure how many more of those we will have.



> Also, if we are going to be adding new APIs, maybe we can try to match
> more closely the existing builders in C++?  I believe they are pretty
> similar, only use "append*", "appendValues", etc.


We could try to match the C++ APIs, but I think that makes the APIs less
familiar to Java programmers, where "addAll" is used ubiquitously in
java.util.* collections and other collection libraries I'm familiar with.

On Tue, Oct 29, 2019 at 1:16 PM Bryan Cutler <cu...@gmail.com> wrote:

> Just to clarify, how will this be different than the current vector
> writers that they are wrapping? Is it just the ability to add multiple
> values at once, or more efficiently?
> Also, if we are going to be adding new APIs, maybe we can try to match
> more closely the existing builders in C++?  I believe they are pretty
> similar, only use "append*", "appendValues", etc.
>
> Bryan
>
> On Sun, Oct 27, 2019 at 1:03 PM Jacques Nadeau <ja...@apache.org> wrote:
>
>> +1 on the idea of enhancing builder interfaces.
>>
>> >>IntVectorBuilder addAll(int[] values);
>>
>> Let's make sure that anything like the above is efficient. People will
>> judge the quality of the project on the efficiency of the methods we
>> provide. If everybody starts using int[] to build Arrow vectors, we should
>> make sure it is good. The complexwriter tries to be dynamically typed in a
>> statically typed language and had to give up some efficiency. For methods
>> where we're adding multiple values (as above), we should make sure to
>> improve the layers to maximize things.
>>
>> On Thu, Oct 24, 2019 at 3:08 AM Fan Liya <li...@gmail.com> wrote:
>>
>> > Hi Micah,
>> >
>> > IMO, we need an adapter from on-heap array to off-heap array.
>> > This is useful because many third-party Java libraries populate data to
>> an
>> > on-heap array.
>> >
>> > And I see this API in your design:
>> >
>> > IntVectorBuilder addAll(int[] values);
>> >
>> > So I am +1 for this.
>> >
>> > Best,
>> > Liya Fan
>> >
>> > On Thu, Oct 24, 2019 at 12:31 PM Micah Kornfield <emkornfield@gmail.com
>> >
>> > wrote:
>> >
>> > > As part a PR Ji Liu has made to help populate data for test cases [1],
>> > the
>> > > question came up on whether we should provide a more  builder classes
>> in
>> > > java for ValueVectors.  The proposed implementation would wrap the
>> > existing
>> > > Writer classes.
>> > >
>> > > Do people think this would be a valuable addition to the java
>> library? I
>> > > imagine it would be a builder per ValueVectorType.  The main benefit I
>> > see
>> > > to this is making the library potentially slightly easier to use for
>> > > new-comers, but might not be the most efficient.  A straw-man
>> interface
>> > is
>> > > listed below.
>> > >
>> > > Thoughts?
>> > >
>> > > Thanks,
>> > > Micah
>> > >
>> > > class IntVectorBuilder {
>> > >    public IntVectorBuilder(BufferAllocator allocator);
>> > >
>> > >    IntVectorBuilder add(int value);
>> > >     IntVectorBuilder addAll(int[] values);
>> > >     IntVectorBuilder addNull();
>> > >     // handles null values in array
>> > >     IntVectorBuilder addAll(Integer... values);
>> > >     IntVectorBuilder addAll(List<Integer> values);
>> > >     IntVector build(String name);
>> > > }
>> > >
>> >
>>
>

Re: [DISCUSS][Java] Builders for java classes

Posted by Bryan Cutler <cu...@gmail.com>.
Just to clarify, how will this be different than the current vector writers
that they are wrapping? Is it just the ability to add multiple values at
once, or more efficiently?
Also, if we are going to be adding new APIs, maybe we can try to match more
closely the existing builders in C++?  I believe they are pretty similar,
only use "append*", "appendValues", etc.

Bryan

On Sun, Oct 27, 2019 at 1:03 PM Jacques Nadeau <ja...@apache.org> wrote:

> +1 on the idea of enhancing builder interfaces.
>
> >>IntVectorBuilder addAll(int[] values);
>
> Let's make sure that anything like the above is efficient. People will
> judge the quality of the project on the efficiency of the methods we
> provide. If everybody starts using int[] to build Arrow vectors, we should
> make sure it is good. The complexwriter tries to be dynamically typed in a
> statically typed language and had to give up some efficiency. For methods
> where we're adding multiple values (as above), we should make sure to
> improve the layers to maximize things.
>
> On Thu, Oct 24, 2019 at 3:08 AM Fan Liya <li...@gmail.com> wrote:
>
> > Hi Micah,
> >
> > IMO, we need an adapter from on-heap array to off-heap array.
> > This is useful because many third-party Java libraries populate data to
> an
> > on-heap array.
> >
> > And I see this API in your design:
> >
> > IntVectorBuilder addAll(int[] values);
> >
> > So I am +1 for this.
> >
> > Best,
> > Liya Fan
> >
> > On Thu, Oct 24, 2019 at 12:31 PM Micah Kornfield <em...@gmail.com>
> > wrote:
> >
> > > As part a PR Ji Liu has made to help populate data for test cases [1],
> > the
> > > question came up on whether we should provide a more  builder classes
> in
> > > java for ValueVectors.  The proposed implementation would wrap the
> > existing
> > > Writer classes.
> > >
> > > Do people think this would be a valuable addition to the java library?
> I
> > > imagine it would be a builder per ValueVectorType.  The main benefit I
> > see
> > > to this is making the library potentially slightly easier to use for
> > > new-comers, but might not be the most efficient.  A straw-man interface
> > is
> > > listed below.
> > >
> > > Thoughts?
> > >
> > > Thanks,
> > > Micah
> > >
> > > class IntVectorBuilder {
> > >    public IntVectorBuilder(BufferAllocator allocator);
> > >
> > >    IntVectorBuilder add(int value);
> > >     IntVectorBuilder addAll(int[] values);
> > >     IntVectorBuilder addNull();
> > >     // handles null values in array
> > >     IntVectorBuilder addAll(Integer... values);
> > >     IntVectorBuilder addAll(List<Integer> values);
> > >     IntVector build(String name);
> > > }
> > >
> >
>

Re: [DISCUSS][Java] Builders for java classes

Posted by Jacques Nadeau <ja...@apache.org>.
+1 on the idea of enhancing builder interfaces.

>>IntVectorBuilder addAll(int[] values);

Let's make sure that anything like the above is efficient. People will
judge the quality of the project on the efficiency of the methods we
provide. If everybody starts using int[] to build Arrow vectors, we should
make sure it is good. The complexwriter tries to be dynamically typed in a
statically typed language and had to give up some efficiency. For methods
where we're adding multiple values (as above), we should make sure to
improve the layers to maximize things.

On Thu, Oct 24, 2019 at 3:08 AM Fan Liya <li...@gmail.com> wrote:

> Hi Micah,
>
> IMO, we need an adapter from on-heap array to off-heap array.
> This is useful because many third-party Java libraries populate data to an
> on-heap array.
>
> And I see this API in your design:
>
> IntVectorBuilder addAll(int[] values);
>
> So I am +1 for this.
>
> Best,
> Liya Fan
>
> On Thu, Oct 24, 2019 at 12:31 PM Micah Kornfield <em...@gmail.com>
> wrote:
>
> > As part a PR Ji Liu has made to help populate data for test cases [1],
> the
> > question came up on whether we should provide a more  builder classes in
> > java for ValueVectors.  The proposed implementation would wrap the
> existing
> > Writer classes.
> >
> > Do people think this would be a valuable addition to the java library? I
> > imagine it would be a builder per ValueVectorType.  The main benefit I
> see
> > to this is making the library potentially slightly easier to use for
> > new-comers, but might not be the most efficient.  A straw-man interface
> is
> > listed below.
> >
> > Thoughts?
> >
> > Thanks,
> > Micah
> >
> > class IntVectorBuilder {
> >    public IntVectorBuilder(BufferAllocator allocator);
> >
> >    IntVectorBuilder add(int value);
> >     IntVectorBuilder addAll(int[] values);
> >     IntVectorBuilder addNull();
> >     // handles null values in array
> >     IntVectorBuilder addAll(Integer... values);
> >     IntVectorBuilder addAll(List<Integer> values);
> >     IntVector build(String name);
> > }
> >
>

Re: [DISCUSS][Java] Builders for java classes

Posted by Fan Liya <li...@gmail.com>.
Hi Micah,

IMO, we need an adapter from on-heap array to off-heap array.
This is useful because many third-party Java libraries populate data to an
on-heap array.

And I see this API in your design:

IntVectorBuilder addAll(int[] values);

So I am +1 for this.

Best,
Liya Fan

On Thu, Oct 24, 2019 at 12:31 PM Micah Kornfield <em...@gmail.com>
wrote:

> As part a PR Ji Liu has made to help populate data for test cases [1], the
> question came up on whether we should provide a more  builder classes in
> java for ValueVectors.  The proposed implementation would wrap the existing
> Writer classes.
>
> Do people think this would be a valuable addition to the java library? I
> imagine it would be a builder per ValueVectorType.  The main benefit I see
> to this is making the library potentially slightly easier to use for
> new-comers, but might not be the most efficient.  A straw-man interface is
> listed below.
>
> Thoughts?
>
> Thanks,
> Micah
>
> class IntVectorBuilder {
>    public IntVectorBuilder(BufferAllocator allocator);
>
>    IntVectorBuilder add(int value);
>     IntVectorBuilder addAll(int[] values);
>     IntVectorBuilder addNull();
>     // handles null values in array
>     IntVectorBuilder addAll(Integer... values);
>     IntVectorBuilder addAll(List<Integer> values);
>     IntVector build(String name);
> }
>

Re: [DISCUSS][Java] Builders for java classes

Posted by Ravindra Pindikura <ra...@dremio.com>.
On Thu, Oct 24, 2019 at 10:01 AM Micah Kornfield <em...@gmail.com>
wrote:

> As part a PR Ji Liu has made to help populate data for test cases [1], the
> question came up on whether we should provide a more  builder classes in
> java for ValueVectors.  The proposed implementation would wrap the existing
> Writer classes.
>
> Do people think this would be a valuable addition to the java library? I
> imagine it would be a builder per ValueVectorType.  The main benefit I see
> to this is making the library potentially slightly easier to use for
> new-comers, but might not be the most efficient.  A straw-man interface is
> listed below.
>
> Thoughts?
>

I can see that it makes writing tests easier, and ease-of-use (esp.
handling the setSafe/setValueCount).

In dremio, we mostly populate value vectors either :

   - from arrow buffers (eg. read from parquet)
   - from other value vectors (eg. selection vector removal or transfers)
   - directly populate the constituent arrow buffers (eg. gandiva)

so, we haven't had a need for explicit builders.



>
> Thanks,
> Micah
>
> class IntVectorBuilder {
>    public IntVectorBuilder(BufferAllocator allocator);
>
>    IntVectorBuilder add(int value);
>     IntVectorBuilder addAll(int[] values);
>     IntVectorBuilder addNull();
>     // handles null values in array
>     IntVectorBuilder addAll(Integer... values);
>     IntVectorBuilder addAll(List<Integer> values);
>     IntVector build(String name);
> }
>


-- 
Thanks and regards,
Ravindra.