You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Ji Liu <ni...@aliyun.com.INVALID> on 2019/08/11 00:24:49 UTC

[DISCUSS][JAVA] Make FixedSizedListVector inherit from ListVector

Hi, all

While working on the issue to implement dictionary-encoded subfields[1] [2], I found FixedSizeListVector not extends ListVector(Thanks Micah pointing this out and curious why implemented FixedSizeListVector this way 
before). Since FixedSizeListVector is a specific case of ListVector, should we make former extends the latter to reduce the plenty duplicated logic in these two and writer/reader classes? 


Thanks,
Ji Liu

[1] https://issues.apache.org/jira/browse/ARROW-1175[2]https://github.com/apache/arrow/pull/4972


Re: [DISCUSS][JAVA] Make FixedSizedListVector inherit from ListVector

Posted by Ji Liu <ni...@aliyun.com.INVALID>.
My original thoughts is that introduce a new interface makes the hierarchy a little confused (FixedListVector->BaseListVector, ListVector->BaseRepeatedVector->BaseListVector) and should try to avoid introducing new classes.

And you are right, FixedSizeListVector should not include offsetBuffer, I am fine with the approach you suggested, BaseRepeatedVector/ListVector and FixedSizeListVector both inherit from new interface(BaseListVector) they can be used interchangeably in dictionary encoding.Let see if others have different opinions.


Thanks,
Ji Liu


------------------------------------------------------------------
From:Micah Kornfield <em...@gmail.com>
Send Time:2019年8月14日(星期三) 15:10
To:Ji Liu <ni...@aliyun.com>
Cc:dev <de...@arrow.apache.org>
Subject:Re: [DISCUSS][JAVA] Make FixedSizedListVector inherit from ListVector

You are right, the mainly difference between FixSizedListVector and ListVector is the offsetBuffer, but I think this could be avoided through allocateNewSafe() overwrite which calls allocateOffsetBuffer() in BaseRepeatedValueVector, in this way, offsetBuffer in FixSizedListVector will remain allocator.getEmpty(). 

I think there other methods that FixedSizeList shouldn't be implementing that are on List as well.   In an ideal world, I think the parent class/interface would be called ListVector and there would then be specific children of FixedSizeList and VariableSizeList.  I think that is too big a change to something core, but we should try to keep the relationship in that shape, so we don't need to override methods just to throw NotSupportedExceptions.
On Sun, Aug 11, 2019 at 7:35 AM Ji Liu <ni...@aliyun.com> wrote:

Thanks Jacques, to avoid complex call paths for getObject, should keep getObject for both classes. I'll also checked for other methods.

Thanks,
Ji Liu

------------------------------------------------------------------
From:Jacques Nadeau <ja...@apache.org>
Send Time:2019年8月11日(星期日) 21:43
To:dev <de...@arrow.apache.org>; Ji Liu <ni...@aliyun.com>
Cc:emkornfield <em...@gmail.com>
Subject:Re: [DISCUSS][JAVA] Make FixedSizedListVector inherit from ListVector

We tried to get away from this kind of back and forth with subclassing as
much as possible. (call getObject on base class which then calls getIndex
on child class which then calls something else on base class). I haven't
looked through the code but let's try to avoid having complex call paths
for the vectors.

On Sat, Aug 10, 2019 at 6:07 PM Ji Liu <ni...@aliyun.com.invalid> wrote:

> Hi Micah, thanks for your suggestion.
> You are right, the mainly difference between FixSizedListVector and
> ListVector is the offsetBuffer, but I think this could be avoided through
> allocateNewSafe() overwrite which calls allocateOffsetBuffer() in
> BaseRepeatedValueVector, in this way, offsetBuffer in FixSizedListVector
> will remain allocator.getEmpty().
>
> Meanwhile, we could add getStartIndex(int index)/getEndIndex(int index)
> API to handle read data logic respectively which could be used in
> getObject(int index) or encoding parts. What’s more, no new interface need
> to be introduced.
>
> What do you think?
>
>
> Thanks,
> Ji Liu
>
>
> ------------------------------------------------------------------
> From:Micah Kornfield <em...@gmail.com>
> Send Time:2019年8月11日(星期日) 08:47
> To:dev <de...@arrow.apache.org>; Ji Liu <ni...@aliyun.com>
> Subject:Re: [DISCUSS][JAVA] Make FixedSizedListVector inherit from
> ListVector
>
> Hi Ji Liu,
> I think have a common interface/base-class for the two makes sense (but
> don't have historical context) from a reading data perspective.
>
> I think the change would need to be something above
> BaseRepeatedValueVector, since the FixedSizeListVector doesn't contain an
> offset buffer, and that field is contained on BaseRepeatedValueVector.
>
> Thanks,
> Micah
> On Sat, Aug 10, 2019 at 5:25 PM Ji Liu <ni...@aliyun.com.invalid> wrote:
> Hi, all
>
>  While working on the issue to implement dictionary-encoded subfields[1]
> [2], I found FixedSizeListVector not extends ListVector(Thanks Micah
> pointing this out and curious why implemented FixedSizeListVector this way
>  before). Since FixedSizeListVector is a specific case of ListVector,
> should we make former extends the latter to reduce the plenty duplicated
> logic in these two and writer/reader classes?
>
>
>  Thanks,
>  Ji Liu
>
>  [1]
> https://issues.apache.org/jira/browse/ARROW-1175[2]https://github.com/apache/arrow/pull/4972
>
>

Re: [DISCUSS][JAVA] Make FixedSizedListVector inherit from ListVector

Posted by Micah Kornfield <em...@gmail.com>.
>
> You are right, the mainly difference between FixSizedListVector and
> ListVector is the offsetBuffer, but I think this could be avoided through
> allocateNewSafe() overwrite which calls allocateOffsetBuffer() in
> BaseRepeatedValueVector, in this way, offsetBuffer in FixSizedListVector
> will remain allocator.getEmpty().


I think there other methods that FixedSizeList shouldn't be implementing
that are on List as well.   In an ideal world, I think the parent
class/interface would be called ListVector and there would then be specific
children of FixedSizeList and VariableSizeList.  I think that is too big a
change to something core, but we should try to keep the relationship in
that shape, so we don't need to override methods just to throw
NotSupportedExceptions.

On Sun, Aug 11, 2019 at 7:35 AM Ji Liu <ni...@aliyun.com> wrote:

> Thanks Jacques, to avoid complex call paths for getObject, should keep
> getObject for both classes. I'll also checked for other methods.
>
> Thanks,
> Ji Liu
>
> ------------------------------------------------------------------
> From:Jacques Nadeau <ja...@apache.org>
> Send Time:2019年8月11日(星期日) 21:43
> To:dev <de...@arrow.apache.org>; Ji Liu <ni...@aliyun.com>
> Cc:emkornfield <em...@gmail.com>
> Subject:Re: [DISCUSS][JAVA] Make FixedSizedListVector inherit from
> ListVector
>
> We tried to get away from this kind of back and forth with subclassing as
> much as possible. (call getObject on base class which then calls getIndex
> on child class which then calls something else on base class). I haven't
> looked through the code but let's try to avoid having complex call paths
> for the vectors.
>
> On Sat, Aug 10, 2019 at 6:07 PM Ji Liu <ni...@aliyun.com.invalid> wrote:
>
> > Hi Micah, thanks for your suggestion.
> > You are right, the mainly difference between FixSizedListVector and
> > ListVector is the offsetBuffer, but I think this could be avoided through
> > allocateNewSafe() overwrite which calls allocateOffsetBuffer() in
> > BaseRepeatedValueVector, in this way, offsetBuffer in FixSizedListVector
> > will remain allocator.getEmpty().
> >
> > Meanwhile, we could add getStartIndex(int index)/getEndIndex(int index)
> > API to handle read data logic respectively which could be used in
>
> > getObject(int index) or encoding parts. What’s more, no new interface need
> > to be introduced.
> >
> > What do you think?
> >
> >
> > Thanks,
> > Ji Liu
> >
> >
> > ------------------------------------------------------------------
> > From:Micah Kornfield <em...@gmail.com>
> > Send Time:2019年8月11日(星期日) 08:47
> > To:dev <de...@arrow.apache.org>; Ji Liu <ni...@aliyun.com>
> > Subject:Re: [DISCUSS][JAVA] Make FixedSizedListVector inherit from
> > ListVector
> >
> > Hi Ji Liu,
> > I think have a common interface/base-class for the two makes sense (but
> > don't have historical context) from a reading data perspective.
> >
> > I think the change would need to be something above
> > BaseRepeatedValueVector, since the FixedSizeListVector doesn't contain an
> > offset buffer, and that field is contained on BaseRepeatedValueVector.
> >
> > Thanks,
> > Micah
> > On Sat, Aug 10, 2019 at 5:25 PM Ji Liu <niki.lj@aliyun.com.invalid
> > wrote:
> > Hi, all
> >
> >  While working on the issue to implement dictionary-encoded subfields[1]
> > [2], I found FixedSizeListVector not extends ListVector(Thanks Micah
>
> > pointing this out and curious why implemented FixedSizeListVector this way
> >  before). Since FixedSizeListVector is a specific case of ListVector,
> > should we make former extends the latter to reduce the plenty duplicated
> > logic in these two and writer/reader classes?
> >
> >
> >  Thanks,
> >  Ji Liu
> >
> >  [1]
> >
> https://issues.apache.org/jira/browse/ARROW-1175[2]https://github.com/apache/arrow/pull/4972
> >
> >
>
>

Re: [DISCUSS][JAVA] Make FixedSizedListVector inherit from ListVector

Posted by Ji Liu <ni...@aliyun.com.INVALID>.
Thanks Jacques, to avoid complex call paths for getObject, should keep getObject for both classes. I'll also checked for other methods.

Thanks,
Ji Liu


------------------------------------------------------------------
From:Jacques Nadeau <ja...@apache.org>
Send Time:2019年8月11日(星期日) 21:43
To:dev <de...@arrow.apache.org>; Ji Liu <ni...@aliyun.com>
Cc:emkornfield <em...@gmail.com>
Subject:Re: [DISCUSS][JAVA] Make FixedSizedListVector inherit from ListVector

We tried to get away from this kind of back and forth with subclassing as
much as possible. (call getObject on base class which then calls getIndex
on child class which then calls something else on base class). I haven't
looked through the code but let's try to avoid having complex call paths
for the vectors.

On Sat, Aug 10, 2019 at 6:07 PM Ji Liu <ni...@aliyun.com.invalid> wrote:

> Hi Micah, thanks for your suggestion.
> You are right, the mainly difference between FixSizedListVector and
> ListVector is the offsetBuffer, but I think this could be avoided through
> allocateNewSafe() overwrite which calls allocateOffsetBuffer() in
> BaseRepeatedValueVector, in this way, offsetBuffer in FixSizedListVector
> will remain allocator.getEmpty().
>
> Meanwhile, we could add getStartIndex(int index)/getEndIndex(int index)
> API to handle read data logic respectively which could be used in
> getObject(int index) or encoding parts. What’s more, no new interface need
> to be introduced.
>
> What do you think?
>
>
> Thanks,
> Ji Liu
>
>
> ------------------------------------------------------------------
> From:Micah Kornfield <em...@gmail.com>
> Send Time:2019年8月11日(星期日) 08:47
> To:dev <de...@arrow.apache.org>; Ji Liu <ni...@aliyun.com>
> Subject:Re: [DISCUSS][JAVA] Make FixedSizedListVector inherit from
> ListVector
>
> Hi Ji Liu,
> I think have a common interface/base-class for the two makes sense (but
> don't have historical context) from a reading data perspective.
>
> I think the change would need to be something above
> BaseRepeatedValueVector, since the FixedSizeListVector doesn't contain an
> offset buffer, and that field is contained on BaseRepeatedValueVector.
>
> Thanks,
> Micah
> On Sat, Aug 10, 2019 at 5:25 PM Ji Liu <ni...@aliyun.com.invalid> wrote:
> Hi, all
>
>  While working on the issue to implement dictionary-encoded subfields[1]
> [2], I found FixedSizeListVector not extends ListVector(Thanks Micah
> pointing this out and curious why implemented FixedSizeListVector this way
>  before). Since FixedSizeListVector is a specific case of ListVector,
> should we make former extends the latter to reduce the plenty duplicated
> logic in these two and writer/reader classes?
>
>
>  Thanks,
>  Ji Liu
>
>  [1]
> https://issues.apache.org/jira/browse/ARROW-1175[2]https://github.com/apache/arrow/pull/4972
>
>

Re: [DISCUSS][JAVA] Make FixedSizedListVector inherit from ListVector

Posted by Jacques Nadeau <ja...@apache.org>.
We tried to get away from this kind of back and forth with subclassing as
much as possible. (call getObject on base class which then calls getIndex
on child class which then calls something else on base class). I haven't
looked through the code but let's try to avoid having complex call paths
for the vectors.

On Sat, Aug 10, 2019 at 6:07 PM Ji Liu <ni...@aliyun.com.invalid> wrote:

> Hi Micah, thanks for your suggestion.
> You are right, the mainly difference between FixSizedListVector and
> ListVector is the offsetBuffer, but I think this could be avoided through
> allocateNewSafe() overwrite which calls allocateOffsetBuffer() in
> BaseRepeatedValueVector, in this way, offsetBuffer in FixSizedListVector
> will remain allocator.getEmpty().
>
> Meanwhile, we could add getStartIndex(int index)/getEndIndex(int index)
> API to handle read data logic respectively which could be used in
> getObject(int index) or encoding parts. What’s more, no new interface need
> to be introduced.
>
> What do you think?
>
>
> Thanks,
> Ji Liu
>
>
> ------------------------------------------------------------------
> From:Micah Kornfield <em...@gmail.com>
> Send Time:2019年8月11日(星期日) 08:47
> To:dev <de...@arrow.apache.org>; Ji Liu <ni...@aliyun.com>
> Subject:Re: [DISCUSS][JAVA] Make FixedSizedListVector inherit from
> ListVector
>
> Hi Ji Liu,
> I think have a common interface/base-class for the two makes sense (but
> don't have historical context) from a reading data perspective.
>
> I think the change would need to be something above
> BaseRepeatedValueVector, since the FixedSizeListVector doesn't contain an
> offset buffer, and that field is contained on BaseRepeatedValueVector.
>
> Thanks,
> Micah
> On Sat, Aug 10, 2019 at 5:25 PM Ji Liu <ni...@aliyun.com.invalid> wrote:
> Hi, all
>
>  While working on the issue to implement dictionary-encoded subfields[1]
> [2], I found FixedSizeListVector not extends ListVector(Thanks Micah
> pointing this out and curious why implemented FixedSizeListVector this way
>  before). Since FixedSizeListVector is a specific case of ListVector,
> should we make former extends the latter to reduce the plenty duplicated
> logic in these two and writer/reader classes?
>
>
>  Thanks,
>  Ji Liu
>
>  [1]
> https://issues.apache.org/jira/browse/ARROW-1175[2]https://github.com/apache/arrow/pull/4972
>
>

Re: [DISCUSS][JAVA] Make FixedSizedListVector inherit from ListVector

Posted by Ji Liu <ni...@aliyun.com.INVALID>.
Hi Micah, thanks for your suggestion. 
You are right, the mainly difference between FixSizedListVector and ListVector is the offsetBuffer, but I think this could be avoided through allocateNewSafe() overwrite which calls allocateOffsetBuffer() in BaseRepeatedValueVector, in this way, offsetBuffer in FixSizedListVector will remain allocator.getEmpty(). 

Meanwhile, we could add getStartIndex(int index)/getEndIndex(int index) API to handle read data logic respectively which could be used in getObject(int index) or encoding parts. What’s more, no new interface need to be introduced.

What do you think?


Thanks,
Ji Liu


------------------------------------------------------------------
From:Micah Kornfield <em...@gmail.com>
Send Time:2019年8月11日(星期日) 08:47
To:dev <de...@arrow.apache.org>; Ji Liu <ni...@aliyun.com>
Subject:Re: [DISCUSS][JAVA] Make FixedSizedListVector inherit from ListVector

Hi Ji Liu,
I think have a common interface/base-class for the two makes sense (but don't have historical context) from a reading data perspective. 

I think the change would need to be something above BaseRepeatedValueVector, since the FixedSizeListVector doesn't contain an offset buffer, and that field is contained on BaseRepeatedValueVector.

Thanks,
Micah
On Sat, Aug 10, 2019 at 5:25 PM Ji Liu <ni...@aliyun.com.invalid> wrote:
Hi, all

 While working on the issue to implement dictionary-encoded subfields[1] [2], I found FixedSizeListVector not extends ListVector(Thanks Micah pointing this out and curious why implemented FixedSizeListVector this way 
 before). Since FixedSizeListVector is a specific case of ListVector, should we make former extends the latter to reduce the plenty duplicated logic in these two and writer/reader classes? 


 Thanks,
 Ji Liu

 [1] https://issues.apache.org/jira/browse/ARROW-1175[2]https://github.com/apache/arrow/pull/4972


Re: [DISCUSS][JAVA] Make FixedSizedListVector inherit from ListVector

Posted by Micah Kornfield <em...@gmail.com>.
Hi Ji Liu,
I think have a common interface/base-class for the two makes sense (but
don't have historical context) from a reading data perspective.

I think the change would need to be something above
BaseRepeatedValueVector, since the FixedSizeListVector doesn't contain an
offset buffer, and that field is contained on BaseRepeatedValueVector.

Thanks,
Micah

On Sat, Aug 10, 2019 at 5:25 PM Ji Liu <ni...@aliyun.com.invalid> wrote:

> Hi, all
>
> While working on the issue to implement dictionary-encoded subfields[1]
> [2], I found FixedSizeListVector not extends ListVector(Thanks Micah
> pointing this out and curious why implemented FixedSizeListVector this way
> before). Since FixedSizeListVector is a specific case of ListVector,
> should we make former extends the latter to reduce the plenty duplicated
> logic in these two and writer/reader classes?
>
>
> Thanks,
> Ji Liu
>
> [1]
> https://issues.apache.org/jira/browse/ARROW-1175[2]https://github.com/apache/arrow/pull/4972
>
>