You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Fan Liya <li...@gmail.com> on 2019/07/05 02:31:38 UTC

[Discuss][Java] Make the semantics of lastSet consistent

There are two lastSet member variables in the code. One is in
BaseVariableWidthVector and the other is in ListVector. In
BaseVariableWidthVector, the lastSet refers to the last index that is
actually set, while in ListVector, the lastSet refers to the next index
that will be set. So there is an inconsistency.


According to the name, lastSet should refer to the last index that is
actually set. So the semantics in ListVector should be revised. However,
the setLastSet and getLastSet methods in ListVector have been made public,
so they cannot be changed freely.


My initial idea is that: we first change the internal semantics of
ListVector, leaving the external semantics (setLastSet and getLastSet
methods) unchanged. Meanwhile, we make the setLastset & getLastSet methods
deprecated. Changing the external semantics will be performed later as a
long process.


Would you please give some comments? Do you have some other ideas?


Thank you in advance.


Liya Fan

Re: [Discuss][Java] Make the semantics of lastSet consistent

Posted by Fan Liya <li...@gmail.com>.
@Ravindra Pindikura, thanks a lot for your confirmation.
We have prepared a PR <https://github.com/apache/arrow/pull/4797> for this.
Would you please help take a look?

Best,
Liya Fan

On Mon, Jul 8, 2019 at 4:25 PM Ravindra Pindikura <ra...@dremio.com>
wrote:

> On Sat, Jul 6, 2019 at 5:48 AM Jacques Nadeau <ja...@apache.org> wrote:
>
> > Ravindra, Praveen and Prudhvi, can you confirm the ramifications of this
> > change and what impact this inconsistency has had downstream?
> >
>
> Looks like the ListVector treats lastSet as the "last set index" in the
> offsets buffer. It treats it this way in the other internal functions too
> (eg. startNewValue, setValueCount, ..).
>
> dremio doesn't use the getLastSet fn. It does setLastSet in one place (with
> the compensatory +1).
>
>
> https://github.com/dremio/dremio-oss/blob/master/sabot/kernel/src/main/java/org/apache/arrow/vector/complex/ListVectorHelper.java#L65
>
> I'm fine with treating this as bug and fixing the external semantics too.
>
>
> > On Thu, Jul 4, 2019 at 7:32 PM Fan Liya <li...@gmail.com> wrote:
> >
> > > There are two lastSet member variables in the code. One is in
> > > BaseVariableWidthVector and the other is in ListVector. In
> > > BaseVariableWidthVector, the lastSet refers to the last index that is
> > > actually set, while in ListVector, the lastSet refers to the next index
> > > that will be set. So there is an inconsistency.
> > >
> > >
> > > According to the name, lastSet should refer to the last index that is
> > > actually set. So the semantics in ListVector should be revised.
> However,
> > > the setLastSet and getLastSet methods in ListVector have been made
> > public,
> > > so they cannot be changed freely.
> > >
> > >
> > > My initial idea is that: we first change the internal semantics of
> > > ListVector, leaving the external semantics (setLastSet and getLastSet
> > > methods) unchanged. Meanwhile, we make the setLastset & getLastSet
> > methods
> > > deprecated. Changing the external semantics will be performed later as
> a
> > > long process.
> > >
> > >
> > > Would you please give some comments? Do you have some other ideas?
> > >
> > >
> > > Thank you in advance.
> > >
> > >
> > > Liya Fan
> > >
> >
>
>
> --
> Thanks and regards,
> Ravindra.
>

Re: [Discuss][Java] Make the semantics of lastSet consistent

Posted by Ravindra Pindikura <ra...@dremio.com>.
On Sat, Jul 6, 2019 at 5:48 AM Jacques Nadeau <ja...@apache.org> wrote:

> Ravindra, Praveen and Prudhvi, can you confirm the ramifications of this
> change and what impact this inconsistency has had downstream?
>

Looks like the ListVector treats lastSet as the "last set index" in the
offsets buffer. It treats it this way in the other internal functions too
(eg. startNewValue, setValueCount, ..).

dremio doesn't use the getLastSet fn. It does setLastSet in one place (with
the compensatory +1).

https://github.com/dremio/dremio-oss/blob/master/sabot/kernel/src/main/java/org/apache/arrow/vector/complex/ListVectorHelper.java#L65

I'm fine with treating this as bug and fixing the external semantics too.


> On Thu, Jul 4, 2019 at 7:32 PM Fan Liya <li...@gmail.com> wrote:
>
> > There are two lastSet member variables in the code. One is in
> > BaseVariableWidthVector and the other is in ListVector. In
> > BaseVariableWidthVector, the lastSet refers to the last index that is
> > actually set, while in ListVector, the lastSet refers to the next index
> > that will be set. So there is an inconsistency.
> >
> >
> > According to the name, lastSet should refer to the last index that is
> > actually set. So the semantics in ListVector should be revised. However,
> > the setLastSet and getLastSet methods in ListVector have been made
> public,
> > so they cannot be changed freely.
> >
> >
> > My initial idea is that: we first change the internal semantics of
> > ListVector, leaving the external semantics (setLastSet and getLastSet
> > methods) unchanged. Meanwhile, we make the setLastset & getLastSet
> methods
> > deprecated. Changing the external semantics will be performed later as a
> > long process.
> >
> >
> > Would you please give some comments? Do you have some other ideas?
> >
> >
> > Thank you in advance.
> >
> >
> > Liya Fan
> >
>


-- 
Thanks and regards,
Ravindra.

Re: [Discuss][Java] Make the semantics of lastSet consistent

Posted by Jacques Nadeau <ja...@apache.org>.
Ravindra, Praveen and Prudhvi, can you confirm the ramifications of this
change and what impact this inconsistency has had downstream?

On Thu, Jul 4, 2019 at 7:32 PM Fan Liya <li...@gmail.com> wrote:

> There are two lastSet member variables in the code. One is in
> BaseVariableWidthVector and the other is in ListVector. In
> BaseVariableWidthVector, the lastSet refers to the last index that is
> actually set, while in ListVector, the lastSet refers to the next index
> that will be set. So there is an inconsistency.
>
>
> According to the name, lastSet should refer to the last index that is
> actually set. So the semantics in ListVector should be revised. However,
> the setLastSet and getLastSet methods in ListVector have been made public,
> so they cannot be changed freely.
>
>
> My initial idea is that: we first change the internal semantics of
> ListVector, leaving the external semantics (setLastSet and getLastSet
> methods) unchanged. Meanwhile, we make the setLastset & getLastSet methods
> deprecated. Changing the external semantics will be performed later as a
> long process.
>
>
> Would you please give some comments? Do you have some other ideas?
>
>
> Thank you in advance.
>
>
> Liya Fan
>