You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Paul Taylor <pt...@gmail.com> on 2021/11/02 20:12:02 UTC

Re: [VOTE] Remove compute from Arrow JS

+1 from me as well

> On Oct 27, 2021, at 6:58 PM, Brian Hulette <bh...@apache.org> wrote:
> 
> 
> +1
> 
> I don't think there's much reason to keep the compute code around when there's a more performant, easier to use alternative. I think the only unique feature of the arrow compute code was the ability to optimize queries on dictionary-encoded columns, but Jeff added this to Arquero almost a year ago now [1].
> 
> Brian
> 
> [1] https://github.com/uwdata/arquero/issues/86
> 
>> On Wed, Oct 27, 2021 at 4:46 PM Dominik Moritz <do...@apache.org> wrote:
>> Dear Arrow community,
>> 
>> We are proposing to remove the compute code from Arrow JS. Right now, the compute code is encapsulated in a DataFrame class that extends Table. The DataFrame implements a few functions such as filtering and counting with expressions. However, the predicate code is not very efficient (it’s interpreted) and most people only use Arrow to read data but don’t need compute. There are also more complete alternatives for doing compute on Arrow data structures such as Arquero (https://github.com/uwdata/arquero). By removing the compute code, we can focus on the IPC reading/writing and primitive types.
>> 
>> The vote will be open for at least 72 hours.
>> 
>> [ ] +1 Remove compute from Arrow JS
>> [ ] +0
>> [ ] -1 Do not remove compute because…
>> 
>> Thank you,
>> Dominik

Re: [VOTE] Remove compute from Arrow JS

Posted by Dominik Moritz <do...@cmu.edu>.
Thank you, Micah. We plan to rewrite a lot of the API in Arrow 7 so I think
option 1 would add a lot of overhead. I added a note to the Arrow 6 release
notes in https://github.com/apache/arrow-site/pull/153/files#r741928109.

On Nov 2, 2021 at 22:30:01, Micah Kornfield <em...@gmail.com> wrote:

> I'd suggest maybe two things:
> 1.  If possible add deprecation warnings for the next release, and delete
> the release after (we don't have a formal policy but if would be good to
> give user heads up before out-right deletion).
> 2.  If 1 isn't an option then please add something to the 6.0.0 release
> notes indicating the removal.
>
> Cheers,
> Micah
>
> On Tue, Nov 2, 2021 at 1:54 PM Dominik Moritz <do...@apache.org> wrote:
>
> +1 from me as well.
>
>
> That brings us to 3 times +1 and no -1 or +0.
>
>
> Thank you, all. We will remove the compute code in the next Arrow version.
>
>
> On Nov 2, 2021 at 16:12:02, Paul Taylor <pt...@gmail.com> wrote:
>
>
> > +1 from me as well
>
> >
>
> > On Oct 27, 2021, at 6:58 PM, Brian Hulette <bh...@apache.org> wrote:
>
> >
>
> > 
>
> > +1
>
> >
>
> > I don't think there's much reason to keep the compute code around when
>
> > there's a more performant, easier to use alternative. I think the only
>
> > unique feature of the arrow compute code was the ability to optimize
>
> > queries on dictionary-encoded columns, but Jeff added this to Arquero
>
> > almost a year ago now [1].
>
> >
>
> > Brian
>
> >
>
> > [1] https://github.com/uwdata/arquero/issues/86
>
> >
>
> > On Wed, Oct 27, 2021 at 4:46 PM Dominik Moritz <do...@apache.org>
>
> > wrote:
>
> >
>
> >> Dear Arrow community,
>
> >>
>
> >> We are proposing to remove the compute code from Arrow JS. Right now,
>
> the
>
> >> compute code is encapsulated in a DataFrame class that extends Table.
>
> The
>
> >> DataFrame implements a few functions such as filtering and counting with
>
> >> expressions. However, the predicate code is not very efficient (it’s
>
> >> interpreted) and most people only use Arrow to read data but don’t need
>
> >> compute. There are also more complete alternatives for doing compute on
>
> >> Arrow data structures such as Arquero (
>
> https://github.com/uwdata/arquero).
>
> >> By removing the compute code, we can focus on the IPC reading/writing
>
> and
>
> >> primitive types.
>
> >>
>
> >> The vote will be open for at least 72 hours.
>
> >>
>
> >> [ ] +1 Remove compute from Arrow JS
>
> >> [ ] +0
>
> >> [ ] -1 Do not remove compute because…
>
> >>
>
> >> Thank you,
>
> >> Dominik
>
> >>
>
> >
>
>
>

Re: [VOTE] Remove compute from Arrow JS

Posted by Dominik Moritz <do...@apache.org>.
 Hi Micah, I just sent a separate message to the mailing list about the
changes. The discussion is happening on the GitHub pull request and Jira.

On Nov 5, 2021 at 18:25:58, Micah Kornfield <em...@gmail.com> wrote:

>
> I added a note to the Arrow 6 release notes in
>
> https://github.com/apache/arrow-site/pull/153/files#r741928109.
>
>
> Thank you.
>
>
> We plan to rewrite a lot of the API in Arrow 7
>
>
>
> Sorry I missed them on the mailing list.  Were these plans discussed here
> (sorry if I missed them, sometimes I filter out stuff from javascript
> discussions)? or in another forum?
>
> -Micah
>
> On Wed, Nov 3, 2021 at 9:23 AM Dominik Moritz <do...@apache.org> wrote:
>
> Thank you, Micah. We plan to rewrite a lot of the API in Arrow 7 so I
>
> think option 1 would add a lot of overhead. I added a note to the Arrow 6
>
> release notes in
>
> https://github.com/apache/arrow-site/pull/153/files#r741928109.
>
>
> On Nov 2, 2021 at 22:30:01, Micah Kornfield <em...@gmail.com> wrote:
>
>
> > I'd suggest maybe two things:
>
> > 1.  If possible add deprecation warnings for the next release, and delete
>
> > the release after (we don't have a formal policy but if would be good to
>
> > give user heads up before out-right deletion).
>
> > 2.  If 1 isn't an option then please add something to the 6.0.0 release
>
> > notes indicating the removal.
>
> >
>
> > Cheers,
>
> > Micah
>
> >
>
> > On Tue, Nov 2, 2021 at 1:54 PM Dominik Moritz <do...@apache.org>
>
> > wrote:
>
> >
>
> > +1 from me as well.
>
> >
>
> >
>
> > That brings us to 3 times +1 and no -1 or +0.
>
> >
>
> >
>
> > Thank you, all. We will remove the compute code in the next Arrow
> version.
>
> >
>
> >
>
> > On Nov 2, 2021 at 16:12:02, Paul Taylor <pt...@gmail.com>
> wrote:
>
> >
>
> >
>
> > > +1 from me as well
>
> >
>
> > >
>
> >
>
> > > On Oct 27, 2021, at 6:58 PM, Brian Hulette <bh...@apache.org>
> wrote:
>
> >
>
> > >
>
> >
>
> > > 
>
> >
>
> > > +1
>
> >
>
> > >
>
> >
>
> > > I don't think there's much reason to keep the compute code around when
>
> >
>
> > > there's a more performant, easier to use alternative. I think the only
>
> >
>
> > > unique feature of the arrow compute code was the ability to optimize
>
> >
>
> > > queries on dictionary-encoded columns, but Jeff added this to Arquero
>
> >
>
> > > almost a year ago now [1].
>
> >
>
> > >
>
> >
>
> > > Brian
>
> >
>
> > >
>
> >
>
> > > [1] https://github.com/uwdata/arquero/issues/86
>
> >
>
> > >
>
> >
>
> > > On Wed, Oct 27, 2021 at 4:46 PM Dominik Moritz <do...@apache.org>
>
> >
>
> > > wrote:
>
> >
>
> > >
>
> >
>
> > >> Dear Arrow community,
>
> >
>
> > >>
>
> >
>
> > >> We are proposing to remove the compute code from Arrow JS. Right now,
>
> >
>
> > the
>
> >
>
> > >> compute code is encapsulated in a DataFrame class that extends Table.
>
> >
>
> > The
>
> >
>
> > >> DataFrame implements a few functions such as filtering and counting
>
> > with
>
> >
>
> > >> expressions. However, the predicate code is not very efficient (it’s
>
> >
>
> > >> interpreted) and most people only use Arrow to read data but don’t
> need
>
> >
>
> > >> compute. There are also more complete alternatives for doing compute
> on
>
> >
>
> > >> Arrow data structures such as Arquero (
>
> >
>
> > https://github.com/uwdata/arquero).
>
> >
>
> > >> By removing the compute code, we can focus on the IPC reading/writing
>
> >
>
> > and
>
> >
>
> > >> primitive types.
>
> >
>
> > >>
>
> >
>
> > >> The vote will be open for at least 72 hours.
>
> >
>
> > >>
>
> >
>
> > >> [ ] +1 Remove compute from Arrow JS
>
> >
>
> > >> [ ] +0
>
> >
>
> > >> [ ] -1 Do not remove compute because…
>
> >
>
> > >>
>
> >
>
> > >> Thank you,
>
> >
>
> > >> Dominik
>
> >
>
> > >>
>
> >
>
> > >
>
> >
>
> >
>
> >
>
>

Re: [VOTE] Remove compute from Arrow JS

Posted by Micah Kornfield <em...@gmail.com>.
>
> I added a note to the Arrow 6 release notes in
> https://github.com/apache/arrow-site/pull/153/files#r741928109.

Thank you.


> We plan to rewrite a lot of the API in Arrow 7


Sorry I missed them on the mailing list.  Were these plans discussed here
(sorry if I missed them, sometimes I filter out stuff from javascript
discussions)? or in another forum?

-Micah

On Wed, Nov 3, 2021 at 9:23 AM Dominik Moritz <do...@apache.org> wrote:

> Thank you, Micah. We plan to rewrite a lot of the API in Arrow 7 so I
> think option 1 would add a lot of overhead. I added a note to the Arrow 6
> release notes in
> https://github.com/apache/arrow-site/pull/153/files#r741928109.
>
> On Nov 2, 2021 at 22:30:01, Micah Kornfield <em...@gmail.com> wrote:
>
>> I'd suggest maybe two things:
>> 1.  If possible add deprecation warnings for the next release, and delete
>> the release after (we don't have a formal policy but if would be good to
>> give user heads up before out-right deletion).
>> 2.  If 1 isn't an option then please add something to the 6.0.0 release
>> notes indicating the removal.
>>
>> Cheers,
>> Micah
>>
>> On Tue, Nov 2, 2021 at 1:54 PM Dominik Moritz <do...@apache.org>
>> wrote:
>>
>> +1 from me as well.
>>
>>
>> That brings us to 3 times +1 and no -1 or +0.
>>
>>
>> Thank you, all. We will remove the compute code in the next Arrow version.
>>
>>
>> On Nov 2, 2021 at 16:12:02, Paul Taylor <pt...@gmail.com> wrote:
>>
>>
>> > +1 from me as well
>>
>> >
>>
>> > On Oct 27, 2021, at 6:58 PM, Brian Hulette <bh...@apache.org> wrote:
>>
>> >
>>
>> > 
>>
>> > +1
>>
>> >
>>
>> > I don't think there's much reason to keep the compute code around when
>>
>> > there's a more performant, easier to use alternative. I think the only
>>
>> > unique feature of the arrow compute code was the ability to optimize
>>
>> > queries on dictionary-encoded columns, but Jeff added this to Arquero
>>
>> > almost a year ago now [1].
>>
>> >
>>
>> > Brian
>>
>> >
>>
>> > [1] https://github.com/uwdata/arquero/issues/86
>>
>> >
>>
>> > On Wed, Oct 27, 2021 at 4:46 PM Dominik Moritz <do...@apache.org>
>>
>> > wrote:
>>
>> >
>>
>> >> Dear Arrow community,
>>
>> >>
>>
>> >> We are proposing to remove the compute code from Arrow JS. Right now,
>>
>> the
>>
>> >> compute code is encapsulated in a DataFrame class that extends Table.
>>
>> The
>>
>> >> DataFrame implements a few functions such as filtering and counting
>> with
>>
>> >> expressions. However, the predicate code is not very efficient (it’s
>>
>> >> interpreted) and most people only use Arrow to read data but don’t need
>>
>> >> compute. There are also more complete alternatives for doing compute on
>>
>> >> Arrow data structures such as Arquero (
>>
>> https://github.com/uwdata/arquero).
>>
>> >> By removing the compute code, we can focus on the IPC reading/writing
>>
>> and
>>
>> >> primitive types.
>>
>> >>
>>
>> >> The vote will be open for at least 72 hours.
>>
>> >>
>>
>> >> [ ] +1 Remove compute from Arrow JS
>>
>> >> [ ] +0
>>
>> >> [ ] -1 Do not remove compute because…
>>
>> >>
>>
>> >> Thank you,
>>
>> >> Dominik
>>
>> >>
>>
>> >
>>
>>
>>

Re: [VOTE] Remove compute from Arrow JS

Posted by Dominik Moritz <do...@apache.org>.
 Thank you, Micah. We plan to rewrite a lot of the API in Arrow 7 so I
think option 1 would add a lot of overhead. I added a note to the Arrow 6
release notes in
https://github.com/apache/arrow-site/pull/153/files#r741928109.

On Nov 2, 2021 at 22:30:01, Micah Kornfield <em...@gmail.com> wrote:

> I'd suggest maybe two things:
> 1.  If possible add deprecation warnings for the next release, and delete
> the release after (we don't have a formal policy but if would be good to
> give user heads up before out-right deletion).
> 2.  If 1 isn't an option then please add something to the 6.0.0 release
> notes indicating the removal.
>
> Cheers,
> Micah
>
> On Tue, Nov 2, 2021 at 1:54 PM Dominik Moritz <do...@apache.org> wrote:
>
> +1 from me as well.
>
>
> That brings us to 3 times +1 and no -1 or +0.
>
>
> Thank you, all. We will remove the compute code in the next Arrow version.
>
>
> On Nov 2, 2021 at 16:12:02, Paul Taylor <pt...@gmail.com> wrote:
>
>
> > +1 from me as well
>
> >
>
> > On Oct 27, 2021, at 6:58 PM, Brian Hulette <bh...@apache.org> wrote:
>
> >
>
> > 
>
> > +1
>
> >
>
> > I don't think there's much reason to keep the compute code around when
>
> > there's a more performant, easier to use alternative. I think the only
>
> > unique feature of the arrow compute code was the ability to optimize
>
> > queries on dictionary-encoded columns, but Jeff added this to Arquero
>
> > almost a year ago now [1].
>
> >
>
> > Brian
>
> >
>
> > [1] https://github.com/uwdata/arquero/issues/86
>
> >
>
> > On Wed, Oct 27, 2021 at 4:46 PM Dominik Moritz <do...@apache.org>
>
> > wrote:
>
> >
>
> >> Dear Arrow community,
>
> >>
>
> >> We are proposing to remove the compute code from Arrow JS. Right now,
>
> the
>
> >> compute code is encapsulated in a DataFrame class that extends Table.
>
> The
>
> >> DataFrame implements a few functions such as filtering and counting with
>
> >> expressions. However, the predicate code is not very efficient (it’s
>
> >> interpreted) and most people only use Arrow to read data but don’t need
>
> >> compute. There are also more complete alternatives for doing compute on
>
> >> Arrow data structures such as Arquero (
>
> https://github.com/uwdata/arquero).
>
> >> By removing the compute code, we can focus on the IPC reading/writing
>
> and
>
> >> primitive types.
>
> >>
>
> >> The vote will be open for at least 72 hours.
>
> >>
>
> >> [ ] +1 Remove compute from Arrow JS
>
> >> [ ] +0
>
> >> [ ] -1 Do not remove compute because…
>
> >>
>
> >> Thank you,
>
> >> Dominik
>
> >>
>
> >
>
>
>

Re: [VOTE] Remove compute from Arrow JS

Posted by Micah Kornfield <em...@gmail.com>.
I'd suggest maybe two things:
1.  If possible add deprecation warnings for the next release, and delete
the release after (we don't have a formal policy but if would be good to
give user heads up before out-right deletion).
2.  If 1 isn't an option then please add something to the 6.0.0 release
notes indicating the removal.

Cheers,
Micah

On Tue, Nov 2, 2021 at 1:54 PM Dominik Moritz <do...@apache.org> wrote:

> +1 from me as well.
>
> That brings us to 3 times +1 and no -1 or +0.
>
> Thank you, all. We will remove the compute code in the next Arrow version.
>
> On Nov 2, 2021 at 16:12:02, Paul Taylor <pt...@gmail.com> wrote:
>
> > +1 from me as well
> >
> > On Oct 27, 2021, at 6:58 PM, Brian Hulette <bh...@apache.org> wrote:
> >
> > 
> > +1
> >
> > I don't think there's much reason to keep the compute code around when
> > there's a more performant, easier to use alternative. I think the only
> > unique feature of the arrow compute code was the ability to optimize
> > queries on dictionary-encoded columns, but Jeff added this to Arquero
> > almost a year ago now [1].
> >
> > Brian
> >
> > [1] https://github.com/uwdata/arquero/issues/86
> >
> > On Wed, Oct 27, 2021 at 4:46 PM Dominik Moritz <do...@apache.org>
> > wrote:
> >
> >> Dear Arrow community,
> >>
> >> We are proposing to remove the compute code from Arrow JS. Right now,
> the
> >> compute code is encapsulated in a DataFrame class that extends Table.
> The
> >> DataFrame implements a few functions such as filtering and counting with
> >> expressions. However, the predicate code is not very efficient (it’s
> >> interpreted) and most people only use Arrow to read data but don’t need
> >> compute. There are also more complete alternatives for doing compute on
> >> Arrow data structures such as Arquero (
> https://github.com/uwdata/arquero).
> >> By removing the compute code, we can focus on the IPC reading/writing
> and
> >> primitive types.
> >>
> >> The vote will be open for at least 72 hours.
> >>
> >> [ ] +1 Remove compute from Arrow JS
> >> [ ] +0
> >> [ ] -1 Do not remove compute because…
> >>
> >> Thank you,
> >> Dominik
> >>
> >
>

Re: [VOTE] Remove compute from Arrow JS

Posted by Dominik Moritz <do...@apache.org>.
+1 from me as well.

That brings us to 3 times +1 and no -1 or +0.

Thank you, all. We will remove the compute code in the next Arrow version.

On Nov 2, 2021 at 16:12:02, Paul Taylor <pt...@gmail.com> wrote:

> +1 from me as well
>
> On Oct 27, 2021, at 6:58 PM, Brian Hulette <bh...@apache.org> wrote:
>
> 
> +1
>
> I don't think there's much reason to keep the compute code around when
> there's a more performant, easier to use alternative. I think the only
> unique feature of the arrow compute code was the ability to optimize
> queries on dictionary-encoded columns, but Jeff added this to Arquero
> almost a year ago now [1].
>
> Brian
>
> [1] https://github.com/uwdata/arquero/issues/86
>
> On Wed, Oct 27, 2021 at 4:46 PM Dominik Moritz <do...@apache.org>
> wrote:
>
>> Dear Arrow community,
>>
>> We are proposing to remove the compute code from Arrow JS. Right now, the
>> compute code is encapsulated in a DataFrame class that extends Table. The
>> DataFrame implements a few functions such as filtering and counting with
>> expressions. However, the predicate code is not very efficient (it’s
>> interpreted) and most people only use Arrow to read data but don’t need
>> compute. There are also more complete alternatives for doing compute on
>> Arrow data structures such as Arquero (https://github.com/uwdata/arquero).
>> By removing the compute code, we can focus on the IPC reading/writing and
>> primitive types.
>>
>> The vote will be open for at least 72 hours.
>>
>> [ ] +1 Remove compute from Arrow JS
>> [ ] +0
>> [ ] -1 Do not remove compute because…
>>
>> Thank you,
>> Dominik
>>
>