You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Lirong Jian <ji...@gmail.com> on 2020/05/07 07:52:01 UTC

Re: [jira] [Created] (ARROW-7165) [C++] Arrow Compute Group By Support

Any update of this task?

We are very interested in this feature. Thanks.

Lirong Jian
HashData Inc.


Chendi.Xue (Jira) <ji...@apache.org> 于2019年11月14日周四 下午2:02写道:

> Chendi.Xue created ARROW-7165:
> ---------------------------------
>
>              Summary: [C++] Arrow Compute Group By Support
>                  Key: ARROW-7165
>                  URL: https://issues.apache.org/jira/browse/ARROW-7165
>              Project: Apache Arrow
>           Issue Type: New Feature
>           Components: C++ - Compute
>             Reporter: Chendi.Xue
>
>
> Not sure if there is any plan to support groupby in arrow?
>
> Here is some to do in my mind:
>  # To make current arrow/compute/kernels/hash supporting received a
> memo_table as input, so multiple array will be able to get dictencode and
> valuecount based on same hashmap with a unified index.
>  # To add a split array function instead of using take multiple time to
> split one array to several ones.
>  # so the output array can use current funcs under compute/kernels, such
> as sum/count/sort to support group by.
>
> But this is some of my basic idea, wanna know if there is a on going plan
> on this?
>
>
>
> --
> This message was sent by Atlassian Jira
> (v8.3.4#803005)
>

Re: [jira] [Created] (ARROW-7165) [C++] Arrow Compute Group By Support

Posted by Lirong Jian <ji...@gmail.com>.
Thanks.

Lirong Jian
HashData Inc.


Wes McKinney <we...@gmail.com> 于2020年5月7日周四 下午10:44写道:

> Feel free to follow ARROW-5002
>
> On Thu, May 7, 2020 at 2:52 AM Lirong Jian <ji...@gmail.com> wrote:
> >
> > Any update of this task?
> >
> > We are very interested in this feature. Thanks.
> >
> > Lirong Jian
> > HashData Inc.
> >
> >
> > Chendi.Xue (Jira) <ji...@apache.org> 于2019年11月14日周四 下午2:02写道:
> >
> > > Chendi.Xue created ARROW-7165:
> > > ---------------------------------
> > >
> > >              Summary: [C++] Arrow Compute Group By Support
> > >                  Key: ARROW-7165
> > >                  URL: https://issues.apache.org/jira/browse/ARROW-7165
> > >              Project: Apache Arrow
> > >           Issue Type: New Feature
> > >           Components: C++ - Compute
> > >             Reporter: Chendi.Xue
> > >
> > >
> > > Not sure if there is any plan to support groupby in arrow?
> > >
> > > Here is some to do in my mind:
> > >  # To make current arrow/compute/kernels/hash supporting received a
> > > memo_table as input, so multiple array will be able to get dictencode
> and
> > > valuecount based on same hashmap with a unified index.
> > >  # To add a split array function instead of using take multiple time to
> > > split one array to several ones.
> > >  # so the output array can use current funcs under compute/kernels,
> such
> > > as sum/count/sort to support group by.
> > >
> > > But this is some of my basic idea, wanna know if there is a on going
> plan
> > > on this?
> > >
> > >
> > >
> > > --
> > > This message was sent by Atlassian Jira
> > > (v8.3.4#803005)
> > >
>

Re: [jira] [Created] (ARROW-7165) [C++] Arrow Compute Group By Support

Posted by Wes McKinney <we...@gmail.com>.
Feel free to follow ARROW-5002

On Thu, May 7, 2020 at 2:52 AM Lirong Jian <ji...@gmail.com> wrote:
>
> Any update of this task?
>
> We are very interested in this feature. Thanks.
>
> Lirong Jian
> HashData Inc.
>
>
> Chendi.Xue (Jira) <ji...@apache.org> 于2019年11月14日周四 下午2:02写道:
>
> > Chendi.Xue created ARROW-7165:
> > ---------------------------------
> >
> >              Summary: [C++] Arrow Compute Group By Support
> >                  Key: ARROW-7165
> >                  URL: https://issues.apache.org/jira/browse/ARROW-7165
> >              Project: Apache Arrow
> >           Issue Type: New Feature
> >           Components: C++ - Compute
> >             Reporter: Chendi.Xue
> >
> >
> > Not sure if there is any plan to support groupby in arrow?
> >
> > Here is some to do in my mind:
> >  # To make current arrow/compute/kernels/hash supporting received a
> > memo_table as input, so multiple array will be able to get dictencode and
> > valuecount based on same hashmap with a unified index.
> >  # To add a split array function instead of using take multiple time to
> > split one array to several ones.
> >  # so the output array can use current funcs under compute/kernels, such
> > as sum/count/sort to support group by.
> >
> > But this is some of my basic idea, wanna know if there is a on going plan
> > on this?
> >
> >
> >
> > --
> > This message was sent by Atlassian Jira
> > (v8.3.4#803005)
> >