You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by Hanifi Gunes <hg...@maprtech.com> on 2015/06/01 21:55:12 UTC

Re: question about correlated arrays and flatten

Idea of having functional primitives with Drill sounds really handy. It
would be great if we could support left-right folding as well. I can see
many great use cases of project/map, fold/reduce, zip, flatten when
combined.

On Sat, May 30, 2015 at 12:57 AM, Ted Dunning <te...@gmail.com> wrote:

> OK.  I will file a JIRA for a zip function.  No idea if I will be able to
> get one written in the available cracks of time.
>
>
>
> On Fri, May 29, 2015 at 7:17 PM, Steven Phillips <sp...@maprtech.com>
> wrote:
>
> > I think your use case could be solved by adding a UDF that can combine
> > multiple arrays into a single array. The result of this function could
> then
> > be handled by our current implementation of flatten.
> >
> > I think this is preferable to enhancing flatten itself to handle it,
> since
> > flatten is not an ordinary UDF, and thus more difficult to modify and
> > maintain.
> >
> > On Fri, May 29, 2015 at 3:20 PM, Ted Dunning <te...@gmail.com>
> > wrote:
> >
> > > My particular use case can throw an error if the lists are different
> > > length.
> > >
> > > I think our real goal should be to have a logically complete set of
> > simple
> > > primitives that lets any sort of back and forward conversions of this
> > kind.
> > >
> > >
> > >
> > >
> > > On Fri, May 29, 2015 at 9:58 AM, Jason Altekruse <
> > altekrusejason@gmail.com
> > > >
> > > wrote:
> > >
> > > > I understand what you want to do, unfortunately we don't have support
> > for
> > > > this right now. A UDF is the best I can suggest at this point.
> > > >
> > > > Just to explore the idea a little further for the sake of creating a
> > > > complete feature request, I assume you would just want nulls filled
> in
> > > for
> > > > the cases where the lists were different lengths?
> > > >
> > > > On Fri, May 29, 2015 at 8:58 AM, Ted Dunning <te...@gmail.com>
> > > > wrote:
> > > >
> > > > > Input is here:
> https://gist.github.com/tdunning/07ce66e7e4d4af41afd7
> > > > >
> > > > > Output is here:
> > https://gist.github.com/tdunning/3aa841c56bfcdc0ab90e
> > > > >
> > > > > log-synth schema for generating input data is here:
> > > > > https://gist.github.com/tdunning/638dd52c00569ffa9582
> > > > >
> > > > >
> > > > > Preferred syntax would be like
> > > > >
> > > > > select flatten(t, v1, v2) from ...
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Fri, May 29, 2015 at 7:04 AM, Neeraja Rentachintala <
> > > > > nrentachintala@maprtech.com> wrote:
> > > > >
> > > > > > Ted
> > > > > > can you pls give an example with few data elements in a, b and
> the
> > > > > expected
> > > > > > output you are looking from the query.
> > > > > >
> > > > > > -Neeraja
> > > > > >
> > > > > > On Fri, May 29, 2015 at 6:43 AM, Ted Dunning <
> > ted.dunning@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > I have two arrays.  Their elements are correlated times and
> > values.
> > > > I
> > > > > > > would like to flatten them into rows, each with two elements.
> > > > > > >
> > > > > > > The query
> > > > > > >
> > > > > > >    select flatten(a), flatten(b) from ...
> > > > > > >
> > > > > > > doesn't work because I get the cartesian product (of course).
> > The
> > > > > query
> > > > > > >
> > > > > > >    select flatten(a, b) from ...
> > > > > > >
> > > > > > > also doesn't work because flatten doesn't have a multi-argument
> > > form.
> > > > > > >
> > > > > > > Going crazy, this query kind of sort of almost works, but not
> > > really:
> > > > > > >
> > > > > > >      select r.x.`key`, flatten(r.x.`value`)  from (
> > > > > > >
> > > > > > >          select flatten(kvgen(x)) as x from ...) r;
> > > > > > >
> > > > > > > What I really want to see is something like this:
> > > > > > >    select zip(flatten(a), flatten(b)) from ...
> > > > > > >
> > > > > > > Any pointers?  Is my next step to write a UDF?
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> >  Steven Phillips
> >  Software Engineer
> >
> >  mapr.com
> >
>

Re: question about correlated arrays and flatten

Posted by Hanifi Gunes <hg...@maprtech.com>.
That's right. I guess that's what I am proposing to have here implicitly. I
am not sure how feasible this would be, however, we should be able to
interpret inline lambda like expressions. This is something to discuss as
we improve Drill's complex data handling capabilities. I see a great value
added here - especially for computationally-intense workloads.

select fold(t.numbers, 0, (r, c) => r + c), map(t.numbers, (n) => n*n) from
dfs.`some/table` t

-Hanifi

On Mon, Jun 1, 2015 at 3:28 PM, Ted Dunning <te...@gmail.com> wrote:

> How could we make functional primitives work without lambda?
>
>
>
> On Mon, Jun 1, 2015 at 9:55 PM, Hanifi Gunes <hg...@maprtech.com> wrote:
>
> > Idea of having functional primitives with Drill sounds really handy. It
> > would be great if we could support left-right folding as well. I can see
> > many great use cases of project/map, fold/reduce, zip, flatten when
> > combined.
> >
> > On Sat, May 30, 2015 at 12:57 AM, Ted Dunning <te...@gmail.com>
> > wrote:
> >
> > > OK.  I will file a JIRA for a zip function.  No idea if I will be able
> to
> > > get one written in the available cracks of time.
> > >
> > >
> > >
> > > On Fri, May 29, 2015 at 7:17 PM, Steven Phillips <
> sphillips@maprtech.com
> > >
> > > wrote:
> > >
> > > > I think your use case could be solved by adding a UDF that can
> combine
> > > > multiple arrays into a single array. The result of this function
> could
> > > then
> > > > be handled by our current implementation of flatten.
> > > >
> > > > I think this is preferable to enhancing flatten itself to handle it,
> > > since
> > > > flatten is not an ordinary UDF, and thus more difficult to modify and
> > > > maintain.
> > > >
> > > > On Fri, May 29, 2015 at 3:20 PM, Ted Dunning <te...@gmail.com>
> > > > wrote:
> > > >
> > > > > My particular use case can throw an error if the lists are
> different
> > > > > length.
> > > > >
> > > > > I think our real goal should be to have a logically complete set of
> > > > simple
> > > > > primitives that lets any sort of back and forward conversions of
> this
> > > > kind.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Fri, May 29, 2015 at 9:58 AM, Jason Altekruse <
> > > > altekrusejason@gmail.com
> > > > > >
> > > > > wrote:
> > > > >
> > > > > > I understand what you want to do, unfortunately we don't have
> > support
> > > > for
> > > > > > this right now. A UDF is the best I can suggest at this point.
> > > > > >
> > > > > > Just to explore the idea a little further for the sake of
> creating
> > a
> > > > > > complete feature request, I assume you would just want nulls
> filled
> > > in
> > > > > for
> > > > > > the cases where the lists were different lengths?
> > > > > >
> > > > > > On Fri, May 29, 2015 at 8:58 AM, Ted Dunning <
> > ted.dunning@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Input is here:
> > > https://gist.github.com/tdunning/07ce66e7e4d4af41afd7
> > > > > > >
> > > > > > > Output is here:
> > > > https://gist.github.com/tdunning/3aa841c56bfcdc0ab90e
> > > > > > >
> > > > > > > log-synth schema for generating input data is here:
> > > > > > > https://gist.github.com/tdunning/638dd52c00569ffa9582
> > > > > > >
> > > > > > >
> > > > > > > Preferred syntax would be like
> > > > > > >
> > > > > > > select flatten(t, v1, v2) from ...
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Fri, May 29, 2015 at 7:04 AM, Neeraja Rentachintala <
> > > > > > > nrentachintala@maprtech.com> wrote:
> > > > > > >
> > > > > > > > Ted
> > > > > > > > can you pls give an example with few data elements in a, b
> and
> > > the
> > > > > > > expected
> > > > > > > > output you are looking from the query.
> > > > > > > >
> > > > > > > > -Neeraja
> > > > > > > >
> > > > > > > > On Fri, May 29, 2015 at 6:43 AM, Ted Dunning <
> > > > ted.dunning@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > I have two arrays.  Their elements are correlated times and
> > > > values.
> > > > > > I
> > > > > > > > > would like to flatten them into rows, each with two
> elements.
> > > > > > > > >
> > > > > > > > > The query
> > > > > > > > >
> > > > > > > > >    select flatten(a), flatten(b) from ...
> > > > > > > > >
> > > > > > > > > doesn't work because I get the cartesian product (of
> course).
> > > > The
> > > > > > > query
> > > > > > > > >
> > > > > > > > >    select flatten(a, b) from ...
> > > > > > > > >
> > > > > > > > > also doesn't work because flatten doesn't have a
> > multi-argument
> > > > > form.
> > > > > > > > >
> > > > > > > > > Going crazy, this query kind of sort of almost works, but
> not
> > > > > really:
> > > > > > > > >
> > > > > > > > >      select r.x.`key`, flatten(r.x.`value`)  from (
> > > > > > > > >
> > > > > > > > >          select flatten(kvgen(x)) as x from ...) r;
> > > > > > > > >
> > > > > > > > > What I really want to see is something like this:
> > > > > > > > >    select zip(flatten(a), flatten(b)) from ...
> > > > > > > > >
> > > > > > > > > Any pointers?  Is my next step to write a UDF?
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > >  Steven Phillips
> > > >  Software Engineer
> > > >
> > > >  mapr.com
> > > >
> > >
> >
>

Re: question about correlated arrays and flatten

Posted by Ted Dunning <te...@gmail.com>.
How could we make functional primitives work without lambda?



On Mon, Jun 1, 2015 at 9:55 PM, Hanifi Gunes <hg...@maprtech.com> wrote:

> Idea of having functional primitives with Drill sounds really handy. It
> would be great if we could support left-right folding as well. I can see
> many great use cases of project/map, fold/reduce, zip, flatten when
> combined.
>
> On Sat, May 30, 2015 at 12:57 AM, Ted Dunning <te...@gmail.com>
> wrote:
>
> > OK.  I will file a JIRA for a zip function.  No idea if I will be able to
> > get one written in the available cracks of time.
> >
> >
> >
> > On Fri, May 29, 2015 at 7:17 PM, Steven Phillips <sphillips@maprtech.com
> >
> > wrote:
> >
> > > I think your use case could be solved by adding a UDF that can combine
> > > multiple arrays into a single array. The result of this function could
> > then
> > > be handled by our current implementation of flatten.
> > >
> > > I think this is preferable to enhancing flatten itself to handle it,
> > since
> > > flatten is not an ordinary UDF, and thus more difficult to modify and
> > > maintain.
> > >
> > > On Fri, May 29, 2015 at 3:20 PM, Ted Dunning <te...@gmail.com>
> > > wrote:
> > >
> > > > My particular use case can throw an error if the lists are different
> > > > length.
> > > >
> > > > I think our real goal should be to have a logically complete set of
> > > simple
> > > > primitives that lets any sort of back and forward conversions of this
> > > kind.
> > > >
> > > >
> > > >
> > > >
> > > > On Fri, May 29, 2015 at 9:58 AM, Jason Altekruse <
> > > altekrusejason@gmail.com
> > > > >
> > > > wrote:
> > > >
> > > > > I understand what you want to do, unfortunately we don't have
> support
> > > for
> > > > > this right now. A UDF is the best I can suggest at this point.
> > > > >
> > > > > Just to explore the idea a little further for the sake of creating
> a
> > > > > complete feature request, I assume you would just want nulls filled
> > in
> > > > for
> > > > > the cases where the lists were different lengths?
> > > > >
> > > > > On Fri, May 29, 2015 at 8:58 AM, Ted Dunning <
> ted.dunning@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Input is here:
> > https://gist.github.com/tdunning/07ce66e7e4d4af41afd7
> > > > > >
> > > > > > Output is here:
> > > https://gist.github.com/tdunning/3aa841c56bfcdc0ab90e
> > > > > >
> > > > > > log-synth schema for generating input data is here:
> > > > > > https://gist.github.com/tdunning/638dd52c00569ffa9582
> > > > > >
> > > > > >
> > > > > > Preferred syntax would be like
> > > > > >
> > > > > > select flatten(t, v1, v2) from ...
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Fri, May 29, 2015 at 7:04 AM, Neeraja Rentachintala <
> > > > > > nrentachintala@maprtech.com> wrote:
> > > > > >
> > > > > > > Ted
> > > > > > > can you pls give an example with few data elements in a, b and
> > the
> > > > > > expected
> > > > > > > output you are looking from the query.
> > > > > > >
> > > > > > > -Neeraja
> > > > > > >
> > > > > > > On Fri, May 29, 2015 at 6:43 AM, Ted Dunning <
> > > ted.dunning@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > I have two arrays.  Their elements are correlated times and
> > > values.
> > > > > I
> > > > > > > > would like to flatten them into rows, each with two elements.
> > > > > > > >
> > > > > > > > The query
> > > > > > > >
> > > > > > > >    select flatten(a), flatten(b) from ...
> > > > > > > >
> > > > > > > > doesn't work because I get the cartesian product (of course).
> > > The
> > > > > > query
> > > > > > > >
> > > > > > > >    select flatten(a, b) from ...
> > > > > > > >
> > > > > > > > also doesn't work because flatten doesn't have a
> multi-argument
> > > > form.
> > > > > > > >
> > > > > > > > Going crazy, this query kind of sort of almost works, but not
> > > > really:
> > > > > > > >
> > > > > > > >      select r.x.`key`, flatten(r.x.`value`)  from (
> > > > > > > >
> > > > > > > >          select flatten(kvgen(x)) as x from ...) r;
> > > > > > > >
> > > > > > > > What I really want to see is something like this:
> > > > > > > >    select zip(flatten(a), flatten(b)) from ...
> > > > > > > >
> > > > > > > > Any pointers?  Is my next step to write a UDF?
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > >  Steven Phillips
> > >  Software Engineer
> > >
> > >  mapr.com
> > >
> >
>