You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@drill.apache.org by Charles Givre <cg...@gmail.com> on 2019/04/11 00:26:04 UTC

Query Question

Hello Drillers,
I have a query question for you.  I have some really ugly data that has a field like this:

compound_field : { “field_1”: [1,2,3],
    “field_2”:[4,5,6]
)

I would like to map fields 1 and 2 to columns so that the end result is:

field1 | field2
1        | 4
2       |  5
3       |  5

I thought flatten() would be the answer, however, if I flatten the columns, I get the following result:

field1 | field2
1       |  4
1       |  5
1       |  6

Does anyone have any suggestions?
Thanks,
—C

Re: Query Question

Posted by Ted Dunning <te...@gmail.com>.

On Fri, Apr 12, 2019 at 9:51 AM Paul Rogers <pa...@yahoo.com.invalid>
wrote:

> Hi All,
>
> The trick here, of course, is that Drill does not have the tuple concept
> of Python:
>
> zip([1, 2, 3], [4, 5, 6]) --> [(1, 4), (2, 5), (3, 6)]
>
>
> There is no good way in Drill, to represent the array of (1, 4), (2, 5)
> pairs. The best we can do is:
>
> * A map of two columns, with element names the same as the original array
> columns.
>

This sounds lovely to me.


> ... Also, since functions in Drill are typed, we'd need a "flatten2"
> function for all supported combinations of array types (int, int), (int,
> Varchar), (Varchar, int), (Varchar, Varchar), etc.
>
> Seems like there is an opportunity here to provide a better way to handle
> these kinds of vector (as in "array") operations.
>

The current situations is *definitely* a pain in the ass. It would be
*lovely* to have a simpler way to build UDFs even at the cost of some
performance.

Re: Query Question

Posted by Ted Dunning <te...@gmail.com>.

On Fri, Apr 12, 2019 at 9:51 AM Paul Rogers <pa...@yahoo.com.invalid>
wrote:

> Hi All,
>
> The trick here, of course, is that Drill does not have the tuple concept
> of Python:
>
> zip([1, 2, 3], [4, 5, 6]) --> [(1, 4), (2, 5), (3, 6)]
>
>
> There is no good way in Drill, to represent the array of (1, 4), (2, 5)
> pairs. The best we can do is:
>
> * A map of two columns, with element names the same as the original array
> columns.
>

This sounds lovely to me.


> ... Also, since functions in Drill are typed, we'd need a "flatten2"
> function for all supported combinations of array types (int, int), (int,
> Varchar), (Varchar, int), (Varchar, Varchar), etc.
>
> Seems like there is an opportunity here to provide a better way to handle
> these kinds of vector (as in "array") operations.
>

The current situations is *definitely* a pain in the ass. It would be
*lovely* to have a simpler way to build UDFs even at the cost of some
performance.

Re: Query Question

Posted by Paul Rogers <pa...@yahoo.com.INVALID>.

Hi All,

The trick here, of course, is that Drill does not have the tuple concept of Python:

zip([1, 2, 3], [4, 5, 6]) --> [(1, 4), (2, 5), (3, 6)]

There is no good way in Drill, to represent the array of (1, 4), (2, 5) pairs. The best we can do is:

* A map of two columns, with element names the same as the original array columns.
* A repeated list (but repeated list is not fully supported by all operators)

Maybe what we need is a "flatten2" function that will take two arguments and do the zip and flatten in a single step.

(a: [1, 3 4], b: [4, 5, 6]) --> flatten2(a, b) -->
(a: 1, b: 4)
(a: 2, b: 5)
(a: 3, b 6)

Ideally we'd want a "flattenN" that takes any number of arguments, but I think that may not be achievable due to the nuances of code generation, etc.

Also, since functions in Drill are typed, we'd need a "flatten2" function for all supported combinations of array types (int, int), (int, Varchar), (Varchar, int), (Varchar, Varchar), etc.

Seems like there is an opportunity here to provide a better way to handle these kinds of vector (as in "array") operations.

Thanks,
- Paul

    On Thursday, April 11, 2019, 5:49:14 PM PDT, Ted Dunning <te...@gmail.com> wrote:  

 The semantics for zip with different length arguments tend to be either
ignore tail of longer argument
<https://docs.python.org/3.3/library/functions.html#zip> as Python does
with zip or to reuse shorter arguments to fill out to the length of the
longest argument as R does with cbind and rbind
<https://www.rdocumentation.org/packages/base/versions/3.5.3/topics/cbind>.
R is nice about returning with a warning if the reused value doesn't come
out even at the end.

> cbind(c(1,2), c(4,5,6))
>      [,1] [,2]
> [1,]    1    4
> [2,]    2    5
> [3,]    1    6
> Warning message:
> In cbind(c(1, 2), c(4, 5, 6)) :
>  number of rows of result is not a multiple of vector length (arg 1)

I think that either definition is fine, but that python's truncate style is
probably easier and makes more sense in a database environment. The most
common use case for R's semantics is to build tables with rows that have
all combinations of sets of values (i.e. the cross product). IN a database,
we already have a better mechanism to build the cross product so having zip
behave like Python is nice.

On Thu, Apr 11, 2019 at 8:40 AM Aman Sinha <am...@gmail.com> wrote:

> > I thought flatten() would be the answer, however, if I flatten the
> columns, I get the following result:
>
> Regarding the flatten() output, this is expected because doing a 'SELECT
> flatten(a),  flatten(b) FROM T'  is equivalent to doing a cross-product of
> the 2 arrays.
>
> In your example, both arrays are the same length, but what would you expect
> the output to be if they were different ?  I don't see a direct SQL way of
> doing it but
> even with UDFs the semantics should be defined.
>
> Aman
>
> On Thu, Apr 11, 2019 at 6:37 AM Charles Givre <cg...@gmail.com> wrote:
>
> > That’s a good idea.  I’ll work on a equivalent ZIP() function and submit
> > as a separate PR.
> > — C
> >
> > > On Apr 10, 2019, at 20:44, Paul Rogers <pa...@yahoo.com.INVALID>
> > wrote:
> > >
> > > Hi Charles,
> > >
> > > In Python [1], the "zip" function does this task:
> > >
> > >
> > > zip([1, 2, 3], [4, 5, 6]) --> [(1, 4), (2, 5), (3, 6)]
> > >
> > >
> > > When you gathered the list of functions for the Drill book, did you
> come
> > across anything like this in Drill? I presume you didn't, hence the
> > question. I did a quick (incomplete) check and didn't see any likely
> > candidates.
> > >
> > > Perhaps you could create such a function.
> > >
> > > Once you have the zipped result, you could flatten to get the pairs as
> > rows.
> > >
> > >
> > > Thanks,
> > > - Paul
> > >
> > >
> > >
> > >    On Wednesday, April 10, 2019, 5:26:10 PM PDT, Charles Givre <
> > cgivre@gmail.com> wrote:
> > >
> > > Hello Drillers,
> > > I have a query question for you.  I have some really ugly data that has
> > a field like this:
> > >
> > > compound_field : { “field_1”: [1,2,3],
> > >    “field_2”:[4,5,6]
> > > )
> > >
> > > I would like to map fields 1 and 2 to columns so that the end result
> is:
> > >
> > > field1 | field2
> > > 1        | 4
> > > 2      |  5
> > > 3      |  5
> > >
> > > I thought flatten() would be the answer, however, if I flatten the
> > columns, I get the following result:
> > >
> > > field1 | field2
> > > 1      |  4
> > > 1      |  5
> > > 1      |  6
> > >
> > > Does anyone have any suggestions?
> > > Thanks,
> > > —C
> >
> >
>

Re: Query Question

Posted by Paul Rogers <pa...@yahoo.com.INVALID>.

Hi All,

The trick here, of course, is that Drill does not have the tuple concept of Python:

zip([1, 2, 3], [4, 5, 6]) --> [(1, 4), (2, 5), (3, 6)]

There is no good way in Drill, to represent the array of (1, 4), (2, 5) pairs. The best we can do is:

* A map of two columns, with element names the same as the original array columns.
* A repeated list (but repeated list is not fully supported by all operators)

Maybe what we need is a "flatten2" function that will take two arguments and do the zip and flatten in a single step.

(a: [1, 3 4], b: [4, 5, 6]) --> flatten2(a, b) -->
(a: 1, b: 4)
(a: 2, b: 5)
(a: 3, b 6)

Ideally we'd want a "flattenN" that takes any number of arguments, but I think that may not be achievable due to the nuances of code generation, etc.

Also, since functions in Drill are typed, we'd need a "flatten2" function for all supported combinations of array types (int, int), (int, Varchar), (Varchar, int), (Varchar, Varchar), etc.

Seems like there is an opportunity here to provide a better way to handle these kinds of vector (as in "array") operations.

Thanks,
- Paul

    On Thursday, April 11, 2019, 5:49:14 PM PDT, Ted Dunning <te...@gmail.com> wrote:  

 The semantics for zip with different length arguments tend to be either
ignore tail of longer argument
<https://docs.python.org/3.3/library/functions.html#zip> as Python does
with zip or to reuse shorter arguments to fill out to the length of the
longest argument as R does with cbind and rbind
<https://www.rdocumentation.org/packages/base/versions/3.5.3/topics/cbind>.
R is nice about returning with a warning if the reused value doesn't come
out even at the end.

> cbind(c(1,2), c(4,5,6))
>      [,1] [,2]
> [1,]    1    4
> [2,]    2    5
> [3,]    1    6
> Warning message:
> In cbind(c(1, 2), c(4, 5, 6)) :
>  number of rows of result is not a multiple of vector length (arg 1)

I think that either definition is fine, but that python's truncate style is
probably easier and makes more sense in a database environment. The most
common use case for R's semantics is to build tables with rows that have
all combinations of sets of values (i.e. the cross product). IN a database,
we already have a better mechanism to build the cross product so having zip
behave like Python is nice.

On Thu, Apr 11, 2019 at 8:40 AM Aman Sinha <am...@gmail.com> wrote:

> > I thought flatten() would be the answer, however, if I flatten the
> columns, I get the following result:
>
> Regarding the flatten() output, this is expected because doing a 'SELECT
> flatten(a),  flatten(b) FROM T'  is equivalent to doing a cross-product of
> the 2 arrays.
>
> In your example, both arrays are the same length, but what would you expect
> the output to be if they were different ?  I don't see a direct SQL way of
> doing it but
> even with UDFs the semantics should be defined.
>
> Aman
>
> On Thu, Apr 11, 2019 at 6:37 AM Charles Givre <cg...@gmail.com> wrote:
>
> > That’s a good idea.  I’ll work on a equivalent ZIP() function and submit
> > as a separate PR.
> > — C
> >
> > > On Apr 10, 2019, at 20:44, Paul Rogers <pa...@yahoo.com.INVALID>
> > wrote:
> > >
> > > Hi Charles,
> > >
> > > In Python [1], the "zip" function does this task:
> > >
> > >
> > > zip([1, 2, 3], [4, 5, 6]) --> [(1, 4), (2, 5), (3, 6)]
> > >
> > >
> > > When you gathered the list of functions for the Drill book, did you
> come
> > across anything like this in Drill? I presume you didn't, hence the
> > question. I did a quick (incomplete) check and didn't see any likely
> > candidates.
> > >
> > > Perhaps you could create such a function.
> > >
> > > Once you have the zipped result, you could flatten to get the pairs as
> > rows.
> > >
> > >
> > > Thanks,
> > > - Paul
> > >
> > >
> > >
> > >    On Wednesday, April 10, 2019, 5:26:10 PM PDT, Charles Givre <
> > cgivre@gmail.com> wrote:
> > >
> > > Hello Drillers,
> > > I have a query question for you.  I have some really ugly data that has
> > a field like this:
> > >
> > > compound_field : { “field_1”: [1,2,3],
> > >    “field_2”:[4,5,6]
> > > )
> > >
> > > I would like to map fields 1 and 2 to columns so that the end result
> is:
> > >
> > > field1 | field2
> > > 1        | 4
> > > 2      |  5
> > > 3      |  5
> > >
> > > I thought flatten() would be the answer, however, if I flatten the
> > columns, I get the following result:
> > >
> > > field1 | field2
> > > 1      |  4
> > > 1      |  5
> > > 1      |  6
> > >
> > > Does anyone have any suggestions?
> > > Thanks,
> > > —C
> >
> >
>

Re: Query Question

Posted by Ted Dunning <te...@gmail.com>.

The semantics for zip with different length arguments tend to be either
ignore tail of longer argument
<https://docs.python.org/3.3/library/functions.html#zip> as Python does
with zip or to reuse shorter arguments to fill out to the length of the
longest argument as R does with cbind and rbind
<https://www.rdocumentation.org/packages/base/versions/3.5.3/topics/cbind>.
R is nice about returning with a warning if the reused value doesn't come
out even at the end.

> cbind(c(1,2), c(4,5,6))
>      [,1] [,2]
> [1,]    1    4
> [2,]    2    5
> [3,]    1    6
> Warning message:
> In cbind(c(1, 2), c(4, 5, 6)) :
>   number of rows of result is not a multiple of vector length (arg 1)


I think that either definition is fine, but that python's truncate style is
probably easier and makes more sense in a database environment. The most
common use case for R's semantics is to build tables with rows that have
all combinations of sets of values (i.e. the cross product). IN a database,
we already have a better mechanism to build the cross product so having zip
behave like Python is nice.


On Thu, Apr 11, 2019 at 8:40 AM Aman Sinha <am...@gmail.com> wrote:

> > I thought flatten() would be the answer, however, if I flatten the
> columns, I get the following result:
>
> Regarding the flatten() output, this is expected because doing a 'SELECT
> flatten(a),  flatten(b) FROM T'  is equivalent to doing a cross-product of
> the 2 arrays.
>
> In your example, both arrays are the same length, but what would you expect
> the output to be if they were different ?   I don't see a direct SQL way of
> doing it but
> even with UDFs the semantics should be defined.
>
> Aman
>
> On Thu, Apr 11, 2019 at 6:37 AM Charles Givre <cg...@gmail.com> wrote:
>
> > That’s a good idea.  I’ll work on a equivalent ZIP() function and submit
> > as a separate PR.
> > — C
> >
> > > On Apr 10, 2019, at 20:44, Paul Rogers <pa...@yahoo.com.INVALID>
> > wrote:
> > >
> > > Hi Charles,
> > >
> > > In Python [1], the "zip" function does this task:
> > >
> > >
> > > zip([1, 2, 3], [4, 5, 6]) --> [(1, 4), (2, 5), (3, 6)]
> > >
> > >
> > > When you gathered the list of functions for the Drill book, did you
> come
> > across anything like this in Drill? I presume you didn't, hence the
> > question. I did a quick (incomplete) check and didn't see any likely
> > candidates.
> > >
> > > Perhaps you could create such a function.
> > >
> > > Once you have the zipped result, you could flatten to get the pairs as
> > rows.
> > >
> > >
> > > Thanks,
> > > - Paul
> > >
> > >
> > >
> > >    On Wednesday, April 10, 2019, 5:26:10 PM PDT, Charles Givre <
> > cgivre@gmail.com> wrote:
> > >
> > > Hello Drillers,
> > > I have a query question for you.  I have some really ugly data that has
> > a field like this:
> > >
> > > compound_field : { “field_1”: [1,2,3],
> > >     “field_2”:[4,5,6]
> > > )
> > >
> > > I would like to map fields 1 and 2 to columns so that the end result
> is:
> > >
> > > field1 | field2
> > > 1        | 4
> > > 2      |  5
> > > 3      |  5
> > >
> > > I thought flatten() would be the answer, however, if I flatten the
> > columns, I get the following result:
> > >
> > > field1 | field2
> > > 1      |  4
> > > 1      |  5
> > > 1      |  6
> > >
> > > Does anyone have any suggestions?
> > > Thanks,
> > > —C
> >
> >
>

Re: Query Question

Posted by Ted Dunning <te...@gmail.com>.

The semantics for zip with different length arguments tend to be either
ignore tail of longer argument
<https://docs.python.org/3.3/library/functions.html#zip> as Python does
with zip or to reuse shorter arguments to fill out to the length of the
longest argument as R does with cbind and rbind
<https://www.rdocumentation.org/packages/base/versions/3.5.3/topics/cbind>.
R is nice about returning with a warning if the reused value doesn't come
out even at the end.

> cbind(c(1,2), c(4,5,6))
>      [,1] [,2]
> [1,]    1    4
> [2,]    2    5
> [3,]    1    6
> Warning message:
> In cbind(c(1, 2), c(4, 5, 6)) :
>   number of rows of result is not a multiple of vector length (arg 1)


I think that either definition is fine, but that python's truncate style is
probably easier and makes more sense in a database environment. The most
common use case for R's semantics is to build tables with rows that have
all combinations of sets of values (i.e. the cross product). IN a database,
we already have a better mechanism to build the cross product so having zip
behave like Python is nice.


On Thu, Apr 11, 2019 at 8:40 AM Aman Sinha <am...@gmail.com> wrote:

> > I thought flatten() would be the answer, however, if I flatten the
> columns, I get the following result:
>
> Regarding the flatten() output, this is expected because doing a 'SELECT
> flatten(a),  flatten(b) FROM T'  is equivalent to doing a cross-product of
> the 2 arrays.
>
> In your example, both arrays are the same length, but what would you expect
> the output to be if they were different ?   I don't see a direct SQL way of
> doing it but
> even with UDFs the semantics should be defined.
>
> Aman
>
> On Thu, Apr 11, 2019 at 6:37 AM Charles Givre <cg...@gmail.com> wrote:
>
> > That’s a good idea.  I’ll work on a equivalent ZIP() function and submit
> > as a separate PR.
> > — C
> >
> > > On Apr 10, 2019, at 20:44, Paul Rogers <pa...@yahoo.com.INVALID>
> > wrote:
> > >
> > > Hi Charles,
> > >
> > > In Python [1], the "zip" function does this task:
> > >
> > >
> > > zip([1, 2, 3], [4, 5, 6]) --> [(1, 4), (2, 5), (3, 6)]
> > >
> > >
> > > When you gathered the list of functions for the Drill book, did you
> come
> > across anything like this in Drill? I presume you didn't, hence the
> > question. I did a quick (incomplete) check and didn't see any likely
> > candidates.
> > >
> > > Perhaps you could create such a function.
> > >
> > > Once you have the zipped result, you could flatten to get the pairs as
> > rows.
> > >
> > >
> > > Thanks,
> > > - Paul
> > >
> > >
> > >
> > >    On Wednesday, April 10, 2019, 5:26:10 PM PDT, Charles Givre <
> > cgivre@gmail.com> wrote:
> > >
> > > Hello Drillers,
> > > I have a query question for you.  I have some really ugly data that has
> > a field like this:
> > >
> > > compound_field : { “field_1”: [1,2,3],
> > >     “field_2”:[4,5,6]
> > > )
> > >
> > > I would like to map fields 1 and 2 to columns so that the end result
> is:
> > >
> > > field1 | field2
> > > 1        | 4
> > > 2      |  5
> > > 3      |  5
> > >
> > > I thought flatten() would be the answer, however, if I flatten the
> > columns, I get the following result:
> > >
> > > field1 | field2
> > > 1      |  4
> > > 1      |  5
> > > 1      |  6
> > >
> > > Does anyone have any suggestions?
> > > Thanks,
> > > —C
> >
> >
>

Re: Query Question

Posted by Aman Sinha <am...@gmail.com>.

> I thought flatten() would be the answer, however, if I flatten the
columns, I get the following result:

Regarding the flatten() output, this is expected because doing a 'SELECT
flatten(a),  flatten(b) FROM T'  is equivalent to doing a cross-product of
the 2 arrays.

In your example, both arrays are the same length, but what would you expect
the output to be if they were different ?   I don't see a direct SQL way of
doing it but
even with UDFs the semantics should be defined.

Aman

On Thu, Apr 11, 2019 at 6:37 AM Charles Givre <cg...@gmail.com> wrote:

> That’s a good idea.  I’ll work on a equivalent ZIP() function and submit
> as a separate PR.
> — C
>
> > On Apr 10, 2019, at 20:44, Paul Rogers <pa...@yahoo.com.INVALID>
> wrote:
> >
> > Hi Charles,
> >
> > In Python [1], the "zip" function does this task:
> >
> >
> > zip([1, 2, 3], [4, 5, 6]) --> [(1, 4), (2, 5), (3, 6)]
> >
> >
> > When you gathered the list of functions for the Drill book, did you come
> across anything like this in Drill? I presume you didn't, hence the
> question. I did a quick (incomplete) check and didn't see any likely
> candidates.
> >
> > Perhaps you could create such a function.
> >
> > Once you have the zipped result, you could flatten to get the pairs as
> rows.
> >
> >
> > Thanks,
> > - Paul
> >
> >
> >
> >    On Wednesday, April 10, 2019, 5:26:10 PM PDT, Charles Givre <
> cgivre@gmail.com> wrote:
> >
> > Hello Drillers,
> > I have a query question for you.  I have some really ugly data that has
> a field like this:
> >
> > compound_field : { “field_1”: [1,2,3],
> >     “field_2”:[4,5,6]
> > )
> >
> > I would like to map fields 1 and 2 to columns so that the end result is:
> >
> > field1 | field2
> > 1        | 4
> > 2      |  5
> > 3      |  5
> >
> > I thought flatten() would be the answer, however, if I flatten the
> columns, I get the following result:
> >
> > field1 | field2
> > 1      |  4
> > 1      |  5
> > 1      |  6
> >
> > Does anyone have any suggestions?
> > Thanks,
> > —C
>
>

Re: Query Question

Posted by Aman Sinha <am...@gmail.com>.

> I thought flatten() would be the answer, however, if I flatten the
columns, I get the following result:

Regarding the flatten() output, this is expected because doing a 'SELECT
flatten(a),  flatten(b) FROM T'  is equivalent to doing a cross-product of
the 2 arrays.

In your example, both arrays are the same length, but what would you expect
the output to be if they were different ?   I don't see a direct SQL way of
doing it but
even with UDFs the semantics should be defined.

Aman

On Thu, Apr 11, 2019 at 6:37 AM Charles Givre <cg...@gmail.com> wrote:

> That’s a good idea.  I’ll work on a equivalent ZIP() function and submit
> as a separate PR.
> — C
>
> > On Apr 10, 2019, at 20:44, Paul Rogers <pa...@yahoo.com.INVALID>
> wrote:
> >
> > Hi Charles,
> >
> > In Python [1], the "zip" function does this task:
> >
> >
> > zip([1, 2, 3], [4, 5, 6]) --> [(1, 4), (2, 5), (3, 6)]
> >
> >
> > When you gathered the list of functions for the Drill book, did you come
> across anything like this in Drill? I presume you didn't, hence the
> question. I did a quick (incomplete) check and didn't see any likely
> candidates.
> >
> > Perhaps you could create such a function.
> >
> > Once you have the zipped result, you could flatten to get the pairs as
> rows.
> >
> >
> > Thanks,
> > - Paul
> >
> >
> >
> >    On Wednesday, April 10, 2019, 5:26:10 PM PDT, Charles Givre <
> cgivre@gmail.com> wrote:
> >
> > Hello Drillers,
> > I have a query question for you.  I have some really ugly data that has
> a field like this:
> >
> > compound_field : { “field_1”: [1,2,3],
> >     “field_2”:[4,5,6]
> > )
> >
> > I would like to map fields 1 and 2 to columns so that the end result is:
> >
> > field1 | field2
> > 1        | 4
> > 2      |  5
> > 3      |  5
> >
> > I thought flatten() would be the answer, however, if I flatten the
> columns, I get the following result:
> >
> > field1 | field2
> > 1      |  4
> > 1      |  5
> > 1      |  6
> >
> > Does anyone have any suggestions?
> > Thanks,
> > —C
>
>

Re: Query Question

Posted by Charles Givre <cg...@gmail.com>.

That’s a good idea.  I’ll work on a equivalent ZIP() function and submit as a separate PR.
— C

> On Apr 10, 2019, at 20:44, Paul Rogers <pa...@yahoo.com.INVALID> wrote:
> 
> Hi Charles,
> 
> In Python [1], the "zip" function does this task:
> 
> 
> zip([1, 2, 3], [4, 5, 6]) --> [(1, 4), (2, 5), (3, 6)]
> 
> 
> When you gathered the list of functions for the Drill book, did you come across anything like this in Drill? I presume you didn't, hence the question. I did a quick (incomplete) check and didn't see any likely candidates.
> 
> Perhaps you could create such a function.
> 
> Once you have the zipped result, you could flatten to get the pairs as rows.
> 
> 
> Thanks,
> - Paul
> 
> 
> 
>    On Wednesday, April 10, 2019, 5:26:10 PM PDT, Charles Givre <cg...@gmail.com> wrote:  
> 
> Hello Drillers,
> I have a query question for you.  I have some really ugly data that has a field like this:
> 
> compound_field : { “field_1”: [1,2,3],
>     “field_2”:[4,5,6]
> )
> 
> I would like to map fields 1 and 2 to columns so that the end result is:
> 
> field1 | field2
> 1        | 4
> 2      |  5
> 3      |  5
> 
> I thought flatten() would be the answer, however, if I flatten the columns, I get the following result:
> 
> field1 | field2
> 1      |  4
> 1      |  5
> 1      |  6
> 
> Does anyone have any suggestions?
> Thanks,
> —C

Re: Query Question

Posted by Charles Givre <cg...@gmail.com>.

That’s a good idea.  I’ll work on a equivalent ZIP() function and submit as a separate PR.
— C

> On Apr 10, 2019, at 20:44, Paul Rogers <pa...@yahoo.com.INVALID> wrote:
> 
> Hi Charles,
> 
> In Python [1], the "zip" function does this task:
> 
> 
> zip([1, 2, 3], [4, 5, 6]) --> [(1, 4), (2, 5), (3, 6)]
> 
> 
> When you gathered the list of functions for the Drill book, did you come across anything like this in Drill? I presume you didn't, hence the question. I did a quick (incomplete) check and didn't see any likely candidates.
> 
> Perhaps you could create such a function.
> 
> Once you have the zipped result, you could flatten to get the pairs as rows.
> 
> 
> Thanks,
> - Paul
> 
> 
> 
>    On Wednesday, April 10, 2019, 5:26:10 PM PDT, Charles Givre <cg...@gmail.com> wrote:  
> 
> Hello Drillers,
> I have a query question for you.  I have some really ugly data that has a field like this:
> 
> compound_field : { “field_1”: [1,2,3],
>     “field_2”:[4,5,6]
> )
> 
> I would like to map fields 1 and 2 to columns so that the end result is:
> 
> field1 | field2
> 1        | 4
> 2      |  5
> 3      |  5
> 
> I thought flatten() would be the answer, however, if I flatten the columns, I get the following result:
> 
> field1 | field2
> 1      |  4
> 1      |  5
> 1      |  6
> 
> Does anyone have any suggestions?
> Thanks,
> —C

Re: Query Question

Posted by Paul Rogers <pa...@yahoo.com.INVALID>.

Hi Charles,

In Python [1], the "zip" function does this task:


 zip([1, 2, 3], [4, 5, 6]) --> [(1, 4), (2, 5), (3, 6)]


When you gathered the list of functions for the Drill book, did you come across anything like this in Drill? I presume you didn't, hence the question. I did a quick (incomplete) check and didn't see any likely candidates.

Perhaps you could create such a function.

Once you have the zipped result, you could flatten to get the pairs as rows.


Thanks,
- Paul

 

    On Wednesday, April 10, 2019, 5:26:10 PM PDT, Charles Givre <cg...@gmail.com> wrote:  
 
 Hello Drillers,
I have a query question for you.  I have some really ugly data that has a field like this:

compound_field : { “field_1”: [1,2,3],
    “field_2”:[4,5,6]
)

I would like to map fields 1 and 2 to columns so that the end result is:

field1 | field2
1        | 4
2      |  5
3      |  5

I thought flatten() would be the answer, however, if I flatten the columns, I get the following result:

field1 | field2
1      |  4
1      |  5
1      |  6

Does anyone have any suggestions?
Thanks,
—C

Re: Query Question

Posted by Paul Rogers <pa...@yahoo.com.INVALID>.

Hi Charles,

In Python [1], the "zip" function does this task:


 zip([1, 2, 3], [4, 5, 6]) --> [(1, 4), (2, 5), (3, 6)]


When you gathered the list of functions for the Drill book, did you come across anything like this in Drill? I presume you didn't, hence the question. I did a quick (incomplete) check and didn't see any likely candidates.

Perhaps you could create such a function.

Once you have the zipped result, you could flatten to get the pairs as rows.


Thanks,
- Paul

 

    On Wednesday, April 10, 2019, 5:26:10 PM PDT, Charles Givre <cg...@gmail.com> wrote:  
 
 Hello Drillers,
I have a query question for you.  I have some really ugly data that has a field like this:

compound_field : { “field_1”: [1,2,3],
    “field_2”:[4,5,6]
)

I would like to map fields 1 and 2 to columns so that the end result is:

field1 | field2
1        | 4
2      |  5
3      |  5

I thought flatten() would be the answer, however, if I flatten the columns, I get the following result:

field1 | field2
1      |  4
1      |  5
1      |  6

Does anyone have any suggestions?
Thanks,
—C