You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by Charles Givre <cg...@gmail.com> on 2019/04/11 00:26:04 UTC
Query Question
Hello Drillers,
I have a query question for you. I have some really ugly data that has a field like this:
compound_field : { “field_1”: [1,2,3],
“field_2”:[4,5,6]
)
I would like to map fields 1 and 2 to columns so that the end result is:
field1 | field2
1 | 4
2 | 5
3 | 5
I thought flatten() would be the answer, however, if I flatten the columns, I get the following result:
field1 | field2
1 | 4
1 | 5
1 | 6
Does anyone have any suggestions?
Thanks,
—C
Re: Query Question
Posted by Ted Dunning <te...@gmail.com>.
On Fri, Apr 12, 2019 at 9:51 AM Paul Rogers <pa...@yahoo.com.invalid>
wrote:
> Hi All,
>
> The trick here, of course, is that Drill does not have the tuple concept
> of Python:
>
> zip([1, 2, 3], [4, 5, 6]) --> [(1, 4), (2, 5), (3, 6)]
>
>
> There is no good way in Drill, to represent the array of (1, 4), (2, 5)
> pairs. The best we can do is:
>
> * A map of two columns, with element names the same as the original array
> columns.
>
This sounds lovely to me.
> ... Also, since functions in Drill are typed, we'd need a "flatten2"
> function for all supported combinations of array types (int, int), (int,
> Varchar), (Varchar, int), (Varchar, Varchar), etc.
>
> Seems like there is an opportunity here to provide a better way to handle
> these kinds of vector (as in "array") operations.
>
The current situations is *definitely* a pain in the ass. It would be
*lovely* to have a simpler way to build UDFs even at the cost of some
performance.
Re: Query Question
Posted by Ted Dunning <te...@gmail.com>.
On Fri, Apr 12, 2019 at 9:51 AM Paul Rogers <pa...@yahoo.com.invalid>
wrote:
> Hi All,
>
> The trick here, of course, is that Drill does not have the tuple concept
> of Python:
>
> zip([1, 2, 3], [4, 5, 6]) --> [(1, 4), (2, 5), (3, 6)]
>
>
> There is no good way in Drill, to represent the array of (1, 4), (2, 5)
> pairs. The best we can do is:
>
> * A map of two columns, with element names the same as the original array
> columns.
>
This sounds lovely to me.
> ... Also, since functions in Drill are typed, we'd need a "flatten2"
> function for all supported combinations of array types (int, int), (int,
> Varchar), (Varchar, int), (Varchar, Varchar), etc.
>
> Seems like there is an opportunity here to provide a better way to handle
> these kinds of vector (as in "array") operations.
>
The current situations is *definitely* a pain in the ass. It would be
*lovely* to have a simpler way to build UDFs even at the cost of some
performance.
Re: Query Question
Posted by Paul Rogers <pa...@yahoo.com.INVALID>.
Hi All,
The trick here, of course, is that Drill does not have the tuple concept of Python:
zip([1, 2, 3], [4, 5, 6]) --> [(1, 4), (2, 5), (3, 6)]
There is no good way in Drill, to represent the array of (1, 4), (2, 5) pairs. The best we can do is:
* A map of two columns, with element names the same as the original array columns.
* A repeated list (but repeated list is not fully supported by all operators)
Maybe what we need is a "flatten2" function that will take two arguments and do the zip and flatten in a single step.
(a: [1, 3 4], b: [4, 5, 6]) --> flatten2(a, b) -->
(a: 1, b: 4)
(a: 2, b: 5)
(a: 3, b 6)
Ideally we'd want a "flattenN" that takes any number of arguments, but I think that may not be achievable due to the nuances of code generation, etc.
Also, since functions in Drill are typed, we'd need a "flatten2" function for all supported combinations of array types (int, int), (int, Varchar), (Varchar, int), (Varchar, Varchar), etc.
Seems like there is an opportunity here to provide a better way to handle these kinds of vector (as in "array") operations.
Thanks,
- Paul
On Thursday, April 11, 2019, 5:49:14 PM PDT, Ted Dunning <te...@gmail.com> wrote:
The semantics for zip with different length arguments tend to be either
ignore tail of longer argument
<https://docs.python.org/3.3/library/functions.html#zip> as Python does
with zip or to reuse shorter arguments to fill out to the length of the
longest argument as R does with cbind and rbind
<https://www.rdocumentation.org/packages/base/versions/3.5.3/topics/cbind>.
R is nice about returning with a warning if the reused value doesn't come
out even at the end.
> cbind(c(1,2), c(4,5,6))
> [,1] [,2]
> [1,] 1 4
> [2,] 2 5
> [3,] 1 6
> Warning message:
> In cbind(c(1, 2), c(4, 5, 6)) :
> number of rows of result is not a multiple of vector length (arg 1)
I think that either definition is fine, but that python's truncate style is
probably easier and makes more sense in a database environment. The most
common use case for R's semantics is to build tables with rows that have
all combinations of sets of values (i.e. the cross product). IN a database,
we already have a better mechanism to build the cross product so having zip
behave like Python is nice.
On Thu, Apr 11, 2019 at 8:40 AM Aman Sinha <am...@gmail.com> wrote:
> > I thought flatten() would be the answer, however, if I flatten the
> columns, I get the following result:
>
> Regarding the flatten() output, this is expected because doing a 'SELECT
> flatten(a), flatten(b) FROM T' is equivalent to doing a cross-product of
> the 2 arrays.
>
> In your example, both arrays are the same length, but what would you expect
> the output to be if they were different ? I don't see a direct SQL way of
> doing it but
> even with UDFs the semantics should be defined.
>
> Aman
>
> On Thu, Apr 11, 2019 at 6:37 AM Charles Givre <cg...@gmail.com> wrote:
>
> > That’s a good idea. I’ll work on a equivalent ZIP() function and submit
> > as a separate PR.
> > — C
> >
> > > On Apr 10, 2019, at 20:44, Paul Rogers <pa...@yahoo.com.INVALID>
> > wrote:
> > >
> > > Hi Charles,
> > >
> > > In Python [1], the "zip" function does this task:
> > >
> > >
> > > zip([1, 2, 3], [4, 5, 6]) --> [(1, 4), (2, 5), (3, 6)]
> > >
> > >
> > > When you gathered the list of functions for the Drill book, did you
> come
> > across anything like this in Drill? I presume you didn't, hence the
> > question. I did a quick (incomplete) check and didn't see any likely
> > candidates.
> > >
> > > Perhaps you could create such a function.
> > >
> > > Once you have the zipped result, you could flatten to get the pairs as
> > rows.
> > >
> > >
> > > Thanks,
> > > - Paul
> > >
> > >
> > >
> > > On Wednesday, April 10, 2019, 5:26:10 PM PDT, Charles Givre <
> > cgivre@gmail.com> wrote:
> > >
> > > Hello Drillers,
> > > I have a query question for you. I have some really ugly data that has
> > a field like this:
> > >
> > > compound_field : { “field_1”: [1,2,3],
> > > “field_2”:[4,5,6]
> > > )
> > >
> > > I would like to map fields 1 and 2 to columns so that the end result
> is:
> > >
> > > field1 | field2
> > > 1 | 4
> > > 2 | 5
> > > 3 | 5
> > >
> > > I thought flatten() would be the answer, however, if I flatten the
> > columns, I get the following result:
> > >
> > > field1 | field2
> > > 1 | 4
> > > 1 | 5
> > > 1 | 6
> > >
> > > Does anyone have any suggestions?
> > > Thanks,
> > > —C
> >
> >
>
Re: Query Question
Posted by Paul Rogers <pa...@yahoo.com.INVALID>.
Hi All,
The trick here, of course, is that Drill does not have the tuple concept of Python:
zip([1, 2, 3], [4, 5, 6]) --> [(1, 4), (2, 5), (3, 6)]
There is no good way in Drill, to represent the array of (1, 4), (2, 5) pairs. The best we can do is:
* A map of two columns, with element names the same as the original array columns.
* A repeated list (but repeated list is not fully supported by all operators)
Maybe what we need is a "flatten2" function that will take two arguments and do the zip and flatten in a single step.
(a: [1, 3 4], b: [4, 5, 6]) --> flatten2(a, b) -->
(a: 1, b: 4)
(a: 2, b: 5)
(a: 3, b 6)
Ideally we'd want a "flattenN" that takes any number of arguments, but I think that may not be achievable due to the nuances of code generation, etc.
Also, since functions in Drill are typed, we'd need a "flatten2" function for all supported combinations of array types (int, int), (int, Varchar), (Varchar, int), (Varchar, Varchar), etc.
Seems like there is an opportunity here to provide a better way to handle these kinds of vector (as in "array") operations.
Thanks,
- Paul
On Thursday, April 11, 2019, 5:49:14 PM PDT, Ted Dunning <te...@gmail.com> wrote:
The semantics for zip with different length arguments tend to be either
ignore tail of longer argument
<https://docs.python.org/3.3/library/functions.html#zip> as Python does
with zip or to reuse shorter arguments to fill out to the length of the
longest argument as R does with cbind and rbind
<https://www.rdocumentation.org/packages/base/versions/3.5.3/topics/cbind>.
R is nice about returning with a warning if the reused value doesn't come
out even at the end.
> cbind(c(1,2), c(4,5,6))
> [,1] [,2]
> [1,] 1 4
> [2,] 2 5
> [3,] 1 6
> Warning message:
> In cbind(c(1, 2), c(4, 5, 6)) :
> number of rows of result is not a multiple of vector length (arg 1)
I think that either definition is fine, but that python's truncate style is
probably easier and makes more sense in a database environment. The most
common use case for R's semantics is to build tables with rows that have
all combinations of sets of values (i.e. the cross product). IN a database,
we already have a better mechanism to build the cross product so having zip
behave like Python is nice.
On Thu, Apr 11, 2019 at 8:40 AM Aman Sinha <am...@gmail.com> wrote:
> > I thought flatten() would be the answer, however, if I flatten the
> columns, I get the following result:
>
> Regarding the flatten() output, this is expected because doing a 'SELECT
> flatten(a), flatten(b) FROM T' is equivalent to doing a cross-product of
> the 2 arrays.
>
> In your example, both arrays are the same length, but what would you expect
> the output to be if they were different ? I don't see a direct SQL way of
> doing it but
> even with UDFs the semantics should be defined.
>
> Aman
>
> On Thu, Apr 11, 2019 at 6:37 AM Charles Givre <cg...@gmail.com> wrote:
>
> > That’s a good idea. I’ll work on a equivalent ZIP() function and submit
> > as a separate PR.
> > — C
> >
> > > On Apr 10, 2019, at 20:44, Paul Rogers <pa...@yahoo.com.INVALID>
> > wrote:
> > >
> > > Hi Charles,
> > >
> > > In Python [1], the "zip" function does this task:
> > >
> > >
> > > zip([1, 2, 3], [4, 5, 6]) --> [(1, 4), (2, 5), (3, 6)]
> > >
> > >
> > > When you gathered the list of functions for the Drill book, did you
> come
> > across anything like this in Drill? I presume you didn't, hence the
> > question. I did a quick (incomplete) check and didn't see any likely
> > candidates.
> > >
> > > Perhaps you could create such a function.
> > >
> > > Once you have the zipped result, you could flatten to get the pairs as
> > rows.
> > >
> > >
> > > Thanks,
> > > - Paul
> > >
> > >
> > >
> > > On Wednesday, April 10, 2019, 5:26:10 PM PDT, Charles Givre <
> > cgivre@gmail.com> wrote:
> > >
> > > Hello Drillers,
> > > I have a query question for you. I have some really ugly data that has
> > a field like this:
> > >
> > > compound_field : { “field_1”: [1,2,3],
> > > “field_2”:[4,5,6]
> > > )
> > >
> > > I would like to map fields 1 and 2 to columns so that the end result
> is:
> > >
> > > field1 | field2
> > > 1 | 4
> > > 2 | 5
> > > 3 | 5
> > >
> > > I thought flatten() would be the answer, however, if I flatten the
> > columns, I get the following result:
> > >
> > > field1 | field2
> > > 1 | 4
> > > 1 | 5
> > > 1 | 6
> > >
> > > Does anyone have any suggestions?
> > > Thanks,
> > > —C
> >
> >
>
Re: Query Question
Posted by Ted Dunning <te...@gmail.com>.
The semantics for zip with different length arguments tend to be either
ignore tail of longer argument
<https://docs.python.org/3.3/library/functions.html#zip> as Python does
with zip or to reuse shorter arguments to fill out to the length of the
longest argument as R does with cbind and rbind
<https://www.rdocumentation.org/packages/base/versions/3.5.3/topics/cbind>.
R is nice about returning with a warning if the reused value doesn't come
out even at the end.
> cbind(c(1,2), c(4,5,6))
> [,1] [,2]
> [1,] 1 4
> [2,] 2 5
> [3,] 1 6
> Warning message:
> In cbind(c(1, 2), c(4, 5, 6)) :
> number of rows of result is not a multiple of vector length (arg 1)
I think that either definition is fine, but that python's truncate style is
probably easier and makes more sense in a database environment. The most
common use case for R's semantics is to build tables with rows that have
all combinations of sets of values (i.e. the cross product). IN a database,
we already have a better mechanism to build the cross product so having zip
behave like Python is nice.
On Thu, Apr 11, 2019 at 8:40 AM Aman Sinha <am...@gmail.com> wrote:
> > I thought flatten() would be the answer, however, if I flatten the
> columns, I get the following result:
>
> Regarding the flatten() output, this is expected because doing a 'SELECT
> flatten(a), flatten(b) FROM T' is equivalent to doing a cross-product of
> the 2 arrays.
>
> In your example, both arrays are the same length, but what would you expect
> the output to be if they were different ? I don't see a direct SQL way of
> doing it but
> even with UDFs the semantics should be defined.
>
> Aman
>
> On Thu, Apr 11, 2019 at 6:37 AM Charles Givre <cg...@gmail.com> wrote:
>
> > That’s a good idea. I’ll work on a equivalent ZIP() function and submit
> > as a separate PR.
> > — C
> >
> > > On Apr 10, 2019, at 20:44, Paul Rogers <pa...@yahoo.com.INVALID>
> > wrote:
> > >
> > > Hi Charles,
> > >
> > > In Python [1], the "zip" function does this task:
> > >
> > >
> > > zip([1, 2, 3], [4, 5, 6]) --> [(1, 4), (2, 5), (3, 6)]
> > >
> > >
> > > When you gathered the list of functions for the Drill book, did you
> come
> > across anything like this in Drill? I presume you didn't, hence the
> > question. I did a quick (incomplete) check and didn't see any likely
> > candidates.
> > >
> > > Perhaps you could create such a function.
> > >
> > > Once you have the zipped result, you could flatten to get the pairs as
> > rows.
> > >
> > >
> > > Thanks,
> > > - Paul
> > >
> > >
> > >
> > > On Wednesday, April 10, 2019, 5:26:10 PM PDT, Charles Givre <
> > cgivre@gmail.com> wrote:
> > >
> > > Hello Drillers,
> > > I have a query question for you. I have some really ugly data that has
> > a field like this:
> > >
> > > compound_field : { “field_1”: [1,2,3],
> > > “field_2”:[4,5,6]
> > > )
> > >
> > > I would like to map fields 1 and 2 to columns so that the end result
> is:
> > >
> > > field1 | field2
> > > 1 | 4
> > > 2 | 5
> > > 3 | 5
> > >
> > > I thought flatten() would be the answer, however, if I flatten the
> > columns, I get the following result:
> > >
> > > field1 | field2
> > > 1 | 4
> > > 1 | 5
> > > 1 | 6
> > >
> > > Does anyone have any suggestions?
> > > Thanks,
> > > —C
> >
> >
>
Re: Query Question
Posted by Ted Dunning <te...@gmail.com>.
The semantics for zip with different length arguments tend to be either
ignore tail of longer argument
<https://docs.python.org/3.3/library/functions.html#zip> as Python does
with zip or to reuse shorter arguments to fill out to the length of the
longest argument as R does with cbind and rbind
<https://www.rdocumentation.org/packages/base/versions/3.5.3/topics/cbind>.
R is nice about returning with a warning if the reused value doesn't come
out even at the end.
> cbind(c(1,2), c(4,5,6))
> [,1] [,2]
> [1,] 1 4
> [2,] 2 5
> [3,] 1 6
> Warning message:
> In cbind(c(1, 2), c(4, 5, 6)) :
> number of rows of result is not a multiple of vector length (arg 1)
I think that either definition is fine, but that python's truncate style is
probably easier and makes more sense in a database environment. The most
common use case for R's semantics is to build tables with rows that have
all combinations of sets of values (i.e. the cross product). IN a database,
we already have a better mechanism to build the cross product so having zip
behave like Python is nice.
On Thu, Apr 11, 2019 at 8:40 AM Aman Sinha <am...@gmail.com> wrote:
> > I thought flatten() would be the answer, however, if I flatten the
> columns, I get the following result:
>
> Regarding the flatten() output, this is expected because doing a 'SELECT
> flatten(a), flatten(b) FROM T' is equivalent to doing a cross-product of
> the 2 arrays.
>
> In your example, both arrays are the same length, but what would you expect
> the output to be if they were different ? I don't see a direct SQL way of
> doing it but
> even with UDFs the semantics should be defined.
>
> Aman
>
> On Thu, Apr 11, 2019 at 6:37 AM Charles Givre <cg...@gmail.com> wrote:
>
> > That’s a good idea. I’ll work on a equivalent ZIP() function and submit
> > as a separate PR.
> > — C
> >
> > > On Apr 10, 2019, at 20:44, Paul Rogers <pa...@yahoo.com.INVALID>
> > wrote:
> > >
> > > Hi Charles,
> > >
> > > In Python [1], the "zip" function does this task:
> > >
> > >
> > > zip([1, 2, 3], [4, 5, 6]) --> [(1, 4), (2, 5), (3, 6)]
> > >
> > >
> > > When you gathered the list of functions for the Drill book, did you
> come
> > across anything like this in Drill? I presume you didn't, hence the
> > question. I did a quick (incomplete) check and didn't see any likely
> > candidates.
> > >
> > > Perhaps you could create such a function.
> > >
> > > Once you have the zipped result, you could flatten to get the pairs as
> > rows.
> > >
> > >
> > > Thanks,
> > > - Paul
> > >
> > >
> > >
> > > On Wednesday, April 10, 2019, 5:26:10 PM PDT, Charles Givre <
> > cgivre@gmail.com> wrote:
> > >
> > > Hello Drillers,
> > > I have a query question for you. I have some really ugly data that has
> > a field like this:
> > >
> > > compound_field : { “field_1”: [1,2,3],
> > > “field_2”:[4,5,6]
> > > )
> > >
> > > I would like to map fields 1 and 2 to columns so that the end result
> is:
> > >
> > > field1 | field2
> > > 1 | 4
> > > 2 | 5
> > > 3 | 5
> > >
> > > I thought flatten() would be the answer, however, if I flatten the
> > columns, I get the following result:
> > >
> > > field1 | field2
> > > 1 | 4
> > > 1 | 5
> > > 1 | 6
> > >
> > > Does anyone have any suggestions?
> > > Thanks,
> > > —C
> >
> >
>
Re: Query Question
Posted by Aman Sinha <am...@gmail.com>.
> I thought flatten() would be the answer, however, if I flatten the
columns, I get the following result:
Regarding the flatten() output, this is expected because doing a 'SELECT
flatten(a), flatten(b) FROM T' is equivalent to doing a cross-product of
the 2 arrays.
In your example, both arrays are the same length, but what would you expect
the output to be if they were different ? I don't see a direct SQL way of
doing it but
even with UDFs the semantics should be defined.
Aman
On Thu, Apr 11, 2019 at 6:37 AM Charles Givre <cg...@gmail.com> wrote:
> That’s a good idea. I’ll work on a equivalent ZIP() function and submit
> as a separate PR.
> — C
>
> > On Apr 10, 2019, at 20:44, Paul Rogers <pa...@yahoo.com.INVALID>
> wrote:
> >
> > Hi Charles,
> >
> > In Python [1], the "zip" function does this task:
> >
> >
> > zip([1, 2, 3], [4, 5, 6]) --> [(1, 4), (2, 5), (3, 6)]
> >
> >
> > When you gathered the list of functions for the Drill book, did you come
> across anything like this in Drill? I presume you didn't, hence the
> question. I did a quick (incomplete) check and didn't see any likely
> candidates.
> >
> > Perhaps you could create such a function.
> >
> > Once you have the zipped result, you could flatten to get the pairs as
> rows.
> >
> >
> > Thanks,
> > - Paul
> >
> >
> >
> > On Wednesday, April 10, 2019, 5:26:10 PM PDT, Charles Givre <
> cgivre@gmail.com> wrote:
> >
> > Hello Drillers,
> > I have a query question for you. I have some really ugly data that has
> a field like this:
> >
> > compound_field : { “field_1”: [1,2,3],
> > “field_2”:[4,5,6]
> > )
> >
> > I would like to map fields 1 and 2 to columns so that the end result is:
> >
> > field1 | field2
> > 1 | 4
> > 2 | 5
> > 3 | 5
> >
> > I thought flatten() would be the answer, however, if I flatten the
> columns, I get the following result:
> >
> > field1 | field2
> > 1 | 4
> > 1 | 5
> > 1 | 6
> >
> > Does anyone have any suggestions?
> > Thanks,
> > —C
>
>
Re: Query Question
Posted by Aman Sinha <am...@gmail.com>.
> I thought flatten() would be the answer, however, if I flatten the
columns, I get the following result:
Regarding the flatten() output, this is expected because doing a 'SELECT
flatten(a), flatten(b) FROM T' is equivalent to doing a cross-product of
the 2 arrays.
In your example, both arrays are the same length, but what would you expect
the output to be if they were different ? I don't see a direct SQL way of
doing it but
even with UDFs the semantics should be defined.
Aman
On Thu, Apr 11, 2019 at 6:37 AM Charles Givre <cg...@gmail.com> wrote:
> That’s a good idea. I’ll work on a equivalent ZIP() function and submit
> as a separate PR.
> — C
>
> > On Apr 10, 2019, at 20:44, Paul Rogers <pa...@yahoo.com.INVALID>
> wrote:
> >
> > Hi Charles,
> >
> > In Python [1], the "zip" function does this task:
> >
> >
> > zip([1, 2, 3], [4, 5, 6]) --> [(1, 4), (2, 5), (3, 6)]
> >
> >
> > When you gathered the list of functions for the Drill book, did you come
> across anything like this in Drill? I presume you didn't, hence the
> question. I did a quick (incomplete) check and didn't see any likely
> candidates.
> >
> > Perhaps you could create such a function.
> >
> > Once you have the zipped result, you could flatten to get the pairs as
> rows.
> >
> >
> > Thanks,
> > - Paul
> >
> >
> >
> > On Wednesday, April 10, 2019, 5:26:10 PM PDT, Charles Givre <
> cgivre@gmail.com> wrote:
> >
> > Hello Drillers,
> > I have a query question for you. I have some really ugly data that has
> a field like this:
> >
> > compound_field : { “field_1”: [1,2,3],
> > “field_2”:[4,5,6]
> > )
> >
> > I would like to map fields 1 and 2 to columns so that the end result is:
> >
> > field1 | field2
> > 1 | 4
> > 2 | 5
> > 3 | 5
> >
> > I thought flatten() would be the answer, however, if I flatten the
> columns, I get the following result:
> >
> > field1 | field2
> > 1 | 4
> > 1 | 5
> > 1 | 6
> >
> > Does anyone have any suggestions?
> > Thanks,
> > —C
>
>
Re: Query Question
Posted by Charles Givre <cg...@gmail.com>.
That’s a good idea. I’ll work on a equivalent ZIP() function and submit as a separate PR.
— C
> On Apr 10, 2019, at 20:44, Paul Rogers <pa...@yahoo.com.INVALID> wrote:
>
> Hi Charles,
>
> In Python [1], the "zip" function does this task:
>
>
> zip([1, 2, 3], [4, 5, 6]) --> [(1, 4), (2, 5), (3, 6)]
>
>
> When you gathered the list of functions for the Drill book, did you come across anything like this in Drill? I presume you didn't, hence the question. I did a quick (incomplete) check and didn't see any likely candidates.
>
> Perhaps you could create such a function.
>
> Once you have the zipped result, you could flatten to get the pairs as rows.
>
>
> Thanks,
> - Paul
>
>
>
> On Wednesday, April 10, 2019, 5:26:10 PM PDT, Charles Givre <cg...@gmail.com> wrote:
>
> Hello Drillers,
> I have a query question for you. I have some really ugly data that has a field like this:
>
> compound_field : { “field_1”: [1,2,3],
> “field_2”:[4,5,6]
> )
>
> I would like to map fields 1 and 2 to columns so that the end result is:
>
> field1 | field2
> 1 | 4
> 2 | 5
> 3 | 5
>
> I thought flatten() would be the answer, however, if I flatten the columns, I get the following result:
>
> field1 | field2
> 1 | 4
> 1 | 5
> 1 | 6
>
> Does anyone have any suggestions?
> Thanks,
> —C
Re: Query Question
Posted by Charles Givre <cg...@gmail.com>.
That’s a good idea. I’ll work on a equivalent ZIP() function and submit as a separate PR.
— C
> On Apr 10, 2019, at 20:44, Paul Rogers <pa...@yahoo.com.INVALID> wrote:
>
> Hi Charles,
>
> In Python [1], the "zip" function does this task:
>
>
> zip([1, 2, 3], [4, 5, 6]) --> [(1, 4), (2, 5), (3, 6)]
>
>
> When you gathered the list of functions for the Drill book, did you come across anything like this in Drill? I presume you didn't, hence the question. I did a quick (incomplete) check and didn't see any likely candidates.
>
> Perhaps you could create such a function.
>
> Once you have the zipped result, you could flatten to get the pairs as rows.
>
>
> Thanks,
> - Paul
>
>
>
> On Wednesday, April 10, 2019, 5:26:10 PM PDT, Charles Givre <cg...@gmail.com> wrote:
>
> Hello Drillers,
> I have a query question for you. I have some really ugly data that has a field like this:
>
> compound_field : { “field_1”: [1,2,3],
> “field_2”:[4,5,6]
> )
>
> I would like to map fields 1 and 2 to columns so that the end result is:
>
> field1 | field2
> 1 | 4
> 2 | 5
> 3 | 5
>
> I thought flatten() would be the answer, however, if I flatten the columns, I get the following result:
>
> field1 | field2
> 1 | 4
> 1 | 5
> 1 | 6
>
> Does anyone have any suggestions?
> Thanks,
> —C
Re: Query Question
Posted by Paul Rogers <pa...@yahoo.com.INVALID>.
Hi Charles,
In Python [1], the "zip" function does this task:
zip([1, 2, 3], [4, 5, 6]) --> [(1, 4), (2, 5), (3, 6)]
When you gathered the list of functions for the Drill book, did you come across anything like this in Drill? I presume you didn't, hence the question. I did a quick (incomplete) check and didn't see any likely candidates.
Perhaps you could create such a function.
Once you have the zipped result, you could flatten to get the pairs as rows.
Thanks,
- Paul
On Wednesday, April 10, 2019, 5:26:10 PM PDT, Charles Givre <cg...@gmail.com> wrote:
Hello Drillers,
I have a query question for you. I have some really ugly data that has a field like this:
compound_field : { “field_1”: [1,2,3],
“field_2”:[4,5,6]
)
I would like to map fields 1 and 2 to columns so that the end result is:
field1 | field2
1 | 4
2 | 5
3 | 5
I thought flatten() would be the answer, however, if I flatten the columns, I get the following result:
field1 | field2
1 | 4
1 | 5
1 | 6
Does anyone have any suggestions?
Thanks,
—C
Re: Query Question
Posted by Paul Rogers <pa...@yahoo.com.INVALID>.
Hi Charles,
In Python [1], the "zip" function does this task:
zip([1, 2, 3], [4, 5, 6]) --> [(1, 4), (2, 5), (3, 6)]
When you gathered the list of functions for the Drill book, did you come across anything like this in Drill? I presume you didn't, hence the question. I did a quick (incomplete) check and didn't see any likely candidates.
Perhaps you could create such a function.
Once you have the zipped result, you could flatten to get the pairs as rows.
Thanks,
- Paul
On Wednesday, April 10, 2019, 5:26:10 PM PDT, Charles Givre <cg...@gmail.com> wrote:
Hello Drillers,
I have a query question for you. I have some really ugly data that has a field like this:
compound_field : { “field_1”: [1,2,3],
“field_2”:[4,5,6]
)
I would like to map fields 1 and 2 to columns so that the end result is:
field1 | field2
1 | 4
2 | 5
3 | 5
I thought flatten() would be the answer, however, if I flatten the columns, I get the following result:
field1 | field2
1 | 4
1 | 5
1 | 6
Does anyone have any suggestions?
Thanks,
—C