You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by Kathiresan S <ka...@gmail.com> on 2016/08/19 22:15:05 UTC

Array data type in Where Clause

Hi,

We have a json where one of the columns is of type array (of Strings) and
we would like to run a query to get the rows where any of the values in the
array matches the given value.

*(Eg): cities.json*
{"id":"1","city":["nyc","la"]}
{"id":"2","city":["la"]}
{"id":"3","city":["nyc"]}
{"id":"4","city":["la","nyc"]}

*Query we would like to run (something like below)*


*select id from cp.`cities.json` where any[city] = 'la'*

Or

*select id from cp.`cities.json` where city contains 'la'*

which is expected to return rows 1,2 & 4

Is there any way we can get this done without using FLATTEN?

Drill version that we use : 1.6.0

Thanks,
Kathir

Re: Array data type in Where Clause

Posted by Kathiresan S <ka...@gmail.com>.
Thanks Andries.

On Saturday, August 20, 2016, Andries Engelbrecht <ae...@maprtech.com>
wrote:

> UDFs should perform similar to native functions, pending on how well they
> are written.
>
> You will likely need a UDF for this requirement, maybe someone else has a
> different solution using the existing functions.
>
> --Andries
>
>
>
>
> > On Aug 19, 2016, at 4:16 PM, Kathiresan S <kathiresanselvaraj@gmail.com
> <javascript:;>> wrote:
> >
> > Awesome, thanks! This solves my exact match (and wildcard matches)
> > requirements.
> >
> > I also have a requirement to apply standard conditions like <,<=,>,>=,
> etc. on
> > numbers in an array
> >
> > For eg.
> > I need something like *repeated_contains(array, 5, '>')* and this should
> > return true if the array has any element greater than 5 and false
> otherwise.
> >
> > So should I be creating custom functions, for this?
> >
> > If yes, are the custom functions somehow perform slower than the ones
> > provided by drill out of the box?
> >
> > Thanks,
> > Kathir
> >
> > On Friday, August 19, 2016, Andries Engelbrecht <
> aengelbrecht@maprtech.com <javascript:;>>
> > wrote:
> >
> >> Use REPEATED_CONTAINS.
> >> http://drill.apache.org/docs/repeated-contains/ <
> >> http://drill.apache.org/docs/repeated-contains/>
> >>
> >>
> >> --Andries
> >>
> >>
> >>
> >>> On Aug 19, 2016, at 3:15 PM, Kathiresan S <
> kathiresanselvaraj@gmail.com <javascript:;>
> >> <javascript:;>> wrote:
> >>>
> >>> Hi,
> >>>
> >>> We have a json where one of the columns is of type array (of Strings)
> and
> >>> we would like to run a query to get the rows where any of the values in
> >> the
> >>> array matches the given value.
> >>>
> >>> *(Eg): cities.json*
> >>> {"id":"1","city":["nyc","la"]}
> >>> {"id":"2","city":["la"]}
> >>> {"id":"3","city":["nyc"]}
> >>> {"id":"4","city":["la","nyc"]}
> >>>
> >>> *Query we would like to run (something like below)*
> >>>
> >>>
> >>> *select id from cp.`cities.json` where any[city] = 'la'*
> >>>
> >>> Or
> >>>
> >>> *select id from cp.`cities.json` where city contains 'la'*
> >>>
> >>> which is expected to return rows 1,2 & 4
> >>>
> >>> Is there any way we can get this done without using FLATTEN?
> >>>
> >>> Drill version that we use : 1.6.0
> >>>
> >>> Thanks,
> >>> Kathir
> >>
> >>
>
>

Re: Array data type in Where Clause

Posted by Andries Engelbrecht <ae...@maprtech.com>.
UDFs should perform similar to native functions, pending on how well they are written.

You will likely need a UDF for this requirement, maybe someone else has a different solution using the existing functions.

--Andries




> On Aug 19, 2016, at 4:16 PM, Kathiresan S <ka...@gmail.com> wrote:
> 
> Awesome, thanks! This solves my exact match (and wildcard matches)
> requirements.
> 
> I also have a requirement to apply standard conditions like <,<=,>,>=, etc. on
> numbers in an array
> 
> For eg.
> I need something like *repeated_contains(array, 5, '>')* and this should
> return true if the array has any element greater than 5 and false otherwise.
> 
> So should I be creating custom functions, for this?
> 
> If yes, are the custom functions somehow perform slower than the ones
> provided by drill out of the box?
> 
> Thanks,
> Kathir
> 
> On Friday, August 19, 2016, Andries Engelbrecht <ae...@maprtech.com>
> wrote:
> 
>> Use REPEATED_CONTAINS.
>> http://drill.apache.org/docs/repeated-contains/ <
>> http://drill.apache.org/docs/repeated-contains/>
>> 
>> 
>> --Andries
>> 
>> 
>> 
>>> On Aug 19, 2016, at 3:15 PM, Kathiresan S <kathiresanselvaraj@gmail.com
>> <javascript:;>> wrote:
>>> 
>>> Hi,
>>> 
>>> We have a json where one of the columns is of type array (of Strings) and
>>> we would like to run a query to get the rows where any of the values in
>> the
>>> array matches the given value.
>>> 
>>> *(Eg): cities.json*
>>> {"id":"1","city":["nyc","la"]}
>>> {"id":"2","city":["la"]}
>>> {"id":"3","city":["nyc"]}
>>> {"id":"4","city":["la","nyc"]}
>>> 
>>> *Query we would like to run (something like below)*
>>> 
>>> 
>>> *select id from cp.`cities.json` where any[city] = 'la'*
>>> 
>>> Or
>>> 
>>> *select id from cp.`cities.json` where city contains 'la'*
>>> 
>>> which is expected to return rows 1,2 & 4
>>> 
>>> Is there any way we can get this done without using FLATTEN?
>>> 
>>> Drill version that we use : 1.6.0
>>> 
>>> Thanks,
>>> Kathir
>> 
>> 


Re: Array data type in Where Clause

Posted by Kathiresan S <ka...@gmail.com>.
Awesome, thanks! This solves my exact match (and wildcard matches)
requirements.

I also have a requirement to apply standard conditions like <,<=,>,>=, etc. on
numbers in an array

For eg.
I need something like *repeated_contains(array, 5, '>')* and this should
return true if the array has any element greater than 5 and false otherwise.

So should I be creating custom functions, for this?

If yes, are the custom functions somehow perform slower than the ones
provided by drill out of the box?

Thanks,
Kathir

On Friday, August 19, 2016, Andries Engelbrecht <ae...@maprtech.com>
wrote:

> Use REPEATED_CONTAINS.
> http://drill.apache.org/docs/repeated-contains/ <
> http://drill.apache.org/docs/repeated-contains/>
>
>
> --Andries
>
>
>
> > On Aug 19, 2016, at 3:15 PM, Kathiresan S <kathiresanselvaraj@gmail.com
> <javascript:;>> wrote:
> >
> > Hi,
> >
> > We have a json where one of the columns is of type array (of Strings) and
> > we would like to run a query to get the rows where any of the values in
> the
> > array matches the given value.
> >
> > *(Eg): cities.json*
> > {"id":"1","city":["nyc","la"]}
> > {"id":"2","city":["la"]}
> > {"id":"3","city":["nyc"]}
> > {"id":"4","city":["la","nyc"]}
> >
> > *Query we would like to run (something like below)*
> >
> >
> > *select id from cp.`cities.json` where any[city] = 'la'*
> >
> > Or
> >
> > *select id from cp.`cities.json` where city contains 'la'*
> >
> > which is expected to return rows 1,2 & 4
> >
> > Is there any way we can get this done without using FLATTEN?
> >
> > Drill version that we use : 1.6.0
> >
> > Thanks,
> > Kathir
>
>

Re: Array data type in Where Clause

Posted by Andries Engelbrecht <ae...@maprtech.com>.
Use REPEATED_CONTAINS.
http://drill.apache.org/docs/repeated-contains/ <http://drill.apache.org/docs/repeated-contains/>


--Andries



> On Aug 19, 2016, at 3:15 PM, Kathiresan S <ka...@gmail.com> wrote:
> 
> Hi,
> 
> We have a json where one of the columns is of type array (of Strings) and
> we would like to run a query to get the rows where any of the values in the
> array matches the given value.
> 
> *(Eg): cities.json*
> {"id":"1","city":["nyc","la"]}
> {"id":"2","city":["la"]}
> {"id":"3","city":["nyc"]}
> {"id":"4","city":["la","nyc"]}
> 
> *Query we would like to run (something like below)*
> 
> 
> *select id from cp.`cities.json` where any[city] = 'la'*
> 
> Or
> 
> *select id from cp.`cities.json` where city contains 'la'*
> 
> which is expected to return rows 1,2 & 4
> 
> Is there any way we can get this done without using FLATTEN?
> 
> Drill version that we use : 1.6.0
> 
> Thanks,
> Kathir