You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by John Schneider <js...@apixio.com> on 2015/07/31 00:42:04 UTC
Drill making wrong type decision on comparison in where clause
select t.app.hcc.event_name as en
from dfs.`user`.`/logmaster/production/hcc/2015-07-30/*.json` t
where en in ('logout');
this yields the error:
Error: SYSTEM ERROR: NumberFormatException: logout
ok, so let's explicitly cast
select cast(convert_from(t.app.hcc.event_name, 'UTF8') as varchar(30)) as
en
from dfs.`user`.`/logmaster/production/hcc/2015-07-30/*.json` t
where en in ('logout');
now, just to humor drill
select cast(convert_from(t.app.hcc.event_name, 'UTF8') as varchar(30)) as
en
from dfs.`user`.`/logmaster/production/hcc/2015-07-30/*.json` t
where en in ('123');
runs, but returns no results - as would be expected because we don't use #s
as event names
Am I misunderstanding how drill types data in a schema less record?
I would have thought the explicit cast would have been enough
P.S. I ran another query like this one on a months worth of logs (a lot of
json in HDFS) and it chewed through it in less time than it takes my
current Hive query to actually start, and all of this on a single aws
m3.xlarge - this drill sucker is fast, we really want to use it.
john o schneider
jos@apixio.com
408-203-7891
Re: Drill making wrong type decision on comparison in where clause
Posted by John Schneider <js...@apixio.com>.
thanks Aman - that worked well... so many sql dialects
_____________
john o schneider
jos@apixio.com
408-203-7891
On Thu, Jul 30, 2015 at 3:48 PM, Aman Sinha <as...@maprtech.com> wrote:
> Hi John,
> you cannot use aliases in the WHERE condition. Drill is not unique in this
> restriction...since the WHERE condition is evaluated before the alias is
> done in the SELECT clause. Did you try WHERE t.app.hcc.event_name IN
> ('logout') ?
>
> Aman
>
> On Thu, Jul 30, 2015 at 3:42 PM, John Schneider <js...@apixio.com>
> wrote:
>
> > select t.app.hcc.event_name as en
> > from dfs.`user`.`/logmaster/production/hcc/2015-07-30/*.json` t
> > where en in ('logout');
> >
> > this yields the error:
> > Error: SYSTEM ERROR: NumberFormatException: logout
> >
> > ok, so let's explicitly cast
> >
> > select cast(convert_from(t.app.hcc.event_name, 'UTF8') as varchar(30)) as
> > en
> > from dfs.`user`.`/logmaster/production/hcc/2015-07-30/*.json` t
> > where en in ('logout');
> >
> > now, just to humor drill
> >
> > select cast(convert_from(t.app.hcc.event_name, 'UTF8') as varchar(30)) as
> > en
> > from dfs.`user`.`/logmaster/production/hcc/2015-07-30/*.json` t
> > where en in ('123');
> >
> > runs, but returns no results - as would be expected because we don't use
> #s
> > as event names
> >
> > Am I misunderstanding how drill types data in a schema less record?
> > I would have thought the explicit cast would have been enough
> >
> >
> > P.S. I ran another query like this one on a months worth of logs (a lot
> of
> > json in HDFS) and it chewed through it in less time than it takes my
> > current Hive query to actually start, and all of this on a single aws
> > m3.xlarge - this drill sucker is fast, we really want to use it.
> >
> > john o schneider
> > jos@apixio.com
> > 408-203-7891
> >
>
Re: Drill making wrong type decision on comparison in where clause
Posted by Aman Sinha <as...@maprtech.com>.
BTW, there was a discussion earlier today on the dev list regarding a
related issue: see the thread titled: 'Count where or having clause does
not work as expected'.
Drill does need to improve error messaging for such types of queries...this
is a known usability issue and hopefully we can resolve it in the near
future.
Aman
On Thu, Jul 30, 2015 at 3:48 PM, Aman Sinha <as...@maprtech.com> wrote:
> Hi John,
> you cannot use aliases in the WHERE condition. Drill is not unique in
> this restriction...since the WHERE condition is evaluated before the alias
> is done in the SELECT clause. Did you try WHERE t.app.hcc.event_name IN
> ('logout') ?
>
> Aman
>
> On Thu, Jul 30, 2015 at 3:42 PM, John Schneider <js...@apixio.com>
> wrote:
>
>> select t.app.hcc.event_name as en
>> from dfs.`user`.`/logmaster/production/hcc/2015-07-30/*.json` t
>> where en in ('logout');
>>
>> this yields the error:
>> Error: SYSTEM ERROR: NumberFormatException: logout
>>
>> ok, so let's explicitly cast
>>
>> select cast(convert_from(t.app.hcc.event_name, 'UTF8') as varchar(30)) as
>> en
>> from dfs.`user`.`/logmaster/production/hcc/2015-07-30/*.json` t
>> where en in ('logout');
>>
>> now, just to humor drill
>>
>> select cast(convert_from(t.app.hcc.event_name, 'UTF8') as varchar(30)) as
>> en
>> from dfs.`user`.`/logmaster/production/hcc/2015-07-30/*.json` t
>> where en in ('123');
>>
>> runs, but returns no results - as would be expected because we don't use
>> #s
>> as event names
>>
>> Am I misunderstanding how drill types data in a schema less record?
>> I would have thought the explicit cast would have been enough
>>
>>
>> P.S. I ran another query like this one on a months worth of logs (a lot of
>> json in HDFS) and it chewed through it in less time than it takes my
>> current Hive query to actually start, and all of this on a single aws
>> m3.xlarge - this drill sucker is fast, we really want to use it.
>>
>> john o schneider
>> jos@apixio.com
>> 408-203-7891
>>
>
>
Re: Drill making wrong type decision on comparison in where clause
Posted by John Schneider <js...@apixio.com>.
thanks mehant, I'm starting to grok this.
_____________
john o schneider
jos@apixio.com
408-203-7891
On Thu, Jul 30, 2015 at 4:05 PM, mehant baid <ba...@gmail.com> wrote:
> On a side note, convert_from is used to convert a set of bytes encoded in a
> given format to a Drill type (mostly it is used while reading from HBase).
> You shouldn't have to use this with JSON because Drill knows how to read
> JSON and convert it to a Drill type. However you may have to use cast to go
> from one Drill type to another.
>
> Thanks
> Mehant
>
> On Thu, Jul 30, 2015 at 3:48 PM, Aman Sinha <as...@maprtech.com> wrote:
>
> > Hi John,
> > you cannot use aliases in the WHERE condition. Drill is not unique in
> this
> > restriction...since the WHERE condition is evaluated before the alias is
> > done in the SELECT clause. Did you try WHERE t.app.hcc.event_name IN
> > ('logout') ?
> >
> > Aman
> >
> > On Thu, Jul 30, 2015 at 3:42 PM, John Schneider <js...@apixio.com>
> > wrote:
> >
> > > select t.app.hcc.event_name as en
> > > from dfs.`user`.`/logmaster/production/hcc/2015-07-30/*.json` t
> > > where en in ('logout');
> > >
> > > this yields the error:
> > > Error: SYSTEM ERROR: NumberFormatException: logout
> > >
> > > ok, so let's explicitly cast
> > >
> > > select cast(convert_from(t.app.hcc.event_name, 'UTF8') as varchar(30))
> as
> > > en
> > > from dfs.`user`.`/logmaster/production/hcc/2015-07-30/*.json` t
> > > where en in ('logout');
> > >
> > > now, just to humor drill
> > >
> > > select cast(convert_from(t.app.hcc.event_name, 'UTF8') as varchar(30))
> as
> > > en
> > > from dfs.`user`.`/logmaster/production/hcc/2015-07-30/*.json` t
> > > where en in ('123');
> > >
> > > runs, but returns no results - as would be expected because we don't
> use
> > #s
> > > as event names
> > >
> > > Am I misunderstanding how drill types data in a schema less record?
> > > I would have thought the explicit cast would have been enough
> > >
> > >
> > > P.S. I ran another query like this one on a months worth of logs (a lot
> > of
> > > json in HDFS) and it chewed through it in less time than it takes my
> > > current Hive query to actually start, and all of this on a single aws
> > > m3.xlarge - this drill sucker is fast, we really want to use it.
> > >
> > > john o schneider
> > > jos@apixio.com
> > > 408-203-7891
> > >
> >
>
Re: Drill making wrong type decision on comparison in where clause
Posted by mehant baid <ba...@gmail.com>.
On a side note, convert_from is used to convert a set of bytes encoded in a
given format to a Drill type (mostly it is used while reading from HBase).
You shouldn't have to use this with JSON because Drill knows how to read
JSON and convert it to a Drill type. However you may have to use cast to go
from one Drill type to another.
Thanks
Mehant
On Thu, Jul 30, 2015 at 3:48 PM, Aman Sinha <as...@maprtech.com> wrote:
> Hi John,
> you cannot use aliases in the WHERE condition. Drill is not unique in this
> restriction...since the WHERE condition is evaluated before the alias is
> done in the SELECT clause. Did you try WHERE t.app.hcc.event_name IN
> ('logout') ?
>
> Aman
>
> On Thu, Jul 30, 2015 at 3:42 PM, John Schneider <js...@apixio.com>
> wrote:
>
> > select t.app.hcc.event_name as en
> > from dfs.`user`.`/logmaster/production/hcc/2015-07-30/*.json` t
> > where en in ('logout');
> >
> > this yields the error:
> > Error: SYSTEM ERROR: NumberFormatException: logout
> >
> > ok, so let's explicitly cast
> >
> > select cast(convert_from(t.app.hcc.event_name, 'UTF8') as varchar(30)) as
> > en
> > from dfs.`user`.`/logmaster/production/hcc/2015-07-30/*.json` t
> > where en in ('logout');
> >
> > now, just to humor drill
> >
> > select cast(convert_from(t.app.hcc.event_name, 'UTF8') as varchar(30)) as
> > en
> > from dfs.`user`.`/logmaster/production/hcc/2015-07-30/*.json` t
> > where en in ('123');
> >
> > runs, but returns no results - as would be expected because we don't use
> #s
> > as event names
> >
> > Am I misunderstanding how drill types data in a schema less record?
> > I would have thought the explicit cast would have been enough
> >
> >
> > P.S. I ran another query like this one on a months worth of logs (a lot
> of
> > json in HDFS) and it chewed through it in less time than it takes my
> > current Hive query to actually start, and all of this on a single aws
> > m3.xlarge - this drill sucker is fast, we really want to use it.
> >
> > john o schneider
> > jos@apixio.com
> > 408-203-7891
> >
>
Re: Drill making wrong type decision on comparison in where clause
Posted by Aman Sinha <as...@maprtech.com>.
Hi John,
you cannot use aliases in the WHERE condition. Drill is not unique in this
restriction...since the WHERE condition is evaluated before the alias is
done in the SELECT clause. Did you try WHERE t.app.hcc.event_name IN
('logout') ?
Aman
On Thu, Jul 30, 2015 at 3:42 PM, John Schneider <js...@apixio.com>
wrote:
> select t.app.hcc.event_name as en
> from dfs.`user`.`/logmaster/production/hcc/2015-07-30/*.json` t
> where en in ('logout');
>
> this yields the error:
> Error: SYSTEM ERROR: NumberFormatException: logout
>
> ok, so let's explicitly cast
>
> select cast(convert_from(t.app.hcc.event_name, 'UTF8') as varchar(30)) as
> en
> from dfs.`user`.`/logmaster/production/hcc/2015-07-30/*.json` t
> where en in ('logout');
>
> now, just to humor drill
>
> select cast(convert_from(t.app.hcc.event_name, 'UTF8') as varchar(30)) as
> en
> from dfs.`user`.`/logmaster/production/hcc/2015-07-30/*.json` t
> where en in ('123');
>
> runs, but returns no results - as would be expected because we don't use #s
> as event names
>
> Am I misunderstanding how drill types data in a schema less record?
> I would have thought the explicit cast would have been enough
>
>
> P.S. I ran another query like this one on a months worth of logs (a lot of
> json in HDFS) and it chewed through it in less time than it takes my
> current Hive query to actually start, and all of this on a single aws
> m3.xlarge - this drill sucker is fast, we really want to use it.
>
> john o schneider
> jos@apixio.com
> 408-203-7891
>