You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by John Schneider <js...@apixio.com> on 2015/07/31 00:42:04 UTC

Drill making wrong type decision on comparison in where clause

select t.app.hcc.event_name as en
from dfs.`user`.`/logmaster/production/hcc/2015-07-30/*.json` t
where en in ('logout');

this yields the error:
  Error: SYSTEM ERROR: NumberFormatException: logout

ok, so let's explicitly cast

select cast(convert_from(t.app.hcc.event_name, 'UTF8') as varchar(30)) as
en
from dfs.`user`.`/logmaster/production/hcc/2015-07-30/*.json` t
where en in ('logout');

now, just to humor drill

select cast(convert_from(t.app.hcc.event_name, 'UTF8') as varchar(30)) as
en
from dfs.`user`.`/logmaster/production/hcc/2015-07-30/*.json` t
where en in ('123');

runs, but returns no results - as would be expected because we don't use #s
as event names

Am I misunderstanding how drill types data in a schema less record?
I would have thought the explicit cast would have been enough


P.S. I ran another query like this one on a months worth of logs (a lot of
json in HDFS) and it chewed through it in less time than it takes my
current Hive query to actually start, and all of this on a single aws
m3.xlarge - this drill sucker is fast, we really want to use it.

john o schneider
jos@apixio.com
408-203-7891

Re: Drill making wrong type decision on comparison in where clause

Posted by John Schneider <js...@apixio.com>.
thanks Aman - that worked well... so many sql dialects

_____________

john o schneider
jos@apixio.com
408-203-7891


On Thu, Jul 30, 2015 at 3:48 PM, Aman Sinha <as...@maprtech.com> wrote:

> Hi John,
> you cannot use aliases in the WHERE condition.  Drill is not unique in this
> restriction...since the WHERE condition is evaluated before the alias is
> done in the SELECT clause.   Did you try WHERE t.app.hcc.event_name IN
> ('logout') ?
>
> Aman
>
> On Thu, Jul 30, 2015 at 3:42 PM, John Schneider <js...@apixio.com>
> wrote:
>
> > select t.app.hcc.event_name as en
> > from dfs.`user`.`/logmaster/production/hcc/2015-07-30/*.json` t
> > where en in ('logout');
> >
> > this yields the error:
> >   Error: SYSTEM ERROR: NumberFormatException: logout
> >
> > ok, so let's explicitly cast
> >
> > select cast(convert_from(t.app.hcc.event_name, 'UTF8') as varchar(30)) as
> > en
> > from dfs.`user`.`/logmaster/production/hcc/2015-07-30/*.json` t
> > where en in ('logout');
> >
> > now, just to humor drill
> >
> > select cast(convert_from(t.app.hcc.event_name, 'UTF8') as varchar(30)) as
> > en
> > from dfs.`user`.`/logmaster/production/hcc/2015-07-30/*.json` t
> > where en in ('123');
> >
> > runs, but returns no results - as would be expected because we don't use
> #s
> > as event names
> >
> > Am I misunderstanding how drill types data in a schema less record?
> > I would have thought the explicit cast would have been enough
> >
> >
> > P.S. I ran another query like this one on a months worth of logs (a lot
> of
> > json in HDFS) and it chewed through it in less time than it takes my
> > current Hive query to actually start, and all of this on a single aws
> > m3.xlarge - this drill sucker is fast, we really want to use it.
> >
> > john o schneider
> > jos@apixio.com
> > 408-203-7891
> >
>

Re: Drill making wrong type decision on comparison in where clause

Posted by Aman Sinha <as...@maprtech.com>.
BTW, there was a discussion earlier today on the dev list regarding a
related issue: see the thread titled:  'Count where or having clause does
not work as expected'.
Drill does need to improve error messaging for such types of queries...this
is a known usability issue and hopefully we can resolve it in the near
future.

Aman

On Thu, Jul 30, 2015 at 3:48 PM, Aman Sinha <as...@maprtech.com> wrote:

> Hi John,
> you cannot use aliases in the WHERE condition.  Drill is not unique in
> this restriction...since the WHERE condition is evaluated before the alias
> is done in the SELECT clause.   Did you try WHERE t.app.hcc.event_name IN
> ('logout') ?
>
> Aman
>
> On Thu, Jul 30, 2015 at 3:42 PM, John Schneider <js...@apixio.com>
> wrote:
>
>> select t.app.hcc.event_name as en
>> from dfs.`user`.`/logmaster/production/hcc/2015-07-30/*.json` t
>> where en in ('logout');
>>
>> this yields the error:
>>   Error: SYSTEM ERROR: NumberFormatException: logout
>>
>> ok, so let's explicitly cast
>>
>> select cast(convert_from(t.app.hcc.event_name, 'UTF8') as varchar(30)) as
>> en
>> from dfs.`user`.`/logmaster/production/hcc/2015-07-30/*.json` t
>> where en in ('logout');
>>
>> now, just to humor drill
>>
>> select cast(convert_from(t.app.hcc.event_name, 'UTF8') as varchar(30)) as
>> en
>> from dfs.`user`.`/logmaster/production/hcc/2015-07-30/*.json` t
>> where en in ('123');
>>
>> runs, but returns no results - as would be expected because we don't use
>> #s
>> as event names
>>
>> Am I misunderstanding how drill types data in a schema less record?
>> I would have thought the explicit cast would have been enough
>>
>>
>> P.S. I ran another query like this one on a months worth of logs (a lot of
>> json in HDFS) and it chewed through it in less time than it takes my
>> current Hive query to actually start, and all of this on a single aws
>> m3.xlarge - this drill sucker is fast, we really want to use it.
>>
>> john o schneider
>> jos@apixio.com
>> 408-203-7891
>>
>
>

Re: Drill making wrong type decision on comparison in where clause

Posted by John Schneider <js...@apixio.com>.
thanks mehant, I'm starting to grok this.

_____________

john o schneider
jos@apixio.com
408-203-7891


On Thu, Jul 30, 2015 at 4:05 PM, mehant baid <ba...@gmail.com> wrote:

> On a side note, convert_from is used to convert a set of bytes encoded in a
> given format to a Drill type (mostly it is used while reading from HBase).
> You shouldn't have to use this with JSON because Drill knows how to read
> JSON and convert it to a Drill type. However you may have to use cast to go
> from one Drill type to another.
>
> Thanks
> Mehant
>
> On Thu, Jul 30, 2015 at 3:48 PM, Aman Sinha <as...@maprtech.com> wrote:
>
> > Hi John,
> > you cannot use aliases in the WHERE condition.  Drill is not unique in
> this
> > restriction...since the WHERE condition is evaluated before the alias is
> > done in the SELECT clause.   Did you try WHERE t.app.hcc.event_name IN
> > ('logout') ?
> >
> > Aman
> >
> > On Thu, Jul 30, 2015 at 3:42 PM, John Schneider <js...@apixio.com>
> > wrote:
> >
> > > select t.app.hcc.event_name as en
> > > from dfs.`user`.`/logmaster/production/hcc/2015-07-30/*.json` t
> > > where en in ('logout');
> > >
> > > this yields the error:
> > >   Error: SYSTEM ERROR: NumberFormatException: logout
> > >
> > > ok, so let's explicitly cast
> > >
> > > select cast(convert_from(t.app.hcc.event_name, 'UTF8') as varchar(30))
> as
> > > en
> > > from dfs.`user`.`/logmaster/production/hcc/2015-07-30/*.json` t
> > > where en in ('logout');
> > >
> > > now, just to humor drill
> > >
> > > select cast(convert_from(t.app.hcc.event_name, 'UTF8') as varchar(30))
> as
> > > en
> > > from dfs.`user`.`/logmaster/production/hcc/2015-07-30/*.json` t
> > > where en in ('123');
> > >
> > > runs, but returns no results - as would be expected because we don't
> use
> > #s
> > > as event names
> > >
> > > Am I misunderstanding how drill types data in a schema less record?
> > > I would have thought the explicit cast would have been enough
> > >
> > >
> > > P.S. I ran another query like this one on a months worth of logs (a lot
> > of
> > > json in HDFS) and it chewed through it in less time than it takes my
> > > current Hive query to actually start, and all of this on a single aws
> > > m3.xlarge - this drill sucker is fast, we really want to use it.
> > >
> > > john o schneider
> > > jos@apixio.com
> > > 408-203-7891
> > >
> >
>

Re: Drill making wrong type decision on comparison in where clause

Posted by mehant baid <ba...@gmail.com>.
On a side note, convert_from is used to convert a set of bytes encoded in a
given format to a Drill type (mostly it is used while reading from HBase).
You shouldn't have to use this with JSON because Drill knows how to read
JSON and convert it to a Drill type. However you may have to use cast to go
from one Drill type to another.

Thanks
Mehant

On Thu, Jul 30, 2015 at 3:48 PM, Aman Sinha <as...@maprtech.com> wrote:

> Hi John,
> you cannot use aliases in the WHERE condition.  Drill is not unique in this
> restriction...since the WHERE condition is evaluated before the alias is
> done in the SELECT clause.   Did you try WHERE t.app.hcc.event_name IN
> ('logout') ?
>
> Aman
>
> On Thu, Jul 30, 2015 at 3:42 PM, John Schneider <js...@apixio.com>
> wrote:
>
> > select t.app.hcc.event_name as en
> > from dfs.`user`.`/logmaster/production/hcc/2015-07-30/*.json` t
> > where en in ('logout');
> >
> > this yields the error:
> >   Error: SYSTEM ERROR: NumberFormatException: logout
> >
> > ok, so let's explicitly cast
> >
> > select cast(convert_from(t.app.hcc.event_name, 'UTF8') as varchar(30)) as
> > en
> > from dfs.`user`.`/logmaster/production/hcc/2015-07-30/*.json` t
> > where en in ('logout');
> >
> > now, just to humor drill
> >
> > select cast(convert_from(t.app.hcc.event_name, 'UTF8') as varchar(30)) as
> > en
> > from dfs.`user`.`/logmaster/production/hcc/2015-07-30/*.json` t
> > where en in ('123');
> >
> > runs, but returns no results - as would be expected because we don't use
> #s
> > as event names
> >
> > Am I misunderstanding how drill types data in a schema less record?
> > I would have thought the explicit cast would have been enough
> >
> >
> > P.S. I ran another query like this one on a months worth of logs (a lot
> of
> > json in HDFS) and it chewed through it in less time than it takes my
> > current Hive query to actually start, and all of this on a single aws
> > m3.xlarge - this drill sucker is fast, we really want to use it.
> >
> > john o schneider
> > jos@apixio.com
> > 408-203-7891
> >
>

Re: Drill making wrong type decision on comparison in where clause

Posted by Aman Sinha <as...@maprtech.com>.
Hi John,
you cannot use aliases in the WHERE condition.  Drill is not unique in this
restriction...since the WHERE condition is evaluated before the alias is
done in the SELECT clause.   Did you try WHERE t.app.hcc.event_name IN
('logout') ?

Aman

On Thu, Jul 30, 2015 at 3:42 PM, John Schneider <js...@apixio.com>
wrote:

> select t.app.hcc.event_name as en
> from dfs.`user`.`/logmaster/production/hcc/2015-07-30/*.json` t
> where en in ('logout');
>
> this yields the error:
>   Error: SYSTEM ERROR: NumberFormatException: logout
>
> ok, so let's explicitly cast
>
> select cast(convert_from(t.app.hcc.event_name, 'UTF8') as varchar(30)) as
> en
> from dfs.`user`.`/logmaster/production/hcc/2015-07-30/*.json` t
> where en in ('logout');
>
> now, just to humor drill
>
> select cast(convert_from(t.app.hcc.event_name, 'UTF8') as varchar(30)) as
> en
> from dfs.`user`.`/logmaster/production/hcc/2015-07-30/*.json` t
> where en in ('123');
>
> runs, but returns no results - as would be expected because we don't use #s
> as event names
>
> Am I misunderstanding how drill types data in a schema less record?
> I would have thought the explicit cast would have been enough
>
>
> P.S. I ran another query like this one on a months worth of logs (a lot of
> json in HDFS) and it chewed through it in less time than it takes my
> current Hive query to actually start, and all of this on a single aws
> m3.xlarge - this drill sucker is fast, we really want to use it.
>
> john o schneider
> jos@apixio.com
> 408-203-7891
>