You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by John Schneider <js...@apixio.com> on 2015/07/29 21:02:07 UTC

Can I use hive table column as json

I've seen examples from drill queries on HBase (all columns are strings)
access each column as a json object as in:

select t.column_a.xField, t.column_b.yField from some_hbase_table t

I'd like to do the same using my hive tables where I have production logging

My production log tables have the following schema

CREATE EXTERNAL TABLE logging_master_schema(
 line string
) partitioned by (month string, week string, day string);

each "line" is a log line in json, we use the partitions
to narrow down lines we crunch in hive queries where we
use fucntion calls like this get_json_object(line, '$.x.y')
to crack our logs.

A trivial example of a hive query would be:
SELECT get_json_object(line, '$.app.hcc.event_name'),
       get_json_object(line, '$.unixtime'),
       line FROM production_logs_hcc WHERE year='2014' and month='08' and
day='21'
       and get_json_object(line, '$.level') = 'EVENT'
  ORDER BY get_json_object(line, '$.unixtime')
       DESC;

I am trying to figure out now if there's a way we can use
drill to access hive and do the same kind of queries where
drill will treat the 'line' column as json as in this example:

select line.unixtime as ts,
       line.app.hcc.event_name as evt
from hive.production_logs_hcc
where year='2014' and month='08' and day='21'
      and line.level = 'EVENT'
      order by ts desc

so I started with the simplest test:

> select t.line from from hive.production_logs_hcc t --> works fine

> select t.line.level from from hive.production_logs_hcc t --> VARCHAR
assertion error

bummer, so it seams where a column in hbase is implicitly a json, a string
column
in hive is not AND no ammount of attempts to cast the log line to a varchar
and then
access as json makes any difference... (I tried this when I realized that
hive string
had no drill sql analog

Am I just out of luck here, hbase columns are json but hive text columns
aren't

Thanks for any help / explanation
_____________

john o schneider
jos@apixio.com
408-203-7891

Re: Can I use hive table column as json

Posted by John Schneider <js...@apixio.com>.
Digging a little more, I think there must be an error in the convert_from()
function or what ever is passing my line to it
the error

Error: SYSTEM ERROR: IllegalArgumentException: You tried to write a BigInt
> type when you are using a ValueWriter of type NullableVarCharWriterImpl.
> Fragment 0:0


seems to be speaking to the conversion of the json which looks like this:

>
> {"level":"EVENT","app":"app_user_info":"session":null,"user":null,"user_agent":"Pingdom.com_bot_version_1.4_(
> http://www.pingdom.com/)"},"app_name":"hcc","hc":{"event_name":"logout","frontend":{"logout":{"username":null,"message":"User
> is logging
> out"}},"component_name":"frontend"}},"isotime":"2015-07-29T00:00:48.4
> 24267+00:00","source":"account.views","host":"hcc-demo-prd1.apixio.com
> ","client":"184.75.210.186","time":"1438128048424"}


if i read these source log files in directly using

0: jdbc:drill:drillbit=drill.apixio.com> SELECT
distinct(log.app.app_user_info.user_agent) FROM
dfs.`user`.`/logmaster/production/hcc/2015-07-28/*` log ;

I get expected results, no issue parsing the json

Some more verbose tracing would help me debug this issue, does any exist?

/jos



_____________

john o schneider
jos@apixio.com
408-203-7891


On Thu, Jul 30, 2015 at 12:06 AM, John Schneider <js...@apixio.com>
wrote:

>
> Jacques - thanks for your reply. It is very much appreciated
>
> I was able to create a view but when accessing the one column in the view
> I got an IllegalArgumentException. I tried a casting values but keep
> getting back to this error.
>
> One thing I saw though is perhaps not the issue is that when I describe
> the view, the line I converted to JSON is an ANY
> My transcript below:
>
>
> CREATE VIEW logView AS SELECT CONVERT_FROM(line, 'JSON') AS line FROM
> hive.production_logs_hcc_24
>
> 0: jdbc:drill:drillbit=drill.apixio.com> use dfs.`tmp`;
> +-------+--------------------------------------+
> |  ok   |               summary                |
> +-------+--------------------------------------+
> | true  | Default schema changed to [dfs.tmp]  |
> +-------+--------------------------------------+
> 1 row selected (0.353 seconds)
> 0: jdbc:drill:drillbit=drill.apixio.com> CREATE VIEW logView AS SELECT
> CONVERT_FROM(line, 'JSON') AS line FROM hive.production_logs_hcc_24;
> +-------+----------------------------------------------------------+
> |  ok   |                         summary                          |
> +-------+----------------------------------------------------------+
> | true  | View 'logView' created successfully in 'dfs.tmp' schema  |
> +-------+----------------------------------------------------------+
> 1 row selected (0.272 seconds)
>
> 0: jdbc:drill:drillbit=drill.apixio.com> describe logView;
> +--------------+------------+--------------+
> | COLUMN_NAME  | DATA_TYPE  | IS_NULLABLE  |
> +--------------+------------+--------------+
> | line         | ANY        | YES          |
> +--------------+------------+--------------+
> 1 row selected (0.125 seconds)
>
> 0: jdbc:drill:drillbit=drill.apixio.com> select t.line.level from logView
> t limit 10;
> Error: SYSTEM ERROR: IllegalArgumentException: You tried to write a BigInt
> type when you are using a ValueWriter of type NullableVarCharWriterImpl.
>
> Fragment 0:0
>
> [Error Id: 655d0dbd-4d8f-433f-96a6-7ab85efb0b8b on
> hadoop-data-drill:31010] (state=,code=0)
> 0: jdbc:drill:drillbit=drill.apixio.com> select t.line from logView t
> limit 10;
> Error: SYSTEM ERROR: IllegalArgumentException: You tried to write a BigInt
> type when you are using a ValueWriter of type NullableVarCharWriterImpl.
>
> Fragment 0:0
>
> [Error Id: fee663a6-b75c-414c-9d47-39b1e2437558 on
> hadoop-data-drill:31010] (state=,code=0)
>
>
> _____________
>
> john o schneider
> jos@apixio.com
> 408-203-7891
>
>
> On Wed, Jul 29, 2015 at 12:20 PM, Jacques Nadeau <ja...@dremio.com>
> wrote:
>
>> Drill doesn't implicitly cast from VarChar -> Map.  However, just create a
>> Drill view:
>>
>> CREATE VIEW newView AS SELECT CONVERT_FROM(line, 'JSON') AS line FROM
>> hive.production_logs_hcc
>>
>> Then just query newView.
>>
>> Then 'select t.line.level from newView' should work just fine.
>>
>>
>> --
>> Jacques Nadeau
>> CTO and Co-Founder, Dremio
>>
>> On Wed, Jul 29, 2015 at 12:02 PM, John Schneider <js...@apixio.com>
>> wrote:
>>
>> > I've seen examples from drill queries on HBase (all columns are strings)
>> > access each column as a json object as in:
>> >
>> > select t.column_a.xField, t.column_b.yField from some_hbase_table t
>> >
>> > I'd like to do the same using my hive tables where I have production
>> > logging
>> >
>> > My production log tables have the following schema
>> >
>> > CREATE EXTERNAL TABLE logging_master_schema(
>> >  line string
>> > ) partitioned by (month string, week string, day string);
>> >
>> > each "line" is a log line in json, we use the partitions
>> > to narrow down lines we crunch in hive queries where we
>> > use fucntion calls like this get_json_object(line, '$.x.y')
>> > to crack our logs.
>> >
>> > A trivial example of a hive query would be:
>> > SELECT get_json_object(line, '$.app.hcc.event_name'),
>> >        get_json_object(line, '$.unixtime'),
>> >        line FROM production_logs_hcc WHERE year='2014' and month='08'
>> and
>> > day='21'
>> >        and get_json_object(line, '$.level') = 'EVENT'
>> >   ORDER BY get_json_object(line, '$.unixtime')
>> >        DESC;
>> >
>> > I am trying to figure out now if there's a way we can use
>> > drill to access hive and do the same kind of queries where
>> > drill will treat the 'line' column as json as in this example:
>> >
>> > select line.unixtime as ts,
>> >        line.app.hcc.event_name as evt
>> > from hive.production_logs_hcc
>> > where year='2014' and month='08' and day='21'
>> >       and line.level = 'EVENT'
>> >       order by ts desc
>> >
>> > so I started with the simplest test:
>> >
>> > > select t.line from from hive.production_logs_hcc t --> works fine
>> >
>> > > select t.line.level from from hive.production_logs_hcc t --> VARCHAR
>> > assertion error
>> >
>> > bummer, so it seams where a column in hbase is implicitly a json, a
>> string
>> > column
>> > in hive is not AND no ammount of attempts to cast the log line to a
>> varchar
>> > and then
>> > access as json makes any difference... (I tried this when I realized
>> that
>> > hive string
>> > had no drill sql analog
>> >
>> > Am I just out of luck here, hbase columns are json but hive text columns
>> > aren't
>> >
>> > Thanks for any help / explanation
>> > _____________
>> >
>> > john o schneider
>> > jos@apixio.com
>> > 408-203-7891
>> >
>>
>
>

Re: Can I use hive table column as json

Posted by John Schneider <js...@apixio.com>.
Jacques - thanks for your reply. It is very much appreciated

I was able to create a view but when accessing the one column in the view I
got an IllegalArgumentException. I tried a casting values but keep getting
back to this error.

One thing I saw though is perhaps not the issue is that when I describe the
view, the line I converted to JSON is an ANY
My transcript below:


CREATE VIEW logView AS SELECT CONVERT_FROM(line, 'JSON') AS line FROM
hive.production_logs_hcc_24

0: jdbc:drill:drillbit=drill.apixio.com> use dfs.`tmp`;
+-------+--------------------------------------+
|  ok   |               summary                |
+-------+--------------------------------------+
| true  | Default schema changed to [dfs.tmp]  |
+-------+--------------------------------------+
1 row selected (0.353 seconds)
0: jdbc:drill:drillbit=drill.apixio.com> CREATE VIEW logView AS SELECT
CONVERT_FROM(line, 'JSON') AS line FROM hive.production_logs_hcc_24;
+-------+----------------------------------------------------------+
|  ok   |                         summary                          |
+-------+----------------------------------------------------------+
| true  | View 'logView' created successfully in 'dfs.tmp' schema  |
+-------+----------------------------------------------------------+
1 row selected (0.272 seconds)

0: jdbc:drill:drillbit=drill.apixio.com> describe logView;
+--------------+------------+--------------+
| COLUMN_NAME  | DATA_TYPE  | IS_NULLABLE  |
+--------------+------------+--------------+
| line         | ANY        | YES          |
+--------------+------------+--------------+
1 row selected (0.125 seconds)

0: jdbc:drill:drillbit=drill.apixio.com> select t.line.level from logView t
limit 10;
Error: SYSTEM ERROR: IllegalArgumentException: You tried to write a BigInt
type when you are using a ValueWriter of type NullableVarCharWriterImpl.

Fragment 0:0

[Error Id: 655d0dbd-4d8f-433f-96a6-7ab85efb0b8b on hadoop-data-drill:31010]
(state=,code=0)
0: jdbc:drill:drillbit=drill.apixio.com> select t.line from logView t limit
10;
Error: SYSTEM ERROR: IllegalArgumentException: You tried to write a BigInt
type when you are using a ValueWriter of type NullableVarCharWriterImpl.

Fragment 0:0

[Error Id: fee663a6-b75c-414c-9d47-39b1e2437558 on hadoop-data-drill:31010]
(state=,code=0)


_____________

john o schneider
jos@apixio.com
408-203-7891


On Wed, Jul 29, 2015 at 12:20 PM, Jacques Nadeau <ja...@dremio.com> wrote:

> Drill doesn't implicitly cast from VarChar -> Map.  However, just create a
> Drill view:
>
> CREATE VIEW newView AS SELECT CONVERT_FROM(line, 'JSON') AS line FROM
> hive.production_logs_hcc
>
> Then just query newView.
>
> Then 'select t.line.level from newView' should work just fine.
>
>
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
>
> On Wed, Jul 29, 2015 at 12:02 PM, John Schneider <js...@apixio.com>
> wrote:
>
> > I've seen examples from drill queries on HBase (all columns are strings)
> > access each column as a json object as in:
> >
> > select t.column_a.xField, t.column_b.yField from some_hbase_table t
> >
> > I'd like to do the same using my hive tables where I have production
> > logging
> >
> > My production log tables have the following schema
> >
> > CREATE EXTERNAL TABLE logging_master_schema(
> >  line string
> > ) partitioned by (month string, week string, day string);
> >
> > each "line" is a log line in json, we use the partitions
> > to narrow down lines we crunch in hive queries where we
> > use fucntion calls like this get_json_object(line, '$.x.y')
> > to crack our logs.
> >
> > A trivial example of a hive query would be:
> > SELECT get_json_object(line, '$.app.hcc.event_name'),
> >        get_json_object(line, '$.unixtime'),
> >        line FROM production_logs_hcc WHERE year='2014' and month='08' and
> > day='21'
> >        and get_json_object(line, '$.level') = 'EVENT'
> >   ORDER BY get_json_object(line, '$.unixtime')
> >        DESC;
> >
> > I am trying to figure out now if there's a way we can use
> > drill to access hive and do the same kind of queries where
> > drill will treat the 'line' column as json as in this example:
> >
> > select line.unixtime as ts,
> >        line.app.hcc.event_name as evt
> > from hive.production_logs_hcc
> > where year='2014' and month='08' and day='21'
> >       and line.level = 'EVENT'
> >       order by ts desc
> >
> > so I started with the simplest test:
> >
> > > select t.line from from hive.production_logs_hcc t --> works fine
> >
> > > select t.line.level from from hive.production_logs_hcc t --> VARCHAR
> > assertion error
> >
> > bummer, so it seams where a column in hbase is implicitly a json, a
> string
> > column
> > in hive is not AND no ammount of attempts to cast the log line to a
> varchar
> > and then
> > access as json makes any difference... (I tried this when I realized that
> > hive string
> > had no drill sql analog
> >
> > Am I just out of luck here, hbase columns are json but hive text columns
> > aren't
> >
> > Thanks for any help / explanation
> > _____________
> >
> > john o schneider
> > jos@apixio.com
> > 408-203-7891
> >
>

Re: Can I use hive table column as json

Posted by Jacques Nadeau <ja...@dremio.com>.
Drill doesn't implicitly cast from VarChar -> Map.  However, just create a
Drill view:

CREATE VIEW newView AS SELECT CONVERT_FROM(line, 'JSON') AS line FROM
hive.production_logs_hcc

Then just query newView.

Then 'select t.line.level from newView' should work just fine.


--
Jacques Nadeau
CTO and Co-Founder, Dremio

On Wed, Jul 29, 2015 at 12:02 PM, John Schneider <js...@apixio.com>
wrote:

> I've seen examples from drill queries on HBase (all columns are strings)
> access each column as a json object as in:
>
> select t.column_a.xField, t.column_b.yField from some_hbase_table t
>
> I'd like to do the same using my hive tables where I have production
> logging
>
> My production log tables have the following schema
>
> CREATE EXTERNAL TABLE logging_master_schema(
>  line string
> ) partitioned by (month string, week string, day string);
>
> each "line" is a log line in json, we use the partitions
> to narrow down lines we crunch in hive queries where we
> use fucntion calls like this get_json_object(line, '$.x.y')
> to crack our logs.
>
> A trivial example of a hive query would be:
> SELECT get_json_object(line, '$.app.hcc.event_name'),
>        get_json_object(line, '$.unixtime'),
>        line FROM production_logs_hcc WHERE year='2014' and month='08' and
> day='21'
>        and get_json_object(line, '$.level') = 'EVENT'
>   ORDER BY get_json_object(line, '$.unixtime')
>        DESC;
>
> I am trying to figure out now if there's a way we can use
> drill to access hive and do the same kind of queries where
> drill will treat the 'line' column as json as in this example:
>
> select line.unixtime as ts,
>        line.app.hcc.event_name as evt
> from hive.production_logs_hcc
> where year='2014' and month='08' and day='21'
>       and line.level = 'EVENT'
>       order by ts desc
>
> so I started with the simplest test:
>
> > select t.line from from hive.production_logs_hcc t --> works fine
>
> > select t.line.level from from hive.production_logs_hcc t --> VARCHAR
> assertion error
>
> bummer, so it seams where a column in hbase is implicitly a json, a string
> column
> in hive is not AND no ammount of attempts to cast the log line to a varchar
> and then
> access as json makes any difference... (I tried this when I realized that
> hive string
> had no drill sql analog
>
> Am I just out of luck here, hbase columns are json but hive text columns
> aren't
>
> Thanks for any help / explanation
> _____________
>
> john o schneider
> jos@apixio.com
> 408-203-7891
>