You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by John Schneider <js...@apixio.com> on 2015/07/29 21:02:07 UTC
Can I use hive table column as json
I've seen examples from drill queries on HBase (all columns are strings)
access each column as a json object as in:
select t.column_a.xField, t.column_b.yField from some_hbase_table t
I'd like to do the same using my hive tables where I have production logging
My production log tables have the following schema
CREATE EXTERNAL TABLE logging_master_schema(
line string
) partitioned by (month string, week string, day string);
each "line" is a log line in json, we use the partitions
to narrow down lines we crunch in hive queries where we
use fucntion calls like this get_json_object(line, '$.x.y')
to crack our logs.
A trivial example of a hive query would be:
SELECT get_json_object(line, '$.app.hcc.event_name'),
get_json_object(line, '$.unixtime'),
line FROM production_logs_hcc WHERE year='2014' and month='08' and
day='21'
and get_json_object(line, '$.level') = 'EVENT'
ORDER BY get_json_object(line, '$.unixtime')
DESC;
I am trying to figure out now if there's a way we can use
drill to access hive and do the same kind of queries where
drill will treat the 'line' column as json as in this example:
select line.unixtime as ts,
line.app.hcc.event_name as evt
from hive.production_logs_hcc
where year='2014' and month='08' and day='21'
and line.level = 'EVENT'
order by ts desc
so I started with the simplest test:
> select t.line from from hive.production_logs_hcc t --> works fine
> select t.line.level from from hive.production_logs_hcc t --> VARCHAR
assertion error
bummer, so it seams where a column in hbase is implicitly a json, a string
column
in hive is not AND no ammount of attempts to cast the log line to a varchar
and then
access as json makes any difference... (I tried this when I realized that
hive string
had no drill sql analog
Am I just out of luck here, hbase columns are json but hive text columns
aren't
Thanks for any help / explanation
_____________
john o schneider
jos@apixio.com
408-203-7891
Re: Can I use hive table column as json
Posted by John Schneider <js...@apixio.com>.
Digging a little more, I think there must be an error in the convert_from()
function or what ever is passing my line to it
the error
Error: SYSTEM ERROR: IllegalArgumentException: You tried to write a BigInt
> type when you are using a ValueWriter of type NullableVarCharWriterImpl.
> Fragment 0:0
seems to be speaking to the conversion of the json which looks like this:
>
> {"level":"EVENT","app":"app_user_info":"session":null,"user":null,"user_agent":"Pingdom.com_bot_version_1.4_(
> http://www.pingdom.com/)"},"app_name":"hcc","hc":{"event_name":"logout","frontend":{"logout":{"username":null,"message":"User
> is logging
> out"}},"component_name":"frontend"}},"isotime":"2015-07-29T00:00:48.4
> 24267+00:00","source":"account.views","host":"hcc-demo-prd1.apixio.com
> ","client":"184.75.210.186","time":"1438128048424"}
if i read these source log files in directly using
0: jdbc:drill:drillbit=drill.apixio.com> SELECT
distinct(log.app.app_user_info.user_agent) FROM
dfs.`user`.`/logmaster/production/hcc/2015-07-28/*` log ;
I get expected results, no issue parsing the json
Some more verbose tracing would help me debug this issue, does any exist?
/jos
_____________
john o schneider
jos@apixio.com
408-203-7891
On Thu, Jul 30, 2015 at 12:06 AM, John Schneider <js...@apixio.com>
wrote:
>
> Jacques - thanks for your reply. It is very much appreciated
>
> I was able to create a view but when accessing the one column in the view
> I got an IllegalArgumentException. I tried a casting values but keep
> getting back to this error.
>
> One thing I saw though is perhaps not the issue is that when I describe
> the view, the line I converted to JSON is an ANY
> My transcript below:
>
>
> CREATE VIEW logView AS SELECT CONVERT_FROM(line, 'JSON') AS line FROM
> hive.production_logs_hcc_24
>
> 0: jdbc:drill:drillbit=drill.apixio.com> use dfs.`tmp`;
> +-------+--------------------------------------+
> | ok | summary |
> +-------+--------------------------------------+
> | true | Default schema changed to [dfs.tmp] |
> +-------+--------------------------------------+
> 1 row selected (0.353 seconds)
> 0: jdbc:drill:drillbit=drill.apixio.com> CREATE VIEW logView AS SELECT
> CONVERT_FROM(line, 'JSON') AS line FROM hive.production_logs_hcc_24;
> +-------+----------------------------------------------------------+
> | ok | summary |
> +-------+----------------------------------------------------------+
> | true | View 'logView' created successfully in 'dfs.tmp' schema |
> +-------+----------------------------------------------------------+
> 1 row selected (0.272 seconds)
>
> 0: jdbc:drill:drillbit=drill.apixio.com> describe logView;
> +--------------+------------+--------------+
> | COLUMN_NAME | DATA_TYPE | IS_NULLABLE |
> +--------------+------------+--------------+
> | line | ANY | YES |
> +--------------+------------+--------------+
> 1 row selected (0.125 seconds)
>
> 0: jdbc:drill:drillbit=drill.apixio.com> select t.line.level from logView
> t limit 10;
> Error: SYSTEM ERROR: IllegalArgumentException: You tried to write a BigInt
> type when you are using a ValueWriter of type NullableVarCharWriterImpl.
>
> Fragment 0:0
>
> [Error Id: 655d0dbd-4d8f-433f-96a6-7ab85efb0b8b on
> hadoop-data-drill:31010] (state=,code=0)
> 0: jdbc:drill:drillbit=drill.apixio.com> select t.line from logView t
> limit 10;
> Error: SYSTEM ERROR: IllegalArgumentException: You tried to write a BigInt
> type when you are using a ValueWriter of type NullableVarCharWriterImpl.
>
> Fragment 0:0
>
> [Error Id: fee663a6-b75c-414c-9d47-39b1e2437558 on
> hadoop-data-drill:31010] (state=,code=0)
>
>
> _____________
>
> john o schneider
> jos@apixio.com
> 408-203-7891
>
>
> On Wed, Jul 29, 2015 at 12:20 PM, Jacques Nadeau <ja...@dremio.com>
> wrote:
>
>> Drill doesn't implicitly cast from VarChar -> Map. However, just create a
>> Drill view:
>>
>> CREATE VIEW newView AS SELECT CONVERT_FROM(line, 'JSON') AS line FROM
>> hive.production_logs_hcc
>>
>> Then just query newView.
>>
>> Then 'select t.line.level from newView' should work just fine.
>>
>>
>> --
>> Jacques Nadeau
>> CTO and Co-Founder, Dremio
>>
>> On Wed, Jul 29, 2015 at 12:02 PM, John Schneider <js...@apixio.com>
>> wrote:
>>
>> > I've seen examples from drill queries on HBase (all columns are strings)
>> > access each column as a json object as in:
>> >
>> > select t.column_a.xField, t.column_b.yField from some_hbase_table t
>> >
>> > I'd like to do the same using my hive tables where I have production
>> > logging
>> >
>> > My production log tables have the following schema
>> >
>> > CREATE EXTERNAL TABLE logging_master_schema(
>> > line string
>> > ) partitioned by (month string, week string, day string);
>> >
>> > each "line" is a log line in json, we use the partitions
>> > to narrow down lines we crunch in hive queries where we
>> > use fucntion calls like this get_json_object(line, '$.x.y')
>> > to crack our logs.
>> >
>> > A trivial example of a hive query would be:
>> > SELECT get_json_object(line, '$.app.hcc.event_name'),
>> > get_json_object(line, '$.unixtime'),
>> > line FROM production_logs_hcc WHERE year='2014' and month='08'
>> and
>> > day='21'
>> > and get_json_object(line, '$.level') = 'EVENT'
>> > ORDER BY get_json_object(line, '$.unixtime')
>> > DESC;
>> >
>> > I am trying to figure out now if there's a way we can use
>> > drill to access hive and do the same kind of queries where
>> > drill will treat the 'line' column as json as in this example:
>> >
>> > select line.unixtime as ts,
>> > line.app.hcc.event_name as evt
>> > from hive.production_logs_hcc
>> > where year='2014' and month='08' and day='21'
>> > and line.level = 'EVENT'
>> > order by ts desc
>> >
>> > so I started with the simplest test:
>> >
>> > > select t.line from from hive.production_logs_hcc t --> works fine
>> >
>> > > select t.line.level from from hive.production_logs_hcc t --> VARCHAR
>> > assertion error
>> >
>> > bummer, so it seams where a column in hbase is implicitly a json, a
>> string
>> > column
>> > in hive is not AND no ammount of attempts to cast the log line to a
>> varchar
>> > and then
>> > access as json makes any difference... (I tried this when I realized
>> that
>> > hive string
>> > had no drill sql analog
>> >
>> > Am I just out of luck here, hbase columns are json but hive text columns
>> > aren't
>> >
>> > Thanks for any help / explanation
>> > _____________
>> >
>> > john o schneider
>> > jos@apixio.com
>> > 408-203-7891
>> >
>>
>
>
Re: Can I use hive table column as json
Posted by John Schneider <js...@apixio.com>.
Jacques - thanks for your reply. It is very much appreciated
I was able to create a view but when accessing the one column in the view I
got an IllegalArgumentException. I tried a casting values but keep getting
back to this error.
One thing I saw though is perhaps not the issue is that when I describe the
view, the line I converted to JSON is an ANY
My transcript below:
CREATE VIEW logView AS SELECT CONVERT_FROM(line, 'JSON') AS line FROM
hive.production_logs_hcc_24
0: jdbc:drill:drillbit=drill.apixio.com> use dfs.`tmp`;
+-------+--------------------------------------+
| ok | summary |
+-------+--------------------------------------+
| true | Default schema changed to [dfs.tmp] |
+-------+--------------------------------------+
1 row selected (0.353 seconds)
0: jdbc:drill:drillbit=drill.apixio.com> CREATE VIEW logView AS SELECT
CONVERT_FROM(line, 'JSON') AS line FROM hive.production_logs_hcc_24;
+-------+----------------------------------------------------------+
| ok | summary |
+-------+----------------------------------------------------------+
| true | View 'logView' created successfully in 'dfs.tmp' schema |
+-------+----------------------------------------------------------+
1 row selected (0.272 seconds)
0: jdbc:drill:drillbit=drill.apixio.com> describe logView;
+--------------+------------+--------------+
| COLUMN_NAME | DATA_TYPE | IS_NULLABLE |
+--------------+------------+--------------+
| line | ANY | YES |
+--------------+------------+--------------+
1 row selected (0.125 seconds)
0: jdbc:drill:drillbit=drill.apixio.com> select t.line.level from logView t
limit 10;
Error: SYSTEM ERROR: IllegalArgumentException: You tried to write a BigInt
type when you are using a ValueWriter of type NullableVarCharWriterImpl.
Fragment 0:0
[Error Id: 655d0dbd-4d8f-433f-96a6-7ab85efb0b8b on hadoop-data-drill:31010]
(state=,code=0)
0: jdbc:drill:drillbit=drill.apixio.com> select t.line from logView t limit
10;
Error: SYSTEM ERROR: IllegalArgumentException: You tried to write a BigInt
type when you are using a ValueWriter of type NullableVarCharWriterImpl.
Fragment 0:0
[Error Id: fee663a6-b75c-414c-9d47-39b1e2437558 on hadoop-data-drill:31010]
(state=,code=0)
_____________
john o schneider
jos@apixio.com
408-203-7891
On Wed, Jul 29, 2015 at 12:20 PM, Jacques Nadeau <ja...@dremio.com> wrote:
> Drill doesn't implicitly cast from VarChar -> Map. However, just create a
> Drill view:
>
> CREATE VIEW newView AS SELECT CONVERT_FROM(line, 'JSON') AS line FROM
> hive.production_logs_hcc
>
> Then just query newView.
>
> Then 'select t.line.level from newView' should work just fine.
>
>
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
>
> On Wed, Jul 29, 2015 at 12:02 PM, John Schneider <js...@apixio.com>
> wrote:
>
> > I've seen examples from drill queries on HBase (all columns are strings)
> > access each column as a json object as in:
> >
> > select t.column_a.xField, t.column_b.yField from some_hbase_table t
> >
> > I'd like to do the same using my hive tables where I have production
> > logging
> >
> > My production log tables have the following schema
> >
> > CREATE EXTERNAL TABLE logging_master_schema(
> > line string
> > ) partitioned by (month string, week string, day string);
> >
> > each "line" is a log line in json, we use the partitions
> > to narrow down lines we crunch in hive queries where we
> > use fucntion calls like this get_json_object(line, '$.x.y')
> > to crack our logs.
> >
> > A trivial example of a hive query would be:
> > SELECT get_json_object(line, '$.app.hcc.event_name'),
> > get_json_object(line, '$.unixtime'),
> > line FROM production_logs_hcc WHERE year='2014' and month='08' and
> > day='21'
> > and get_json_object(line, '$.level') = 'EVENT'
> > ORDER BY get_json_object(line, '$.unixtime')
> > DESC;
> >
> > I am trying to figure out now if there's a way we can use
> > drill to access hive and do the same kind of queries where
> > drill will treat the 'line' column as json as in this example:
> >
> > select line.unixtime as ts,
> > line.app.hcc.event_name as evt
> > from hive.production_logs_hcc
> > where year='2014' and month='08' and day='21'
> > and line.level = 'EVENT'
> > order by ts desc
> >
> > so I started with the simplest test:
> >
> > > select t.line from from hive.production_logs_hcc t --> works fine
> >
> > > select t.line.level from from hive.production_logs_hcc t --> VARCHAR
> > assertion error
> >
> > bummer, so it seams where a column in hbase is implicitly a json, a
> string
> > column
> > in hive is not AND no ammount of attempts to cast the log line to a
> varchar
> > and then
> > access as json makes any difference... (I tried this when I realized that
> > hive string
> > had no drill sql analog
> >
> > Am I just out of luck here, hbase columns are json but hive text columns
> > aren't
> >
> > Thanks for any help / explanation
> > _____________
> >
> > john o schneider
> > jos@apixio.com
> > 408-203-7891
> >
>
Re: Can I use hive table column as json
Posted by Jacques Nadeau <ja...@dremio.com>.
Drill doesn't implicitly cast from VarChar -> Map. However, just create a
Drill view:
CREATE VIEW newView AS SELECT CONVERT_FROM(line, 'JSON') AS line FROM
hive.production_logs_hcc
Then just query newView.
Then 'select t.line.level from newView' should work just fine.
--
Jacques Nadeau
CTO and Co-Founder, Dremio
On Wed, Jul 29, 2015 at 12:02 PM, John Schneider <js...@apixio.com>
wrote:
> I've seen examples from drill queries on HBase (all columns are strings)
> access each column as a json object as in:
>
> select t.column_a.xField, t.column_b.yField from some_hbase_table t
>
> I'd like to do the same using my hive tables where I have production
> logging
>
> My production log tables have the following schema
>
> CREATE EXTERNAL TABLE logging_master_schema(
> line string
> ) partitioned by (month string, week string, day string);
>
> each "line" is a log line in json, we use the partitions
> to narrow down lines we crunch in hive queries where we
> use fucntion calls like this get_json_object(line, '$.x.y')
> to crack our logs.
>
> A trivial example of a hive query would be:
> SELECT get_json_object(line, '$.app.hcc.event_name'),
> get_json_object(line, '$.unixtime'),
> line FROM production_logs_hcc WHERE year='2014' and month='08' and
> day='21'
> and get_json_object(line, '$.level') = 'EVENT'
> ORDER BY get_json_object(line, '$.unixtime')
> DESC;
>
> I am trying to figure out now if there's a way we can use
> drill to access hive and do the same kind of queries where
> drill will treat the 'line' column as json as in this example:
>
> select line.unixtime as ts,
> line.app.hcc.event_name as evt
> from hive.production_logs_hcc
> where year='2014' and month='08' and day='21'
> and line.level = 'EVENT'
> order by ts desc
>
> so I started with the simplest test:
>
> > select t.line from from hive.production_logs_hcc t --> works fine
>
> > select t.line.level from from hive.production_logs_hcc t --> VARCHAR
> assertion error
>
> bummer, so it seams where a column in hbase is implicitly a json, a string
> column
> in hive is not AND no ammount of attempts to cast the log line to a varchar
> and then
> access as json makes any difference... (I tried this when I realized that
> hive string
> had no drill sql analog
>
> Am I just out of luck here, hbase columns are json but hive text columns
> aren't
>
> Thanks for any help / explanation
> _____________
>
> john o schneider
> jos@apixio.com
> 408-203-7891
>