You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by Subhajit Ghosh <su...@skipjaq.com> on 2015/03/27 23:30:01 UTC

Fetching nested JSON data in HBase using Apache Drill

I am using Apache Drill to run SQL queries on a HBase table. The value in one of the columns is:
0: jdbc:drill:schema:hbase:zk=localhost> select cast(address['street'] as varchar(20)) from hbase.students;
+------------+
|   EXPR$0   |
+------------+
| {"id": 123} |
+------------+
1 row selected (0.507 seconds)I would like to access the id field using a query. Something like:
0: jdbc:drill:schema:hbase:zk=localhost> select tbl.address['street']['id'] from hbase.students as tbl;
+------------+
|   EXPR$0   |
+------------+
| null       |
+------------+As you can see, this does not work. I am run to similar queries on JSON data in a file. My question is can I query JSON data in HBase.


—
Thanks and Regards,
Subhajit

Re: Fetching nested JSON data in HBase using Apache Drill

Posted by Carol McDonald <cm...@maprtech.com>.
you might find this example useful

Converting a String blob in HBase to JSON



*Use convert_from on a JSON blob in HBase*

Take a look at the HBase embedded clicks table. This table has one Column
Family blob, which contains one column json, and the value in the column is
a JSON string bytes.

0: jdbc:drill:> !set maxwidth 10000

> SELECT * FROM maprdb.embeddedclicks t LIMIT 5;

+------------+------------+

|  row_key   |    blob    |

+------------+------------+

| [B@2caf7908 |
{"json":"eyJ0cmFuc19pZCI6MTAwMDAsImRhdGUiOiIyMDE0LTA1LTE3IiwidGltZSI6IjAyOjI0OjU4IiwidXNlcl9pbmZvIjp7ImN1c3RfaWQiOjI4NDEsImRldmljZSI6IklPUzUiLCJzdGF0ZSI6InZhIn0sImFkX2luZm8iOnsiY2FtcF9pZCI6IjEifSwidHJhbnNfaW5mbyI6eyJwcm9kX2lkIjpbXSwicHVyY2hfZmxhZyI6InRydWUifX0="}
|

| [B@34331323 |
{"json":"eyJ0cmFuc19pZCI6MTAwMDEsImRhdGUiOiIyMDE0LTA1LTIyIiwidGltZSI6IjE3OjUxOjE0IiwidXNlcl9pbmZvIjp7ImN1c3RfaWQiOjMyMjIsImRldmljZSI6IklPUzUiLCJzdGF0ZSI6ImZsIn0sImFkX2luZm8iOnsiY2FtcF9pZCI6IjEifSwidHJhbnNfaW5mbyI6eyJwcm9kX2lkIjpbNTEsNCwyLDEsMCwwLDQ3LDEsOSwxLDgzXSwicHVyY2hfZmxhZyI6ImZhbHNlIn19"}
|

| [B@52d799b |
{"json":"eyJ0cmFuc19pZCI6MTAwMDMsImRhdGUiOiIyMDE0LTA1LTA1IiwidGltZSI6IjA4OjEzOjAwIiwidXNlcl9pbmZvIjp7ImN1c3RfaWQiOjM2MjUsImRldmljZSI6IklPUzYiLCJzdGF0ZSI6ImlhIn0sImFkX2luZm8iOnsiY2FtcF9pZCI6IjEifSwidHJhbnNfaW5mbyI6eyJwcm9kX2lkIjpbMCw3N10sInB1cmNoX2ZsYWciOiJmYWxzZSJ9fQ=="}
|

| [B@6c8ab1c4 |
{"json":"eyJ0cmFuc19pZCI6MTAwMDQsImRhdGUiOiIyMDE0LTA1LTA2IiwidGltZSI6IjA0OjAwOjEwIiwidXNlcl9pbmZvIjp7ImN1c3RfaWQiOjE0MjUsImRldmljZSI6IklPUzUiLCJzdGF0ZSI6IndpIn0sImFkX2luZm8iOnsiY2FtcF9pZCI6IjEifSwidHJhbnNfaW5mbyI6eyJwcm9kX2lkIjpbM10sInB1cmNoX2ZsYWciOiJ0cnVlIn19"}
|

*Now use convert_from  to convert the bytes to JSON:*

> SELECT convert_from(row_key, 'UTF8') Row_Key, convert_from(t.`blob`.json,
'JSON') json FROM maprdb.embeddedclicks t LIMIT 10;



+------------+------------+

|  Row_Key   |    json    |

+------------+------------+

| 10000      |
{"trans_id":10000,"date":"2014-05-17","time":"02:24:58","user_info":{"cust_id":2841,"device":"IOS5","state":"va"},"ad_info":{"camp_id":"1"},"trans_info":{"purch_flag":"true","prod_id":[]}}
|

| 10001      |
{"trans_id":10001,"date":"2014-05-22","time":"17:51:14","user_info":{"cust_id":3222,"device":"IOS5","state":"fl"},"ad_info":{"camp_id":"1"},"trans_info":{"purch_flag":"false","prod_id":[51,4,2,1,0,0,47,1,9,1,83]}}
|

| 10003      |
{"trans_id":10003,"date":"2014-05-05","time":"08:13:00","user_info":{"cust_id":3625,"device":"IOS6","state":"ia"},"ad_info":{"camp_id":"1"},"trans_info":{"purch_flag":"false","prod_id":[0,77]}}
|

| 10004      |
{"trans_id":10004,"date":"2014-05-06","time":"04:00:10","user_info":{"cust_id":1425,"device":"IOS5","state":"wi"},"ad_info":{"camp_id":"1"},"trans_info":{"purch_flag":"true","prod_id":[3]}}
|

| 10007      |
{"trans_id":10007,"date":"2014-05-13","time":"13:56:21","user_info":{"cust_id":2102,"device":"IOS7","state":"fl"},"ad_info":{"camp_id":"1"},"trans_info":{"purch_flag":"true","prod_id":[14,60,18]}}
|


Creating a View to Convert a String blob in HBase to JSON



Let’s make it easier to write queries against our example for Mapr-DB. We
will:

●        create a view, which can be used like a table in future queries

●        use comvert_from to convert the blob column to json

●        use column aliases to give the json column a meaningful name

●        note the blob column must have ` back ticks because blob is a
reserved sql word



At the sqlline prompt, enter:





>use dfs.mydata;

> create or replace view emclicksview as SELECT convert_from(row_key,
'UTF8') row_key, convert_from(t.`blob`.json, 'JSON') json FROM
maprdb.embeddedclicks t ;



Now query the view you just created.  You will see that with the view  the
blob is converted to json format.  At the sqlline prompt, enter:





>select * from emclicksview limit 2;



+------------+------------+

|  Row_Key   |    json    |

+------------+------------+

| 10000      |
{"trans_id":10000,"date":"2014-05-17","time":"02:24:58","user_info":{"cust_id":2841,"device":"IOS5","state":"va"},"ad_info":{"camp_id":"1"},"trans_info":{"purch_flag":"true","prod_id":[]}}
|

| 10001      |
{"trans_id":10001,"date":"2014-05-22","time":"17:51:14","user_info":{"cust_id":3222,"device":"IOS5","state":"fl"},"ad_info":{"camp_id":"1"},"trans_info":{"purch_flag":"false","prod_id":[51,4,2,1,0,0,47,1,9,1,83]}}
|



Now with this  view you can use the  table.column.column notation  to
extract nested column data from the json blob column :







> select  row_key, em.json.trans_id as transid, em.json.`date` as edate,
em.json.user_info.cust_id as cust_id , em.json.trans_info.purch_flag as
purch_flag, em.json.trans_info.prod_id as prod_id from emclicksview em
limit 2;



+------------+------------+------------+------------+------------+------------+

|  row_key   |  transid   |   edate    |  cust_id   | purch_flag |
prod_id   |

+------------+------------+------------+------------+------------+------------+

| 10000      | 10000      | 2014-05-17 | 2841       | true       |
[]         |

| 10001      | 10001      | 2014-05-22 | 3222       | false      |
[51,4,2,1,0,0,47,1,9,1,83] |

+------------+------------+------------+------------+------------+------------+



Now with this view you can also  use the  uses the [ n ] notation  on
arrays in the json blob column :



> select  row_key, em.json.trans_id as transid, em.json.`date` as edate,
em.json.user_info.cust_id as cust_id , em.json.trans_info.purch_flag as
purch_flag, em.json.trans_info.prod_id[0] as prod_id from emclicksview em
limit 2;



+------------+------------+------------+------------+------------+------------+

|  row_key   |  transid   |   edate    |  cust_id   | purch_flag |
prod_id   |

+------------+------------+------------+------------+------------+------------+

| 10000      | 10000      | 2014-05-17 | 2841       | true       |
null       |

| 10001      | 10001      | 2014-05-22 | 3222       | false      |
51         |

+------------+------------+------------+------------+------------+------------+

2 rows selected (0.228 seconds)







On Fri, Mar 27, 2015 at 6:30 PM, Subhajit Ghosh <su...@skipjaq.com>
wrote:

> I am using Apache Drill to run SQL queries on a HBase table. The value in
> one of the columns is:
> 0: jdbc:drill:schema:hbase:zk=localhost> select cast(address['street'] as
> varchar(20)) from hbase.students;
> +------------+
> |   EXPR$0   |
> +------------+
> | {"id": 123} |
> +------------+
> 1 row selected (0.507 seconds)I would like to access the id field using a
> query. Something like:
> 0: jdbc:drill:schema:hbase:zk=localhost> select
> tbl.address['street']['id'] from hbase.students as tbl;
> +------------+
> |   EXPR$0   |
> +------------+
> | null       |
> +------------+As you can see, this does not work. I am run to similar
> queries on JSON data in a file. My question is can I query JSON data in
> HBase.
>
>
> —
> Thanks and Regards,
> Subhajit