You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by Subhajit Ghosh <su...@skipjaq.com> on 2015/03/27 23:30:01 UTC
Fetching nested JSON data in HBase using Apache Drill
I am using Apache Drill to run SQL queries on a HBase table. The value in one of the columns is:
0: jdbc:drill:schema:hbase:zk=localhost> select cast(address['street'] as varchar(20)) from hbase.students;
+------------+
| EXPR$0 |
+------------+
| {"id": 123} |
+------------+
1 row selected (0.507 seconds)I would like to access the id field using a query. Something like:
0: jdbc:drill:schema:hbase:zk=localhost> select tbl.address['street']['id'] from hbase.students as tbl;
+------------+
| EXPR$0 |
+------------+
| null |
+------------+As you can see, this does not work. I am run to similar queries on JSON data in a file. My question is can I query JSON data in HBase.
—
Thanks and Regards,
Subhajit
Re: Fetching nested JSON data in HBase using Apache Drill
Posted by Carol McDonald <cm...@maprtech.com>.
you might find this example useful
Converting a String blob in HBase to JSON
*Use convert_from on a JSON blob in HBase*
Take a look at the HBase embedded clicks table. This table has one Column
Family blob, which contains one column json, and the value in the column is
a JSON string bytes.
0: jdbc:drill:> !set maxwidth 10000
> SELECT * FROM maprdb.embeddedclicks t LIMIT 5;
+------------+------------+
| row_key | blob |
+------------+------------+
| [B@2caf7908 |
{"json":"eyJ0cmFuc19pZCI6MTAwMDAsImRhdGUiOiIyMDE0LTA1LTE3IiwidGltZSI6IjAyOjI0OjU4IiwidXNlcl9pbmZvIjp7ImN1c3RfaWQiOjI4NDEsImRldmljZSI6IklPUzUiLCJzdGF0ZSI6InZhIn0sImFkX2luZm8iOnsiY2FtcF9pZCI6IjEifSwidHJhbnNfaW5mbyI6eyJwcm9kX2lkIjpbXSwicHVyY2hfZmxhZyI6InRydWUifX0="}
|
| [B@34331323 |
{"json":"eyJ0cmFuc19pZCI6MTAwMDEsImRhdGUiOiIyMDE0LTA1LTIyIiwidGltZSI6IjE3OjUxOjE0IiwidXNlcl9pbmZvIjp7ImN1c3RfaWQiOjMyMjIsImRldmljZSI6IklPUzUiLCJzdGF0ZSI6ImZsIn0sImFkX2luZm8iOnsiY2FtcF9pZCI6IjEifSwidHJhbnNfaW5mbyI6eyJwcm9kX2lkIjpbNTEsNCwyLDEsMCwwLDQ3LDEsOSwxLDgzXSwicHVyY2hfZmxhZyI6ImZhbHNlIn19"}
|
| [B@52d799b |
{"json":"eyJ0cmFuc19pZCI6MTAwMDMsImRhdGUiOiIyMDE0LTA1LTA1IiwidGltZSI6IjA4OjEzOjAwIiwidXNlcl9pbmZvIjp7ImN1c3RfaWQiOjM2MjUsImRldmljZSI6IklPUzYiLCJzdGF0ZSI6ImlhIn0sImFkX2luZm8iOnsiY2FtcF9pZCI6IjEifSwidHJhbnNfaW5mbyI6eyJwcm9kX2lkIjpbMCw3N10sInB1cmNoX2ZsYWciOiJmYWxzZSJ9fQ=="}
|
| [B@6c8ab1c4 |
{"json":"eyJ0cmFuc19pZCI6MTAwMDQsImRhdGUiOiIyMDE0LTA1LTA2IiwidGltZSI6IjA0OjAwOjEwIiwidXNlcl9pbmZvIjp7ImN1c3RfaWQiOjE0MjUsImRldmljZSI6IklPUzUiLCJzdGF0ZSI6IndpIn0sImFkX2luZm8iOnsiY2FtcF9pZCI6IjEifSwidHJhbnNfaW5mbyI6eyJwcm9kX2lkIjpbM10sInB1cmNoX2ZsYWciOiJ0cnVlIn19"}
|
*Now use convert_from to convert the bytes to JSON:*
> SELECT convert_from(row_key, 'UTF8') Row_Key, convert_from(t.`blob`.json,
'JSON') json FROM maprdb.embeddedclicks t LIMIT 10;
+------------+------------+
| Row_Key | json |
+------------+------------+
| 10000 |
{"trans_id":10000,"date":"2014-05-17","time":"02:24:58","user_info":{"cust_id":2841,"device":"IOS5","state":"va"},"ad_info":{"camp_id":"1"},"trans_info":{"purch_flag":"true","prod_id":[]}}
|
| 10001 |
{"trans_id":10001,"date":"2014-05-22","time":"17:51:14","user_info":{"cust_id":3222,"device":"IOS5","state":"fl"},"ad_info":{"camp_id":"1"},"trans_info":{"purch_flag":"false","prod_id":[51,4,2,1,0,0,47,1,9,1,83]}}
|
| 10003 |
{"trans_id":10003,"date":"2014-05-05","time":"08:13:00","user_info":{"cust_id":3625,"device":"IOS6","state":"ia"},"ad_info":{"camp_id":"1"},"trans_info":{"purch_flag":"false","prod_id":[0,77]}}
|
| 10004 |
{"trans_id":10004,"date":"2014-05-06","time":"04:00:10","user_info":{"cust_id":1425,"device":"IOS5","state":"wi"},"ad_info":{"camp_id":"1"},"trans_info":{"purch_flag":"true","prod_id":[3]}}
|
| 10007 |
{"trans_id":10007,"date":"2014-05-13","time":"13:56:21","user_info":{"cust_id":2102,"device":"IOS7","state":"fl"},"ad_info":{"camp_id":"1"},"trans_info":{"purch_flag":"true","prod_id":[14,60,18]}}
|
Creating a View to Convert a String blob in HBase to JSON
Let’s make it easier to write queries against our example for Mapr-DB. We
will:
● create a view, which can be used like a table in future queries
● use comvert_from to convert the blob column to json
● use column aliases to give the json column a meaningful name
● note the blob column must have ` back ticks because blob is a
reserved sql word
At the sqlline prompt, enter:
>use dfs.mydata;
> create or replace view emclicksview as SELECT convert_from(row_key,
'UTF8') row_key, convert_from(t.`blob`.json, 'JSON') json FROM
maprdb.embeddedclicks t ;
Now query the view you just created. You will see that with the view the
blob is converted to json format. At the sqlline prompt, enter:
>select * from emclicksview limit 2;
+------------+------------+
| Row_Key | json |
+------------+------------+
| 10000 |
{"trans_id":10000,"date":"2014-05-17","time":"02:24:58","user_info":{"cust_id":2841,"device":"IOS5","state":"va"},"ad_info":{"camp_id":"1"},"trans_info":{"purch_flag":"true","prod_id":[]}}
|
| 10001 |
{"trans_id":10001,"date":"2014-05-22","time":"17:51:14","user_info":{"cust_id":3222,"device":"IOS5","state":"fl"},"ad_info":{"camp_id":"1"},"trans_info":{"purch_flag":"false","prod_id":[51,4,2,1,0,0,47,1,9,1,83]}}
|
Now with this view you can use the table.column.column notation to
extract nested column data from the json blob column :
> select row_key, em.json.trans_id as transid, em.json.`date` as edate,
em.json.user_info.cust_id as cust_id , em.json.trans_info.purch_flag as
purch_flag, em.json.trans_info.prod_id as prod_id from emclicksview em
limit 2;
+------------+------------+------------+------------+------------+------------+
| row_key | transid | edate | cust_id | purch_flag |
prod_id |
+------------+------------+------------+------------+------------+------------+
| 10000 | 10000 | 2014-05-17 | 2841 | true |
[] |
| 10001 | 10001 | 2014-05-22 | 3222 | false |
[51,4,2,1,0,0,47,1,9,1,83] |
+------------+------------+------------+------------+------------+------------+
Now with this view you can also use the uses the [ n ] notation on
arrays in the json blob column :
> select row_key, em.json.trans_id as transid, em.json.`date` as edate,
em.json.user_info.cust_id as cust_id , em.json.trans_info.purch_flag as
purch_flag, em.json.trans_info.prod_id[0] as prod_id from emclicksview em
limit 2;
+------------+------------+------------+------------+------------+------------+
| row_key | transid | edate | cust_id | purch_flag |
prod_id |
+------------+------------+------------+------------+------------+------------+
| 10000 | 10000 | 2014-05-17 | 2841 | true |
null |
| 10001 | 10001 | 2014-05-22 | 3222 | false |
51 |
+------------+------------+------------+------------+------------+------------+
2 rows selected (0.228 seconds)
On Fri, Mar 27, 2015 at 6:30 PM, Subhajit Ghosh <su...@skipjaq.com>
wrote:
> I am using Apache Drill to run SQL queries on a HBase table. The value in
> one of the columns is:
> 0: jdbc:drill:schema:hbase:zk=localhost> select cast(address['street'] as
> varchar(20)) from hbase.students;
> +------------+
> | EXPR$0 |
> +------------+
> | {"id": 123} |
> +------------+
> 1 row selected (0.507 seconds)I would like to access the id field using a
> query. Something like:
> 0: jdbc:drill:schema:hbase:zk=localhost> select
> tbl.address['street']['id'] from hbase.students as tbl;
> +------------+
> | EXPR$0 |
> +------------+
> | null |
> +------------+As you can see, this does not work. I am run to similar
> queries on JSON data in a file. My question is can I query JSON data in
> HBase.
>
>
> —
> Thanks and Regards,
> Subhajit