You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Steven Wong <sw...@netflix.com> on 2010/08/19 00:07:34 UTC

get_json_object cannot handle array

I have a string column that contains a JSON array on each row. Calling the function get_json_object on that column always returns NULL:

$ hive -e "select cdnbwdata, get_json_object(cdnbwdata, '$'), get_json_object(cdnbwdata, '$[0].rtt'), get_json_object(cdnbwdata, '$.rtt') from mytable"
[{"id":9,"ip":"173.223.232.50","rtt":71,"bw":41943}] NULL NULL NULL

If I replace the array with its 0th element, then get_json_object works:

$ hive -e "select cdnbwdata, get_json_object(cdnbwdata, '$'), get_json_object(cdnbwdata, '$[0].rtt'), get_json_object(cdnbwdata, '$.rtt') from mytable"
{"id":9,"ip":"173.223.232.32","rtt":71,"bw":4211}     {"id":9,"bw":4211,"rtt":71,"ip":"173.223.232.32"}    NULL 71

Is it a get_json_object bug in handling arrays, or am I doing something wrong? Hive version is 0.5.

Thanks.
Steven


Re: get_json_object cannot handle array

Posted by Ning Zhang <nz...@facebook.com>.
You are right. You need a function to parse JSON arrays to Hive arrays.

On Aug 18, 2010, at 6:37 PM, Steven Wong wrote:

That doesn’t sound like it would work, because explode takes a Hive array, not a string that represents a JSON array. Corect me if I’m wrong.


From: Ning Zhang [mailto:nzhang@facebook.com]
Sent: Wednesday, August 18, 2010 5:41 PM
To: <hi...@hadoop.apache.org>>
Subject: Re: get_json_object cannot handle array

Definitely. You can extend get_json_object() to support arbitrary JSON values.

BTW, for your particular example, if your JSOn value is always an array of JSON object, you can compile get_json_object with the UDTF explode:

select get_json_object(B.jobj, '$...)
from T lateral view explode(T.json_array) B as jobj
...


On Aug 18, 2010, at 5:30 PM, Steven Wong wrote:


Thanks for the explanation. Basically, get_json_object’s json_txt argument must be a JSON object (which is “an unordered set of name/value pairs”), not just any JSON value (string, number, object, array, true, false, null).

I would think that it’ll be more convenient to allow the json_txt argument to be any JSON value, as an enhancement. Yes? No?


From: Ning Zhang [mailto:nzhang@facebook.com]
Sent: Wednesday, August 18, 2010 5:02 PM
To: <hi...@hadoop.apache.org>>
Subject: Re: get_json_object cannot handle array

The get_json_object supports a restricted version of JSON object and path expressions. It supports arrays at non-root level.  You can see the details by hive> describe function extended get_json_object;

On Aug 18, 2010, at 3:07 PM, Steven Wong wrote:



I have a string column that contains a JSON array on each row. Calling the function get_json_object on that column always returns NULL:

$ hive -e "select cdnbwdata, get_json_object(cdnbwdata, '$'), get_json_object(cdnbwdata, '$[0].rtt'), get_json_object(cdnbwdata, '$.rtt') from mytable"
[{"id":9,"ip":"173.223.232.50","rtt":71,"bw":41943}] NULL NULL NULL

If I replace the array with its 0th element, then get_json_object works:

$ hive -e "select cdnbwdata, get_json_object(cdnbwdata, '$'), get_json_object(cdnbwdata, '$[0].rtt'), get_json_object(cdnbwdata, '$.rtt') from mytable"
{"id":9,"ip":"173.223.232.32","rtt":71,"bw":4211}     {"id":9,"bw":4211,"rtt":71,"ip":"173.223.232.32"}    NULL 71

Is it a get_json_object bug in handling arrays, or am I doing something wrong? Hive version is 0.5.

Thanks.
Steven





RE: get_json_object cannot handle array

Posted by Steven Wong <sw...@netflix.com>.
That doesn't sound like it would work, because explode takes a Hive array, not a string that represents a JSON array. Corect me if I'm wrong.


From: Ning Zhang [mailto:nzhang@facebook.com]
Sent: Wednesday, August 18, 2010 5:41 PM
To: <hi...@hadoop.apache.org>
Subject: Re: get_json_object cannot handle array

Definitely. You can extend get_json_object() to support arbitrary JSON values.

BTW, for your particular example, if your JSOn value is always an array of JSON object, you can compile get_json_object with the UDTF explode:

select get_json_object(B.jobj, '$...)
from T lateral view explode(T.json_array) B as jobj
...


On Aug 18, 2010, at 5:30 PM, Steven Wong wrote:


Thanks for the explanation. Basically, get_json_object's json_txt argument must be a JSON object (which is "an unordered set of name/value pairs"), not just any JSON value (string, number, object, array, true, false, null).

I would think that it'll be more convenient to allow the json_txt argument to be any JSON value, as an enhancement. Yes? No?


From: Ning Zhang [mailto:nzhang@facebook.com]
Sent: Wednesday, August 18, 2010 5:02 PM
To: <hi...@hadoop.apache.org>>
Subject: Re: get_json_object cannot handle array

The get_json_object supports a restricted version of JSON object and path expressions. It supports arrays at non-root level.  You can see the details by hive> describe function extended get_json_object;

On Aug 18, 2010, at 3:07 PM, Steven Wong wrote:



I have a string column that contains a JSON array on each row. Calling the function get_json_object on that column always returns NULL:

$ hive -e "select cdnbwdata, get_json_object(cdnbwdata, '$'), get_json_object(cdnbwdata, '$[0].rtt'), get_json_object(cdnbwdata, '$.rtt') from mytable"
[{"id":9,"ip":"173.223.232.50","rtt":71,"bw":41943}] NULL NULL NULL

If I replace the array with its 0th element, then get_json_object works:

$ hive -e "select cdnbwdata, get_json_object(cdnbwdata, '$'), get_json_object(cdnbwdata, '$[0].rtt'), get_json_object(cdnbwdata, '$.rtt') from mytable"
{"id":9,"ip":"173.223.232.32","rtt":71,"bw":4211}     {"id":9,"bw":4211,"rtt":71,"ip":"173.223.232.32"}    NULL 71

Is it a get_json_object bug in handling arrays, or am I doing something wrong? Hive version is 0.5.

Thanks.
Steven




Re: get_json_object cannot handle array

Posted by Ning Zhang <nz...@facebook.com>.
Definitely. You can extend get_json_object() to support arbitrary JSON values.

BTW, for your particular example, if your JSOn value is always an array of JSON object, you can compile get_json_object with the UDTF explode:

select get_json_object(B.jobj, '$...)
from T lateral view explode(T.json_array) B as jobj
...


On Aug 18, 2010, at 5:30 PM, Steven Wong wrote:

Thanks for the explanation. Basically, get_json_object’s json_txt argument must be a JSON object (which is “an unordered set of name/value pairs”), not just any JSON value (string, number, object, array, true, false, null).

I would think that it’ll be more convenient to allow the json_txt argument to be any JSON value, as an enhancement. Yes? No?


From: Ning Zhang [mailto:nzhang@facebook.com]
Sent: Wednesday, August 18, 2010 5:02 PM
To: <hi...@hadoop.apache.org>>
Subject: Re: get_json_object cannot handle array

The get_json_object supports a restricted version of JSON object and path expressions. It supports arrays at non-root level.  You can see the details by hive> describe function extended get_json_object;

On Aug 18, 2010, at 3:07 PM, Steven Wong wrote:


I have a string column that contains a JSON array on each row. Calling the function get_json_object on that column always returns NULL:

$ hive -e "select cdnbwdata, get_json_object(cdnbwdata, '$'), get_json_object(cdnbwdata, '$[0].rtt'), get_json_object(cdnbwdata, '$.rtt') from mytable"
[{"id":9,"ip":"173.223.232.50","rtt":71,"bw":41943}] NULL NULL NULL

If I replace the array with its 0th element, then get_json_object works:

$ hive -e "select cdnbwdata, get_json_object(cdnbwdata, '$'), get_json_object(cdnbwdata, '$[0].rtt'), get_json_object(cdnbwdata, '$.rtt') from mytable"
{"id":9,"ip":"173.223.232.32","rtt":71,"bw":4211}     {"id":9,"bw":4211,"rtt":71,"ip":"173.223.232.32"}    NULL 71

Is it a get_json_object bug in handling arrays, or am I doing something wrong? Hive version is 0.5.

Thanks.
Steven




RE: get_json_object cannot handle array

Posted by Steven Wong <sw...@netflix.com>.
Thanks for the explanation. Basically, get_json_object's json_txt argument must be a JSON object (which is "an unordered set of name/value pairs"), not just any JSON value (string, number, object, array, true, false, null).

I would think that it'll be more convenient to allow the json_txt argument to be any JSON value, as an enhancement. Yes? No?


From: Ning Zhang [mailto:nzhang@facebook.com]
Sent: Wednesday, August 18, 2010 5:02 PM
To: <hi...@hadoop.apache.org>
Subject: Re: get_json_object cannot handle array

The get_json_object supports a restricted version of JSON object and path expressions. It supports arrays at non-root level.  You can see the details by hive> describe function extended get_json_object;

On Aug 18, 2010, at 3:07 PM, Steven Wong wrote:


I have a string column that contains a JSON array on each row. Calling the function get_json_object on that column always returns NULL:

$ hive -e "select cdnbwdata, get_json_object(cdnbwdata, '$'), get_json_object(cdnbwdata, '$[0].rtt'), get_json_object(cdnbwdata, '$.rtt') from mytable"
[{"id":9,"ip":"173.223.232.50","rtt":71,"bw":41943}] NULL NULL NULL

If I replace the array with its 0th element, then get_json_object works:

$ hive -e "select cdnbwdata, get_json_object(cdnbwdata, '$'), get_json_object(cdnbwdata, '$[0].rtt'), get_json_object(cdnbwdata, '$.rtt') from mytable"
{"id":9,"ip":"173.223.232.32","rtt":71,"bw":4211}     {"id":9,"bw":4211,"rtt":71,"ip":"173.223.232.32"}    NULL 71

Is it a get_json_object bug in handling arrays, or am I doing something wrong? Hive version is 0.5.

Thanks.
Steven



Re: get_json_object cannot handle array

Posted by Ning Zhang <nz...@facebook.com>.
The get_json_object supports a restricted version of JSON object and path expressions. It supports arrays at non-root level.  You can see the details by hive> describe function extended get_json_object;

On Aug 18, 2010, at 3:07 PM, Steven Wong wrote:

I have a string column that contains a JSON array on each row. Calling the function get_json_object on that column always returns NULL:

$ hive -e "select cdnbwdata, get_json_object(cdnbwdata, '$'), get_json_object(cdnbwdata, '$[0].rtt'), get_json_object(cdnbwdata, '$.rtt') from mytable"
[{"id":9,"ip":"173.223.232.50","rtt":71,"bw":41943}] NULL NULL NULL

If I replace the array with its 0th element, then get_json_object works:

$ hive -e "select cdnbwdata, get_json_object(cdnbwdata, '$'), get_json_object(cdnbwdata, '$[0].rtt'), get_json_object(cdnbwdata, '$.rtt') from mytable"
{"id":9,"ip":"173.223.232.32","rtt":71,"bw":4211}     {"id":9,"bw":4211,"rtt":71,"ip":"173.223.232.32"}    NULL 71

Is it a get_json_object bug in handling arrays, or am I doing something wrong? Hive version is 0.5.

Thanks.
Steven