You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Anurag Phadke <ap...@mozilla.com> on 2012/02/15 23:28:50 UTC

iterating through JSON

Hive version: 0.7.1
Does hive support extracting data from jsons which begin with @ as part 
of the key?

Sample Data:
{"@fields":{},"timestamp":"2012-02-09T21:58:42.420085","@timestamp":"2012-02-09T21:58:42.472000Z","@source":"unknown","env_version":"0.8","@type":null,"@tags":["output_hdfs","output_text","output_statsd"],"logger":"","payload":"1","severity":6,"type":"counter","fields":{"name":"foo"}}

{"@fields":{},"timestamp":"2012-02-09T22:19:59.261948","@timestamp":"2012-02-09T22:19:59.266000Z","@source":"unknown","env_version":"0.8","@type":null,"@tags":["output_hdfs","output_text","output_statsd"],"logger":"server-core-app","payload":"1","severity":6,"type":"timer","fields":{"rate":1,"name":"syncstorage.controller.get_collections"}}



For eg:
select get_json_object(line, '$.env_version') from apachelog limit 10;
works BUT

select get_json_object(line, '$.env_version'), 
get_json_object(line,'$.@source') from apachelog limit 10;
returns NULL for @source.

Tried escaping as:
select get_json_object(line, '$.env_version'), 
get_json_object(line,'$.\\@source') from apachelog limit 10;
same error.

-anurag

Re: iterating through JSON

Posted by Anurag Phadke <ap...@mozilla.com>.
thx sean, i am thinking along same lines, use regexp_replace and then 
use get_json_object over it.

-anurag


Sean McNamara wrote:
> I replace '.' with '_' for all the keys in a select transform. Definitely
> not ideal, but it works for us for now.
>
> Sean
>
> On 2/15/12 3:53 PM, "Anurag Phadke"<ap...@mozilla.com>  wrote:
>
>> Sean,
>> How did u resolve the issue? replace "." with something else in the
>> source json?
>>
>> -anurag
>>
>>
>> Sean McNamara wrote:
>>> Escaping with \\ doesn't seem to work.  I had a similar issue where the
>>> key contained a '.' and it would do the same thing (always return NULL).
>>>
>>> Sean
>>>
>>> On 2/15/12 3:28 PM, "Anurag Phadke"<ap...@mozilla.com>   wrote:
>>>
>>>> Hive version: 0.7.1
>>>> Does hive support extracting data from jsons which begin with @ as part
>>>> of the key?
>>>>
>>>> Sample Data:
>>>>
>>>> {"@fields":{},"timestamp":"2012-02-09T21:58:42.420085","@timestamp":"201
>>>> 2-
>>>>
>>>> 02-09T21:58:42.472000Z","@source":"unknown","env_version":"0.8","@type":
>>>> nu
>>>>
>>>> ll,"@tags":["output_hdfs","output_text","output_statsd"],"logger":"","pa
>>>> yl
>>>> oad":"1","severity":6,"type":"counter","fields":{"name":"foo"}}
>>>>
>>>>
>>>> {"@fields":{},"timestamp":"2012-02-09T22:19:59.261948","@timestamp":"201
>>>> 2-
>>>>
>>>> 02-09T22:19:59.266000Z","@source":"unknown","env_version":"0.8","@type":
>>>> nu
>>>>
>>>> ll,"@tags":["output_hdfs","output_text","output_statsd"],"logger":"serve
>>>> r-
>>>>
>>>> core-app","payload":"1","severity":6,"type":"timer","fields":{"rate":1,"
>>>> na
>>>> me":"syncstorage.controller.get_collections"}}
>>>>
>>>>
>>>>
>>>> For eg:
>>>> select get_json_object(line, '$.env_version') from apachelog limit 10;
>>>> works BUT
>>>>
>>>> select get_json_object(line, '$.env_version'),
>>>> get_json_object(line,'$.@source') from apachelog limit 10;
>>>> returns NULL for @source.
>>>>
>>>> Tried escaping as:
>>>> select get_json_object(line, '$.env_version'),
>>>> get_json_object(line,'$.\\@source') from apachelog limit 10;
>>>> same error.
>>>>
>>>> -anurag
>>>
>

Re: iterating through JSON

Posted by Sean McNamara <Se...@Webtrends.com>.
I replace '.' with '_' for all the keys in a select transform. Definitely
not ideal, but it works for us for now.

Sean

On 2/15/12 3:53 PM, "Anurag Phadke" <ap...@mozilla.com> wrote:

>Sean,
>How did u resolve the issue? replace "." with something else in the
>source json?
>
>-anurag
>
>
>Sean McNamara wrote:
>> Escaping with \\ doesn't seem to work.  I had a similar issue where the
>> key contained a '.' and it would do the same thing (always return NULL).
>>
>> Sean
>>
>> On 2/15/12 3:28 PM, "Anurag Phadke"<ap...@mozilla.com>  wrote:
>>
>>> Hive version: 0.7.1
>>> Does hive support extracting data from jsons which begin with @ as part
>>> of the key?
>>>
>>> Sample Data:
>>> 
>>>{"@fields":{},"timestamp":"2012-02-09T21:58:42.420085","@timestamp":"201
>>>2-
>>> 
>>>02-09T21:58:42.472000Z","@source":"unknown","env_version":"0.8","@type":
>>>nu
>>> 
>>>ll,"@tags":["output_hdfs","output_text","output_statsd"],"logger":"","pa
>>>yl
>>> oad":"1","severity":6,"type":"counter","fields":{"name":"foo"}}
>>>
>>> 
>>>{"@fields":{},"timestamp":"2012-02-09T22:19:59.261948","@timestamp":"201
>>>2-
>>> 
>>>02-09T22:19:59.266000Z","@source":"unknown","env_version":"0.8","@type":
>>>nu
>>> 
>>>ll,"@tags":["output_hdfs","output_text","output_statsd"],"logger":"serve
>>>r-
>>> 
>>>core-app","payload":"1","severity":6,"type":"timer","fields":{"rate":1,"
>>>na
>>> me":"syncstorage.controller.get_collections"}}
>>>
>>>
>>>
>>> For eg:
>>> select get_json_object(line, '$.env_version') from apachelog limit 10;
>>> works BUT
>>>
>>> select get_json_object(line, '$.env_version'),
>>> get_json_object(line,'$.@source') from apachelog limit 10;
>>> returns NULL for @source.
>>>
>>> Tried escaping as:
>>> select get_json_object(line, '$.env_version'),
>>> get_json_object(line,'$.\\@source') from apachelog limit 10;
>>> same error.
>>>
>>> -anurag
>>
>>


Re: iterating through JSON

Posted by Anurag Phadke <ap...@mozilla.com>.
Sean,
How did u resolve the issue? replace "." with something else in the 
source json?

-anurag


Sean McNamara wrote:
> Escaping with \\ doesn't seem to work.  I had a similar issue where the
> key contained a '.' and it would do the same thing (always return NULL).
>
> Sean
>
> On 2/15/12 3:28 PM, "Anurag Phadke"<ap...@mozilla.com>  wrote:
>
>> Hive version: 0.7.1
>> Does hive support extracting data from jsons which begin with @ as part
>> of the key?
>>
>> Sample Data:
>> {"@fields":{},"timestamp":"2012-02-09T21:58:42.420085","@timestamp":"2012-
>> 02-09T21:58:42.472000Z","@source":"unknown","env_version":"0.8","@type":nu
>> ll,"@tags":["output_hdfs","output_text","output_statsd"],"logger":"","payl
>> oad":"1","severity":6,"type":"counter","fields":{"name":"foo"}}
>>
>> {"@fields":{},"timestamp":"2012-02-09T22:19:59.261948","@timestamp":"2012-
>> 02-09T22:19:59.266000Z","@source":"unknown","env_version":"0.8","@type":nu
>> ll,"@tags":["output_hdfs","output_text","output_statsd"],"logger":"server-
>> core-app","payload":"1","severity":6,"type":"timer","fields":{"rate":1,"na
>> me":"syncstorage.controller.get_collections"}}
>>
>>
>>
>> For eg:
>> select get_json_object(line, '$.env_version') from apachelog limit 10;
>> works BUT
>>
>> select get_json_object(line, '$.env_version'),
>> get_json_object(line,'$.@source') from apachelog limit 10;
>> returns NULL for @source.
>>
>> Tried escaping as:
>> select get_json_object(line, '$.env_version'),
>> get_json_object(line,'$.\\@source') from apachelog limit 10;
>> same error.
>>
>> -anurag
>
>

Re: iterating through JSON

Posted by Sean McNamara <Se...@Webtrends.com>.
Escaping with \\ doesn't seem to work.  I had a similar issue where the
key contained a '.' and it would do the same thing (always return NULL).

Sean

On 2/15/12 3:28 PM, "Anurag Phadke" <ap...@mozilla.com> wrote:

>Hive version: 0.7.1
>Does hive support extracting data from jsons which begin with @ as part
>of the key?
>
>Sample Data:
>{"@fields":{},"timestamp":"2012-02-09T21:58:42.420085","@timestamp":"2012-
>02-09T21:58:42.472000Z","@source":"unknown","env_version":"0.8","@type":nu
>ll,"@tags":["output_hdfs","output_text","output_statsd"],"logger":"","payl
>oad":"1","severity":6,"type":"counter","fields":{"name":"foo"}}
>
>{"@fields":{},"timestamp":"2012-02-09T22:19:59.261948","@timestamp":"2012-
>02-09T22:19:59.266000Z","@source":"unknown","env_version":"0.8","@type":nu
>ll,"@tags":["output_hdfs","output_text","output_statsd"],"logger":"server-
>core-app","payload":"1","severity":6,"type":"timer","fields":{"rate":1,"na
>me":"syncstorage.controller.get_collections"}}
>
>
>
>For eg:
>select get_json_object(line, '$.env_version') from apachelog limit 10;
>works BUT
>
>select get_json_object(line, '$.env_version'),
>get_json_object(line,'$.@source') from apachelog limit 10;
>returns NULL for @source.
>
>Tried escaping as:
>select get_json_object(line, '$.env_version'),
>get_json_object(line,'$.\\@source') from apachelog limit 10;
>same error.
>
>-anurag