You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Chunky Gupta <ch...@vizury.com> on 2013/02/18 13:43:30 UTC

Loading json files into hive table is giving NULL as output(data is in s3 bucket)

Hi,

I have data in s3 bucket, which is in json format and is a zip file. I have
added this jar file in hive console :-
http://code.google.com/p/hive-json-serde/downloads/detail?name=hive-json-serde-0.2.jar&can=2&q=

I tried the following steps to create table and load data :-

1. CREATE EXTERNAL TABLE table_test ( uname STRING ) PARTITIONED BY (dt
STRING ) ROW FORMAT SERDE "org.apache.hadoop.hive.contrib.serde2.JsonSerde"
WITH SERDEPROPERTIES ( "uname"="$._u" ) LOCATION
's3://BUCKET_NAME/test_data/'

           I tried this also :-

CREATE EXTERNAL TABLE table_test ( uname STRING ) PARTITIONED BY (dt STRING
) ROW FORMAT SERDE "org.apache.hadoop.hive.contrib.serde2.JsonSerde" WITH
SERDEPROPERTIES ( "uname"="_u" ) LOCATION 's3://BUCKET_NAME/test_data/'



2. alter table table_test add partition (dt='13Feb2012') location
's3n://BUCKET_NAME/test_data/13Feb2012';

and json file is like this :-
-------------------------------------
{"_u":"test_name1","_ts":"2012-01-13","_ip":"IP1"}
{"_u":"test_name2","_ts":"2012-01-13","_ip":"IP2"}
{"_u":"test_name3","_ts":"2012-01-13","_ip":"IP3"}


When I query :-
select uname from table_test;

Output :-
NULL 13Feb2012
NULL 13Feb2012
NULL 13Feb2012


Please help me and let me know how to add json data in a table.

Thanks,
Chunky.

Re: Loading json files into hive table is giving NULL as output(data is in s3 bucket)

Posted by Dean Wampler <de...@thinkbiganalytics.com>.
I don't know what changed. I didn't work on this SerDe myself. I'm glad
it's working now for you, though!

dean

On Mon, Feb 18, 2013 at 9:31 AM, Chunky Gupta <ch...@vizury.com>wrote:

> Hi Dean,
>
> I was using *hive-json-serde-0.2.jar* earlier. Now I tried *hive-json-serde-0.3.jar
> *as you suggested and it is working fine, I am getting the output as
> expected.
>
> Can you please tell me that what code change from 0.2 to 0.3 could have
> solved this problem ?
>
>
> Thanks,
> Chunky.
>
>
> On Mon, Feb 18, 2013 at 8:47 PM, Chunky Gupta <ch...@vizury.com>wrote:
>
>> Hi Dean,
>>
>> I tried with removing underscore too, and getting the same output which
>> means problem is not with underscore. Yes, it was an example.
>>
>> Actual json file is like :-
>>
>>
>> {"colnamec":"ColNametest","colnamets":"2013-01-14","colnameip":"10.10.10.10","colnameid":"10","colnameid2":"100","colnamep":0,"colnamecp":0,"colnamep":1,"colnameed":"31509","colnamesw":0,"colnamesu2":3,"colnameqq":"0","colnameppaa":0,"colnameqwe1":0,"colnamerty2":0,"colnameiop":"1000","colnamebnm":"23425253RFDSE","colnamefgh":2,"colnameagl":"","colnameyhgb":["1234","12345","2345","56789"],"colnamepoix":["12","4567","123","5678"],"colnamedswer":["100","567","123","678"],"colnamewerui":["10","10","10","10"]}
>>
>> I tried extracting one column only as I mentioned in last mail.
>>
>> There are values not in double quotes, some are null and some keys are
>> having multiple values.
>> Dean, is this json file correct for HIVE to handle it ?
>>
>> Thanks,
>> Chunky.
>>
>>
>>
>>
>>
>> On Mon, Feb 18, 2013 at 6:23 PM, Dean Wampler <
>> dean.wampler@thinkbiganalytics.com> wrote:
>>
>>> The "uname="$._u" is the correct form. We also hacked on this SerDe at
>>> Think Big Analytics. I don't know if you'll see an improvement though.
>>>
>>> https://github.com/thinkbiganalytics/hive-json-serde
>>>
>>> I wonder if there's a problem handling the leading underscore?
>>>
>>> Also, I know it's just an example, but in case it was taken from a real
>>> situation, the dates in your example are for January.
>>>
>>> dean
>>>
>>> On Mon, Feb 18, 2013 at 6:43 AM, Chunky Gupta <ch...@vizury.com>wrote:
>>>
>>>> Hi,
>>>>
>>>> I have data in s3 bucket, which is in json format and is a zip file. I
>>>> have added this jar file in hive console :-
>>>>
>>>> http://code.google.com/p/hive-json-serde/downloads/detail?name=hive-json-serde-0.2.jar&can=2&q=
>>>>
>>>> I tried the following steps to create table and load data :-
>>>>
>>>> 1. CREATE EXTERNAL TABLE table_test ( uname STRING ) PARTITIONED BY (dt
>>>> STRING ) ROW FORMAT SERDE "org.apache.hadoop.hive.contrib.serde2.JsonSerde"
>>>> WITH SERDEPROPERTIES ( "uname"="$._u" ) LOCATION
>>>> 's3://BUCKET_NAME/test_data/'
>>>>
>>>>            I tried this also :-
>>>>
>>>> CREATE EXTERNAL TABLE table_test ( uname STRING ) PARTITIONED BY (dt
>>>> STRING ) ROW FORMAT SERDE "org.apache.hadoop.hive.contrib.serde2.JsonSerde"
>>>> WITH SERDEPROPERTIES ( "uname"="_u" ) LOCATION
>>>> 's3://BUCKET_NAME/test_data/'
>>>>
>>>>
>>>>
>>>> 2. alter table table_test add partition (dt='13Feb2012') location
>>>> 's3n://BUCKET_NAME/test_data/13Feb2012';
>>>>
>>>> and json file is like this :-
>>>> -------------------------------------
>>>> {"_u":"test_name1","_ts":"2012-01-13","_ip":"IP1"}
>>>> {"_u":"test_name2","_ts":"2012-01-13","_ip":"IP2"}
>>>> {"_u":"test_name3","_ts":"2012-01-13","_ip":"IP3"}
>>>>
>>>>
>>>> When I query :-
>>>> select uname from table_test;
>>>>
>>>> Output :-
>>>> NULL 13Feb2012
>>>> NULL 13Feb2012
>>>> NULL 13Feb2012
>>>>
>>>>
>>>> Please help me and let me know how to add json data in a table.
>>>>
>>>> Thanks,
>>>> Chunky.
>>>>
>>>
>>>
>>>
>>> --
>>> *Dean Wampler, Ph.D.*
>>> thinkbiganalytics.com
>>> +1-312-339-1330
>>>
>>>
>>
>


-- 
*Dean Wampler, Ph.D.*
thinkbiganalytics.com
+1-312-339-1330

Re: Loading json files into hive table is giving NULL as output(data is in s3 bucket)

Posted by Chunky Gupta <ch...@vizury.com>.
Hi Dean,

I was using *hive-json-serde-0.2.jar* earlier. Now I tried
*hive-json-serde-0.3.jar
*as you suggested and it is working fine, I am getting the output as
expected.

Can you please tell me that what code change from 0.2 to 0.3 could have
solved this problem ?


Thanks,
Chunky.

On Mon, Feb 18, 2013 at 8:47 PM, Chunky Gupta <ch...@vizury.com>wrote:

> Hi Dean,
>
> I tried with removing underscore too, and getting the same output which
> means problem is not with underscore. Yes, it was an example.
>
> Actual json file is like :-
>
>
> {"colnamec":"ColNametest","colnamets":"2013-01-14","colnameip":"10.10.10.10","colnameid":"10","colnameid2":"100","colnamep":0,"colnamecp":0,"colnamep":1,"colnameed":"31509","colnamesw":0,"colnamesu2":3,"colnameqq":"0","colnameppaa":0,"colnameqwe1":0,"colnamerty2":0,"colnameiop":"1000","colnamebnm":"23425253RFDSE","colnamefgh":2,"colnameagl":"","colnameyhgb":["1234","12345","2345","56789"],"colnamepoix":["12","4567","123","5678"],"colnamedswer":["100","567","123","678"],"colnamewerui":["10","10","10","10"]}
>
> I tried extracting one column only as I mentioned in last mail.
>
> There are values not in double quotes, some are null and some keys are
> having multiple values.
> Dean, is this json file correct for HIVE to handle it ?
>
> Thanks,
> Chunky.
>
>
>
>
>
> On Mon, Feb 18, 2013 at 6:23 PM, Dean Wampler <
> dean.wampler@thinkbiganalytics.com> wrote:
>
>> The "uname="$._u" is the correct form. We also hacked on this SerDe at
>> Think Big Analytics. I don't know if you'll see an improvement though.
>>
>> https://github.com/thinkbiganalytics/hive-json-serde
>>
>> I wonder if there's a problem handling the leading underscore?
>>
>> Also, I know it's just an example, but in case it was taken from a real
>> situation, the dates in your example are for January.
>>
>> dean
>>
>> On Mon, Feb 18, 2013 at 6:43 AM, Chunky Gupta <ch...@vizury.com>wrote:
>>
>>> Hi,
>>>
>>> I have data in s3 bucket, which is in json format and is a zip file. I
>>> have added this jar file in hive console :-
>>>
>>> http://code.google.com/p/hive-json-serde/downloads/detail?name=hive-json-serde-0.2.jar&can=2&q=
>>>
>>> I tried the following steps to create table and load data :-
>>>
>>> 1. CREATE EXTERNAL TABLE table_test ( uname STRING ) PARTITIONED BY (dt
>>> STRING ) ROW FORMAT SERDE "org.apache.hadoop.hive.contrib.serde2.JsonSerde"
>>> WITH SERDEPROPERTIES ( "uname"="$._u" ) LOCATION
>>> 's3://BUCKET_NAME/test_data/'
>>>
>>>            I tried this also :-
>>>
>>> CREATE EXTERNAL TABLE table_test ( uname STRING ) PARTITIONED BY (dt
>>> STRING ) ROW FORMAT SERDE "org.apache.hadoop.hive.contrib.serde2.JsonSerde"
>>> WITH SERDEPROPERTIES ( "uname"="_u" ) LOCATION
>>> 's3://BUCKET_NAME/test_data/'
>>>
>>>
>>>
>>> 2. alter table table_test add partition (dt='13Feb2012') location
>>> 's3n://BUCKET_NAME/test_data/13Feb2012';
>>>
>>> and json file is like this :-
>>> -------------------------------------
>>> {"_u":"test_name1","_ts":"2012-01-13","_ip":"IP1"}
>>> {"_u":"test_name2","_ts":"2012-01-13","_ip":"IP2"}
>>> {"_u":"test_name3","_ts":"2012-01-13","_ip":"IP3"}
>>>
>>>
>>> When I query :-
>>> select uname from table_test;
>>>
>>> Output :-
>>> NULL 13Feb2012
>>> NULL 13Feb2012
>>> NULL 13Feb2012
>>>
>>>
>>> Please help me and let me know how to add json data in a table.
>>>
>>> Thanks,
>>> Chunky.
>>>
>>
>>
>>
>> --
>> *Dean Wampler, Ph.D.*
>> thinkbiganalytics.com
>> +1-312-339-1330
>>
>>
>

Re: Loading json files into hive table is giving NULL as output(data is in s3 bucket)

Posted by Chunky Gupta <ch...@vizury.com>.
Hi Dean,

I tried with removing underscore too, and getting the same output which
means problem is not with underscore. Yes, it was an example.

Actual json file is like :-

{"colnamec":"ColNametest","colnamets":"2013-01-14","colnameip":"10.10.10.10","colnameid":"10","colnameid2":"100","colnamep":0,"colnamecp":0,"colnamep":1,"colnameed":"31509","colnamesw":0,"colnamesu2":3,"colnameqq":"0","colnameppaa":0,"colnameqwe1":0,"colnamerty2":0,"colnameiop":"1000","colnamebnm":"23425253RFDSE","colnamefgh":2,"colnameagl":"","colnameyhgb":["1234","12345","2345","56789"],"colnamepoix":["12","4567","123","5678"],"colnamedswer":["100","567","123","678"],"colnamewerui":["10","10","10","10"]}

I tried extracting one column only as I mentioned in last mail.

There are values not in double quotes, some are null and some keys are
having multiple values.
Dean, is this json file correct for HIVE to handle it ?

Thanks,
Chunky.




On Mon, Feb 18, 2013 at 6:23 PM, Dean Wampler <
dean.wampler@thinkbiganalytics.com> wrote:

> The "uname="$._u" is the correct form. We also hacked on this SerDe at
> Think Big Analytics. I don't know if you'll see an improvement though.
>
> https://github.com/thinkbiganalytics/hive-json-serde
>
> I wonder if there's a problem handling the leading underscore?
>
> Also, I know it's just an example, but in case it was taken from a real
> situation, the dates in your example are for January.
>
> dean
>
> On Mon, Feb 18, 2013 at 6:43 AM, Chunky Gupta <ch...@vizury.com>wrote:
>
>> Hi,
>>
>> I have data in s3 bucket, which is in json format and is a zip file. I
>> have added this jar file in hive console :-
>>
>> http://code.google.com/p/hive-json-serde/downloads/detail?name=hive-json-serde-0.2.jar&can=2&q=
>>
>> I tried the following steps to create table and load data :-
>>
>> 1. CREATE EXTERNAL TABLE table_test ( uname STRING ) PARTITIONED BY (dt
>> STRING ) ROW FORMAT SERDE "org.apache.hadoop.hive.contrib.serde2.JsonSerde"
>> WITH SERDEPROPERTIES ( "uname"="$._u" ) LOCATION
>> 's3://BUCKET_NAME/test_data/'
>>
>>            I tried this also :-
>>
>> CREATE EXTERNAL TABLE table_test ( uname STRING ) PARTITIONED BY (dt
>> STRING ) ROW FORMAT SERDE "org.apache.hadoop.hive.contrib.serde2.JsonSerde"
>> WITH SERDEPROPERTIES ( "uname"="_u" ) LOCATION
>> 's3://BUCKET_NAME/test_data/'
>>
>>
>>
>> 2. alter table table_test add partition (dt='13Feb2012') location
>> 's3n://BUCKET_NAME/test_data/13Feb2012';
>>
>> and json file is like this :-
>> -------------------------------------
>> {"_u":"test_name1","_ts":"2012-01-13","_ip":"IP1"}
>> {"_u":"test_name2","_ts":"2012-01-13","_ip":"IP2"}
>> {"_u":"test_name3","_ts":"2012-01-13","_ip":"IP3"}
>>
>>
>> When I query :-
>> select uname from table_test;
>>
>> Output :-
>> NULL 13Feb2012
>> NULL 13Feb2012
>> NULL 13Feb2012
>>
>>
>> Please help me and let me know how to add json data in a table.
>>
>> Thanks,
>> Chunky.
>>
>
>
>
> --
> *Dean Wampler, Ph.D.*
> thinkbiganalytics.com
> +1-312-339-1330
>
>

Re: Loading json files into hive table is giving NULL as output(data is in s3 bucket)

Posted by Dean Wampler <de...@thinkbiganalytics.com>.
The "uname="$._u" is the correct form. We also hacked on this SerDe at
Think Big Analytics. I don't know if you'll see an improvement though.

https://github.com/thinkbiganalytics/hive-json-serde

I wonder if there's a problem handling the leading underscore?

Also, I know it's just an example, but in case it was taken from a real
situation, the dates in your example are for January.

dean

On Mon, Feb 18, 2013 at 6:43 AM, Chunky Gupta <ch...@vizury.com>wrote:

> Hi,
>
> I have data in s3 bucket, which is in json format and is a zip file. I
> have added this jar file in hive console :-
>
> http://code.google.com/p/hive-json-serde/downloads/detail?name=hive-json-serde-0.2.jar&can=2&q=
>
> I tried the following steps to create table and load data :-
>
> 1. CREATE EXTERNAL TABLE table_test ( uname STRING ) PARTITIONED BY (dt
> STRING ) ROW FORMAT SERDE "org.apache.hadoop.hive.contrib.serde2.JsonSerde"
> WITH SERDEPROPERTIES ( "uname"="$._u" ) LOCATION
> 's3://BUCKET_NAME/test_data/'
>
>            I tried this also :-
>
> CREATE EXTERNAL TABLE table_test ( uname STRING ) PARTITIONED BY (dt
> STRING ) ROW FORMAT SERDE "org.apache.hadoop.hive.contrib.serde2.JsonSerde"
> WITH SERDEPROPERTIES ( "uname"="_u" ) LOCATION
> 's3://BUCKET_NAME/test_data/'
>
>
>
> 2. alter table table_test add partition (dt='13Feb2012') location
> 's3n://BUCKET_NAME/test_data/13Feb2012';
>
> and json file is like this :-
> -------------------------------------
> {"_u":"test_name1","_ts":"2012-01-13","_ip":"IP1"}
> {"_u":"test_name2","_ts":"2012-01-13","_ip":"IP2"}
> {"_u":"test_name3","_ts":"2012-01-13","_ip":"IP3"}
>
>
> When I query :-
> select uname from table_test;
>
> Output :-
> NULL 13Feb2012
> NULL 13Feb2012
> NULL 13Feb2012
>
>
> Please help me and let me know how to add json data in a table.
>
> Thanks,
> Chunky.
>



-- 
*Dean Wampler, Ph.D.*
thinkbiganalytics.com
+1-312-339-1330