You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hive.apache.org by Igor Kravzov <ig...@gmail.com> on 2016/06/08 18:55:25 UTC

JsonSerDe file format question

I am merging multiple JSON file in a bigger one before saving it to HDFS.
So merged file looks like this

{"id":160889136,"url":"
http://twitter.com/PatrocinarBRA/statuses/740301352052654080",
..}{"id":160889137,"url":"
http://twitter.com/tchiagoolimpio/statuses/740301352253825024
",...}{"id":160889138,"url":"
http://twitter.com/Aztlana/statuses/740301352694255621",...}

JSON data concatenated one after another, not on a new line.

I also created table like this
CREATE external TABLE testtable
(
  id bigint,
  url string,
...)
partitioned by (yyyymmdd int)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
location '/mytest/test';

and added partition
alter table testtable
  add if not exists partition (yyyymmdd=20160608) location
'/mytest/test/20160608';


There are 3 file with r JSON records each. But when I run select * from
testtable; it return me only first row from each one of file nested of 9.

What can be the problem?

Re: JsonSerDe file format question

Posted by Igor Kravzov <ig...@gmail.com>.

Found the issue. Looks like rows should be separated by new line.

On Wed, Jun 8, 2016 at 2:55 PM, Igor Kravzov <ig...@gmail.com> wrote:

> I am merging multiple JSON file in a bigger one before saving it to HDFS.
> So merged file looks like this
>
> {"id":160889136,"url":"
> http://twitter.com/PatrocinarBRA/statuses/740301352052654080",
> ..}{"id":160889137,"url":"
> http://twitter.com/tchiagoolimpio/statuses/740301352253825024
> ",...}{"id":160889138,"url":"
> http://twitter.com/Aztlana/statuses/740301352694255621",...}
>
> JSON data concatenated one after another, not on a new line.
>
> I also created table like this
> CREATE external TABLE testtable
> (
>   id bigint,
>   url string,
> ...)
> partitioned by (yyyymmdd int)
> ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
> location '/mytest/test';
>
> and added partition
> alter table testtable
>   add if not exists partition (yyyymmdd=20160608) location
> '/mytest/test/20160608';
>
>
> There are 3 file with r JSON records each. But when I run select * from
> testtable; it return me only first row from each one of file nested of 9.
>
> What can be the problem?
>

Re: JsonSerDe file format question

Posted by Igor Kravzov <ig...@gmail.com>.

There are 3 files with 3 JSON records each. But when I run select * from
testtable; it returns me only first row from each one of files instead of 9.

On Wed, Jun 8, 2016 at 2:55 PM, Igor Kravzov <ig...@gmail.com> wrote:

> I am merging multiple JSON file in a bigger one before saving it to HDFS.
> So merged file looks like this
>
> {"id":160889136,"url":"
> http://twitter.com/PatrocinarBRA/statuses/740301352052654080",
> ..}{"id":160889137,"url":"
> http://twitter.com/tchiagoolimpio/statuses/740301352253825024
> ",...}{"id":160889138,"url":"
> http://twitter.com/Aztlana/statuses/740301352694255621",...}
>
> JSON data concatenated one after another, not on a new line.
>
> I also created table like this
> CREATE external TABLE testtable
> (
>   id bigint,
>   url string,
> ...)
> partitioned by (yyyymmdd int)
> ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
> location '/mytest/test';
>
> and added partition
> alter table testtable
>   add if not exists partition (yyyymmdd=20160608) location
> '/mytest/test/20160608';
>
>
> There are 3 file with r JSON records each. But when I run select * from
> testtable; it return me only first row from each one of file nested of 9.
>
> What can be the problem?
>