You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Igor Kravzov <ig...@gmail.com> on 2016/06/08 18:55:25 UTC
JsonSerDe file format question
I am merging multiple JSON file in a bigger one before saving it to HDFS.
So merged file looks like this
{"id":160889136,"url":"
http://twitter.com/PatrocinarBRA/statuses/740301352052654080",
..}{"id":160889137,"url":"
http://twitter.com/tchiagoolimpio/statuses/740301352253825024
",...}{"id":160889138,"url":"
http://twitter.com/Aztlana/statuses/740301352694255621",...}
JSON data concatenated one after another, not on a new line.
I also created table like this
CREATE external TABLE testtable
(
id bigint,
url string,
...)
partitioned by (yyyymmdd int)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
location '/mytest/test';
and added partition
alter table testtable
add if not exists partition (yyyymmdd=20160608) location
'/mytest/test/20160608';
There are 3 file with r JSON records each. But when I run select * from
testtable; it return me only first row from each one of file nested of 9.
What can be the problem?
Re: JsonSerDe file format question
Posted by Igor Kravzov <ig...@gmail.com>.
Found the issue. Looks like rows should be separated by new line.
On Wed, Jun 8, 2016 at 2:55 PM, Igor Kravzov <ig...@gmail.com> wrote:
> I am merging multiple JSON file in a bigger one before saving it to HDFS.
> So merged file looks like this
>
> {"id":160889136,"url":"
> http://twitter.com/PatrocinarBRA/statuses/740301352052654080",
> ..}{"id":160889137,"url":"
> http://twitter.com/tchiagoolimpio/statuses/740301352253825024
> ",...}{"id":160889138,"url":"
> http://twitter.com/Aztlana/statuses/740301352694255621",...}
>
> JSON data concatenated one after another, not on a new line.
>
> I also created table like this
> CREATE external TABLE testtable
> (
> id bigint,
> url string,
> ...)
> partitioned by (yyyymmdd int)
> ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
> location '/mytest/test';
>
> and added partition
> alter table testtable
> add if not exists partition (yyyymmdd=20160608) location
> '/mytest/test/20160608';
>
>
> There are 3 file with r JSON records each. But when I run select * from
> testtable; it return me only first row from each one of file nested of 9.
>
> What can be the problem?
>
Re: JsonSerDe file format question
Posted by Igor Kravzov <ig...@gmail.com>.
There are 3 files with 3 JSON records each. But when I run select * from
testtable; it returns me only first row from each one of files instead of 9.
On Wed, Jun 8, 2016 at 2:55 PM, Igor Kravzov <ig...@gmail.com> wrote:
> I am merging multiple JSON file in a bigger one before saving it to HDFS.
> So merged file looks like this
>
> {"id":160889136,"url":"
> http://twitter.com/PatrocinarBRA/statuses/740301352052654080",
> ..}{"id":160889137,"url":"
> http://twitter.com/tchiagoolimpio/statuses/740301352253825024
> ",...}{"id":160889138,"url":"
> http://twitter.com/Aztlana/statuses/740301352694255621",...}
>
> JSON data concatenated one after another, not on a new line.
>
> I also created table like this
> CREATE external TABLE testtable
> (
> id bigint,
> url string,
> ...)
> partitioned by (yyyymmdd int)
> ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
> location '/mytest/test';
>
> and added partition
> alter table testtable
> add if not exists partition (yyyymmdd=20160608) location
> '/mytest/test/20160608';
>
>
> There are 3 file with r JSON records each. But when I run select * from
> testtable; it return me only first row from each one of file nested of 9.
>
> What can be the problem?
>