You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by kpeng1 <kp...@gmail.com> on 2015/03/14 00:45:26 UTC

Loading in json with spark sql

Hi All,

I was noodling around with loading in a json file into spark sql's hive
context and I noticed that I get the following message after loading in the
json file:
PhysicalRDD [_corrupt_record#0], MappedRDD[5] at map at JsonRDD.scala:47

I am using the HiveContext to load in the json file using the jsonFile
command.  I also have 1 json object per line on the file.  Here is a sample
of the contents in the json file:
{"user_id":"7070","providers":{{"id":"8753","name":"pjfig","behaviors":{"b1":"erwxt","b2":"yjooj"}},{"id":"8329","name":"dfvhh","behaviors":{"b1":"pjjdn","b2":"ooqsh"}}}}
{"user_id":"1615","providers":{{"id":"6105","name":"rsfon","behaviors":{"b1":"whlje","b2":"lpjnq"}},{"id":"6828","name":"pnmrb","behaviors":{"b1":"fjpmz","b2":"dxqxk"}}}}
{"user_id":"5210","providers":{{"id":"9360","name":"xdylm","behaviors":{"b1":"gcdze","b2":"cndcs"}},{"id":"4812","name":"gxboh","behaviors":{"b1":"qsxao","b2":"ixdzq"}}}}




--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Loading-in-json-with-spark-sql-tp22044.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: Loading in json with spark sql

Posted by Kevin Peng <kp...@gmail.com>.
Yin,

Yup thanks.  I fixed that shortly after I posted and it worked.

Thanks,

Kevin

On Fri, Mar 13, 2015 at 8:28 PM, Yin Huai <yh...@databricks.com> wrote:

> Seems you want to use array for the field of "providers", like "providers":[{"id":
> ...}, {"id":...}] instead of "providers":{{"id": ...}, {"id":...}}
>
> On Fri, Mar 13, 2015 at 7:45 PM, kpeng1 <kp...@gmail.com> wrote:
>
>> Hi All,
>>
>> I was noodling around with loading in a json file into spark sql's hive
>> context and I noticed that I get the following message after loading in
>> the
>> json file:
>> PhysicalRDD [_corrupt_record#0], MappedRDD[5] at map at JsonRDD.scala:47
>>
>> I am using the HiveContext to load in the json file using the jsonFile
>> command.  I also have 1 json object per line on the file.  Here is a
>> sample
>> of the contents in the json file:
>>
>> {"user_id":"7070","providers":{{"id":"8753","name":"pjfig","behaviors":{"b1":"erwxt","b2":"yjooj"}},{"id":"8329","name":"dfvhh","behaviors":{"b1":"pjjdn","b2":"ooqsh"}}}}
>>
>> {"user_id":"1615","providers":{{"id":"6105","name":"rsfon","behaviors":{"b1":"whlje","b2":"lpjnq"}},{"id":"6828","name":"pnmrb","behaviors":{"b1":"fjpmz","b2":"dxqxk"}}}}
>>
>> {"user_id":"5210","providers":{{"id":"9360","name":"xdylm","behaviors":{"b1":"gcdze","b2":"cndcs"}},{"id":"4812","name":"gxboh","behaviors":{"b1":"qsxao","b2":"ixdzq"}}}}
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Loading-in-json-with-spark-sql-tp22044.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
>>
>>
>

Re: Loading in json with spark sql

Posted by Yin Huai <yh...@databricks.com>.
Seems you want to use array for the field of "providers", like
"providers":[{"id":
...}, {"id":...}] instead of "providers":{{"id": ...}, {"id":...}}

On Fri, Mar 13, 2015 at 7:45 PM, kpeng1 <kp...@gmail.com> wrote:

> Hi All,
>
> I was noodling around with loading in a json file into spark sql's hive
> context and I noticed that I get the following message after loading in the
> json file:
> PhysicalRDD [_corrupt_record#0], MappedRDD[5] at map at JsonRDD.scala:47
>
> I am using the HiveContext to load in the json file using the jsonFile
> command.  I also have 1 json object per line on the file.  Here is a sample
> of the contents in the json file:
>
> {"user_id":"7070","providers":{{"id":"8753","name":"pjfig","behaviors":{"b1":"erwxt","b2":"yjooj"}},{"id":"8329","name":"dfvhh","behaviors":{"b1":"pjjdn","b2":"ooqsh"}}}}
>
> {"user_id":"1615","providers":{{"id":"6105","name":"rsfon","behaviors":{"b1":"whlje","b2":"lpjnq"}},{"id":"6828","name":"pnmrb","behaviors":{"b1":"fjpmz","b2":"dxqxk"}}}}
>
> {"user_id":"5210","providers":{{"id":"9360","name":"xdylm","behaviors":{"b1":"gcdze","b2":"cndcs"}},{"id":"4812","name":"gxboh","behaviors":{"b1":"qsxao","b2":"ixdzq"}}}}
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Loading-in-json-with-spark-sql-tp22044.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>