You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Jerry Wong <je...@gmail.com> on 2016/06/03 19:31:31 UTC

Spark SQL Nested Array of JSON with empty field

Hi,

I met a problem of empty field in the nested JSON file with Spark SQL. For
instance,
There are two lines of JSON file as follows,

{
"firstname": "Jack",
"lastname": "Nelson",
"address": {
"state": "New York",
"city": "New York"
}
}{
"firstname": "Landy",
"middlename": "Ken",
"lastname": "Yong",
"address": {
"state": "California",
"city": "Los Angles"
}
}

I use Spark SQL to get the files like,
val row = sqlContext.sql("SELECT firstname, middlename, lastname,
address.state, address.city FROM jsontable")
The compile will tell me the error of line1: no "middlename".
How do I handle this case in the SQL sql?

Many thanks in advance!
Jerry

Re: Spark SQL Nested Array of JSON with empty field

Posted by Christian Hellström <ps...@gmail.com>.
If that's your JSON file, then the first problem is that it's incorrectly
formatted.

Apart from that you can just read the JSON into a DataFrame with
sqlContext.read.json() and then select directly on the DataFrame without
having to register a temporary table: jsonDF.select("firstname",
"address.state", ...). Works for me (with a properly formatted JSON
document). To make sure that your JSON is read correctly, check
jsonDF.printSchema. If there is an entry with corrupt records (or similar),
you know there's a problem with the JSON structure.

On 3 June 2016 at 21:31, Jerry Wong <je...@gmail.com> wrote:

> Hi,
>
> I met a problem of empty field in the nested JSON file with Spark SQL. For
> instance,
> There are two lines of JSON file as follows,
>
> {
> "firstname": "Jack",
> "lastname": "Nelson",
> "address": {
> "state": "New York",
> "city": "New York"
> }
> }{
> "firstname": "Landy",
> "middlename": "Ken",
> "lastname": "Yong",
> "address": {
> "state": "California",
> "city": "Los Angles"
> }
> }
>
> I use Spark SQL to get the files like,
> val row = sqlContext.sql("SELECT firstname, middlename, lastname,
> address.state, address.city FROM jsontable")
> The compile will tell me the error of line1: no "middlename".
> How do I handle this case in the SQL sql?
>
> Many thanks in advance!
> Jerry
>
>
>