You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Jean Georges Perrin <jg...@jgp.net> on 2016/10/10 16:57:33 UTC

JSON Arrays and Spark

Hi folks,

I am trying to parse JSON arrays and it’s getting a little crazy (for me at least)…

1)
If my JSON is:
{"vals":[100,500,600,700,800,200,900,300]}

I get:
+--------------------+
|                vals|
+--------------------+
|[100, 500, 600, 7...|
+--------------------+

root
 |-- vals: array (nullable = true)
 |    |-- element: long (containsNull = true)

and I am :)

2)
If my JSON is:
[100,500,600,700,800,200,900,300]

I get:
+--------------------+
|     _corrupt_record|
+--------------------+
|[100,500,600,700,...|
+--------------------+

root
 |-- _corrupt_record: string (nullable = true)

Both are legit JSON structures… Do you think that #2 is a bug?

jg






Re: JSON Arrays and Spark

Posted by Hyukjin Kwon <gu...@gmail.com>.
FYI, it supports

[{...}, {...} ...]

Or

{...}

format as input.

On 11 Oct 2016 3:19 a.m., "Jean Georges Perrin" <jg...@jgp.net> wrote:

> Thanks Luciano - I think this is my issue :(
>
> On Oct 10, 2016, at 2:08 PM, Luciano Resende <lu...@gmail.com> wrote:
>
> Please take a look at
> http://spark.apache.org/docs/latest/sql-programming-guide.
> html#json-datasets
>
> Particularly the note at the required format :
>
> Note that the file that is offered as *a json file* is not a typical JSON
> file. Each line must contain a separate, self-contained valid JSON object.
> As a consequence, a regular multi-line JSON file will most often fail.
>
>
>
> On Mon, Oct 10, 2016 at 9:57 AM, Jean Georges Perrin <jg...@jgp.net> wrote:
>
>> Hi folks,
>>
>> I am trying to parse JSON arrays and it’s getting a little crazy (for me
>> at least)…
>>
>> 1)
>> If my JSON is:
>> {"vals":[100,500,600,700,800,200,900,300]}
>>
>> I get:
>> +--------------------+
>> |                vals|
>> +--------------------+
>> |[100, 500, 600, 7...|
>> +--------------------+
>>
>> root
>>  |-- vals: array (nullable = true)
>>  |    |-- element: long (containsNull = true)
>>
>> and I am :)
>>
>> 2)
>> If my JSON is:
>> [100,500,600,700,800,200,900,300]
>>
>> I get:
>> +--------------------+
>> |     _corrupt_record|
>> +--------------------+
>> |[100,500,600,700,...|
>> +--------------------+
>>
>> root
>>  |-- _corrupt_record: string (nullable = true)
>>
>> Both are legit JSON structures… Do you think that #2 is a bug?
>>
>> jg
>>
>>
>>
>>
>>
>>
>
>
> --
> Luciano Resende
> http://twitter.com/lresende1975
> http://lresende.blogspot.com/
>
>
>

Re: JSON Arrays and Spark

Posted by Jean Georges Perrin <jg...@jgp.net>.
Thanks Luciano - I think this is my issue :(

> On Oct 10, 2016, at 2:08 PM, Luciano Resende <lu...@gmail.com> wrote:
> 
> Please take a look at 
> http://spark.apache.org/docs/latest/sql-programming-guide.html#json-datasets <http://spark.apache.org/docs/latest/sql-programming-guide.html#json-datasets>
> 
> Particularly the note at the required format :
> 
> Note that the file that is offered as a json file is not a typical JSON file. Each line must contain a separate, self-contained valid JSON object. As a consequence, a regular multi-line JSON file will most often fail.
> 
> 
> 
> On Mon, Oct 10, 2016 at 9:57 AM, Jean Georges Perrin <jgp@jgp.net <ma...@jgp.net>> wrote:
> Hi folks,
> 
> I am trying to parse JSON arrays and it’s getting a little crazy (for me at least)…
> 
> 1)
> If my JSON is:
> {"vals":[100,500,600,700,800,200,900,300]}
> 
> I get:
> +--------------------+
> |                vals|
> +--------------------+
> |[100, 500, 600, 7...|
> +--------------------+
> 
> root
>  |-- vals: array (nullable = true)
>  |    |-- element: long (containsNull = true)
> 
> and I am :)
> 
> 2)
> If my JSON is:
> [100,500,600,700,800,200,900,300]
> 
> I get:
> +--------------------+
> |     _corrupt_record|
> +--------------------+
> |[100,500,600,700,...|
> +--------------------+
> 
> root
>  |-- _corrupt_record: string (nullable = true)
> 
> Both are legit JSON structures… Do you think that #2 is a bug?
> 
> jg
> 
> 
> 
> 
> 
> 
> 
> 
> -- 
> Luciano Resende
> http://twitter.com/lresende1975 <http://twitter.com/lresende1975>
> http://lresende.blogspot.com/ <http://lresende.blogspot.com/>

Re: JSON Arrays and Spark

Posted by sujeet jog <su...@gmail.com>.
I generally use Play Framework Api's for comple json structures.

https://www.playframework.com/documentation/2.5.x/ScalaJson#Json

On Wed, Oct 12, 2016 at 11:34 AM, Kappaganthu, Sivaram (ES) <
Sivaram.Kappaganthu@adp.com> wrote:

> Hi,
>
>
>
> Does this mean that handling any Json with kind of below schema  with
> spark is not a good fit?? I have requirement to parse the below Json that
> spans across multiple lines. Whats the best way to parse the jsns of this
> kind?? Please suggest.
>
>
>
> root
>
> |-- maindate: struct (nullable = true)
>
> |    |-- mainidnId: string (nullable = true)
>
> |-- Entity: array (nullable = true)
>
> |    |-- element: struct (containsNull = true)
>
> |    |    |-- Profile: struct (nullable = true)
>
> |    |    |    |-- Kind: string (nullable = true)
>
> |    |    |-- Identifier: string (nullable = true)
>
> |    |    |-- Group: array (nullable = true)
>
> |    |    |    |-- element: struct (containsNull = true)
>
> |    |    |    |    |-- Period: struct (nullable = true)
>
> |    |    |    |    |    |-- pid: string (nullable = true)
>
> |    |    |    |    |    |-- pDate: string (nullable = true)
>
> |    |    |    |    |    |-- quarter: long (nullable = true)
>
> |    |    |    |    |    |-- labour: array (nullable = true)
>
> |    |    |    |    |    |    |-- element: struct (containsNull = true)
>
> |    |    |    |    |    |    |    |-- category: string (nullable = true)
>
> |    |    |    |    |    |    |    |-- id: string (nullable = true)
>
> |    |    |    |    |    |    |    |-- person: struct (nullable = true)
>
> |    |    |    |    |    |    |    |    |-- address: array (nullable =
> true)
>
> |    |    |    |    |    |    |    |    |    |-- element: struct
> (containsNull = true)
>
> |    |    |    |    |    |    |    |    |    |    |-- city: string
> (nullable = true)
>
> |    |    |    |    |    |    |    |    |    |    |-- line1: string
> (nullable = true)
>
> |    |    |    |    |    |    |    |    |    |    |-- line2: string
> (nullable = true)
>
> |    |    |    |    |    |    |    |    |    |    |-- postalCode: string
> (nullable = true)
>
> |    |    |    |    |    |    |    |    |    |    |-- state: string
> (nullable = true)
>
> |    |    |    |    |    |    |    |    |    |    |-- type: string
> (nullable = true)
>
> |    |    |    |    |    |    |    |    |-- familyName: string (nullable =
> true)
>
> |    |    |    |    |    |    |    |-- tax: array (nullable = true)
>
> |    |    |    |    |    |    |    |    |-- element: struct (containsNull
> = true)
>
> |    |    |    |    |    |    |    |    |    |-- code: string (nullable =
> true)
>
> |    |    |    |    |    |    |    |    |    |-- qwage: double (nullable =
> true)
>
> |    |    |    |    |    |    |    |    |    |-- qvalue: double (nullable
> = true)
>
> |    |    |    |    |    |    |    |    |    |-- qSubjectvalue: double
> (nullable = true)
>
> |    |    |    |    |    |    |    |    |    |-- qfinalvalue: double
> (nullable = true)
>
> |    |    |    |    |    |    |    |    |    |-- ywage: double (nullable =
> true)
>
> |    |    |    |    |    |    |    |    |    |-- yalue: double (nullable =
> true)
>
> |    |    |    |    |    |    |    |    |    |-- ySubjectvalue: double
> (nullable = true)
>
> |    |    |    |    |    |    |    |    |    |-- yfinalvalue: double
> (nullable = true)
>
> |    |    |    |    |    |    |    |-- tProfile: array (nullable = true)
>
> |    |    |    |    |    |    |    |    |-- element: struct (containsNull
> = true)
>
> |    |    |    |    |    |    |    |    |    |-- isExempt: boolean
> (nullable = true)
>
> |    |    |    |    |    |    |    |    |    |-- jurisdiction: struct
> (nullable = true)
>
> |    |    |    |    |    |    |    |    |    |    |-- code: string
> (nullable = true)
>
> |    |    |    |    |    |    |    |    |    |-- maritalStatus: string
> (nullable = true)
>
> |    |    |    |    |    |    |    |    |    |-- numberOfDeductions: long
> (nullable = true)
>
> |    |    |    |    |    |    |    |-- wDate: struct (nullable = true)
>
> |    |    |    |    |    |    |    |    |-- originalHireDate: string
> (nullable = true)
>
> |    |    |    |    |    |-- year: long (nullable = true)
>
>
>
>
>
> *From:* Luciano Resende [mailto:luckbr1975@gmail.com]
> *Sent:* Monday, October 10, 2016 11:39 PM
> *To:* Jean Georges Perrin
> *Cc:* user @spark
> *Subject:* Re: JSON Arrays and Spark
>
>
>
> Please take a look at
> http://spark.apache.org/docs/latest/sql-programming-guide.
> html#json-datasets
>
> Particularly the note at the required format :
>
> Note that the file that is offered as *a json file* is not a typical JSON
> file. Each line must contain a separate, self-contained valid JSON object.
> As a consequence, a regular multi-line JSON file will most often fail.
>
>
>
> On Mon, Oct 10, 2016 at 9:57 AM, Jean Georges Perrin <jg...@jgp.net> wrote:
>
> Hi folks,
>
>
>
> I am trying to parse JSON arrays and it’s getting a little crazy (for me
> at least)…
>
>
>
> 1)
>
> If my JSON is:
>
> {"vals":[100,500,600,700,800,200,900,300]}
>
>
>
> I get:
>
> +--------------------+
>
> |                vals|
>
> +--------------------+
>
> |[100, 500, 600, 7...|
>
> +--------------------+
>
>
>
> root
>
>  |-- vals: array (nullable = true)
>
>  |    |-- element: long (containsNull = true)
>
>
>
> and I am :)
>
>
>
> 2)
>
> If my JSON is:
>
> [100,500,600,700,800,200,900,300]
>
>
>
> I get:
>
> +--------------------+
>
> |     _corrupt_record|
>
> +--------------------+
>
> |[100,500,600,700,...|
>
> +--------------------+
>
>
>
> root
>
>  |-- _corrupt_record: string (nullable = true)
>
>
>
> Both are legit JSON structures… Do you think that #2 is a bug?
>
>
>
> jg
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> --
>
> Luciano Resende
> http://twitter.com/lresende1975
> http://lresende.blogspot.com/
> ------------------------------
> This message and any attachments are intended only for the use of the
> addressee and may contain information that is privileged and confidential.
> If the reader of the message is not the intended recipient or an authorized
> representative of the intended recipient, you are hereby notified that any
> dissemination of this communication is strictly prohibited. If you have
> received this communication in error, notify the sender immediately by
> return email and delete the message and any attachments from your system.
>

Re: JSON Arrays and Spark

Posted by Hyukjin Kwon <gu...@gmail.com>.
No, I meant it should be in a single line but it supports array type too as
a root wrapper of JSON objects.

If you need to parse multiple lines, I have a reference here.

http://searchdatascience.com/spark-adventures-1-processing-multi-line-json-files/

2016-10-12 15:04 GMT+09:00 Kappaganthu, Sivaram (ES) <
Sivaram.Kappaganthu@adp.com>:

> Hi,
>
>
>
> Does this mean that handling any Json with kind of below schema  with
> spark is not a good fit?? I have requirement to parse the below Json that
> spans across multiple lines. Whats the best way to parse the jsns of this
> kind?? Please suggest.
>
>
>
> root
>
> |-- maindate: struct (nullable = true)
>
> |    |-- mainidnId: string (nullable = true)
>
> |-- Entity: array (nullable = true)
>
> |    |-- element: struct (containsNull = true)
>
> |    |    |-- Profile: struct (nullable = true)
>
> |    |    |    |-- Kind: string (nullable = true)
>
> |    |    |-- Identifier: string (nullable = true)
>
> |    |    |-- Group: array (nullable = true)
>
> |    |    |    |-- element: struct (containsNull = true)
>
> |    |    |    |    |-- Period: struct (nullable = true)
>
> |    |    |    |    |    |-- pid: string (nullable = true)
>
> |    |    |    |    |    |-- pDate: string (nullable = true)
>
> |    |    |    |    |    |-- quarter: long (nullable = true)
>
> |    |    |    |    |    |-- labour: array (nullable = true)
>
> |    |    |    |    |    |    |-- element: struct (containsNull = true)
>
> |    |    |    |    |    |    |    |-- category: string (nullable = true)
>
> |    |    |    |    |    |    |    |-- id: string (nullable = true)
>
> |    |    |    |    |    |    |    |-- person: struct (nullable = true)
>
> |    |    |    |    |    |    |    |    |-- address: array (nullable =
> true)
>
> |    |    |    |    |    |    |    |    |    |-- element: struct
> (containsNull = true)
>
> |    |    |    |    |    |    |    |    |    |    |-- city: string
> (nullable = true)
>
> |    |    |    |    |    |    |    |    |    |    |-- line1: string
> (nullable = true)
>
> |    |    |    |    |    |    |    |    |    |    |-- line2: string
> (nullable = true)
>
> |    |    |    |    |    |    |    |    |    |    |-- postalCode: string
> (nullable = true)
>
> |    |    |    |    |    |    |    |    |    |    |-- state: string
> (nullable = true)
>
> |    |    |    |    |    |    |    |    |    |    |-- type: string
> (nullable = true)
>
> |    |    |    |    |    |    |    |    |-- familyName: string (nullable =
> true)
>
> |    |    |    |    |    |    |    |-- tax: array (nullable = true)
>
> |    |    |    |    |    |    |    |    |-- element: struct (containsNull
> = true)
>
> |    |    |    |    |    |    |    |    |    |-- code: string (nullable =
> true)
>
> |    |    |    |    |    |    |    |    |    |-- qwage: double (nullable =
> true)
>
> |    |    |    |    |    |    |    |    |    |-- qvalue: double (nullable
> = true)
>
> |    |    |    |    |    |    |    |    |    |-- qSubjectvalue: double
> (nullable = true)
>
> |    |    |    |    |    |    |    |    |    |-- qfinalvalue: double
> (nullable = true)
>
> |    |    |    |    |    |    |    |    |    |-- ywage: double (nullable =
> true)
>
> |    |    |    |    |    |    |    |    |    |-- yalue: double (nullable =
> true)
>
> |    |    |    |    |    |    |    |    |    |-- ySubjectvalue: double
> (nullable = true)
>
> |    |    |    |    |    |    |    |    |    |-- yfinalvalue: double
> (nullable = true)
>
> |    |    |    |    |    |    |    |-- tProfile: array (nullable = true)
>
> |    |    |    |    |    |    |    |    |-- element: struct (containsNull
> = true)
>
> |    |    |    |    |    |    |    |    |    |-- isExempt: boolean
> (nullable = true)
>
> |    |    |    |    |    |    |    |    |    |-- jurisdiction: struct
> (nullable = true)
>
> |    |    |    |    |    |    |    |    |    |    |-- code: string
> (nullable = true)
>
> |    |    |    |    |    |    |    |    |    |-- maritalStatus: string
> (nullable = true)
>
> |    |    |    |    |    |    |    |    |    |-- numberOfDeductions: long
> (nullable = true)
>
> |    |    |    |    |    |    |    |-- wDate: struct (nullable = true)
>
> |    |    |    |    |    |    |    |    |-- originalHireDate: string
> (nullable = true)
>
> |    |    |    |    |    |-- year: long (nullable = true)
>
>
>
>
>
> *From:* Luciano Resende [mailto:luckbr1975@gmail.com]
> *Sent:* Monday, October 10, 2016 11:39 PM
> *To:* Jean Georges Perrin
> *Cc:* user @spark
> *Subject:* Re: JSON Arrays and Spark
>
>
>
> Please take a look at
> http://spark.apache.org/docs/latest/sql-programming-guide.
> html#json-datasets
>
> Particularly the note at the required format :
>
> Note that the file that is offered as *a json file* is not a typical JSON
> file. Each line must contain a separate, self-contained valid JSON object.
> As a consequence, a regular multi-line JSON file will most often fail.
>
>
>
> On Mon, Oct 10, 2016 at 9:57 AM, Jean Georges Perrin <jg...@jgp.net> wrote:
>
> Hi folks,
>
>
>
> I am trying to parse JSON arrays and it’s getting a little crazy (for me
> at least)…
>
>
>
> 1)
>
> If my JSON is:
>
> {"vals":[100,500,600,700,800,200,900,300]}
>
>
>
> I get:
>
> +--------------------+
>
> |                vals|
>
> +--------------------+
>
> |[100, 500, 600, 7...|
>
> +--------------------+
>
>
>
> root
>
>  |-- vals: array (nullable = true)
>
>  |    |-- element: long (containsNull = true)
>
>
>
> and I am :)
>
>
>
> 2)
>
> If my JSON is:
>
> [100,500,600,700,800,200,900,300]
>
>
>
> I get:
>
> +--------------------+
>
> |     _corrupt_record|
>
> +--------------------+
>
> |[100,500,600,700,...|
>
> +--------------------+
>
>
>
> root
>
>  |-- _corrupt_record: string (nullable = true)
>
>
>
> Both are legit JSON structures… Do you think that #2 is a bug?
>
>
>
> jg
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> --
>
> Luciano Resende
> http://twitter.com/lresende1975
> http://lresende.blogspot.com/
> ------------------------------
> This message and any attachments are intended only for the use of the
> addressee and may contain information that is privileged and confidential.
> If the reader of the message is not the intended recipient or an authorized
> representative of the intended recipient, you are hereby notified that any
> dissemination of this communication is strictly prohibited. If you have
> received this communication in error, notify the sender immediately by
> return email and delete the message and any attachments from your system.
>

RE: JSON Arrays and Spark

Posted by "Kappaganthu, Sivaram (ES)" <Si...@ADP.com>.
Hi,

Does this mean that handling any Json with kind of below schema  with spark is not a good fit?? I have requirement to parse the below Json that spans across multiple lines. Whats the best way to parse the jsns of this kind?? Please suggest.

root
|-- maindate: struct (nullable = true)
|    |-- mainidnId: string (nullable = true)
|-- Entity: array (nullable = true)
|    |-- element: struct (containsNull = true)
|    |    |-- Profile: struct (nullable = true)
|    |    |    |-- Kind: string (nullable = true)
|    |    |-- Identifier: string (nullable = true)
|    |    |-- Group: array (nullable = true)
|    |    |    |-- element: struct (containsNull = true)
|    |    |    |    |-- Period: struct (nullable = true)
|    |    |    |    |    |-- pid: string (nullable = true)
|    |    |    |    |    |-- pDate: string (nullable = true)
|    |    |    |    |    |-- quarter: long (nullable = true)
|    |    |    |    |    |-- labour: array (nullable = true)
|    |    |    |    |    |    |-- element: struct (containsNull = true)
|    |    |    |    |    |    |    |-- category: string (nullable = true)
|    |    |    |    |    |    |    |-- id: string (nullable = true)
|    |    |    |    |    |    |    |-- person: struct (nullable = true)
|    |    |    |    |    |    |    |    |-- address: array (nullable = true)
|    |    |    |    |    |    |    |    |    |-- element: struct (containsNull = true)
|    |    |    |    |    |    |    |    |    |    |-- city: string (nullable = true)
|    |    |    |    |    |    |    |    |    |    |-- line1: string (nullable = true)
|    |    |    |    |    |    |    |    |    |    |-- line2: string (nullable = true)
|    |    |    |    |    |    |    |    |    |    |-- postalCode: string (nullable = true)
|    |    |    |    |    |    |    |    |    |    |-- state: string (nullable = true)
|    |    |    |    |    |    |    |    |    |    |-- type: string (nullable = true)
|    |    |    |    |    |    |    |    |-- familyName: string (nullable = true)
|    |    |    |    |    |    |    |-- tax: array (nullable = true)
|    |    |    |    |    |    |    |    |-- element: struct (containsNull = true)
|    |    |    |    |    |    |    |    |    |-- code: string (nullable = true)
|    |    |    |    |    |    |    |    |    |-- qwage: double (nullable = true)
|    |    |    |    |    |    |    |    |    |-- qvalue: double (nullable = true)
|    |    |    |    |    |    |    |    |    |-- qSubjectvalue: double (nullable = true)
|    |    |    |    |    |    |    |    |    |-- qfinalvalue: double (nullable = true)
|    |    |    |    |    |    |    |    |    |-- ywage: double (nullable = true)
|    |    |    |    |    |    |    |    |    |-- yalue: double (nullable = true)
|    |    |    |    |    |    |    |    |    |-- ySubjectvalue: double (nullable = true)
|    |    |    |    |    |    |    |    |    |-- yfinalvalue: double (nullable = true)
|    |    |    |    |    |    |    |-- tProfile: array (nullable = true)
|    |    |    |    |    |    |    |    |-- element: struct (containsNull = true)
|    |    |    |    |    |    |    |    |    |-- isExempt: boolean (nullable = true)
|    |    |    |    |    |    |    |    |    |-- jurisdiction: struct (nullable = true)
|    |    |    |    |    |    |    |    |    |    |-- code: string (nullable = true)
|    |    |    |    |    |    |    |    |    |-- maritalStatus: string (nullable = true)
|    |    |    |    |    |    |    |    |    |-- numberOfDeductions: long (nullable = true)
|    |    |    |    |    |    |    |-- wDate: struct (nullable = true)
|    |    |    |    |    |    |    |    |-- originalHireDate: string (nullable = true)
|    |    |    |    |    |-- year: long (nullable = true)


From: Luciano Resende [mailto:luckbr1975@gmail.com]
Sent: Monday, October 10, 2016 11:39 PM
To: Jean Georges Perrin
Cc: user @spark
Subject: Re: JSON Arrays and Spark

Please take a look at
http://spark.apache.org/docs/latest/sql-programming-guide.html#json-datasets
Particularly the note at the required format :

Note that the file that is offered as a json file is not a typical JSON file. Each line must contain a separate, self-contained valid JSON object. As a consequence, a regular multi-line JSON file will most often fail.


On Mon, Oct 10, 2016 at 9:57 AM, Jean Georges Perrin <jg...@jgp.net>> wrote:
Hi folks,

I am trying to parse JSON arrays and it’s getting a little crazy (for me at least)…

1)
If my JSON is:
{"vals":[100,500,600,700,800,200,900,300]}

I get:
+--------------------+
|                vals|
+--------------------+
|[100, 500, 600, 7...|
+--------------------+

root
 |-- vals: array (nullable = true)
 |    |-- element: long (containsNull = true)

and I am :)

2)
If my JSON is:
[100,500,600,700,800,200,900,300]

I get:
+--------------------+
|     _corrupt_record|
+--------------------+
|[100,500,600,700,...|
+--------------------+

root
 |-- _corrupt_record: string (nullable = true)

Both are legit JSON structures… Do you think that #2 is a bug?

jg








--
Luciano Resende
http://twitter.com/lresende1975
http://lresende.blogspot.com/

----------------------------------------------------------------------
This message and any attachments are intended only for the use of the addressee and may contain information that is privileged and confidential. If the reader of the message is not the intended recipient or an authorized representative of the intended recipient, you are hereby notified that any dissemination of this communication is strictly prohibited. If you have received this communication in error, notify the sender immediately by return email and delete the message and any attachments from your system.

Re: JSON Arrays and Spark

Posted by Luciano Resende <lu...@gmail.com>.
Please take a look at
http://spark.apache.org/docs/latest/sql-programming-guide.html#json-datasets

Particularly the note at the required format :

Note that the file that is offered as *a json file* is not a typical JSON
file. Each line must contain a separate, self-contained valid JSON object.
As a consequence, a regular multi-line JSON file will most often fail.



On Mon, Oct 10, 2016 at 9:57 AM, Jean Georges Perrin <jg...@jgp.net> wrote:

> Hi folks,
>
> I am trying to parse JSON arrays and it’s getting a little crazy (for me
> at least)…
>
> 1)
> If my JSON is:
> {"vals":[100,500,600,700,800,200,900,300]}
>
> I get:
> +--------------------+
> |                vals|
> +--------------------+
> |[100, 500, 600, 7...|
> +--------------------+
>
> root
>  |-- vals: array (nullable = true)
>  |    |-- element: long (containsNull = true)
>
> and I am :)
>
> 2)
> If my JSON is:
> [100,500,600,700,800,200,900,300]
>
> I get:
> +--------------------+
> |     _corrupt_record|
> +--------------------+
> |[100,500,600,700,...|
> +--------------------+
>
> root
>  |-- _corrupt_record: string (nullable = true)
>
> Both are legit JSON structures… Do you think that #2 is a bug?
>
> jg
>
>
>
>
>
>


-- 
Luciano Resende
http://twitter.com/lresende1975
http://lresende.blogspot.com/

Re: JSON Arrays and Spark

Posted by Jean Georges Perrin <jg...@jgp.net>.
Thanks!

I am ok with strict rules (despite being French), but even:
[{
	"red": "#f00", 
	"green": "#0f0"
},{
	"red": "#f01", 
	"green": "#0f1"
}]

is not going through…

Is there a way to see what he does not like?

the JSON parser has been pretty good to me until recently.


> On Oct 10, 2016, at 12:59 PM, Sudhanshu Janghel <> wrote:
> 
> As far as my experience goes spark can parse only certain types of Json correctly not all and has strict Parsing rules unlike python
> 
> 
> On Oct 10, 2016 6:57 PM, "Jean Georges Perrin" <jgp@jgp.net <ma...@jgp.net>> wrote:
> Hi folks,
> 
> I am trying to parse JSON arrays and it’s getting a little crazy (for me at least)…
> 
> 1)
> If my JSON is:
> {"vals":[100,500,600,700,800,200,900,300]}
> 
> I get:
> +--------------------+
> |                vals|
> +--------------------+
> |[100, 500, 600, 7...|
> +--------------------+
> 
> root
>  |-- vals: array (nullable = true)
>  |    |-- element: long (containsNull = true)
> 
> and I am :)
> 
> 2)
> If my JSON is:
> [100,500,600,700,800,200,900,300]
> 
> I get:
> +--------------------+
> |     _corrupt_record|
> +--------------------+
> |[100,500,600,700,...|
> +--------------------+
> 
> root
>  |-- _corrupt_record: string (nullable = true)
> 
> Both are legit JSON structures… Do you think that #2 is a bug?
> 
> jg
> 
> 
> 
> 
> 
> 
> Disclaimer: The information in this email is confidential and may be legally privileged. Access to this email by anyone other than the intended addressee is unauthorized. If you are not the intended recipient of this message, any review, disclosure, copying, distribution, retention, or any action taken or omitted to be taken in reliance on it is prohibited and may be unlawful.