You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by SparknewUser <me...@gmail.com> on 2015/07/29 15:37:00 UTC

How to read a Json file with a specific format?

I'm trying to read a Json file which is like :
[
{"IFAM":"EQR","KTM":1430006400000,"COL":21,"DATA":[{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"}
,{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"}
,{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"}
,{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"}
,{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"}
,{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"}
]}
,{"IFAM":"EQR","KTM":1430006400000,"COL":22,"DATA":[{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"}
,{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"}
,{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"}
,{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"}
,{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"}
,{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"}
]}
]

I've tried the command:
        val df = sqlContext.read.json("namefile")
        df.show()


But this does not work : my columns are not recognized...





--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-read-a-Json-file-with-a-specific-format-tp24061.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: How to read a Json file with a specific format?

Posted by Michael Armbrust <mi...@databricks.com>.
This isn't totally correct.  Spark SQL does support JSON arrays and will
implicitly flatten them.  However, complete objects or arrays must exist
one per line and cannot be split with newlines.

On Wed, Jul 29, 2015 at 7:55 AM, Young, Matthew T <matthew.t.young@intel.com
> wrote:

> The built-in Spark JSON functionality cannot read normal JSON arrays. The
> format it expects is a bunch of individual JSON objects without any outer
> array syntax, with one complete JSON object per line of the input file.
>
> AFAIK your options are to read the JSON in the driver and parallelize it
> out to the workers or to fix your input file to match the spec.
>
> For one-off conversions I usually use a combination of jq and
> regex-replaces to get the source file in the right format.
>
> ________________________________________
> From: SparknewUser [melanie.gallois92@gmail.com]
> Sent: Wednesday, July 29, 2015 6:37 AM
> To: user@spark.apache.org
> Subject: How to read a Json file with a specific format?
>
> I'm trying to read a Json file which is like :
> [
>
> {"IFAM":"EQR","KTM":1430006400000,"COL":21,"DATA":[{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"}
> ,{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"}
> ,{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"}
> ,{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"}
> ,{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"}
> ,{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"}
> ]}
>
> ,{"IFAM":"EQR","KTM":1430006400000,"COL":22,"DATA":[{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"}
> ,{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"}
> ,{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"}
> ,{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"}
> ,{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"}
> ,{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"}
> ]}
> ]
>
> I've tried the command:
>         val df = sqlContext.read.json("namefile")
>         df.show()
>
>
> But this does not work : my columns are not recognized...
>
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-read-a-Json-file-with-a-specific-format-tp24061.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>

RE: How to read a Json file with a specific format?

Posted by "Young, Matthew T" <ma...@intel.com>.
{"IFAM":"EQR","KTM":1430006400000,"COL":21,"DATA":[{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"},{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"},{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"},{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"},{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"},{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"}]}
{"IFAM":"EQR","KTM":1430006400000,"COL":22,"DATA":[{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"},{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"},{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"},{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"},{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"},{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"}]}




________________________________
From: mélanie gallois [melanie.gallois92@gmail.com]
Sent: Wednesday, July 29, 2015 8:10 AM
To: Young, Matthew T
Cc: user@spark.apache.org
Subject: Re: How to read a Json file with a specific format?

Can you give an example with my extract?

Mélanie Gallois

2015-07-29 16:55 GMT+02:00 Young, Matthew T <ma...@intel.com>>:
The built-in Spark JSON functionality cannot read normal JSON arrays. The format it expects is a bunch of individual JSON objects without any outer array syntax, with one complete JSON object per line of the input file.

AFAIK your options are to read the JSON in the driver and parallelize it out to the workers or to fix your input file to match the spec.

For one-off conversions I usually use a combination of jq and regex-replaces to get the source file in the right format.

________________________________________
From: SparknewUser [melanie.gallois92@gmail.com<ma...@gmail.com>]
Sent: Wednesday, July 29, 2015 6:37 AM
To: user@spark.apache.org<ma...@spark.apache.org>
Subject: How to read a Json file with a specific format?

I'm trying to read a Json file which is like :
[
{"IFAM":"EQR","KTM":1430006400000,"COL":21,"DATA":[{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"}
,{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"}
,{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"}
,{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"}
,{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"}
,{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"}
]}
,{"IFAM":"EQR","KTM":1430006400000,"COL":22,"DATA":[{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"}
,{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"}
,{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"}
,{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"}
,{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"}
,{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"}
]}
]

I've tried the command:
        val df = sqlContext.read.json("namefile")
        df.show()


But this does not work : my columns are not recognized...





--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-read-a-Json-file-with-a-specific-format-tp24061.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org<ma...@spark.apache.org>
For additional commands, e-mail: user-help@spark.apache.org<ma...@spark.apache.org>




--
Mélanie

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: How to read a Json file with a specific format?

Posted by mélanie gallois <me...@gmail.com>.
Can you give an example with my extract?

Mélanie Gallois

2015-07-29 16:55 GMT+02:00 Young, Matthew T <ma...@intel.com>:

> The built-in Spark JSON functionality cannot read normal JSON arrays. The
> format it expects is a bunch of individual JSON objects without any outer
> array syntax, with one complete JSON object per line of the input file.
>
> AFAIK your options are to read the JSON in the driver and parallelize it
> out to the workers or to fix your input file to match the spec.
>
> For one-off conversions I usually use a combination of jq and
> regex-replaces to get the source file in the right format.
>
> ________________________________________
> From: SparknewUser [melanie.gallois92@gmail.com]
> Sent: Wednesday, July 29, 2015 6:37 AM
> To: user@spark.apache.org
> Subject: How to read a Json file with a specific format?
>
> I'm trying to read a Json file which is like :
> [
>
> {"IFAM":"EQR","KTM":1430006400000,"COL":21,"DATA":[{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"}
> ,{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"}
> ,{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"}
> ,{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"}
> ,{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"}
> ,{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"}
> ]}
>
> ,{"IFAM":"EQR","KTM":1430006400000,"COL":22,"DATA":[{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"}
> ,{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"}
> ,{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"}
> ,{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"}
> ,{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"}
> ,{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"}
> ]}
> ]
>
> I've tried the command:
>         val df = sqlContext.read.json("namefile")
>         df.show()
>
>
> But this does not work : my columns are not recognized...
>
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-read-a-Json-file-with-a-specific-format-tp24061.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>


-- 
*Mélanie*

RE: How to read a Json file with a specific format?

Posted by "Young, Matthew T" <ma...@intel.com>.
The built-in Spark JSON functionality cannot read normal JSON arrays. The format it expects is a bunch of individual JSON objects without any outer array syntax, with one complete JSON object per line of the input file.

AFAIK your options are to read the JSON in the driver and parallelize it out to the workers or to fix your input file to match the spec.

For one-off conversions I usually use a combination of jq and regex-replaces to get the source file in the right format.

________________________________________
From: SparknewUser [melanie.gallois92@gmail.com]
Sent: Wednesday, July 29, 2015 6:37 AM
To: user@spark.apache.org
Subject: How to read a Json file with a specific format?

I'm trying to read a Json file which is like :
[
{"IFAM":"EQR","KTM":1430006400000,"COL":21,"DATA":[{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"}
,{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"}
,{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"}
,{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"}
,{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"}
,{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"}
]}
,{"IFAM":"EQR","KTM":1430006400000,"COL":22,"DATA":[{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"}
,{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"}
,{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"}
,{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"}
,{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"}
,{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"}
]}
]

I've tried the command:
        val df = sqlContext.read.json("namefile")
        df.show()


But this does not work : my columns are not recognized...





--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-read-a-Json-file-with-a-specific-format-tp24061.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org