You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Mahender Sarangam <ma...@outlook.com> on 2018/10/01 08:59:40 UTC

Unable to read multiple JSON.Gz File.

I’m trying to read multiple .json.gz files from a Blob storage path using the below scala code. But I’m unable to read the data from the files or print the schema. If the files are not compressed as .gz then we are able to read all the files into the Dataframe.
I’ve even tried giving *.gz but no luck.
 val df = spark.read.json("wasb://XYZ@AzureStorage.blob.core.windows.net/sourcePath/"<mailto:wasb://XYZ@AzureStorage.blob.core.windows.net/sourcePath/>)

Re: Unable to read multiple JSON.Gz File.

Posted by Mahender Sarangam <ma...@outlook.com>.
Hi Jyoti,

We are using HDInsight Spark 2.2 . Is there any setting differences for latest version of cluster


/mahender



On 10/2/2018 1:48 PM, Jyoti Ranjan Mahapatra wrote:
Hi Mahendar,
Which version of spark and Hadoop are you using?
I tried it on spark2.3.1 with Hadoop 2.7.3 and it works for a folder containing multiple gz files.


From: Mahender Sarangam <ma...@outlook.com>
Sent: Monday, October 1, 2018 2:00 AM
To: user@spark.apache.org<ma...@spark.apache.org>
Subject: Unable to read multiple JSON.Gz File.



I’m trying to read multiple .json.gz files from a Blob storage path using the below scala code. But I’m unable to read the data from the files or print the schema. If the files are not compressed as .gz then we are able to read all the files into the Dataframe.
I’ve even tried giving *.gz but no luck.
 val df = spark.read.json("wasb://XYZ@AzureStorage.blob.core.windows.net/sourcePath/"<mailto:wasb://XYZ@AzureStorage.blob.core.windows.net/sourcePath/>)

RE: Unable to read multiple JSON.Gz File.

Posted by Jyoti Ranjan Mahapatra <jy...@microsoft.com.INVALID>.
Hi Mahendar,
Which version of spark and Hadoop are you using?
I tried it on spark2.3.1 with Hadoop 2.7.3 and it works for a folder containing multiple gz files.


From: Mahender Sarangam <ma...@outlook.com>
Sent: Monday, October 1, 2018 2:00 AM
To: user@spark.apache.org
Subject: Unable to read multiple JSON.Gz File.



I’m trying to read multiple .json.gz files from a Blob storage path using the below scala code. But I’m unable to read the data from the files or print the schema. If the files are not compressed as .gz then we are able to read all the files into the Dataframe.
I’ve even tried giving *.gz but no luck.
 val df = spark.read.json("wasb://XYZ@AzureStorage.blob.core.windows.net/sourcePath/"<mailto:wasb://XYZ@AzureStorage.blob.core.windows.net/sourcePath/>)