You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Chetan Khatri <ch...@gmail.com> on 2018/11/22 02:29:24 UTC
How to Keep Null values in Parquet
Hello Spark Users,
I have a Dataframe with some of Null Values, When I am writing to parquet
it is failing with below error:
Caused by: java.lang.RuntimeException: Unsupported data type NullType.
at scala.sys.package$.error(package.scala:27)
at org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport.org$apache$spark$sql$execution$datasources$parquet$ParquetWriteSupport$$makeWriter(ParquetWriteSupport.scala:206)
at org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport$$anonfun$init$2.apply(ParquetWriteSupport.scala:93)
at org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport$$anonfun$init$2.apply(ParquetWriteSupport.scala:93)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.immutable.List.map(List.scala:285)
at org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport.init(ParquetWriteSupport.scala:93)
at org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:341)
at org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:302)
at org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.<init>(ParquetOutputWriter.scala:37)
at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anon$1.newInstance(ParquetFileFormat.scala:151)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.newOutputWriter(FileFormatWriter.scala:367)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.execute(FileFormatWriter.scala:378)
Thanks
Re: How to Keep Null values in Parquet
Posted by Chetan Khatri <ch...@gmail.com>.
Hello Soumya,
Thanks for quick response, I haven't tried. I am doing now and see.
On Thu, Nov 22, 2018 at 8:13 AM Soumya D. Sanyal <so...@soumyadsanyal.com>
wrote:
> Hi Chetan,
>
> Have you tried casting the null values/columns to a supported type — e.g.
> `StringType`, `IntegerType`, etc?
>
> See also https://issues.apache.org/jira/browse/SPARK-10943.
>
> — Soumya
>
>
> On Nov 21, 2018, at 9:29 PM, Chetan Khatri <ch...@gmail.com>
> wrote:
>
> Hello Spark Users,
>
> I have a Dataframe with some of Null Values, When I am writing to parquet
> it is failing with below error:
>
> Caused by: java.lang.RuntimeException: Unsupported data type NullType.
> at scala.sys.package$.error(package.scala:27)
> at org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport.org <http://org.apache.spark.sql.execution.datasources.parquet.parquetwritesupport.org/>$apache$spark$sql$execution$datasources$parquet$ParquetWriteSupport$$makeWriter(ParquetWriteSupport.scala:206)
> at org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport$$anonfun$init$2.apply(ParquetWriteSupport.scala:93)
> at org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport$$anonfun$init$2.apply(ParquetWriteSupport.scala:93)
> at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> at scala.collection.immutable.List.foreach(List.scala:381)
> at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
> at scala.collection.immutable.List.map(List.scala:285)
> at org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport.init(ParquetWriteSupport.scala:93)
> at org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:341)
> at org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:302)
> at org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.<init>(ParquetOutputWriter.scala:37)
> at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anon$1.newInstance(ParquetFileFormat.scala:151)
> at org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.newOutputWriter(FileFormatWriter.scala:367)
> at org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.execute(FileFormatWriter.scala:378)
>
> Thanks
>
>
>
>
Re: How to Keep Null values in Parquet
Posted by "Soumya D. Sanyal" <so...@soumyadsanyal.com>.
Hi Chetan,
Have you tried casting the null values/columns to a supported type — e.g. `StringType`, `IntegerType`, etc?
See also https://issues.apache.org/jira/browse/SPARK-10943 <https://issues.apache.org/jira/browse/SPARK-10943>.
— Soumya
> On Nov 21, 2018, at 9:29 PM, Chetan Khatri <ch...@gmail.com> wrote:
>
> Hello Spark Users,
>
> I have a Dataframe with some of Null Values, When I am writing to parquet it is failing with below error:
>
> Caused by: java.lang.RuntimeException: Unsupported data type NullType.
> at scala.sys.package$.error(package.scala:27)
> at org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport.org <http://org.apache.spark.sql.execution.datasources.parquet.parquetwritesupport.org/>$apache$spark$sql$execution$datasources$parquet$ParquetWriteSupport$$makeWriter(ParquetWriteSupport.scala:206)
> at org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport$$anonfun$init$2.apply(ParquetWriteSupport.scala:93)
> at org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport$$anonfun$init$2.apply(ParquetWriteSupport.scala:93)
> at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> at scala.collection.immutable.List.foreach(List.scala:381)
> at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
> at scala.collection.immutable.List.map(List.scala:285)
> at org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport.init(ParquetWriteSupport.scala:93)
> at org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:341)
> at org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:302)
> at org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.<init>(ParquetOutputWriter.scala:37)
> at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anon$1.newInstance(ParquetFileFormat.scala:151)
> at org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.newOutputWriter(FileFormatWriter.scala:367)
> at org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.execute(FileFormatWriter.scala:378)
> Thanks
>