You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Chris Horn (JIRA)" <ji...@apache.org> on 2016/09/14 16:17:21 UTC

[jira] [Commented] (SPARK-15835) The read path of json doesn't support write path when schema contains Options

    [ https://issues.apache.org/jira/browse/SPARK-15835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15490836#comment-15490836 ] 

Chris Horn commented on SPARK-15835:
------------------------------------

You can work around this issue by providing the schema StructType up front to the JSON reader.

This also has the added benefit of not eagerly scanning the entire JSON data set to derive a schema.

{code}
scala> spark.read.schema(org.apache.spark.sql.Encoders.product[Bug].schema).json(path).as[Bug].collect
res8: Array[Bug] = Array(Bug(abc,None))

scala> spark.read.schema(org.apache.spark.sql.Encoders.product[Bug].schema).json(path).collect
res9: Array[org.apache.spark.sql.Row] = Array([abc,null])
{code}

> The read path of json doesn't support write path when schema contains Options
> -----------------------------------------------------------------------------
>
>                 Key: SPARK-15835
>                 URL: https://issues.apache.org/jira/browse/SPARK-15835
>             Project: Spark
>          Issue Type: Bug
>            Reporter: Burak Yavuz
>
> my schema contains optional fields. When these fields are written in json (and all of these records are None), the field will be omitted during writes. When reading, these fields can't be found and this throws an exception.
> Either during writes, the fields should be included as `null`, or the Dataset should not require the field to exist in the DataFrame if the field is an Option (which may be a better solution)
> {code}
> case class Bug(field1: String, field2: Option[String])
> Seq(Bug("abc", None)).toDS.write.json("/tmp/sqlBug")
> spark.read.json("/tmp/sqlBug").as[Bug]
> {code}
> stack trace:
> {code}
> org.apache.spark.sql.AnalysisException: cannot resolve '`field2`' given input columns: [field1]
> at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
> 	at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:62)
> 	at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:59)
> 	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:287)
> 	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:287)
> 	at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:68)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org