You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2018/08/28 03:52:00 UTC

[jira] [Updated] (SPARK-24391) from_json should support arrays of primitives, and more generally all JSON

     [ https://issues.apache.org/jira/browse/SPARK-24391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon updated SPARK-24391:
---------------------------------
    Summary: from_json should support arrays of primitives, and more generally all JSON   (was: to_json/from_json should support arrays of primitives, and more generally all JSON )

> from_json should support arrays of primitives, and more generally all JSON 
> ---------------------------------------------------------------------------
>
>                 Key: SPARK-24391
>                 URL: https://issues.apache.org/jira/browse/SPARK-24391
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.3.0
>            Reporter: Sam Kitajima-Kimbrel
>            Assignee: Maxim Gekk
>            Priority: Major
>             Fix For: 2.4.0
>
>
> https://issues.apache.org/jira/browse/SPARK-19849 and https://issues.apache.org/jira/browse/SPARK-21513 brought support for more column types to functions.to_json/from_json, but I also have cases where I'd like to simply (de)serialize an array of primitives to/from JSON when outputting to certain destinations, which does not work:
> {code:java}
> scala> import org.apache.spark.sql.functions._
> import org.apache.spark.sql.functions._
> scala> import spark.implicits._
> import spark.implicits._
> scala> val df = Seq("[1, 2, 3]").toDF("a")
> df: org.apache.spark.sql.DataFrame = [a: string]
> scala> val schema = new ArrayType(IntegerType, false)
> schema: org.apache.spark.sql.types.ArrayType = ArrayType(IntegerType,false)
> scala> df.select(from_json($"a", schema))
> org.apache.spark.sql.AnalysisException: cannot resolve 'jsontostructs(`a`)' due to data type mismatch: Input schema array<int> must be a struct or an array of structs.;;
> 'Project [jsontostructs(ArrayType(IntegerType,false), a#3, Some(America/Los_Angeles)) AS jsontostructs(a)#10]
> scala> val arrayDf = Seq(Array(1, 2, 3)).toDF("arr")
> arrayDf: org.apache.spark.sql.DataFrame = [arr: array<int>]
> scala> arrayDf.select(to_json($"arr"))
> org.apache.spark.sql.AnalysisException: cannot resolve 'structstojson(`arr`)' due to data type mismatch: Input type array<int> must be a struct, array of structs or a map or array of map.;;
> 'Project [structstojson(arr#19, Some(America/Los_Angeles)) AS structstojson(arr)#26]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org