You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/03/09 10:00:43 UTC

[GitHub] [spark] HyukjinKwon commented on a change in pull request #27854: [SPARK-31065][SQL] Match schema_of_json to the schema inference of JSON data source

HyukjinKwon commented on a change in pull request #27854: [SPARK-31065][SQL] Match schema_of_json to the schema inference of JSON data source
URL: https://github.com/apache/spark/pull/27854#discussion_r389560513
 
 

 ##########
 File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 ##########
 @@ -777,7 +777,18 @@ case class SchemaOfJson(
   override def eval(v: InternalRow): Any = {
     val dt = Utils.tryWithResource(CreateJacksonParser.utf8String(jsonFactory, json)) { parser =>
       parser.nextToken()
-      jsonInferSchema.inferField(parser)
+      // To match with schema inference from JSON datasource.
+      jsonInferSchema.inferField(parser) match {
+        case st: StructType =>
+          jsonInferSchema.canonicalizeType(st, jsonOptions).getOrElse(StructType(Nil))
+        case at: ArrayType if at.elementType.isInstanceOf[StructType] =>
+          jsonInferSchema
+            .canonicalizeType(at.elementType, jsonOptions)
+            .map(ArrayType(_, containsNull = at.containsNull))
+            .getOrElse(ArrayType(StructType(Nil), containsNull = at.containsNull))
+        case other: DataType =>
+          jsonInferSchema.canonicalizeType(other, jsonOptions).getOrElse(StringType)
+      }
 
 Review comment:
   The reason why there are differences compared to JSON datasource is:
   
   1. JSON datasource always respects `StructType`. Array of JSON objects is still inferred as `StructType`. Other types are disallowed as a root type.
   2. `schema_of_json` infers the type as is. Array of JSON objects is inferred as `ArrayType(StructType)`. Other types are allowed as a root type.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org