You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/10/13 16:49:08 UTC

[GitHub] [spark] MaxGekk opened a new pull request #30032: [SPARK-33134][SQL][3.0] Return partial results only for root JSON objects

MaxGekk opened a new pull request #30032:
URL: https://github.com/apache/spark/pull/30032


   ### What changes were proposed in this pull request?
   In the PR, I propose to restrict the partial result feature only by root JSON objects. JSON datasource as well as `from_json()` will return `null` for malformed nested JSON objects.
   
   ### Why are the changes needed?
   1. To not raise exception to users in the PERMISSIVE mode
   2. To fix a regression and to have the same behavior as Spark 2.4.x has
   3. Current implementation of partial result is supposed to work only for root (top-level) JSON objects, and not tested for bad nested complex JSON fields.
   
   ### Does this PR introduce _any_ user-facing change?
   Yes. Before the changes, the code below:
   ```scala
       val pokerhand_raw = Seq("""[{"cards": [11], "playerId": 583651}]""").toDF("events")
       val event = new StructType().add("playerId", LongType).add("cards", ArrayType(new StructType().add("id", LongType).add("rank", StringType)))
       val pokerhand_events = pokerhand_raw.select(from_json($"events", ArrayType(event)).as("event"))
       pokerhand_events.show
   ```
   throws the exception even in the default **PERMISSIVE** mode:
   ```java
   java.lang.ClassCastException: java.lang.Long cannot be cast to org.apache.spark.sql.catalyst.util.ArrayData
     at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow.getArray(rows.scala:48)
     at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow.getArray$(rows.scala:48)
     at org.apache.spark.sql.catalyst.expressions.GenericInternalRow.getArray(rows.scala:195)
   ```
   
   After the changes:
   ```
   +-----+
   |event|
   +-----+
   | null|
   +-----+
   ```
   
   ### How was this patch tested?
   Added a test to `JsonFunctionsSuite`.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon closed pull request #30032: [SPARK-33134][SQL][3.0] Return partial results only for root JSON objects

Posted by GitBox <gi...@apache.org>.
HyukjinKwon closed pull request #30032:
URL: https://github.com/apache/spark/pull/30032


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #30032: [SPARK-33134][SQL][3.0] Return partial results only for root JSON objects

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #30032:
URL: https://github.com/apache/spark/pull/30032#issuecomment-708129068


   Merged to branch-3.0.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org