You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/08/11 20:39:53 UTC

[GitHub] [spark] dtenedor commented on a diff in pull request #37431: [SPARK-40001][SQL] Add config to make DEFAULT values in JSON tables mutually exclusive with SQLConf.JSON_GENERATOR_IGNORE_NULL_FIELDS

dtenedor commented on code in PR #37431:
URL: https://github.com/apache/spark/pull/37431#discussion_r943921356


##########
sql/core/src/test/scala/org/apache/spark/sql/sources/InsertSuite.scala:
##########
@@ -1657,6 +1656,28 @@ class InsertSuite extends DataSourceTest with SharedSparkSession {
     }
   }
 
+  test("SPARK-40001 JSON DEFAULT columns require JSON_GENERATOR_IGNORE_NULL_FIELDS off") {
+    val error = "DEFAULT values are not supported for JSON tables"
+    withTable("t") {
+      assert(intercept[AnalysisException] {
+        sql("create table t (a int default 42) using json")

Review Comment:
   Thanks for the useful comment! This made me realize that other writers may create JSON with missing NULL values in the storage as well. I fixed this by:
   
   1) Reverting the analyzer changes in this PR
   2) Changed the new config to `DEFAULT_COLUMN_JSON_GENERATOR_FORCE_NULL_FIELDS` which overrides any other settings for target columns with DEFAULT values to always write explicit NULLs to storage.
   3) Updated the `DEFAULT_COLUMN_ALLOWED_PROVIDERS` config to ban `ALTER TABLE ADD COLUMN` commands with `DEFAULT` values for JSON tables, with a descriptive error message.
   
   This ensures correctness with JSON `DEFAULT` columns by always ensuring that new rows with NULL values get explicit NULLs written to the JSON storage, so that subsequent scans can tell the difference between those NULLs and any written `DEFAULT` values.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org