You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "stevomitric (via GitHub)" <gi...@apache.org> on 2024/03/28 21:41:25 UTC

[PR] [SPARK-47634] Add legacy support for disabling map key normalization [spark]

stevomitric opened a new pull request, #45760:
URL: https://github.com/apache/spark/pull/45760

   ### What changes were proposed in this pull request?
   Added a `DISABLE_MAP_KEY_NORMALIZATION` option in `SQLConf` to allow for legacy creation of a map without key normalization (keys `0.0` and `-0.0`) in `ArrayBasedMapBuilder`.
   
   
   ### Why are the changes needed?
   As a legacy fallback option.
   
   
   ### Does this PR introduce _any_ user-facing change?
   New `DISABLE_MAP_KEY_NORMALIZATION` config option.
   
   
   ### How was this patch tested?
   New UT proposed in this PR
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   No
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47634][SQL] Add legacy support for disabling map key normalization [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #45760:
URL: https://github.com/apache/spark/pull/45760#discussion_r1543977704


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayBasedMapBuilder.scala:
##########
@@ -64,7 +64,7 @@ class ArrayBasedMapBuilder(keyType: DataType, valueType: DataType) extends Seria
       throw QueryExecutionErrors.nullAsMapKeyNotAllowedError()
     }
 
-    val keyNormalized = keyNormalizer(key)
+    val keyNormalized = if (SQLConf.get.disableMapKeyNormalization) key else keyNormalizer(key)

Review Comment:
   can we check this config when creating `keyNormalizer`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47634][SQL] Add legacy support for disabling map key normalization [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on PR #45760:
URL: https://github.com/apache/spark/pull/45760#issuecomment-2031382893

   thanks, merging to master! (the last commit only touches migration guide which doesn't need to go through CI)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47634][SQL] Add legacy support for disabling map key normalization [spark]

Posted by "stevomitric (via GitHub)" <gi...@apache.org>.
stevomitric commented on code in PR #45760:
URL: https://github.com/apache/spark/pull/45760#discussion_r1547337031


##########
docs/sql-migration-guide.md:
##########
@@ -24,6 +24,7 @@ license: |
 
 ## Upgrading from Spark SQL 3.5 to 4.0
 
+- Since Spark 4.0, the default behaviour when inserting elements in a map is changed to first apply key normalization. The affected SQL functions are `create_map`, `map_from_arrays`, `map_from_entries`, and `map_concat`. To restore the previous behaviour, set `spark.sql.legacy.disableMapKeyNormalization` to `true`.

Review Comment:
   Changed to: 
   - Since Spark 4.0, the default behaviour when inserting elements in a map is changed to first normalize keys -0.0 to 0.0. The affected SQL functions are `create_map`, `map_from_arrays`, `map_from_entries`, and `map_concat`. To restore the previous behaviour, set `spark.sql.legacy.disableMapKeyNormalization` to `true`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47634][SQL] Add legacy support for disabling map key normalization [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #45760:
URL: https://github.com/apache/spark/pull/45760#discussion_r1543985843


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##########
@@ -3247,6 +3247,13 @@ object SQLConf {
       .stringConf
       .createWithDefault("")
 
+  val DISABLE_MAP_KEY_NORMALIZATION =
+    buildConf("spark.sql.legacy.disableMapKeyNormalization")
+      .doc("Disables key normalization when creating a map with `ArrayBasedMapBuilder`.")

Review Comment:
   Can we add some more description? What does this mean to the end users?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47634][SQL] Add legacy support for disabling map key normalization [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan closed pull request #45760: [SPARK-47634][SQL] Add legacy support for disabling map key normalization
URL: https://github.com/apache/spark/pull/45760


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47634][SQL] Add legacy support for disabling map key normalization [spark]

Posted by "stevomitric (via GitHub)" <gi...@apache.org>.
stevomitric commented on PR #45760:
URL: https://github.com/apache/spark/pull/45760#issuecomment-2026901257

   Changed `DISABLE_MAP_KEY_NORMALIZATION ` value to `internal` since all configs in legacy namespace need to be so.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47634][SQL] Add legacy support for disabling map key normalization [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #45760:
URL: https://github.com/apache/spark/pull/45760#discussion_r1543978123


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##########
@@ -5454,6 +5461,8 @@ class SQLConf extends Serializable with Logging with SqlApiConf {
   def disabledV2StreamingMicroBatchReaders: String =
     getConf(DISABLED_V2_STREAMING_MICROBATCH_READERS)
 
+  def disableMapKeyNormalization: Boolean = getConf(DISABLE_MAP_KEY_NORMALIZATION)

Review Comment:
   If it's only called by one place, we can just inline it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47634][SQL] Add legacy support for disabling map key normalization [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on PR #45760:
URL: https://github.com/apache/spark/pull/45760#issuecomment-2026506170

   let's also mention this change in `docs/sql-migration-guide.md`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47634][SQL] Add legacy support for disabling map key normalization [spark]

Posted by "stevomitric (via GitHub)" <gi...@apache.org>.
stevomitric commented on code in PR #45760:
URL: https://github.com/apache/spark/pull/45760#discussion_r1544253822


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##########
@@ -5454,6 +5461,8 @@ class SQLConf extends Serializable with Logging with SqlApiConf {
   def disabledV2StreamingMicroBatchReaders: String =
     getConf(DISABLED_V2_STREAMING_MICROBATCH_READERS)
 
+  def disableMapKeyNormalization: Boolean = getConf(DISABLE_MAP_KEY_NORMALIZATION)

Review Comment:
   added `@inline` - or do we just inline it manually where its called?
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47634][SQL] Add legacy support for disabling map key normalization [spark]

Posted by "stevomitric (via GitHub)" <gi...@apache.org>.
stevomitric commented on code in PR #45760:
URL: https://github.com/apache/spark/pull/45760#discussion_r1544253456


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayBasedMapBuilder.scala:
##########
@@ -64,7 +64,7 @@ class ArrayBasedMapBuilder(keyType: DataType, valueType: DataType) extends Seria
       throw QueryExecutionErrors.nullAsMapKeyNotAllowedError()
     }
 
-    val keyNormalized = keyNormalizer(key)
+    val keyNormalized = if (SQLConf.get.disableMapKeyNormalization) key else keyNormalizer(key)

Review Comment:
   Refactored `keyNormalizer` to match the config as well.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47634][SQL] Add legacy support for disabling map key normalization [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #45760:
URL: https://github.com/apache/spark/pull/45760#discussion_r1546044904


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##########
@@ -5454,6 +5461,8 @@ class SQLConf extends Serializable with Logging with SqlApiConf {
   def disabledV2StreamingMicroBatchReaders: String =
     getConf(DISABLED_V2_STREAMING_MICROBATCH_READERS)
 
+  def disableMapKeyNormalization: Boolean = getConf(DISABLE_MAP_KEY_NORMALIZATION)

Review Comment:
   I mean manually inline it. Do not create this method.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47634][SQL] Add legacy support for disabling map key normalization [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #45760:
URL: https://github.com/apache/spark/pull/45760#discussion_r1546046412


##########
docs/sql-migration-guide.md:
##########
@@ -24,6 +24,7 @@ license: |
 
 ## Upgrading from Spark SQL 3.5 to 4.0
 
+- Since Spark 4.0, the default behaviour when inserting map elements in `ArrayBasedMapBuilder` is changed to first apply key normalization. To restore the previous behaviour, set `spark.sql.legacy.disableMapKeyNormalization` to `true`.

Review Comment:
   let's be more user-facing. The affected SQL functions are `create_map`, `map_from_arrays`, `map_from_entries`, and `map_concat`. `ArrayBasedMapBuilder` is an internal thing and we should not mention it to the end users.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47634][SQL] Add legacy support for disabling map key normalization [spark]

Posted by "stevomitric (via GitHub)" <gi...@apache.org>.
stevomitric commented on code in PR #45760:
URL: https://github.com/apache/spark/pull/45760#discussion_r1546053807


##########
docs/sql-migration-guide.md:
##########
@@ -24,6 +24,7 @@ license: |
 
 ## Upgrading from Spark SQL 3.5 to 4.0
 
+- Since Spark 4.0, the default behaviour when inserting map elements in `ArrayBasedMapBuilder` is changed to first apply key normalization. To restore the previous behaviour, set `spark.sql.legacy.disableMapKeyNormalization` to `true`.

Review Comment:
   Added: 
   
   > The affected SQL functions are create_map, map_from_arrays, map_from_entries, and map_concat.
   
   And removed `ArrayBasedMapBuilder`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47634][SQL] Add legacy support for disabling map key normalization [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #45760:
URL: https://github.com/apache/spark/pull/45760#discussion_r1546405837


##########
docs/sql-migration-guide.md:
##########
@@ -24,6 +24,7 @@ license: |
 
 ## Upgrading from Spark SQL 3.5 to 4.0
 
+- Since Spark 4.0, the default behaviour when inserting elements in a map is changed to first apply key normalization. The affected SQL functions are `create_map`, `map_from_arrays`, `map_from_entries`, and `map_concat`. To restore the previous behaviour, set `spark.sql.legacy.disableMapKeyNormalization` to `true`.

Review Comment:
   let's be specific about `key normalization`. `... is changed to first normalize -0.0 to 0.0.`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org