You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/02/06 14:36:06 UTC
[GitHub] [spark] xuanyuanking opened a new pull request #27478:
[SPARK-25829][SQL] Add config
`spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default
behavior
xuanyuanking opened a new pull request #27478: [SPARK-25829][SQL] Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default behavior
URL: https://github.com/apache/spark/pull/27478
### What changes were proposed in this pull request?
This is a follow-up for #23124, add a new config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` to control the behavior of removing duplicated map keys in build-in functions. With the default value `false`, Spark will throw a RuntimeException while duplicated keys are found.
### Why are the changes needed?
Prevent silent behavior changes.
### Does this PR introduce any user-facing change?
Yes, new config added and the default behavior for duplicated map keys changed to RuntimeException thrown.
### How was this patch tested?
Modify existing UT.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27478: [SPARK-25829][SQL]
Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default
behavior
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-586568447
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23226/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27478:
[SPARK-25829][SQL] Add config
`spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default
behavior
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-583705939
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27478:
[SPARK-25829][SQL] Add config
`spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default
behavior
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-583685459
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/22818/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27478: [SPARK-25829][SQL]
Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change
the default behavior
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-583050788
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/117998/
Test FAILed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #27478:
[SPARK-25829][SQL] Add config
`spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default
behavior
Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #27478: [SPARK-25829][SQL] Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#discussion_r376213630
##########
File path: docs/sql-migration-guide.md
##########
@@ -49,7 +49,7 @@ license: |
- In Spark version 2.4 and earlier, float/double -0.0 is semantically equal to 0.0, but -0.0 and 0.0 are considered as different values when used in aggregate grouping keys, window partition keys and join keys. Since Spark 3.0, this bug is fixed. For example, `Seq(-0.0, 0.0).toDF("d").groupBy("d").count()` returns `[(0.0, 2)]` in Spark 3.0, and `[(0.0, 1), (-0.0, 1)]` in Spark 2.4 and earlier.
- - In Spark version 2.4 and earlier, users can create a map with duplicated keys via built-in functions like `CreateMap`, `StringToMap`, etc. The behavior of map with duplicated keys is undefined, e.g. map look up respects the duplicated key appears first, `Dataset.collect` only keeps the duplicated key appears last, `MapKeys` returns duplicated keys, etc. Since Spark 3.0, these built-in functions will remove duplicated map keys with last wins policy. Users may still read map values with duplicated keys from data sources which do not enforce it (e.g. Parquet), the behavior will be undefined.
+ - In Spark version 2.4 and earlier, users can create a map with duplicated keys via built-in functions like `CreateMap`, `StringToMap`, etc. The behavior of map with duplicated keys is undefined, e.g. map look up respects the duplicated key appears first, `Dataset.collect` only keeps the duplicated key appears last, `MapKeys` returns duplicated keys, etc. Since Spark 3.0, new config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` was added, with the default value `false`, Spark will throw RuntimeException while duplicated keys are found. If set to `true`, these built-in functions will remove duplicated map keys with last wins policy. Users may still read map values with duplicated keys from data sources which do not enforce it (e.g. Parquet), the behavior will be undefined.
Review comment:
Hi, @cloud-fan , @gatorsmile , @rxin , @marmbrus .
So, `RuntimeException by default` is better in Apache Spark 3.0.0?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #27478:
[SPARK-25829][SQL] Add config
`spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default
behavior
Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #27478: [SPARK-25829][SQL] Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#discussion_r376163252
##########
File path: python/pyspark/sql/functions.py
##########
@@ -2766,13 +2766,15 @@ def map_concat(*cols):
:param cols: list of column names (string) or list of :class:`Column` expressions
>>> from pyspark.sql.functions import map_concat
+ >>> spark.conf.set("spark.sql.deduplicateMapKey.lastWinsPolicy.enabled", "true")
Review comment:
I would don't set this configuration, and change:
```diff
- >>> df = spark.sql("SELECT map(1, 'a', 2, 'b') as map1, map(3, 'c', 1, 'd') as map2")
+ >>> df = spark.sql("SELECT map(1, 'a', 2, 'b') as map1, map(3, 'c') as map2")
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27478: [SPARK-25829][SQL] Add
config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the
default behavior
Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-583050397
**[Test build #117998 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/117998/testReport)** for PR 27478 at commit [`09cfb67`](https://github.com/apache/spark/commit/09cfb674f835015c9de15b24ea958f5fb47fb915).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27478: [SPARK-25829][SQL] Add
config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the
default behavior
Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-583557917
**[Test build #118040 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118040/testReport)** for PR 27478 at commit [`fe1fb0f`](https://github.com/apache/spark/commit/fe1fb0f28bf7a448051b46517dc6ceed4fd78288).
* This patch **fails PySpark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27478: [SPARK-25829][SQL]
Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change
the default behavior
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-583050774
Merged build finished. Test FAILed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27478:
[SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and
change the default behavior
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-586568447
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23226/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] xuanyuanking commented on issue #27478: [SPARK-25829][SQL]
Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default
behavior
Posted by GitBox <gi...@apache.org>.
xuanyuanking commented on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-586908624
retest this please
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27478: [SPARK-25829][SQL]
Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default
behavior
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-587120130
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118578/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27478:
[SPARK-25829][SQL] Add config
`spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default
behavior
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-586241007
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23181/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #27478:
[SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and
change the default behavior
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #27478: [SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#discussion_r379831096
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayBasedMapBuilder.scala
##########
@@ -63,6 +68,11 @@ class ArrayBasedMapBuilder(keyType: DataType, valueType: DataType) extends Seria
keys.append(key)
values.append(value)
} else {
+ if (!allowDuplicatedMapKey) {
+ throw new RuntimeException(s"Duplicate map key $key was founded, please set " +
Review comment:
We shouldn't recommend users to set legacy config to fix a problem. We should ask them to check the input data to see why there are duplicated keys. If they want to remove duplicated keys, tell them that they can set the legacy config which uses last wins policy.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan closed pull request #27478: [SPARK-25829][SQL]
Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default
behavior
Posted by GitBox <gi...@apache.org>.
cloud-fan closed pull request #27478: [SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default behavior
URL: https://github.com/apache/spark/pull/27478
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27478: [SPARK-25829][SQL] Add
config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default
behavior
Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-586976283
**[Test build #118578 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118578/testReport)** for PR 27478 at commit [`feb8c0a`](https://github.com/apache/spark/commit/feb8c0aa28034c166f6413bd1eec58a885848192).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #27478:
[SPARK-25829][SQL] Add config
`spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default
behavior
Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #27478: [SPARK-25829][SQL] Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#discussion_r376253512
##########
File path: docs/sql-migration-guide.md
##########
@@ -49,7 +49,7 @@ license: |
- In Spark version 2.4 and earlier, float/double -0.0 is semantically equal to 0.0, but -0.0 and 0.0 are considered as different values when used in aggregate grouping keys, window partition keys and join keys. Since Spark 3.0, this bug is fixed. For example, `Seq(-0.0, 0.0).toDF("d").groupBy("d").count()` returns `[(0.0, 2)]` in Spark 3.0, and `[(0.0, 1), (-0.0, 1)]` in Spark 2.4 and earlier.
- - In Spark version 2.4 and earlier, users can create a map with duplicated keys via built-in functions like `CreateMap`, `StringToMap`, etc. The behavior of map with duplicated keys is undefined, e.g. map look up respects the duplicated key appears first, `Dataset.collect` only keeps the duplicated key appears last, `MapKeys` returns duplicated keys, etc. Since Spark 3.0, these built-in functions will remove duplicated map keys with last wins policy. Users may still read map values with duplicated keys from data sources which do not enforce it (e.g. Parquet), the behavior will be undefined.
+ - In Spark version 2.4 and earlier, users can create a map with duplicated keys via built-in functions like `CreateMap`, `StringToMap`, etc. The behavior of map with duplicated keys is undefined, e.g. map look up respects the duplicated key appears first, `Dataset.collect` only keeps the duplicated key appears last, `MapKeys` returns duplicated keys, etc. Since Spark 3.0, new config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` was added, with the default value `false`, Spark will throw RuntimeException while duplicated keys are found. If set to `true`, these built-in functions will remove duplicated map keys with last wins policy. Users may still read map values with duplicated keys from data sources which do not enforce it (e.g. Parquet), the behavior will be undefined.
Review comment:
Should we mention this `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` config is only useful for migration and could be removed in future version?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27478:
[SPARK-25829][SQL] Add config
`spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default
behavior
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-583050788
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/117998/
Test FAILed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27478: [SPARK-25829][SQL]
Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default
behavior
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-586864334
Merged build finished. Test FAILed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27478:
[SPARK-25829][SQL] Add config
`spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default
behavior
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-583155463
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/22770/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27478: [SPARK-25829][SQL]
Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default
behavior
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-586587027
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118468/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27478: [SPARK-25829][SQL]
Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change
the default behavior
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-583453786
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/22805/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27478:
[SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and
change the default behavior
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-586587025
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] xuanyuanking commented on a change in pull request #27478:
[SPARK-25829][SQL] Add config
`spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default
behavior
Posted by GitBox <gi...@apache.org>.
xuanyuanking commented on a change in pull request #27478: [SPARK-25829][SQL] Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#discussion_r376678023
##########
File path: docs/sql-migration-guide.md
##########
@@ -49,7 +49,7 @@ license: |
- In Spark version 2.4 and earlier, float/double -0.0 is semantically equal to 0.0, but -0.0 and 0.0 are considered as different values when used in aggregate grouping keys, window partition keys and join keys. Since Spark 3.0, this bug is fixed. For example, `Seq(-0.0, 0.0).toDF("d").groupBy("d").count()` returns `[(0.0, 2)]` in Spark 3.0, and `[(0.0, 1), (-0.0, 1)]` in Spark 2.4 and earlier.
- - In Spark version 2.4 and earlier, users can create a map with duplicated keys via built-in functions like `CreateMap`, `StringToMap`, etc. The behavior of map with duplicated keys is undefined, e.g. map look up respects the duplicated key appears first, `Dataset.collect` only keeps the duplicated key appears last, `MapKeys` returns duplicated keys, etc. Since Spark 3.0, these built-in functions will remove duplicated map keys with last wins policy. Users may still read map values with duplicated keys from data sources which do not enforce it (e.g. Parquet), the behavior will be undefined.
+ - In Spark version 2.4 and earlier, users can create a map with duplicated keys via built-in functions like `CreateMap`, `StringToMap`, etc. The behavior of map with duplicated keys is undefined, e.g. map look up respects the duplicated key appears first, `Dataset.collect` only keeps the duplicated key appears last, `MapKeys` returns duplicated keys, etc. Since Spark 3.0, new config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` was added, with the default value `false`, Spark will throw RuntimeException while duplicated keys are found. If set to `true`, these built-in functions will remove duplicated map keys with last wins policy. Users may still read map values with duplicated keys from data sources which do not enforce it (e.g. Parquet), the behavior will be undefined.
Review comment:
Agree to have a guideline if we need the new rule, but `correctness bug fixes` might not the issue we discuss here? If it's a bug, why we need to have a legacy fallback config.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #27478:
[SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and
change the default behavior
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-586976283
**[Test build #118578 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118578/testReport)** for PR 27478 at commit [`feb8c0a`](https://github.com/apache/spark/commit/feb8c0aa28034c166f6413bd1eec58a885848192).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #27478:
[SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and
change the default behavior
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-586854920
**[Test build #118547 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118547/testReport)** for PR 27478 at commit [`feb8c0a`](https://github.com/apache/spark/commit/feb8c0aa28034c166f6413bd1eec58a885848192).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27478: [SPARK-25829][SQL]
Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change
the default behavior
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-583155458
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27478: [SPARK-25829][SQL] Add
config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the
default behavior
Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-583452915
**[Test build #118040 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118040/testReport)** for PR 27478 at commit [`fe1fb0f`](https://github.com/apache/spark/commit/fe1fb0f28bf7a448051b46517dc6ceed4fd78288).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27478: [SPARK-25829][SQL] Add
config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the
default behavior
Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-586235313
**[Test build #118423 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118423/testReport)** for PR 27478 at commit [`f622180`](https://github.com/apache/spark/commit/f622180e40ed984d593bc95ad2a1b7ad96f2af1c).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27478:
[SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and
change the default behavior
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-586864352
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118537/
Test FAILed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #27478:
[SPARK-25829][SQL] Add config
`spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default
behavior
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-586235313
**[Test build #118423 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118423/testReport)** for PR 27478 at commit [`f622180`](https://github.com/apache/spark/commit/f622180e40ed984d593bc95ad2a1b7ad96f2af1c).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] xuanyuanking commented on a change in pull request #27478:
[SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and
change the default behavior
Posted by GitBox <gi...@apache.org>.
xuanyuanking commented on a change in pull request #27478: [SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#discussion_r379999008
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayBasedMapBuilder.scala
##########
@@ -63,6 +68,11 @@ class ArrayBasedMapBuilder(keyType: DataType, valueType: DataType) extends Seria
keys.append(key)
values.append(value)
} else {
+ if (!allowDuplicatedMapKey) {
+ throw new RuntimeException(s"Duplicate map key $key was founded, please set " +
Review comment:
Thanks, done in 905ae96.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27478:
[SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and
change the default behavior
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-586834168
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27478: [SPARK-25829][SQL]
Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default
behavior
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-586970413
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118567/
Test FAILed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] xuanyuanking commented on a change in pull request #27478:
[SPARK-25829][SQL] Add config
`spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default
behavior
Posted by GitBox <gi...@apache.org>.
xuanyuanking commented on a change in pull request #27478: [SPARK-25829][SQL] Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#discussion_r376438874
##########
File path: docs/sql-migration-guide.md
##########
@@ -49,7 +49,7 @@ license: |
- In Spark version 2.4 and earlier, float/double -0.0 is semantically equal to 0.0, but -0.0 and 0.0 are considered as different values when used in aggregate grouping keys, window partition keys and join keys. Since Spark 3.0, this bug is fixed. For example, `Seq(-0.0, 0.0).toDF("d").groupBy("d").count()` returns `[(0.0, 2)]` in Spark 3.0, and `[(0.0, 1), (-0.0, 1)]` in Spark 2.4 and earlier.
- - In Spark version 2.4 and earlier, users can create a map with duplicated keys via built-in functions like `CreateMap`, `StringToMap`, etc. The behavior of map with duplicated keys is undefined, e.g. map look up respects the duplicated key appears first, `Dataset.collect` only keeps the duplicated key appears last, `MapKeys` returns duplicated keys, etc. Since Spark 3.0, these built-in functions will remove duplicated map keys with last wins policy. Users may still read map values with duplicated keys from data sources which do not enforce it (e.g. Parquet), the behavior will be undefined.
+ - In Spark version 2.4 and earlier, users can create a map with duplicated keys via built-in functions like `CreateMap`, `StringToMap`, etc. The behavior of map with duplicated keys is undefined, e.g. map look up respects the duplicated key appears first, `Dataset.collect` only keeps the duplicated key appears last, `MapKeys` returns duplicated keys, etc. Since Spark 3.0, new config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` was added, with the default value `false`, Spark will throw RuntimeException while duplicated keys are found. If set to `true`, these built-in functions will remove duplicated map keys with last wins policy. Users may still read map values with duplicated keys from data sources which do not enforce it (e.g. Parquet), the behavior will be undefined.
Review comment:
+1 for the idea of highlight this config is migration only.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27478:
[SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and
change the default behavior
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-586859653
Merged build finished. Test FAILed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27478: [SPARK-25829][SQL]
Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default
behavior
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-586859653
Merged build finished. Test FAILed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] gengliangwang edited a comment on issue #27478:
[SPARK-25829][SQL] Add config
`spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default
behavior
Posted by GitBox <gi...@apache.org>.
gengliangwang edited a comment on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-585039004
I searched "deduplicate map key" and there is no matching result.
Since there will be runtime exception on duplicated keys, how about renaming the config to `spark.sql.allowDuplicatedMapKeys.enabled`?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27478: [SPARK-25829][SQL]
Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default
behavior
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-586568444
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27478: [SPARK-25829][SQL] Add
config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default
behavior
Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-586970149
**[Test build #118567 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118567/testReport)** for PR 27478 at commit [`feb8c0a`](https://github.com/apache/spark/commit/feb8c0aa28034c166f6413bd1eec58a885848192).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] xuanyuanking commented on a change in pull request #27478:
[SPARK-25829][SQL] Add config
`spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default
behavior
Posted by GitBox <gi...@apache.org>.
xuanyuanking commented on a change in pull request #27478: [SPARK-25829][SQL] Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#discussion_r376678023
##########
File path: docs/sql-migration-guide.md
##########
@@ -49,7 +49,7 @@ license: |
- In Spark version 2.4 and earlier, float/double -0.0 is semantically equal to 0.0, but -0.0 and 0.0 are considered as different values when used in aggregate grouping keys, window partition keys and join keys. Since Spark 3.0, this bug is fixed. For example, `Seq(-0.0, 0.0).toDF("d").groupBy("d").count()` returns `[(0.0, 2)]` in Spark 3.0, and `[(0.0, 1), (-0.0, 1)]` in Spark 2.4 and earlier.
- - In Spark version 2.4 and earlier, users can create a map with duplicated keys via built-in functions like `CreateMap`, `StringToMap`, etc. The behavior of map with duplicated keys is undefined, e.g. map look up respects the duplicated key appears first, `Dataset.collect` only keeps the duplicated key appears last, `MapKeys` returns duplicated keys, etc. Since Spark 3.0, these built-in functions will remove duplicated map keys with last wins policy. Users may still read map values with duplicated keys from data sources which do not enforce it (e.g. Parquet), the behavior will be undefined.
+ - In Spark version 2.4 and earlier, users can create a map with duplicated keys via built-in functions like `CreateMap`, `StringToMap`, etc. The behavior of map with duplicated keys is undefined, e.g. map look up respects the duplicated key appears first, `Dataset.collect` only keeps the duplicated key appears last, `MapKeys` returns duplicated keys, etc. Since Spark 3.0, new config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` was added, with the default value `false`, Spark will throw RuntimeException while duplicated keys are found. If set to `true`, these built-in functions will remove duplicated map keys with last wins policy. Users may still read map values with duplicated keys from data sources which do not enforce it (e.g. Parquet), the behavior will be undefined.
Review comment:
Agree to have a guideline if we need the new rule, to separate the silent result changing here is caused by the behavior change, not by the bug fixes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27478: [SPARK-25829][SQL]
Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change
the default behavior
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-582937112
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/22763/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27478: [SPARK-25829][SQL]
Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change
the default behavior
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-586241007
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23181/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27478:
[SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and
change the default behavior
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-586909169
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on issue #27478: [SPARK-25829][SQL] Add
config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default
behavior
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-586973268
retest it please
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #27478:
[SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and
change the default behavior
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-586912213
**[Test build #118567 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118567/testReport)** for PR 27478 at commit [`feb8c0a`](https://github.com/apache/spark/commit/feb8c0aa28034c166f6413bd1eec58a885848192).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #27478:
[SPARK-25829][SQL] Add config
`spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default
behavior
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-583685270
**[Test build #118052 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118052/testReport)** for PR 27478 at commit [`66dc51f`](https://github.com/apache/spark/commit/66dc51f0bc0fb45f7c44ef72f26e1918e01d0212).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #27478:
[SPARK-25829][SQL] Add config
`spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default
behavior
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #27478: [SPARK-25829][SQL] Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#discussion_r378054659
##########
File path: docs/sql-migration-guide.md
##########
@@ -49,7 +49,7 @@ license: |
- In Spark version 2.4 and earlier, float/double -0.0 is semantically equal to 0.0, but -0.0 and 0.0 are considered as different values when used in aggregate grouping keys, window partition keys and join keys. Since Spark 3.0, this bug is fixed. For example, `Seq(-0.0, 0.0).toDF("d").groupBy("d").count()` returns `[(0.0, 2)]` in Spark 3.0, and `[(0.0, 1), (-0.0, 1)]` in Spark 2.4 and earlier.
- - In Spark version 2.4 and earlier, users can create a map with duplicated keys via built-in functions like `CreateMap`, `StringToMap`, etc. The behavior of map with duplicated keys is undefined, e.g. map look up respects the duplicated key appears first, `Dataset.collect` only keeps the duplicated key appears last, `MapKeys` returns duplicated keys, etc. Since Spark 3.0, these built-in functions will remove duplicated map keys with last wins policy. Users may still read map values with duplicated keys from data sources which do not enforce it (e.g. Parquet), the behavior will be undefined.
+ - In Spark version 2.4 and earlier, users can create a map with duplicated keys via built-in functions like `CreateMap`, `StringToMap`, etc. The behavior of map with duplicated keys is undefined, e.g. map look up respects the duplicated key appears first, `Dataset.collect` only keeps the duplicated key appears last, `MapKeys` returns duplicated keys, etc. Since Spark 3.0, new config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` was added, with the default value `false`, Spark will throw RuntimeException while duplicated keys are found. If set to `true`, these built-in functions will remove duplicated map keys with last wins policy. Users may still read map values with duplicated keys from data sources which do not enforce it (e.g. Parquet), the behavior will be undefined.
Review comment:
What I have in mind is:
- if it's a bug fix (the previous result is definitely wrong), then no config is needed. If the impact is big, we can add a legacy config which is false by default.
- if it makes the behavior better, we should either add a config and use the old behavior by default, or fail by default and ask users to set config explicitly and pick the desired behavior.
I'm trying to think more cases, will send an email to the dev list soon.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #27478:
[SPARK-25829][SQL] Add config
`spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default
behavior
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-583452915
**[Test build #118040 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118040/testReport)** for PR 27478 at commit [`fe1fb0f`](https://github.com/apache/spark/commit/fe1fb0f28bf7a448051b46517dc6ceed4fd78288).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] xuanyuanking commented on a change in pull request #27478:
[SPARK-25829][SQL] Add config
`spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default
behavior
Posted by GitBox <gi...@apache.org>.
xuanyuanking commented on a change in pull request #27478: [SPARK-25829][SQL] Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#discussion_r376449589
##########
File path: python/pyspark/sql/functions.py
##########
@@ -2766,13 +2766,15 @@ def map_concat(*cols):
:param cols: list of column names (string) or list of :class:`Column` expressions
>>> from pyspark.sql.functions import map_concat
+ >>> spark.conf.set("spark.sql.deduplicateMapKey.lastWinsPolicy.enabled", "true")
Review comment:
Thanks, done in fe1fb0f
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on issue #27478: [SPARK-25829][SQL] Add
config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the
default behavior
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-586253212
LGTM, can you update PR title as well?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27478:
[SPARK-25829][SQL] Add config
`spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default
behavior
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-586257473
Merged build finished. Test FAILed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27478: [SPARK-25829][SQL] Add
config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default
behavior
Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-586859630
**[Test build #118547 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118547/testReport)** for PR 27478 at commit [`feb8c0a`](https://github.com/apache/spark/commit/feb8c0aa28034c166f6413bd1eec58a885848192).
* This patch **fails to generate documentation**.
* This patch merges cleanly.
* This patch adds no public classes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27478: [SPARK-25829][SQL]
Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default
behavior
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-586853035
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27478:
[SPARK-25829][SQL] Add config
`spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default
behavior
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-582937097
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #27478:
[SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and
change the default behavior
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-586833788
**[Test build #118537 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118537/testReport)** for PR 27478 at commit [`905ae96`](https://github.com/apache/spark/commit/905ae96e60f43788fded374e9e7a597c2d973f40).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on issue #27478: [SPARK-25829][SQL] Add
config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default
behavior
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-586871501
retest it please
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27478:
[SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and
change the default behavior
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-586853035
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27478: [SPARK-25829][SQL]
Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default
behavior
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-586909181
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23320/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27478: [SPARK-25829][SQL] Add
config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default
behavior
Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-586864090
**[Test build #118537 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118537/testReport)** for PR 27478 at commit [`905ae96`](https://github.com/apache/spark/commit/905ae96e60f43788fded374e9e7a597c2d973f40).
* This patch **fails due to an unknown error code, -9**.
* This patch merges cleanly.
* This patch adds no public classes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27478: [SPARK-25829][SQL]
Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change
the default behavior
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-586257479
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118423/
Test FAILed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on issue #27478: [SPARK-25829][SQL]
Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change
the default behavior
Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-583153477
Retest this please.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27478:
[SPARK-25829][SQL] Add config
`spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default
behavior
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-583209717
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118005/
Test FAILed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27478: [SPARK-25829][SQL]
Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default
behavior
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-586587025
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27478: [SPARK-25829][SQL] Add
config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the
default behavior
Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-582936565
**[Test build #117998 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/117998/testReport)** for PR 27478 at commit [`09cfb67`](https://github.com/apache/spark/commit/09cfb674f835015c9de15b24ea958f5fb47fb915).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27478: [SPARK-25829][SQL] Add
config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default
behavior
Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-586586867
**[Test build #118468 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118468/testReport)** for PR 27478 at commit [`b102c36`](https://github.com/apache/spark/commit/b102c361bd800f3183db029da42a069201b9f39d).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #27478:
[SPARK-25829][SQL] Add config
`spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default
behavior
Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #27478: [SPARK-25829][SQL] Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#discussion_r376262398
##########
File path: docs/sql-migration-guide.md
##########
@@ -49,7 +49,7 @@ license: |
- In Spark version 2.4 and earlier, float/double -0.0 is semantically equal to 0.0, but -0.0 and 0.0 are considered as different values when used in aggregate grouping keys, window partition keys and join keys. Since Spark 3.0, this bug is fixed. For example, `Seq(-0.0, 0.0).toDF("d").groupBy("d").count()` returns `[(0.0, 2)]` in Spark 3.0, and `[(0.0, 1), (-0.0, 1)]` in Spark 2.4 and earlier.
- - In Spark version 2.4 and earlier, users can create a map with duplicated keys via built-in functions like `CreateMap`, `StringToMap`, etc. The behavior of map with duplicated keys is undefined, e.g. map look up respects the duplicated key appears first, `Dataset.collect` only keeps the duplicated key appears last, `MapKeys` returns duplicated keys, etc. Since Spark 3.0, these built-in functions will remove duplicated map keys with last wins policy. Users may still read map values with duplicated keys from data sources which do not enforce it (e.g. Parquet), the behavior will be undefined.
+ - In Spark version 2.4 and earlier, users can create a map with duplicated keys via built-in functions like `CreateMap`, `StringToMap`, etc. The behavior of map with duplicated keys is undefined, e.g. map look up respects the duplicated key appears first, `Dataset.collect` only keeps the duplicated key appears last, `MapKeys` returns duplicated keys, etc. Since Spark 3.0, new config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` was added, with the default value `false`, Spark will throw RuntimeException while duplicated keys are found. If set to `true`, these built-in functions will remove duplicated map keys with last wins policy. Users may still read map values with duplicated keys from data sources which do not enforce it (e.g. Parquet), the behavior will be undefined.
Review comment:
I agree with the idea to avoid "silent result changing". Btw, we couldn't keep the old (spark 2.4) behaviour for duplicate keys by using a legacy option? If we couldn't do because of some reasons, the proposed one (the runtime exception) looks reasonable to me, too.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27478:
[SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and
change the default behavior
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-586973691
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23332/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #27478:
[SPARK-25829][SQL] Add config
`spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default
behavior
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-583155092
**[Test build #118005 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118005/testReport)** for PR 27478 at commit [`09cfb67`](https://github.com/apache/spark/commit/09cfb674f835015c9de15b24ea958f5fb47fb915).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #27478:
[SPARK-25829][SQL] Add config
`spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default
behavior
Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #27478: [SPARK-25829][SQL] Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#discussion_r376254342
##########
File path: docs/sql-migration-guide.md
##########
@@ -49,7 +49,7 @@ license: |
- In Spark version 2.4 and earlier, float/double -0.0 is semantically equal to 0.0, but -0.0 and 0.0 are considered as different values when used in aggregate grouping keys, window partition keys and join keys. Since Spark 3.0, this bug is fixed. For example, `Seq(-0.0, 0.0).toDF("d").groupBy("d").count()` returns `[(0.0, 2)]` in Spark 3.0, and `[(0.0, 1), (-0.0, 1)]` in Spark 2.4 and earlier.
- - In Spark version 2.4 and earlier, users can create a map with duplicated keys via built-in functions like `CreateMap`, `StringToMap`, etc. The behavior of map with duplicated keys is undefined, e.g. map look up respects the duplicated key appears first, `Dataset.collect` only keeps the duplicated key appears last, `MapKeys` returns duplicated keys, etc. Since Spark 3.0, these built-in functions will remove duplicated map keys with last wins policy. Users may still read map values with duplicated keys from data sources which do not enforce it (e.g. Parquet), the behavior will be undefined.
+ - In Spark version 2.4 and earlier, users can create a map with duplicated keys via built-in functions like `CreateMap`, `StringToMap`, etc. The behavior of map with duplicated keys is undefined, e.g. map look up respects the duplicated key appears first, `Dataset.collect` only keeps the duplicated key appears last, `MapKeys` returns duplicated keys, etc. Since Spark 3.0, new config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` was added, with the default value `false`, Spark will throw RuntimeException while duplicated keys are found. If set to `true`, these built-in functions will remove duplicated map keys with last wins policy. Users may still read map values with duplicated keys from data sources which do not enforce it (e.g. Parquet), the behavior will be undefined.
Review comment:
This idea sounds reasonable. If we think a behavior change is correct but for migration reason, we don't want a silent behavior change, a config like this one is an option we can use to notify users such change.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] xuanyuanking commented on a change in pull request #27478:
[SPARK-25829][SQL] Add config
`spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default
behavior
Posted by GitBox <gi...@apache.org>.
xuanyuanking commented on a change in pull request #27478: [SPARK-25829][SQL] Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#discussion_r376441716
##########
File path: sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala
##########
@@ -651,8 +651,10 @@ class DataFrameFunctionsSuite extends QueryTest with SharedSparkSession {
Row(null)
)
- checkAnswer(df1.selectExpr("map_concat(map1, map2)"), expected1a)
- checkAnswer(df1.select(map_concat($"map1", $"map2")), expected1a)
+ withSQLConf(SQLConf.DEDUPLICATE_MAP_KEY_WITH_LAST_WINS_POLICY.key -> "true") {
Review comment:
Thanks, added in `ArrayBasedMapBuilderSuite`.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #27478:
[SPARK-25829][SQL] Add config
`spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default
behavior
Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #27478: [SPARK-25829][SQL] Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#discussion_r376254710
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##########
@@ -2167,6 +2167,14 @@ object SQLConf {
.booleanConf
.createWithDefault(false)
+ val DEDUPLICATE_MAP_KEY_WITH_LAST_WINS_POLICY =
+ buildConf("spark.sql.deduplicateMapKey.lastWinsPolicy.enabled")
+ .doc("When true, use last wins policy to remove duplicated map keys in built-in functions, " +
+ "this config takes effect in below build-in functions: CreateMap, MapFromArrays, " +
+ "MapFromEntries, StringToMap, MapConcat and TransformKeys.")
Review comment:
We should mention if false then an exception will be thrown.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27478: [SPARK-25829][SQL]
Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default
behavior
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-586834168
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #27478:
[SPARK-25829][SQL] Add config
`spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default
behavior
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #27478: [SPARK-25829][SQL] Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#discussion_r376251516
##########
File path: python/pyspark/sql/functions.py
##########
@@ -2766,13 +2766,15 @@ def map_concat(*cols):
:param cols: list of column names (string) or list of :class:`Column` expressions
>>> from pyspark.sql.functions import map_concat
+ >>> spark.conf.set("spark.sql.deduplicateMapKey.lastWinsPolicy.enabled", "true")
Review comment:
+1
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27478:
[SPARK-25829][SQL] Add config
`spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default
behavior
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-583558598
Merged build finished. Test FAILed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] xuanyuanking commented on issue #27478: [SPARK-25829][SQL]
Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change
the default behavior
Posted by GitBox <gi...@apache.org>.
xuanyuanking commented on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-586224976
Thanks Gengliang, as the config still related to legacy behavior, I rename it to `spark.sql.legacy.allowDuplicatedMapKeys`.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] gengliangwang edited a comment on issue #27478:
[SPARK-25829][SQL] Add config
`spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default
behavior
Posted by GitBox <gi...@apache.org>.
gengliangwang edited a comment on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-585039004
I google searched "deduplicate map key" and there is no matching result.
Since there will be runtime exception on duplicated keys, how about renaming the config to `spark.sql.allowDuplicatedMapKeys.enabled`?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27478: [SPARK-25829][SQL]
Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change
the default behavior
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-582937097
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27478:
[SPARK-25829][SQL] Add config
`spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default
behavior
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-583453786
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/22805/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27478: [SPARK-25829][SQL] Add
config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default
behavior
Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-586854920
**[Test build #118547 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118547/testReport)** for PR 27478 at commit [`feb8c0a`](https://github.com/apache/spark/commit/feb8c0aa28034c166f6413bd1eec58a885848192).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #27478:
[SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and
change the default behavior
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-586568228
**[Test build #118468 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118468/testReport)** for PR 27478 at commit [`b102c36`](https://github.com/apache/spark/commit/b102c361bd800f3183db029da42a069201b9f39d).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27478: [SPARK-25829][SQL]
Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change
the default behavior
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-586240967
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27478: [SPARK-25829][SQL]
Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default
behavior
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-586863059
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27478: [SPARK-25829][SQL]
Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change
the default behavior
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-583685454
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27478: [SPARK-25829][SQL]
Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change
the default behavior
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-583558598
Merged build finished. Test FAILed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27478:
[SPARK-25829][SQL] Add config
`spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default
behavior
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-583705941
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118052/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27478:
[SPARK-25829][SQL] Add config
`spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default
behavior
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-582937112
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/22763/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #27478:
[SPARK-25829][SQL] Add config
`spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default
behavior
Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #27478: [SPARK-25829][SQL] Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#discussion_r376508834
##########
File path: docs/sql-migration-guide.md
##########
@@ -49,7 +49,7 @@ license: |
- In Spark version 2.4 and earlier, float/double -0.0 is semantically equal to 0.0, but -0.0 and 0.0 are considered as different values when used in aggregate grouping keys, window partition keys and join keys. Since Spark 3.0, this bug is fixed. For example, `Seq(-0.0, 0.0).toDF("d").groupBy("d").count()` returns `[(0.0, 2)]` in Spark 3.0, and `[(0.0, 1), (-0.0, 1)]` in Spark 2.4 and earlier.
- - In Spark version 2.4 and earlier, users can create a map with duplicated keys via built-in functions like `CreateMap`, `StringToMap`, etc. The behavior of map with duplicated keys is undefined, e.g. map look up respects the duplicated key appears first, `Dataset.collect` only keeps the duplicated key appears last, `MapKeys` returns duplicated keys, etc. Since Spark 3.0, these built-in functions will remove duplicated map keys with last wins policy. Users may still read map values with duplicated keys from data sources which do not enforce it (e.g. Parquet), the behavior will be undefined.
+ - In Spark version 2.4 and earlier, users can create a map with duplicated keys via built-in functions like `CreateMap`, `StringToMap`, etc. The behavior of map with duplicated keys is undefined, e.g. map look up respects the duplicated key appears first, `Dataset.collect` only keeps the duplicated key appears last, `MapKeys` returns duplicated keys, etc. Since Spark 3.0, new config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` was added, with the default value `false`, Spark will throw RuntimeException while duplicated keys are found. If set to `true`, these built-in functions will remove duplicated map keys with last wins policy. Users may still read map values with duplicated keys from data sources which do not enforce it (e.g. Parquet), the behavior will be undefined.
Review comment:
I'm okay with the rule, @cloud-fan . At this time, could you add a guideline into our website? Actually, many correctness bug fixes in the migration guide have been `silent result changing` with a legacy fallback config.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27478: [SPARK-25829][SQL] Add
config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the
default behavior
Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-583155092
**[Test build #118005 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118005/testReport)** for PR 27478 at commit [`09cfb67`](https://github.com/apache/spark/commit/09cfb674f835015c9de15b24ea958f5fb47fb915).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27478: [SPARK-25829][SQL]
Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change
the default behavior
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-583685459
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/22818/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27478: [SPARK-25829][SQL]
Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default
behavior
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-586863071
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23305/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on issue #27478: [SPARK-25829][SQL] Add
config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default
behavior
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-587010019
thanks, merging to master/3.0!
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27478:
[SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and
change the default behavior
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-586853038
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23301/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27478:
[SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and
change the default behavior
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-586864334
Merged build finished. Test FAILed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27478: [SPARK-25829][SQL]
Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change
the default behavior
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-583705941
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118052/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] xuanyuanking commented on a change in pull request #27478:
[SPARK-25829][SQL] Add config
`spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default
behavior
Posted by GitBox <gi...@apache.org>.
xuanyuanking commented on a change in pull request #27478: [SPARK-25829][SQL] Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#discussion_r376449824
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##########
@@ -2167,6 +2167,14 @@ object SQLConf {
.booleanConf
.createWithDefault(false)
+ val DEDUPLICATE_MAP_KEY_WITH_LAST_WINS_POLICY =
+ buildConf("spark.sql.deduplicateMapKey.lastWinsPolicy.enabled")
+ .doc("When true, use last wins policy to remove duplicated map keys in built-in functions, " +
+ "this config takes effect in below build-in functions: CreateMap, MapFromArrays, " +
+ "MapFromEntries, StringToMap, MapConcat and TransformKeys.")
Review comment:
Thanks, done in fe1fb0f.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27478:
[SPARK-25829][SQL] Add config
`spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default
behavior
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-586257479
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118423/
Test FAILed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27478: [SPARK-25829][SQL]
Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change
the default behavior
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-583209717
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118005/
Test FAILed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27478: [SPARK-25829][SQL]
Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default
behavior
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-586973680
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] xuanyuanking commented on issue #27478: [SPARK-25829][SQL]
Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default
behavior
Posted by GitBox <gi...@apache.org>.
xuanyuanking commented on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-587239996
Thanks all for the review.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] xuanyuanking commented on a change in pull request #27478:
[SPARK-25829][SQL] Add config
`spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default
behavior
Posted by GitBox <gi...@apache.org>.
xuanyuanking commented on a change in pull request #27478: [SPARK-25829][SQL] Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#discussion_r376449743
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayBasedMapBuilder.scala
##########
@@ -63,6 +65,11 @@ class ArrayBasedMapBuilder(keyType: DataType, valueType: DataType) extends Seria
keys.append(key)
values.append(value)
} else {
+ if (!SQLConf.get.getConf(DEDUPLICATE_MAP_KEY_WITH_LAST_WINS_POLICY)) {
Review comment:
Thanks, done in fe1fb0f
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27478: [SPARK-25829][SQL] Add
config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default
behavior
Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-587000309
**[Test build #118551 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118551/testReport)** for PR 27478 at commit [`feb8c0a`](https://github.com/apache/spark/commit/feb8c0aa28034c166f6413bd1eec58a885848192).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #27478:
[SPARK-25829][SQL] Add config
`spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default
behavior
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #27478: [SPARK-25829][SQL] Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#discussion_r376239887
##########
File path: docs/sql-migration-guide.md
##########
@@ -49,7 +49,7 @@ license: |
- In Spark version 2.4 and earlier, float/double -0.0 is semantically equal to 0.0, but -0.0 and 0.0 are considered as different values when used in aggregate grouping keys, window partition keys and join keys. Since Spark 3.0, this bug is fixed. For example, `Seq(-0.0, 0.0).toDF("d").groupBy("d").count()` returns `[(0.0, 2)]` in Spark 3.0, and `[(0.0, 1), (-0.0, 1)]` in Spark 2.4 and earlier.
- - In Spark version 2.4 and earlier, users can create a map with duplicated keys via built-in functions like `CreateMap`, `StringToMap`, etc. The behavior of map with duplicated keys is undefined, e.g. map look up respects the duplicated key appears first, `Dataset.collect` only keeps the duplicated key appears last, `MapKeys` returns duplicated keys, etc. Since Spark 3.0, these built-in functions will remove duplicated map keys with last wins policy. Users may still read map values with duplicated keys from data sources which do not enforce it (e.g. Parquet), the behavior will be undefined.
+ - In Spark version 2.4 and earlier, users can create a map with duplicated keys via built-in functions like `CreateMap`, `StringToMap`, etc. The behavior of map with duplicated keys is undefined, e.g. map look up respects the duplicated key appears first, `Dataset.collect` only keeps the duplicated key appears last, `MapKeys` returns duplicated keys, etc. Since Spark 3.0, new config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` was added, with the default value `false`, Spark will throw RuntimeException while duplicated keys are found. If set to `true`, these built-in functions will remove duplicated map keys with last wins policy. Users may still read map values with duplicated keys from data sources which do not enforce it (e.g. Parquet), the behavior will be undefined.
Review comment:
The principle is still no failure by default, but version upgrade is a different case. We should try our best to avoid "silent result changing".
We may want to introduce a new category of configs, which is for migration-only, and should be removed in the next major version.
Thoughts? @srowen @viirya @maropu @bart-samwel
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27478:
[SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and
change the default behavior
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-586834171
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23292/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] xuanyuanking commented on a change in pull request #27478:
[SPARK-25829][SQL] Add config
`spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default
behavior
Posted by GitBox <gi...@apache.org>.
xuanyuanking commented on a change in pull request #27478: [SPARK-25829][SQL] Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#discussion_r376678023
##########
File path: docs/sql-migration-guide.md
##########
@@ -49,7 +49,7 @@ license: |
- In Spark version 2.4 and earlier, float/double -0.0 is semantically equal to 0.0, but -0.0 and 0.0 are considered as different values when used in aggregate grouping keys, window partition keys and join keys. Since Spark 3.0, this bug is fixed. For example, `Seq(-0.0, 0.0).toDF("d").groupBy("d").count()` returns `[(0.0, 2)]` in Spark 3.0, and `[(0.0, 1), (-0.0, 1)]` in Spark 2.4 and earlier.
- - In Spark version 2.4 and earlier, users can create a map with duplicated keys via built-in functions like `CreateMap`, `StringToMap`, etc. The behavior of map with duplicated keys is undefined, e.g. map look up respects the duplicated key appears first, `Dataset.collect` only keeps the duplicated key appears last, `MapKeys` returns duplicated keys, etc. Since Spark 3.0, these built-in functions will remove duplicated map keys with last wins policy. Users may still read map values with duplicated keys from data sources which do not enforce it (e.g. Parquet), the behavior will be undefined.
+ - In Spark version 2.4 and earlier, users can create a map with duplicated keys via built-in functions like `CreateMap`, `StringToMap`, etc. The behavior of map with duplicated keys is undefined, e.g. map look up respects the duplicated key appears first, `Dataset.collect` only keeps the duplicated key appears last, `MapKeys` returns duplicated keys, etc. Since Spark 3.0, new config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` was added, with the default value `false`, Spark will throw RuntimeException while duplicated keys are found. If set to `true`, these built-in functions will remove duplicated map keys with last wins policy. Users may still read map values with duplicated keys from data sources which do not enforce it (e.g. Parquet), the behavior will be undefined.
Review comment:
Agree to have a guideline if we need the new rule, to separate the silent result changing is caused by the behavior change, not by the bug fixes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #27478:
[SPARK-25829][SQL] Add config
`spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default
behavior
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #27478: [SPARK-25829][SQL] Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#discussion_r376252435
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayBasedMapBuilder.scala
##########
@@ -63,6 +65,11 @@ class ArrayBasedMapBuilder(keyType: DataType, valueType: DataType) extends Seria
keys.append(key)
values.append(value)
} else {
+ if (!SQLConf.get.getConf(DEDUPLICATE_MAP_KEY_WITH_LAST_WINS_POLICY)) {
Review comment:
can we read the config when `ArrayBasedMapBuilder` is constructed? e.g.
```
private val xxx = SQLConf.get.getConf(DEDUPLICATE_MAP_KEY_WITH_LAST_WINS_POLICY)
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #27478:
[SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and
change the default behavior
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-586862667
**[Test build #118551 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118551/testReport)** for PR 27478 at commit [`feb8c0a`](https://github.com/apache/spark/commit/feb8c0aa28034c166f6413bd1eec58a885848192).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] xuanyuanking commented on a change in pull request #27478:
[SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and
change the default behavior
Posted by GitBox <gi...@apache.org>.
xuanyuanking commented on a change in pull request #27478: [SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#discussion_r379802575
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##########
@@ -2167,6 +2167,16 @@ object SQLConf {
.booleanConf
.createWithDefault(false)
+ val LEGACY_ALLOW_DUPLICATED_MAP_KEY =
+ buildConf("spark.sql.legacy.allowDuplicatedMapKeys")
+ .doc("When true, use last wins policy to remove duplicated map keys in built-in functions, " +
+ "this config takes effect in below build-in functions: CreateMap, MapFromArrays, " +
+ "MapFromEntries, StringToMap, MapConcat and TransformKeys. Otherwise, if this is false, " +
+ "which is the default, Spark will throw an exception while duplicated map keys are " +
Review comment:
Thanks, done in b102c36.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27478:
[SPARK-25829][SQL] Add config
`spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default
behavior
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-586240967
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27478:
[SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and
change the default behavior
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-586587027
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118468/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27478:
[SPARK-25829][SQL] Add config
`spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default
behavior
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-583558604
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118040/
Test FAILed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #27478:
[SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and
change the default behavior
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #27478: [SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#discussion_r380008658
##########
File path: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
##########
@@ -188,11 +188,13 @@ class SQLQuerySuite extends QueryTest with SharedSparkSession with AdaptiveSpark
checkExampleSyntax(example)
example.split(" > ").toList.foreach(_ match {
case exampleRe(sql, output) =>
- val df = clonedSpark.sql(sql)
- val actual = unindentAndTrim(
- hiveResultString(df.queryExecution.executedPlan).mkString("\n"))
- val expected = unindentAndTrim(output)
- assert(actual === expected)
+ withSQLConf(SQLConf.LEGACY_ALLOW_DUPLICATED_MAP_KEY.key -> "true") {
Review comment:
can you check the map-related builtin expressions? seems they use wrong examples that have duplicated keys.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #27478:
[SPARK-25829][SQL] Add config
`spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default
behavior
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #27478: [SPARK-25829][SQL] Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#discussion_r379389636
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##########
@@ -2167,6 +2167,16 @@ object SQLConf {
.booleanConf
.createWithDefault(false)
+ val LEGACY_ALLOW_DUPLICATED_MAP_KEY =
+ buildConf("spark.sql.legacy.allowDuplicatedMapKeys")
+ .doc("When true, use last wins policy to remove duplicated map keys in built-in functions, " +
+ "this config takes effect in below build-in functions: CreateMap, MapFromArrays, " +
+ "MapFromEntries, StringToMap, MapConcat and TransformKeys. Otherwise, if this is false, " +
+ "which is the default, Spark will throw an exception while duplicated map keys are " +
Review comment:
while -> when
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27478: [SPARK-25829][SQL]
Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default
behavior
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-586973691
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23332/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27478: [SPARK-25829][SQL]
Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change
the default behavior
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-583705939
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27478: [SPARK-25829][SQL] Add
config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the
default behavior
Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-583685270
**[Test build #118052 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118052/testReport)** for PR 27478 at commit [`66dc51f`](https://github.com/apache/spark/commit/66dc51f0bc0fb45f7c44ef72f26e1918e01d0212).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27478: [SPARK-25829][SQL] Add
config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the
default behavior
Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-583209556
**[Test build #118005 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118005/testReport)** for PR 27478 at commit [`09cfb67`](https://github.com/apache/spark/commit/09cfb674f835015c9de15b24ea958f5fb47fb915).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27478: [SPARK-25829][SQL]
Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default
behavior
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-586970404
Merged build finished. Test FAILed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27478:
[SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and
change the default behavior
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-587001486
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118551/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27478: [SPARK-25829][SQL]
Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default
behavior
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-587001477
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27478:
[SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and
change the default behavior
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-586970413
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118567/
Test FAILed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27478:
[SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and
change the default behavior
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-586863059
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27478: [SPARK-25829][SQL]
Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default
behavior
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-586864352
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118537/
Test FAILed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #27478:
[SPARK-25829][SQL] Add config
`spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default
behavior
Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #27478: [SPARK-25829][SQL] Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#discussion_r376347541
##########
File path: docs/sql-migration-guide.md
##########
@@ -49,7 +49,7 @@ license: |
- In Spark version 2.4 and earlier, float/double -0.0 is semantically equal to 0.0, but -0.0 and 0.0 are considered as different values when used in aggregate grouping keys, window partition keys and join keys. Since Spark 3.0, this bug is fixed. For example, `Seq(-0.0, 0.0).toDF("d").groupBy("d").count()` returns `[(0.0, 2)]` in Spark 3.0, and `[(0.0, 1), (-0.0, 1)]` in Spark 2.4 and earlier.
- - In Spark version 2.4 and earlier, users can create a map with duplicated keys via built-in functions like `CreateMap`, `StringToMap`, etc. The behavior of map with duplicated keys is undefined, e.g. map look up respects the duplicated key appears first, `Dataset.collect` only keeps the duplicated key appears last, `MapKeys` returns duplicated keys, etc. Since Spark 3.0, these built-in functions will remove duplicated map keys with last wins policy. Users may still read map values with duplicated keys from data sources which do not enforce it (e.g. Parquet), the behavior will be undefined.
+ - In Spark version 2.4 and earlier, users can create a map with duplicated keys via built-in functions like `CreateMap`, `StringToMap`, etc. The behavior of map with duplicated keys is undefined, e.g. map look up respects the duplicated key appears first, `Dataset.collect` only keeps the duplicated key appears last, `MapKeys` returns duplicated keys, etc. Since Spark 3.0, new config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` was added, with the default value `false`, Spark will throw RuntimeException while duplicated keys are found. If set to `true`, these built-in functions will remove duplicated map keys with last wins policy. Users may still read map values with duplicated keys from data sources which do not enforce it (e.g. Parquet), the behavior will be undefined.
Review comment:
The idea sounds fine.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on issue #27478: [SPARK-25829][SQL] Add
config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default
behavior
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-586973336
retest this please
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on issue #27478: [SPARK-25829][SQL] Add
config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default
behavior
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-586861735
retest this please
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27478:
[SPARK-25829][SQL] Add config
`spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default
behavior
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-583685454
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27478:
[SPARK-25829][SQL] Add config
`spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default
behavior
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-583155458
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27478: [SPARK-25829][SQL]
Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change
the default behavior
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-583453774
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27478: [SPARK-25829][SQL]
Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default
behavior
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-586909169
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #27478:
[SPARK-25829][SQL] Add config
`spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default
behavior
Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #27478: [SPARK-25829][SQL] Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#discussion_r376163612
##########
File path: sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala
##########
@@ -651,8 +651,10 @@ class DataFrameFunctionsSuite extends QueryTest with SharedSparkSession {
Row(null)
)
- checkAnswer(df1.selectExpr("map_concat(map1, map2)"), expected1a)
- checkAnswer(df1.select(map_concat($"map1", $"map2")), expected1a)
+ withSQLConf(SQLConf.DEDUPLICATE_MAP_KEY_WITH_LAST_WINS_POLICY.key -> "true") {
Review comment:
Can we have one test that checks the exception when this configuration is false?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27478:
[SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and
change the default behavior
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-587120130
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118578/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27478: [SPARK-25829][SQL]
Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change
the default behavior
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-583558604
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118040/
Test FAILed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27478: [SPARK-25829][SQL] Add
config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the
default behavior
Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-583705844
**[Test build #118052 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118052/testReport)** for PR 27478 at commit [`66dc51f`](https://github.com/apache/spark/commit/66dc51f0bc0fb45f7c44ef72f26e1918e01d0212).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27478: [SPARK-25829][SQL]
Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default
behavior
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-586853038
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23301/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27478:
[SPARK-25829][SQL] Add config
`spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default
behavior
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-583453774
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27478: [SPARK-25829][SQL] Add
config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default
behavior
Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-586833788
**[Test build #118537 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118537/testReport)** for PR 27478 at commit [`905ae96`](https://github.com/apache/spark/commit/905ae96e60f43788fded374e9e7a597c2d973f40).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27478: [SPARK-25829][SQL]
Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change
the default behavior
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-586257473
Merged build finished. Test FAILed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27478: [SPARK-25829][SQL] Add
config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the
default behavior
Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-586257413
**[Test build #118423 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118423/testReport)** for PR 27478 at commit [`f622180`](https://github.com/apache/spark/commit/f622180e40ed984d593bc95ad2a1b7ad96f2af1c).
* This patch **fails to generate documentation**.
* This patch merges cleanly.
* This patch adds no public classes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27478: [SPARK-25829][SQL] Add
config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default
behavior
Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-586568228
**[Test build #118468 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118468/testReport)** for PR 27478 at commit [`b102c36`](https://github.com/apache/spark/commit/b102c361bd800f3183db029da42a069201b9f39d).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #27478:
[SPARK-25829][SQL] Add config
`spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default
behavior
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-582936565
**[Test build #117998 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/117998/testReport)** for PR 27478 at commit [`09cfb67`](https://github.com/apache/spark/commit/09cfb674f835015c9de15b24ea958f5fb47fb915).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27478:
[SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and
change the default behavior
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-586568444
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27478:
[SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and
change the default behavior
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-586909181
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23320/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27478: [SPARK-25829][SQL] Add
config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default
behavior
Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-587119453
**[Test build #118578 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118578/testReport)** for PR 27478 at commit [`feb8c0a`](https://github.com/apache/spark/commit/feb8c0aa28034c166f6413bd1eec58a885848192).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27478: [SPARK-25829][SQL]
Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change
the default behavior
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-583155463
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/22770/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] gengliangwang commented on issue #27478: [SPARK-25829][SQL]
Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change
the default behavior
Posted by GitBox <gi...@apache.org>.
gengliangwang commented on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-585039004
I searched "deduplicate map key" and there is no result.
Since there will be runtime exception on duplicated keys, how about renaming the config to `spark.sql.allowDuplicatedMapKeys.enabled`?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27478:
[SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and
change the default behavior
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-587120123
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27478:
[SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and
change the default behavior
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-586973680
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27478: [SPARK-25829][SQL]
Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default
behavior
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-587001486
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118551/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27478: [SPARK-25829][SQL] Add
config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default
behavior
Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-586912213
**[Test build #118567 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118567/testReport)** for PR 27478 at commit [`feb8c0a`](https://github.com/apache/spark/commit/feb8c0aa28034c166f6413bd1eec58a885848192).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27478: [SPARK-25829][SQL]
Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default
behavior
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-586859661
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118547/
Test FAILed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27478: [SPARK-25829][SQL] Add
config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default
behavior
Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-586862667
**[Test build #118551 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118551/testReport)** for PR 27478 at commit [`feb8c0a`](https://github.com/apache/spark/commit/feb8c0aa28034c166f6413bd1eec58a885848192).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27478:
[SPARK-25829][SQL] Add config
`spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default
behavior
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-583209707
Merged build finished. Test FAILed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27478: [SPARK-25829][SQL]
Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change
the default behavior
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-583209707
Merged build finished. Test FAILed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #27478:
[SPARK-25829][SQL] Add config
`spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default
behavior
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #27478: [SPARK-25829][SQL] Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#discussion_r376252692
##########
File path: sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala
##########
@@ -651,8 +651,10 @@ class DataFrameFunctionsSuite extends QueryTest with SharedSparkSession {
Row(null)
)
- checkAnswer(df1.selectExpr("map_concat(map1, map2)"), expected1a)
- checkAnswer(df1.select(map_concat($"map1", $"map2")), expected1a)
+ withSQLConf(SQLConf.DEDUPLICATE_MAP_KEY_WITH_LAST_WINS_POLICY.key -> "true") {
Review comment:
+1
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27478:
[SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and
change the default behavior
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-586859661
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118547/
Test FAILed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27478: [SPARK-25829][SQL]
Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default
behavior
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-586834171
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23292/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27478:
[SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and
change the default behavior
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-586863071
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23305/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27478:
[SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and
change the default behavior
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-587001477
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27478: [SPARK-25829][SQL]
Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default
behavior
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-587120123
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27478:
[SPARK-25829][SQL] Add config
`spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default
behavior
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.deduplicateMapKey.lastWinsPolicy.enabled` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-583050774
Merged build finished. Test FAILed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27478:
[SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and
change the default behavior
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27478: [SPARK-25829][SQL] Add config `spark.sql.legacy.allowDuplicatedMapKeys` and change the default behavior
URL: https://github.com/apache/spark/pull/27478#issuecomment-586970404
Merged build finished. Test FAILed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org