You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/09/10 09:01:20 UTC
[GitHub] [spark] AngersZhuuuu opened a new pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
AngersZhuuuu opened a new pull request #33955:
URL: https://github.com/apache/spark/pull/33955
### What changes were proposed in this pull request?
For query
```
select array_union(array(cast('nan' as double), cast('nan' as double)), array())
```
This returns [NaN, NaN], but it should return [NaN].
This issue is caused by `OpenHashSet` can't handle `Double.NaN` and `Float.NaN` too.
In this pr we add a wrap for OpenHashSet that can handle `null`, `Double.NaN`, `Float.NaN` together
### Why are the changes needed?
Fix bug
### Does this PR introduce _any_ user-facing change?
ArrayUnion won't show duplicated `NaN` value
### How was this patch tested?
Added UT
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #33955:
URL: https://github.com/apache/spark/pull/33955#discussion_r709284232
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##########
@@ -3575,24 +3576,31 @@ case class ArrayUnion(left: Expression, right: Expression) extends ArrayBinaryLi
if (TypeUtils.typeWithProperEquals(elementType)) {
(array1, array2) =>
val arrayBuffer = new scala.collection.mutable.ArrayBuffer[Any]
- val hs = new OpenHashSet[Any]
- var foundNullElement = false
+ val hs = new SQLOpenHashSet[Any]()
+ val isNaN = SQLOpenHashSet.isNaN(elementType)
Seq(array1, array2).foreach { array =>
var i = 0
while (i < array.numElements()) {
if (array.isNullAt(i)) {
- if (!foundNullElement) {
+ if (!hs.containsNull) {
+ hs.addNull
arrayBuffer += null
- foundNullElement = true
}
} else {
val elem = array.get(i, elementType)
- if (!hs.contains(elem)) {
- if (arrayBuffer.size > ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH) {
- ArrayBinaryLike.throwUnionLengthOverflowException(arrayBuffer.size)
+ if (isNaN(elem)) {
+ if (!hs.containsNaN) {
+ arrayBuffer += elem
Review comment:
Thanks, @cloud-fan and @AngersZhuuuu .
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-920111484
I fetched the latest master and the test passed on my side.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-918797641
**[Test build #143239 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143239/testReport)** for PR 33955 at commit [`fe407c9`](https://github.com/apache/spark/commit/fe407c9325716f6ba8fd637e05e54dd208c8ab69).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-918795424
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47737/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan closed pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
cloud-fan closed pull request #33955:
URL: https://github.com/apache/spark/pull/33955
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-917791275
**[Test build #143182 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143182/testReport)** for PR 33955 at commit [`119679c`](https://github.com/apache/spark/commit/119679cfc5884928d9fa368f683689a214d01912).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-921518353
I still see this test failure, see https://github.com/apache/spark/runs/3628995384. Shall we revert this PR?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-917969856
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47695/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-917415029
Kubernetes integration test unable to build dist.
exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47669/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-918374932
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47715/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-917993869
**[Test build #143199 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143199/testReport)** for PR 33955 at commit [`991fddd`](https://github.com/apache/spark/commit/991fddd22d80a9e7e946ba679c9582fc14a33ba6).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-917189917
**[Test build #143152 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143152/testReport)** for PR 33955 at commit [`1857988`](https://github.com/apache/spark/commit/18579884948898f9a9f6e15046fd807a2d294f7e).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
AngersZhuuuu commented on a change in pull request #33955:
URL: https://github.com/apache/spark/pull/33955#discussion_r708815457
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##########
@@ -3575,24 +3576,31 @@ case class ArrayUnion(left: Expression, right: Expression) extends ArrayBinaryLi
if (TypeUtils.typeWithProperEquals(elementType)) {
(array1, array2) =>
val arrayBuffer = new scala.collection.mutable.ArrayBuffer[Any]
- val hs = new OpenHashSet[Any]
- var foundNullElement = false
+ val hs = new SQLOpenHashSet[Any]()
+ val isNaN = SQLOpenHashSet.isNaN(elementType)
Seq(array1, array2).foreach { array =>
var i = 0
while (i < array.numElements()) {
if (array.isNullAt(i)) {
- if (!foundNullElement) {
+ if (!hs.containsNull) {
+ hs.addNull
arrayBuffer += null
- foundNullElement = true
}
} else {
val elem = array.get(i, elementType)
- if (!hs.contains(elem)) {
- if (arrayBuffer.size > ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH) {
- ArrayBinaryLike.throwUnionLengthOverflowException(arrayBuffer.size)
+ if (isNaN(elem)) {
+ if (!hs.containsNaN) {
+ arrayBuffer += elem
Review comment:
LGTM
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-917814704
Kubernetes integration test unable to build dist.
exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47688/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-917814737
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47688/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-918221136
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143199/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-921519621
Actually there are some more: https://github.com/apache/spark/runs/3619357249
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-917969856
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47695/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-918340039
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47714/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-918016405
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #33955:
URL: https://github.com/apache/spark/pull/33955#discussion_r708570515
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##########
@@ -3575,24 +3576,31 @@ case class ArrayUnion(left: Expression, right: Expression) extends ArrayBinaryLi
if (TypeUtils.typeWithProperEquals(elementType)) {
(array1, array2) =>
val arrayBuffer = new scala.collection.mutable.ArrayBuffer[Any]
- val hs = new OpenHashSet[Any]
- var foundNullElement = false
+ val hs = new SQLOpenHashSet[Any]()
+ val isNaN = SQLOpenHashSet.isNaN(elementType)
Seq(array1, array2).foreach { array =>
var i = 0
while (i < array.numElements()) {
if (array.isNullAt(i)) {
- if (!foundNullElement) {
+ if (!hs.containsNull) {
+ hs.addNull
arrayBuffer += null
- foundNullElement = true
}
} else {
val elem = array.get(i, elementType)
- if (!hs.contains(elem)) {
- if (arrayBuffer.size > ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH) {
- ArrayBinaryLike.throwUnionLengthOverflowException(arrayBuffer.size)
+ if (isNaN(elem)) {
+ if (!hs.containsNaN) {
+ arrayBuffer += elem
Review comment:
Ur, BTW, there are multiple `NaN` values which has different bytes from `Double.NaN`. So, this new semantic is adding the first `NaN` value into the result, right?
@cloud-fan and @AngersZhuuuu . Do we need to normalize the NaN value by adding `Double.NaN` or `Float.NaN` always?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #33955:
URL: https://github.com/apache/spark/pull/33955#discussion_r708796039
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##########
@@ -3575,24 +3576,31 @@ case class ArrayUnion(left: Expression, right: Expression) extends ArrayBinaryLi
if (TypeUtils.typeWithProperEquals(elementType)) {
(array1, array2) =>
val arrayBuffer = new scala.collection.mutable.ArrayBuffer[Any]
- val hs = new OpenHashSet[Any]
- var foundNullElement = false
+ val hs = new SQLOpenHashSet[Any]()
+ val isNaN = SQLOpenHashSet.isNaN(elementType)
Seq(array1, array2).foreach { array =>
var i = 0
while (i < array.numElements()) {
if (array.isNullAt(i)) {
- if (!foundNullElement) {
+ if (!hs.containsNull) {
+ hs.addNull
arrayBuffer += null
- foundNullElement = true
}
} else {
val elem = array.get(i, elementType)
- if (!hs.contains(elem)) {
- if (arrayBuffer.size > ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH) {
- ArrayBinaryLike.throwUnionLengthOverflowException(arrayBuffer.size)
+ if (isNaN(elem)) {
+ if (!hs.containsNaN) {
+ arrayBuffer += elem
Review comment:
good point!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-916998634
**[Test build #143152 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143152/testReport)** for PR 33955 at commit [`1857988`](https://github.com/apache/spark/commit/18579884948898f9a9f6e15046fd807a2d294f7e).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-917792072
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47685/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-917835861
**[Test build #143188 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143188/testReport)** for PR 33955 at commit [`8d0e4a9`](https://github.com/apache/spark/commit/8d0e4a9cbf51cebdaebd5f52303e785dd69da31b).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-918828436
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47742/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-918824327
Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47741/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-918893073
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47749/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #33955:
URL: https://github.com/apache/spark/pull/33955#discussion_r707139228
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##########
@@ -3679,22 +3686,44 @@ case class ArrayUnion(left: Expression, right: Expression) extends ArrayBinaryLi
body
}
+ def withNaNCheck(body: String): String = {
+ (elementType match {
+ case DoubleType => Some(s"java.lang.Double.isNaN((double)$value)")
+ case FloatType => Some(s"java.lang.Float.isNaN((float)$value)")
+ case _ => None
+ }).map { isNaN =>
+ s"""
+ |if ($isNaN) {
+ | if (!$hashSet.containsNaN()) {
+ | $size++;
+ | $hashSet.addNaN();
+ | $builder.$$plus$$eq($value);
+ | }
+ |} else {
+ | $body
+ |}
+ """.stripMargin
+ }
+ }.getOrElse(body)
+
val processArray = withArrayNullAssignment(
Review comment:
a probably better code style
```
val body = ...
val processArray = withArrayNullAssignment(
s"""
|$jt $value = ${genGetValue(array, i)};
|${withNaNCheck(body)}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-920084616
The test added here fails:
```
sbt.ForkMain$ForkError: java.lang.ClassCastException: java.lang.Integer cannot be cast to java.lang.Double
at scala.runtime.BoxesRunTime.unboxToDouble(BoxesRunTime.java:112)
at scala.collection.mutable.ArrayBuilder$ofDouble.addOne(ArrayBuilder.scala:402)
at scala.collection.mutable.Growable.$plus$eq(Growable.scala:36)
at scala.collection.mutable.Growable.$plus$eq$(Growable.scala:36)
at scala.collection.mutable.ArrayBuilder.$plus$eq(ArrayBuilder.scala:23)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificMutableProjection.ArrayUnion_0$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificMutableProjection.apply(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.ExpressionEvalHelper.evaluateWithMutableProjection(ExpressionEvalHelper.scala:238)
at org.apache.spark.sql.catalyst.expressions.ExpressionEvalHelper.evaluateWithMutableProjection$(ExpressionEvalHelper.scala:232)
at org.apache.spark.sql.catalyst.expressions.CollectionExpressionsSuite.evaluateWithMutableProjection(CollectionExpressionsSuite.scala:39)
at org.apache.spark.sql.catalyst.expressions.ExpressionEvalHelper.$anonfun$checkEvaluationWithMutableProjection$2(ExpressionEvalHelper.scala:222)
at org.apache.spark.sql.catalyst.plans.SQLHelper.withSQLConf(SQLHelper.scala:54)
at org.apache.spark.sql.catalyst.plans.SQLHelper.withSQLConf$(SQLHelper.scala:38)
at org.apache.spark.sql.catalyst.expressions.CollectionExpressionsSuite.withSQLConf(CollectionExpressionsSuite.scala:39)
at org.apache.spark.sql.catalyst.expressions.ExpressionEvalHelper.$anonfun$checkEvaluationWithMutableProjection$1(ExpressionEvalHelper.scala:221)
at org.apache.spark.sql.catalyst.expressions.ExpressionEvalHelper.$anonfun$checkEvaluationWithMutableProjection$1$adapted(ExpressionEvalHelper.scala:220)
at scala.collection.immutable.List.foreach(List.scala:333)
at org.apache.spark.sql.catalyst.expressions.ExpressionEvalHelper.checkEvaluationWithMutableProjection(ExpressionEvalHelper.scala:220)
at org.apache.spark.sql.catalyst.expressions.ExpressionEvalHelper.checkEvaluationWithMutableProjection$(ExpressionEvalHelper.scala:215)
at org.apache.spark.sql.catalyst.expressions.CollectionExpressionsSuite.checkEvaluationWithMutableProjection(CollectionExpressionsSuite.scala:39)
at org.apache.spark.sql.catalyst.expressions.ExpressionEvalHelper.checkEvaluation(ExpressionEvalHelper.scala:88)
at org.apache.spark.sql.catalyst.expressions.ExpressionEvalHelper.checkEvaluation$(ExpressionEvalHelper.scala:82)
```
https://github.com/apache/spark/runs/3606700233
I wonder how it passed in the PR tests.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-918797641
**[Test build #143239 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143239/testReport)** for PR 33955 at commit [`fe407c9`](https://github.com/apache/spark/commit/fe407c9325716f6ba8fd637e05e54dd208c8ab69).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-919023006
thanks, merging to master/3.2/3.1
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
AngersZhuuuu commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-916757280
ping @cloud-fan
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #33955:
URL: https://github.com/apache/spark/pull/33955#discussion_r707139812
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/util/SQLOpenHashSet.scala
##########
@@ -0,0 +1,74 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.util
+
+import scala.reflect._
+
+import org.apache.spark.annotation.Private
+import org.apache.spark.sql.types.{DataType, DoubleType, FloatType}
+import org.apache.spark.util.collection.OpenHashSet
+
+/**
+ * A wrap of [[OpenHashSet]] that can handle null, Double.NaN and Float.NaN w.r.t. the SQL semantic.
+ */
+@Private
+class SQLOpenHashSet[@specialized(Long, Int, Double, Float) T: ClassTag](
Review comment:
can we add a UT suite for it?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-917988334
**[Test build #143198 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143198/testReport)** for PR 33955 at commit [`3059ea1`](https://github.com/apache/spark/commit/3059ea1d526731c0635a66c48c9154ab259f51da).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
AngersZhuuuu commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-919004198
ping @cloud-fan
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-918859959
**[Test build #143246 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143246/testReport)** for PR 33955 at commit [`4e5e085`](https://github.com/apache/spark/commit/4e5e08526ffe96eaa5add069aef9467948730755).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
AngersZhuuuu commented on a change in pull request #33955:
URL: https://github.com/apache/spark/pull/33955#discussion_r707102491
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##########
@@ -3679,22 +3686,38 @@ case class ArrayUnion(left: Expression, right: Expression) extends ArrayBinaryLi
body
}
+ val isNaN = elementType match {
Review comment:
Done
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-918900036
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47749/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-918523381
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143213/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #33955:
URL: https://github.com/apache/spark/pull/33955#discussion_r707093906
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##########
@@ -3679,22 +3686,38 @@ case class ArrayUnion(left: Expression, right: Expression) extends ArrayBinaryLi
body
}
+ val isNaN = elementType match {
Review comment:
```
def withNaNCheck(body: String): String = {
(elementType match {
case DoubleType => Some(...)
case FloatType => Some(...)
case _ => None
}).map { isNaN =>
s"""
| if (isNal) ... else $body
"""
}
}.getOrElse(body)
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-918775974
**[Test build #143235 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143235/testReport)** for PR 33955 at commit [`f59c0a8`](https://github.com/apache/spark/commit/f59c0a87c8792e6551f719f78b5049bfc5a2f917).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-918900036
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47749/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-918978532
**[Test build #143239 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143239/testReport)** for PR 33955 at commit [`fe407c9`](https://github.com/apache/spark/commit/fe407c9325716f6ba8fd637e05e54dd208c8ab69).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #33955:
URL: https://github.com/apache/spark/pull/33955#discussion_r707040485
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##########
@@ -3575,15 +3576,15 @@ case class ArrayUnion(left: Expression, right: Expression) extends ArrayBinaryLi
if (TypeUtils.typeWithProperEquals(elementType)) {
(array1, array2) =>
val arrayBuffer = new scala.collection.mutable.ArrayBuffer[Any]
- val hs = new OpenHashSet[Any]
+ val hs = new SQLOpenHashSet[Any]
var foundNullElement = false
Review comment:
we can remove this now
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-918798632
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47737/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-918798604
Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47737/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-918340039
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47714/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-917928352
**[Test build #143193 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143193/testReport)** for PR 33955 at commit [`89a4263`](https://github.com/apache/spark/commit/89a426374c0873c48e738d96d0f46f99b6e39f6d).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-918040810
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #33955:
URL: https://github.com/apache/spark/pull/33955#discussion_r707042392
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/util/SQLOpenHashSet.scala
##########
@@ -0,0 +1,72 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.util
+
+import scala.reflect._
+
+import org.apache.spark.annotation.Private
+import org.apache.spark.util.collection.OpenHashSet
+
+/**
+ * A wrap of [[OpenHashSet]] that can handle null, Double.NaN and Float.NaN w.r.t. the SQL semantic.
+ */
+@Private
+class SQLOpenHashSet[@specialized(Long, Int, Double, Float) T: ClassTag](
+ initialCapacity: Int,
+ loadFactor: Double) {
+
+ def this(initialCapacity: Int) = this(initialCapacity, 0.7)
+
+ def this() = this(64)
+
+ private val hashSet = new OpenHashSet[T](initialCapacity, loadFactor)
+
+ private var containNull = false
+ private var containNaN = false
+
+ def addNull(): Unit = {
+ containNull = true
+ }
+
+ def addNaN(): Unit = {
+ containNaN = true
+ }
+
+ def add(k: T): Unit = {
+ hashSet.add(k)
+ }
+
+ def contains(k: T): Boolean = {
Review comment:
shall we add a method `containsNaN`? checking NaN by reflection is pretty slow
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
AngersZhuuuu commented on a change in pull request #33955:
URL: https://github.com/apache/spark/pull/33955#discussion_r706276419
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/util/SQLOpenHashSet.scala
##########
@@ -0,0 +1,66 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.util
+
+import scala.reflect._
+
+import org.apache.spark.annotation.Private
+import org.apache.spark.util.collection.OpenHashSet
+
+/**
+ * A wrap of [[OpenHashSet]] that can handle null, Double.NaN and Float.NaN.
Review comment:
Done
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-916750680
**[Test build #143142 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143142/testReport)** for PR 33955 at commit [`8579c97`](https://github.com/apache/spark/commit/8579c9769df6bfe4f59dda612661d402938867a3).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-918824358
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47741/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-918058441
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47702/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-918529528
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143212/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-917791275
**[Test build #143182 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143182/testReport)** for PR 33955 at commit [`119679c`](https://github.com/apache/spark/commit/119679cfc5884928d9fa368f683689a214d01912).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-918016407
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-918031394
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47700/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-916804017
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47646/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-918953496
**[Test build #143238 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143238/testReport)** for PR 33955 at commit [`f27c4e1`](https://github.com/apache/spark/commit/f27c4e12530e7d98eefb49bd9631f5d19785d9c2).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-918798632
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47737/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-918213551
**[Test build #143198 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143198/testReport)** for PR 33955 at commit [`3059ea1`](https://github.com/apache/spark/commit/3059ea1d526731c0635a66c48c9154ab259f51da).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-918216168
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143198/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-917417677
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47669/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-918824140
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47742/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #33955:
URL: https://github.com/apache/spark/pull/33955#discussion_r707041873
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/util/SQLOpenHashSet.scala
##########
@@ -0,0 +1,72 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.util
+
+import scala.reflect._
+
+import org.apache.spark.annotation.Private
+import org.apache.spark.util.collection.OpenHashSet
+
+/**
+ * A wrap of [[OpenHashSet]] that can handle null, Double.NaN and Float.NaN w.r.t. the SQL semantic.
+ */
+@Private
+class SQLOpenHashSet[@specialized(Long, Int, Double, Float) T: ClassTag](
+ initialCapacity: Int,
+ loadFactor: Double) {
+
+ def this(initialCapacity: Int) = this(initialCapacity, 0.7)
+
+ def this() = this(64)
+
+ private val hashSet = new OpenHashSet[T](initialCapacity, loadFactor)
+
+ private var containNull = false
+ private var containNaN = false
+
+ def addNull(): Unit = {
+ containNull = true
+ }
+
+ def addNaN(): Unit = {
+ containNaN = true
+ }
+
+ def add(k: T): Unit = {
+ hashSet.add(k)
+ }
+
+ def contains(k: T): Boolean = {
+ if (SQLOpenHashSet.isNaN(k)) {
+ containNaN
+ } else {
+ hashSet.contains(k)
+ }
+ }
+
+ def containsNull(): Boolean = containNull
+}
+
+object SQLOpenHashSet {
+ def isNaN(value: Any): Boolean = {
+ (value.isInstanceOf[java.lang.Double] &&
Review comment:
this looks very slow. At least in codegen, we can write `java.lang.Float/Double.isNaN` based on the data type
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-918003002
**[Test build #143188 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143188/testReport)** for PR 33955 at commit [`8d0e4a9`](https://github.com/apache/spark/commit/8d0e4a9cbf51cebdaebd5f52303e785dd69da31b).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-917835861
**[Test build #143188 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143188/testReport)** for PR 33955 at commit [`8d0e4a9`](https://github.com/apache/spark/commit/8d0e4a9cbf51cebdaebd5f52303e785dd69da31b).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-917409055
**[Test build #143165 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143165/testReport)** for PR 33955 at commit [`4e533fd`](https://github.com/apache/spark/commit/4e533fdabcae676560f9396442df4f4993cc2f67).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-917084869
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143142/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-916998634
**[Test build #143152 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143152/testReport)** for PR 33955 at commit [`1857988`](https://github.com/apache/spark/commit/18579884948898f9a9f6e15046fd807a2d294f7e).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-918783434
**[Test build #143238 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143238/testReport)** for PR 33955 at commit [`f27c4e1`](https://github.com/apache/spark/commit/f27c4e12530e7d98eefb49bd9631f5d19785d9c2).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-918064003
Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47702/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #33955:
URL: https://github.com/apache/spark/pull/33955#discussion_r707139812
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/util/SQLOpenHashSet.scala
##########
@@ -0,0 +1,74 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.util
+
+import scala.reflect._
+
+import org.apache.spark.annotation.Private
+import org.apache.spark.sql.types.{DataType, DoubleType, FloatType}
+import org.apache.spark.util.collection.OpenHashSet
+
+/**
+ * A wrap of [[OpenHashSet]] that can handle null, Double.NaN and Float.NaN w.r.t. the SQL semantic.
+ */
+@Private
+class SQLOpenHashSet[@specialized(Long, Int, Double, Float) T: ClassTag](
Review comment:
can we add a UT suite for it?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-918900014
Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47749/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-917858174
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47690/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-918288358
**[Test build #143212 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143212/testReport)** for PR 33955 at commit [`db8159e`](https://github.com/apache/spark/commit/db8159e3676ee5d137e5ebcbc94d92576f7ca0aa).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-918824358
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47741/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-918288358
**[Test build #143212 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143212/testReport)** for PR 33955 at commit [`db8159e`](https://github.com/apache/spark/commit/db8159e3676ee5d137e5ebcbc94d92576f7ca0aa).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-918528199
**[Test build #143212 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143212/testReport)** for PR 33955 at commit [`db8159e`](https://github.com/apache/spark/commit/db8159e3676ee5d137e5ebcbc94d92576f7ca0aa).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-918218723
**[Test build #143199 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143199/testReport)** for PR 33955 at commit [`991fddd`](https://github.com/apache/spark/commit/991fddd22d80a9e7e946ba679c9582fc14a33ba6).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-918202888
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143196/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #33955:
URL: https://github.com/apache/spark/pull/33955#discussion_r706177022
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/util/SQLOpenHashSet.scala
##########
@@ -0,0 +1,66 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.util
+
+import scala.reflect._
+
+import org.apache.spark.annotation.Private
+import org.apache.spark.util.collection.OpenHashSet
+
+/**
+ * A wrap of [[OpenHashSet]] that can handle null, Double.NaN and Float.NaN.
+ */
+@Private
+class SQLOpenHashSet[@specialized(Long, Int, Double, Float) T: ClassTag](
+ initialCapacity: Int,
+ loadFactor: Double) {
+
+ def this(initialCapacity: Int) = this(initialCapacity, 0.7)
+
+ def this() = this(64)
+
+ private val hashSet = new OpenHashSet[T](initialCapacity, loadFactor)
+
+ private var containsNull = false
+ private var containsDoubleNaN = false
+ private var containsFloatNaN = false
Review comment:
Maybe we should do the null/nan check at the caller side
```
class SQLOpenHashSet ... {
def add(k: T)
def addNull()
def addNaN()
}
// caller side
if (row.isNullAt...) {
set.addNull()
} else {
...
if (java.lang.Double.isNaN(value)) {
set.addNaN()
}
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-917042176
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47656/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
AngersZhuuuu commented on a change in pull request #33955:
URL: https://github.com/apache/spark/pull/33955#discussion_r706276531
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/util/SQLOpenHashSet.scala
##########
@@ -0,0 +1,66 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.util
+
+import scala.reflect._
+
+import org.apache.spark.annotation.Private
+import org.apache.spark.util.collection.OpenHashSet
+
+/**
+ * A wrap of [[OpenHashSet]] that can handle null, Double.NaN and Float.NaN.
+ */
+@Private
+class SQLOpenHashSet[@specialized(Long, Int, Double, Float) T: ClassTag](
+ initialCapacity: Int,
+ loadFactor: Double) {
+
+ def this(initialCapacity: Int) = this(initialCapacity, 0.7)
+
+ def this() = this(64)
+
+ private val hashSet = new OpenHashSet[T](initialCapacity, loadFactor)
+
+ private var containsNull = false
+ private var containsDoubleNaN = false
+ private var containsFloatNaN = false
Review comment:
How about current
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##########
@@ -3649,61 +3643,37 @@ case class ArrayUnion(left: Expression, right: Expression) extends ArrayBinaryLi
val ptName = CodeGenerator.primitiveTypeName(jt)
nullSafeCodeGen(ctx, ev, (array1, array2) => {
- val foundNullElement = ctx.freshName("foundNullElement")
val nullElementIndex = ctx.freshName("nullElementIndex")
val builder = ctx.freshName("builder")
val array = ctx.freshName("array")
val arrays = ctx.freshName("arrays")
val arrayDataIdx = ctx.freshName("arrayDataIdx")
- val openHashSet = classOf[OpenHashSet[_]].getName
+ val openHashSet = classOf[SQLOpenHashSet[_]].getName
val classTag = s"scala.reflect.ClassTag$$.MODULE$$.$hsTypeName()"
val hashSet = ctx.freshName("hashSet")
val arrayBuilder = classOf[mutable.ArrayBuilder[_]].getName
val arrayBuilderClass = s"$arrayBuilder$$of$ptName"
- def withArrayNullAssignment(body: String) =
- if (dataType.asInstanceOf[ArrayType].containsNull) {
- s"""
- |if ($array.isNullAt($i)) {
- | if (!$foundNullElement) {
- | $nullElementIndex = $size;
- | $foundNullElement = true;
- | $size++;
- | $builder.$$plus$$eq($nullValueHolder);
- | }
- |} else {
- | $body
- |}
- """.stripMargin
- } else {
- body
- }
-
- val processArray = withArrayNullAssignment(
+ val processArray =
s"""
|$jt $value = ${genGetValue(array, i)};
Review comment:
Done
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-917988334
**[Test build #143198 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143198/testReport)** for PR 33955 at commit [`3059ea1`](https://github.com/apache/spark/commit/3059ea1d526731c0635a66c48c9154ab259f51da).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
AngersZhuuuu commented on a change in pull request #33955:
URL: https://github.com/apache/spark/pull/33955#discussion_r706155804
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/util/SQLOpenHashSet.scala
##########
@@ -0,0 +1,66 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.util
+
+import scala.reflect._
+
+import org.apache.spark.annotation.Private
+import org.apache.spark.util.collection.OpenHashSet
+
+/**
+ * A wrap of [[OpenHashSet]] that can handle null, Double.NaN and Float.NaN.
+ */
+@Private
+class SQLOpenHashSet[@specialized(Long, Int, Double, Float) T: ClassTag](
+ initialCapacity: Int,
+ loadFactor: Double) {
+
+ def this(initialCapacity: Int) = this(initialCapacity, 0.7)
+
+ def this() = this(64)
+
+ private val hashSet = new OpenHashSet[T](initialCapacity, loadFactor)
+
+ private var containsNull = false
+ private var containsDoubleNaN = false
+ private var containsFloatNaN = false
Review comment:
> The data added to this set will always be the same data type. I think we can just have a single `containsNaN` flag.
I have thought about this too, but since it can support any type, so keep this may be better?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-918374932
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47715/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-917812261
**[Test build #143186 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143186/testReport)** for PR 33955 at commit [`45d1fee`](https://github.com/apache/spark/commit/45d1feebbbafb59bc5acf9524b3d4c33761060bf).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-917851170
Kubernetes integration test unable to build dist.
exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47690/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-917792860
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143182/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-918343433
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47715/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-917077242
**[Test build #143142 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143142/testReport)** for PR 33955 at commit [`8579c97`](https://github.com/apache/spark/commit/8579c9769df6bfe4f59dda612661d402938867a3).
* This patch **fails from timeout after a configured wait of `500m`**.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
* `class SQLOpenHashSet[@specialized(Long, Int, Double, Float) T: ClassTag](`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-918523381
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143213/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-918955109
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143238/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-918040810
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47700/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-919023223
@AngersZhuuuu can you open a backport PR for 3.0?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-918775974
**[Test build #143235 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143235/testReport)** for PR 33955 at commit [`f59c0a8`](https://github.com/apache/spark/commit/f59c0a87c8792e6551f719f78b5049bfc5a2f917).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-919326129
cc @sunchao and @viirya
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-919071645
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143246/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-917194384
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143152/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-916792864
Kubernetes integration test unable to build dist.
exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47646/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-917417677
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47669/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-917792845
**[Test build #143182 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143182/testReport)** for PR 33955 at commit [`119679c`](https://github.com/apache/spark/commit/119679cfc5884928d9fa368f683689a214d01912).
* This patch **fails to build**.
* This patch merges cleanly.
* This patch adds no public classes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-917814737
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47688/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-917973756
**[Test build #143196 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143196/testReport)** for PR 33955 at commit [`08da413`](https://github.com/apache/spark/commit/08da4130599d35b2b4a2af1a40a00550617e447f).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-917858174
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47690/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
AngersZhuuuu commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-917789034
ping @cloud-fan
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-918828436
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47742/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-918989931
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143239/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-917973756
**[Test build #143196 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143196/testReport)** for PR 33955 at commit [`08da413`](https://github.com/apache/spark/commit/08da4130599d35b2b4a2af1a40a00550617e447f).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
AngersZhuuuu commented on a change in pull request #33955:
URL: https://github.com/apache/spark/pull/33955#discussion_r707071732
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##########
@@ -3575,15 +3576,15 @@ case class ArrayUnion(left: Expression, right: Expression) extends ArrayBinaryLi
if (TypeUtils.typeWithProperEquals(elementType)) {
(array1, array2) =>
val arrayBuffer = new scala.collection.mutable.ArrayBuffer[Any]
- val hs = new OpenHashSet[Any]
+ val hs = new SQLOpenHashSet[Any]
var foundNullElement = false
Review comment:
Done
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/util/SQLOpenHashSet.scala
##########
@@ -0,0 +1,72 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.util
+
+import scala.reflect._
+
+import org.apache.spark.annotation.Private
+import org.apache.spark.util.collection.OpenHashSet
+
+/**
+ * A wrap of [[OpenHashSet]] that can handle null, Double.NaN and Float.NaN w.r.t. the SQL semantic.
+ */
+@Private
+class SQLOpenHashSet[@specialized(Long, Int, Double, Float) T: ClassTag](
+ initialCapacity: Int,
+ loadFactor: Double) {
+
+ def this(initialCapacity: Int) = this(initialCapacity, 0.7)
+
+ def this() = this(64)
+
+ private val hashSet = new OpenHashSet[T](initialCapacity, loadFactor)
+
+ private var containNull = false
+ private var containNaN = false
+
+ def addNull(): Unit = {
+ containNull = true
+ }
+
+ def addNaN(): Unit = {
+ containNaN = true
+ }
+
+ def add(k: T): Unit = {
+ hashSet.add(k)
+ }
+
+ def contains(k: T): Boolean = {
+ if (SQLOpenHashSet.isNaN(k)) {
+ containNaN
+ } else {
+ hashSet.contains(k)
+ }
+ }
+
+ def containsNull(): Boolean = containNull
+}
+
+object SQLOpenHashSet {
+ def isNaN(value: Any): Boolean = {
+ (value.isInstanceOf[java.lang.Double] &&
Review comment:
How about current?
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/util/SQLOpenHashSet.scala
##########
@@ -0,0 +1,72 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.util
+
+import scala.reflect._
+
+import org.apache.spark.annotation.Private
+import org.apache.spark.util.collection.OpenHashSet
+
+/**
+ * A wrap of [[OpenHashSet]] that can handle null, Double.NaN and Float.NaN w.r.t. the SQL semantic.
+ */
+@Private
+class SQLOpenHashSet[@specialized(Long, Int, Double, Float) T: ClassTag](
+ initialCapacity: Int,
+ loadFactor: Double) {
+
+ def this(initialCapacity: Int) = this(initialCapacity, 0.7)
+
+ def this() = this(64)
+
+ private val hashSet = new OpenHashSet[T](initialCapacity, loadFactor)
+
+ private var containNull = false
+ private var containNaN = false
+
+ def addNull(): Unit = {
+ containNull = true
+ }
+
+ def addNaN(): Unit = {
+ containNaN = true
+ }
+
+ def add(k: T): Unit = {
+ hashSet.add(k)
+ }
+
+ def contains(k: T): Boolean = {
Review comment:
How about current?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-918820137
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47741/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-918859959
**[Test build #143246 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143246/testReport)** for PR 33955 at commit [`4e5e085`](https://github.com/apache/spark/commit/4e5e08526ffe96eaa5add069aef9467948730755).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-917812261
**[Test build #143186 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143186/testReport)** for PR 33955 at commit [`45d1fee`](https://github.com/apache/spark/commit/45d1feebbbafb59bc5acf9524b3d4c33761060bf).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-918354595
Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47715/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-918783434
**[Test build #143238 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143238/testReport)** for PR 33955 at commit [`f27c4e1`](https://github.com/apache/spark/commit/f27c4e12530e7d98eefb49bd9631f5d19785d9c2).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #33955:
URL: https://github.com/apache/spark/pull/33955#discussion_r707145253
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##########
@@ -3679,22 +3686,42 @@ case class ArrayUnion(left: Expression, right: Expression) extends ArrayBinaryLi
body
}
- val processArray = withArrayNullAssignment(
+ def withNaNCheck(body: String): String = {
+ (elementType match {
+ case DoubleType => Some(s"java.lang.Double.isNaN((double)$value)")
+ case FloatType => Some(s"java.lang.Float.isNaN((float)$value)")
+ case _ => None
+ }).map { isNaN =>
+ s"""
+ |if ($isNaN) {
+ | if (!$hashSet.containsNaN()) {
+ | $size++;
+ | $hashSet.addNaN();
+ | $builder.$$plus$$eq($value);
+ | }
+ |} else {
+ | $body
+ |}
+ """.stripMargin
+ }
+ }.getOrElse(body)
+
+ val body =
s"""
- |$jt $value = ${genGetValue(array, i)};
|if (!$hashSet.contains($hsValueCast$value)) {
| if (++$size > ${ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH}) {
| break;
| }
| $hashSet.add$hsPostFix($hsValueCast$value);
| $builder.$$plus$$eq($value);
|}
- """.stripMargin)
+ """.stripMargin
+ val processArray =
+ withArrayNullAssignment(s"$jt $value = ${genGetValue(array, i)};" ++ withNaNCheck(body))
Review comment:
```suggestion
withArrayNullAssignment(s"$jt $value = ${genGetValue(array, i)};" + withNaNCheck(body))
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-919071645
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143246/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-917993869
**[Test build #143199 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143199/testReport)** for PR 33955 at commit [`991fddd`](https://github.com/apache/spark/commit/991fddd22d80a9e7e946ba679c9582fc14a33ba6).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-918151004
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143193/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-918942410
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143235/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-917792860
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143182/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-918221136
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143199/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-919069304
**[Test build #143246 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143246/testReport)** for PR 33955 at commit [`4e5e085`](https://github.com/apache/spark/commit/4e5e08526ffe96eaa5add069aef9467948730755).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-918192845
**[Test build #143196 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143196/testReport)** for PR 33955 at commit [`08da413`](https://github.com/apache/spark/commit/08da4130599d35b2b4a2af1a40a00550617e447f).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-917451396
**[Test build #143165 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143165/testReport)** for PR 33955 at commit [`4e533fd`](https://github.com/apache/spark/commit/4e533fdabcae676560f9396442df4f4993cc2f67).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
AngersZhuuuu commented on a change in pull request #33955:
URL: https://github.com/apache/spark/pull/33955#discussion_r706725649
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##########
@@ -3550,6 +3551,10 @@ object ArrayBinaryLike {
def throwUnionLengthOverflowException(length: Int): Unit = {
throw QueryExecutionErrors.unionArrayWithElementsExceedLimitError(length)
}
+
+ def isNaN(value: Any): Boolean = {
+ Double.NaN.equals(value) || Float.NaN.equals(value)
Review comment:
Ok, done
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-917194384
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143152/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-917453739
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143165/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-917813739
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143186/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-918216168
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143198/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-918955109
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143238/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-916750680
**[Test build #143142 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143142/testReport)** for PR 33955 at commit [`8579c97`](https://github.com/apache/spark/commit/8579c9769df6bfe4f59dda612661d402938867a3).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-917084869
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143142/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-917409055
**[Test build #143165 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143165/testReport)** for PR 33955 at commit [`4e533fd`](https://github.com/apache/spark/commit/4e533fdabcae676560f9396442df4f4993cc2f67).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cfmcgrady commented on a change in pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
cfmcgrady commented on a change in pull request #33955:
URL: https://github.com/apache/spark/pull/33955#discussion_r706304078
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##########
@@ -3550,6 +3551,10 @@ object ArrayBinaryLike {
def throwUnionLengthOverflowException(length: Int): Unit = {
throw QueryExecutionErrors.unionArrayWithElementsExceedLimitError(length)
}
+
+ def isNaN(value: Any): Boolean = {
+ Double.NaN.equals(value) || Float.NaN.equals(value)
Review comment:
Seems `Double.NaN.equals(value)` can't work with `Scala-2.13` together, we need use `java.lang.Double.isNaN()` instead.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-916804017
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47646/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #33955:
URL: https://github.com/apache/spark/pull/33955#discussion_r706154056
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/util/SQLOpenHashSet.scala
##########
@@ -0,0 +1,66 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.util
+
+import scala.reflect._
+
+import org.apache.spark.annotation.Private
+import org.apache.spark.util.collection.OpenHashSet
+
+/**
+ * A wrap of [[OpenHashSet]] that can handle null, Double.NaN and Float.NaN.
Review comment:
```
A wrap of [[OpenHashSet]] that can handle null, Double.NaN and Float.NaN w.r.t. the SQL semantic.
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-917453739
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143165/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-918989931
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143239/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-917965039
Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47695/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-921524320
This is so weird. There is no randomness in the test. How frequently do we see the test failure?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-917813739
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143186/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-918065896
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47702/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-918828405
Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47742/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-918040743
Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47700/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-918151004
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143193/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-918003584
Kubernetes integration test unable to build dist.
exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47698/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-918339982
Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47714/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-920472900
Thanks guys.This is possibly flaky. I'll keep my eyes on the build.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
AngersZhuuuu commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-920095421
> The test added here fails:
>
> ```
> sbt.ForkMain$ForkError: java.lang.ClassCastException: java.lang.Integer cannot be cast to java.lang.Double
> at scala.runtime.BoxesRunTime.unboxToDouble(BoxesRunTime.java:112)
> at scala.collection.mutable.ArrayBuilder$ofDouble.addOne(ArrayBuilder.scala:402)
> at scala.collection.mutable.Growable.$plus$eq(Growable.scala:36)
> at scala.collection.mutable.Growable.$plus$eq$(Growable.scala:36)
> at scala.collection.mutable.ArrayBuilder.$plus$eq(ArrayBuilder.scala:23)
> at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificMutableProjection.ArrayUnion_0$(Unknown Source)
> at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificMutableProjection.apply(Unknown Source)
> at org.apache.spark.sql.catalyst.expressions.ExpressionEvalHelper.evaluateWithMutableProjection(ExpressionEvalHelper.scala:238)
> at org.apache.spark.sql.catalyst.expressions.ExpressionEvalHelper.evaluateWithMutableProjection$(ExpressionEvalHelper.scala:232)
> at org.apache.spark.sql.catalyst.expressions.CollectionExpressionsSuite.evaluateWithMutableProjection(CollectionExpressionsSuite.scala:39)
> at org.apache.spark.sql.catalyst.expressions.ExpressionEvalHelper.$anonfun$checkEvaluationWithMutableProjection$2(ExpressionEvalHelper.scala:222)
> at org.apache.spark.sql.catalyst.plans.SQLHelper.withSQLConf(SQLHelper.scala:54)
> at org.apache.spark.sql.catalyst.plans.SQLHelper.withSQLConf$(SQLHelper.scala:38)
> at org.apache.spark.sql.catalyst.expressions.CollectionExpressionsSuite.withSQLConf(CollectionExpressionsSuite.scala:39)
> at org.apache.spark.sql.catalyst.expressions.ExpressionEvalHelper.$anonfun$checkEvaluationWithMutableProjection$1(ExpressionEvalHelper.scala:221)
> at org.apache.spark.sql.catalyst.expressions.ExpressionEvalHelper.$anonfun$checkEvaluationWithMutableProjection$1$adapted(ExpressionEvalHelper.scala:220)
> at scala.collection.immutable.List.foreach(List.scala:333)
> at org.apache.spark.sql.catalyst.expressions.ExpressionEvalHelper.checkEvaluationWithMutableProjection(ExpressionEvalHelper.scala:220)
> at org.apache.spark.sql.catalyst.expressions.ExpressionEvalHelper.checkEvaluationWithMutableProjection$(ExpressionEvalHelper.scala:215)
> at org.apache.spark.sql.catalyst.expressions.CollectionExpressionsSuite.checkEvaluationWithMutableProjection(CollectionExpressionsSuite.scala:39)
> at org.apache.spark.sql.catalyst.expressions.ExpressionEvalHelper.checkEvaluation(ExpressionEvalHelper.scala:88)
> at org.apache.spark.sql.catalyst.expressions.ExpressionEvalHelper.checkEvaluation$(ExpressionEvalHelper.scala:82)
> ```
>
> https://github.com/apache/spark/runs/3606700233
>
> I wonder how it passed in the PR tests.
let me check
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-917928352
**[Test build #143193 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143193/testReport)** for PR 33955 at commit [`89a4263`](https://github.com/apache/spark/commit/89a426374c0873c48e738d96d0f46f99b6e39f6d).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-917813715
**[Test build #143186 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143186/testReport)** for PR 33955 at commit [`45d1fee`](https://github.com/apache/spark/commit/45d1feebbbafb59bc5acf9524b3d4c33761060bf).
* This patch **fails to build**.
* This patch merges cleanly.
* This patch adds no public classes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-918522029
**[Test build #143213 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143213/testReport)** for PR 33955 at commit [`eb1f028`](https://github.com/apache/spark/commit/eb1f02819db9861604945a522dfbb85daa6cca43).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-918294963
**[Test build #143213 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143213/testReport)** for PR 33955 at commit [`eb1f028`](https://github.com/apache/spark/commit/eb1f02819db9861604945a522dfbb85daa6cca43).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-918202888
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143196/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #33955:
URL: https://github.com/apache/spark/pull/33955#discussion_r706154695
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/util/SQLOpenHashSet.scala
##########
@@ -0,0 +1,66 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.util
+
+import scala.reflect._
+
+import org.apache.spark.annotation.Private
+import org.apache.spark.util.collection.OpenHashSet
+
+/**
+ * A wrap of [[OpenHashSet]] that can handle null, Double.NaN and Float.NaN.
+ */
+@Private
+class SQLOpenHashSet[@specialized(Long, Int, Double, Float) T: ClassTag](
+ initialCapacity: Int,
+ loadFactor: Double) {
+
+ def this(initialCapacity: Int) = this(initialCapacity, 0.7)
+
+ def this() = this(64)
+
+ private val hashSet = new OpenHashSet[T](initialCapacity, loadFactor)
+
+ private var containsNull = false
+ private var containsDoubleNaN = false
+ private var containsFloatNaN = false
Review comment:
The data added to this set will always be the same data type. I think we can just have a single `containsNaN` flag.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #33955:
URL: https://github.com/apache/spark/pull/33955#discussion_r706158509
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##########
@@ -3649,61 +3643,37 @@ case class ArrayUnion(left: Expression, right: Expression) extends ArrayBinaryLi
val ptName = CodeGenerator.primitiveTypeName(jt)
nullSafeCodeGen(ctx, ev, (array1, array2) => {
- val foundNullElement = ctx.freshName("foundNullElement")
val nullElementIndex = ctx.freshName("nullElementIndex")
val builder = ctx.freshName("builder")
val array = ctx.freshName("array")
val arrays = ctx.freshName("arrays")
val arrayDataIdx = ctx.freshName("arrayDataIdx")
- val openHashSet = classOf[OpenHashSet[_]].getName
+ val openHashSet = classOf[SQLOpenHashSet[_]].getName
val classTag = s"scala.reflect.ClassTag$$.MODULE$$.$hsTypeName()"
val hashSet = ctx.freshName("hashSet")
val arrayBuilder = classOf[mutable.ArrayBuilder[_]].getName
val arrayBuilderClass = s"$arrayBuilder$$of$ptName"
- def withArrayNullAssignment(body: String) =
- if (dataType.asInstanceOf[ArrayType].containsNull) {
- s"""
- |if ($array.isNullAt($i)) {
- | if (!$foundNullElement) {
- | $nullElementIndex = $size;
- | $foundNullElement = true;
- | $size++;
- | $builder.$$plus$$eq($nullValueHolder);
- | }
- |} else {
- | $body
- |}
- """.stripMargin
- } else {
- body
- }
-
- val processArray = withArrayNullAssignment(
+ val processArray =
s"""
|$jt $value = ${genGetValue(array, i)};
Review comment:
The value can be primitive type, and we should make sure it's not null before calling `SQLOpenHashSet.add/contains`. We should still follow the previous code style.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-917027368
Kubernetes integration test unable to build dist.
exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47656/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cfmcgrady commented on a change in pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
cfmcgrady commented on a change in pull request #33955:
URL: https://github.com/apache/spark/pull/33955#discussion_r706325696
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/util/SQLOpenHashSet.scala
##########
@@ -0,0 +1,65 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.util
+
+import scala.reflect._
+
+import org.apache.spark.annotation.Private
+import org.apache.spark.util.collection.OpenHashSet
+
+/**
+ * A wrap of [[OpenHashSet]] that can handle null, Double.NaN and Float.NaN w.r.t. the SQL semantic.
+ */
+@Private
+class SQLOpenHashSet[@specialized(Long, Int, Double, Float) T: ClassTag](
+ initialCapacity: Int,
+ loadFactor: Double) {
+
+ def this(initialCapacity: Int) = this(initialCapacity, 0.7)
+
+ def this() = this(64)
+
+ private val hashSet = new OpenHashSet[T](initialCapacity, loadFactor)
+
+ private var containNull = false
+ private var containNaN = false
+
+ def addNull(): Unit = {
+ containNull = true
+ }
+
+ def addNaN(): Unit = {
+ containNaN = true
+ }
+
+ def add(k: T): Unit = {
+ hashSet.add(k)
+ }
+
+ def contains(k: T): Boolean = {
+ if (Double.NaN.equals(k)) {
Review comment:
ditto.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-917042176
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47656/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
AngersZhuuuu commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-920100427
@HyukjinKwon This commit pass the check ac8bce83e7abb01fcea9e53a67a695e31aef7b6a https://github.com/apache/spark/pull/34006/commits
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-918938930
**[Test build #143235 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143235/testReport)** for PR 33955 at commit [`f59c0a8`](https://github.com/apache/spark/commit/f59c0a87c8792e6551f719f78b5049bfc5a2f917).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
* `public class NettyLogger `
* `public final class AlwaysFalse extends Filter `
* `public final class AlwaysTrue extends Filter `
* `public final class And extends BinaryFilter `
* `abstract class BinaryComparison extends Filter `
* `abstract class BinaryFilter extends Filter `
* `public final class EqualNullSafe extends BinaryComparison `
* `public final class EqualTo extends BinaryComparison `
* `public abstract class Filter implements Expression `
* `public final class GreaterThan extends BinaryComparison `
* `public final class GreaterThanOrEqual extends BinaryComparison `
* `public final class In extends Filter `
* `public final class IsNotNull extends Filter `
* `public final class IsNull extends Filter `
* `public final class LessThan extends BinaryComparison `
* `public final class LessThanOrEqual extends BinaryComparison `
* `public final class Not extends Filter `
* `public final class Or extends BinaryFilter `
* `public final class StringContains extends StringPredicate `
* `public final class StringEndsWith extends StringPredicate `
* `abstract class StringPredicate extends Filter `
* `public final class StringStartsWith extends StringPredicate `
* `case class OptimizeSkewedJoin(`
* `case class SkewJoinAwareCost(`
* `case class SimpleCostEvaluator(forceOptimizeSkewedJoin: Boolean) extends CostEvaluator `
* `case class EnsureRequirements(`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-918942410
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143235/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-918529528
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143212/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-918332492
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47714/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-917958250
Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47695/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-917792054
Kubernetes integration test unable to build dist.
exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47685/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-917792072
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47685/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #33955:
URL: https://github.com/apache/spark/pull/33955#discussion_r707137974
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##########
@@ -3679,22 +3686,44 @@ case class ArrayUnion(left: Expression, right: Expression) extends ArrayBinaryLi
body
}
+ def withNaNCheck(body: String): String = {
+ (elementType match {
+ case DoubleType => Some(s"java.lang.Double.isNaN((double)$value)")
+ case FloatType => Some(s"java.lang.Float.isNaN((float)$value)")
+ case _ => None
+ }).map { isNaN =>
+ s"""
+ |if ($isNaN) {
+ | if (!$hashSet.containsNaN()) {
+ | $size++;
+ | $hashSet.addNaN();
+ | $builder.$$plus$$eq($value);
+ | }
+ |} else {
+ | $body
+ |}
+ """.stripMargin
+ }
+ }.getOrElse(body)
+
val processArray = withArrayNullAssignment(
s"""
|$jt $value = ${genGetValue(array, i)};
Review comment:
now it's one line, we don't need to use multi-line syntax here
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-918294963
**[Test build #143213 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143213/testReport)** for PR 33955 at commit [`eb1f028`](https://github.com/apache/spark/commit/eb1f02819db9861604945a522dfbb85daa6cca43).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33955: [SPARK-36702][SQL] ArrayUnion handle duplicated Double.NaN and Float.Nan
Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33955:
URL: https://github.com/apache/spark/pull/33955#issuecomment-918139222
**[Test build #143193 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143193/testReport)** for PR 33955 at commit [`89a4263`](https://github.com/apache/spark/commit/89a426374c0873c48e738d96d0f46f99b6e39f6d).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org