You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/09/14 13:26:14 UTC

[GitHub] [spark] AngersZhuuuu opened a new pull request #33993: [SPARK-36742][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

AngersZhuuuu opened a new pull request #33993:
URL: https://github.com/apache/spark/pull/33993


   ### What changes were proposed in this pull request?
   For query
   ```
   select array_distinct(array(cast('nan' as double), cast('nan' as double)))
   ```
   This returns [NaN, NaN], but it should return [NaN].
   This issue is caused by `OpenHashSet` can't handle `Double.NaN` and `Float.NaN` too.
   In this pr fix this based on https://github.com/apache/spark/pull/33955
   
   
   ### Why are the changes needed?
   Fix bug
   
   ### Does this PR introduce _any_ user-facing change?
   ArrayUnion won't show duplicated `NaN` value
   
   
   ### How was this patch tested?
   Added UT
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-919206586


   Kubernetes integration test unable to build dist.
   
   exiting with code: 1
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47765/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-919420666


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143262/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-920793709


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47851/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #33993:
URL: https://github.com/apache/spark/pull/33993#discussion_r708454521



##########
File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala
##########
@@ -2326,4 +2326,13 @@ class CollectionExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper
       Literal.create(Seq(Float.NaN, null, 1f), ArrayType(FloatType))),
       Seq(Float.NaN, null, 1f))
   }
+
+  test("SPARK-36740: ArrayDistinct should handle duplicated Double.NaN and Float.Nan") {

Review comment:
       SPARK-36741 instead of SPARK-36740?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-920683880


   **[Test build #143342 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143342/testReport)** for PR 33993 at commit [`2478eb4`](https://github.com/apache/spark/commit/2478eb446f8a47bc983661dbfd7801c7f6fe2230).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-921475884


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47895/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-921420738


   **[Test build #143385 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143385/testReport)** for PR 33993 at commit [`d8e80fc`](https://github.com/apache/spark/commit/d8e80fc567f2524075cd6ee23787e47bca480c4f).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-921544035


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47902/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-921085976


   **[Test build #143360 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143360/testReport)** for PR 33993 at commit [`389c9fd`](https://github.com/apache/spark/commit/389c9fd70b6cdff1df09aa90735844558ddef35b).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-920797913


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47852/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-920331170


   **[Test build #143307 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143307/testReport)** for PR 33993 at commit [`f5c5452`](https://github.com/apache/spark/commit/f5c54527905343d599d760e76a03a435962f8d1a).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-920803835


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47852/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AngersZhuuuu commented on a change in pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
AngersZhuuuu commented on a change in pull request #33993:
URL: https://github.com/apache/spark/pull/33993#discussion_r710753893



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##########
@@ -3410,32 +3410,59 @@ case class ArrayDistinct(child: Expression)
   }
 
   override def nullSafeEval(array: Any): Any = {
-    val data = array.asInstanceOf[ArrayData].toArray[AnyRef](elementType)
+    val data = array.asInstanceOf[ArrayData]
     doEvaluation(data)
   }
 
   @transient private lazy val doEvaluation = if (TypeUtils.typeWithProperEquals(elementType)) {
-    (data: Array[AnyRef]) => new GenericArrayData(data.distinct.asInstanceOf[Array[Any]])
+    (array: ArrayData) =>
+      val arrayBuffer = new scala.collection.mutable.ArrayBuffer[Any]
+      val hs = new SQLOpenHashSet[Any]()
+      val withNaNCheckFunc = SQLOpenHashSet.withNaNCheckFunc(elementType, hs,
+        (value: Any) =>
+          if (!hs.contains(value)) {
+            if (arrayBuffer.size > ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH) {
+              ArrayBinaryLike.throwUnionLengthOverflowException(arrayBuffer.size)
+            }
+            arrayBuffer += value
+            hs.add(value)
+          },
+        (value: Any) => arrayBuffer += value)

Review comment:
       Done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-921672405


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143391/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-921497259


   **[Test build #143391 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143391/testReport)** for PR 33993 at commit [`434a2be`](https://github.com/apache/spark/commit/434a2beb096025db1e9363d73b1ff4a7ffab7c61).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-921441449


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47892/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-921085976


   **[Test build #143360 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143360/testReport)** for PR 33993 at commit [`389c9fd`](https://github.com/apache/spark/commit/389c9fd70b6cdff1df09aa90735844558ddef35b).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-921773222


   thanks, merging to master/3.2/3.1/3.0!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AngersZhuuuu commented on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
AngersZhuuuu commented on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-920069452


   ping @cloud-fan @dongjoon-hyun 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-920757770


   **[Test build #143346 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143346/testReport)** for PR 33993 at commit [`5546a55`](https://github.com/apache/spark/commit/5546a55bdba6855c3a229a2b44778c1240b71bc1).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AngersZhuuuu commented on a change in pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
AngersZhuuuu commented on a change in pull request #33993:
URL: https://github.com/apache/spark/pull/33993#discussion_r709947109



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/util/SQLOpenHashSet.scala
##########
@@ -60,21 +60,52 @@ class SQLOpenHashSet[@specialized(Long, Int, Double, Float) T: ClassTag](
 }
 
 object SQLOpenHashSet {
-  def isNaN(dataType: DataType): Any => Boolean = {
+  def isNaNFuncAndValueNaN(dataType: DataType): (Any => Boolean, Any) = {
     dataType match {
       case DoubleType =>
-        (value: Any) => java.lang.Double.isNaN(value.asInstanceOf[java.lang.Double])
+        ((value: Any) => java.lang.Double.isNaN(value.asInstanceOf[java.lang.Double]),
+          java.lang.Double.NaN)
       case FloatType =>
-        (value: Any) => java.lang.Float.isNaN(value.asInstanceOf[java.lang.Float])
-      case _ => (_: Any) => false
+        ((value: Any) => java.lang.Float.isNaN(value.asInstanceOf[java.lang.Float]),
+          java.lang.Float.NaN)
+      case _ => ((_: Any) => false, null)
     }
   }
 
-  def valueNaN(dataType: DataType): Any = {
+  def isNaNFuncAndValueNaN(dataType: DataType, valueName: String): Option[(String, String)] = {

Review comment:
       Yea, Done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-920971148


   **[Test build #143347 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143347/testReport)** for PR 33993 at commit [`72870f6`](https://github.com/apache/spark/commit/72870f6350d870b4c2eb71c70be962affd5a237a).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-921112877


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47867/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AngersZhuuuu commented on a change in pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
AngersZhuuuu commented on a change in pull request #33993:
URL: https://github.com/apache/spark/pull/33993#discussion_r710700103



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##########
@@ -3410,32 +3410,60 @@ case class ArrayDistinct(child: Expression)
   }
 
   override def nullSafeEval(array: Any): Any = {
-    val data = array.asInstanceOf[ArrayData].toArray[AnyRef](elementType)
+    val data = array.asInstanceOf[ArrayData]
     doEvaluation(data)
   }
 
   @transient private lazy val doEvaluation = if (TypeUtils.typeWithProperEquals(elementType)) {
-    (data: Array[AnyRef]) => new GenericArrayData(data.distinct.asInstanceOf[Array[Any]])
+    (array: ArrayData) =>
+      val arrayBuffer = new scala.collection.mutable.ArrayBuffer[Any]
+      val hs = new SQLOpenHashSet[Any]()
+      val (isNaN, valueNaN) = SQLOpenHashSet.isNaNFuncAndValueNaN(elementType)

Review comment:
       > `isNaN` and `valueNaN` are only used by the next function call `withNaNCheckFunc`, we can calculate them in `withNaNCheckFunc`
   
   Yea, after  make it return a partial function, the value will be calculated once. Done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-921588012


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143385/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #33993:
URL: https://github.com/apache/spark/pull/33993#discussion_r709247445



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##########
@@ -3491,17 +3521,41 @@ case class ArrayDistinct(child: Expression)
             body
           }
 
-        val processArray = withArrayNullAssignment(
+        def withNaNCheck(body: String): String = {

Review comment:
       shall we move these codegen utils to `SQLOpenHashSet` as well to reduce duplicated code?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan closed pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
cloud-fan closed pull request #33993:
URL: https://github.com/apache/spark/pull/33993


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-920972806


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143347/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-920771530


   **[Test build #143347 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143347/testReport)** for PR 33993 at commit [`72870f6`](https://github.com/apache/spark/commit/72870f6350d870b4c2eb71c70be962affd5a237a).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-920837653


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143346/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-920139835


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47809/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-919166595


   **[Test build #143262 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143262/testReport)** for PR 33993 at commit [`0ac9924`](https://github.com/apache/spark/commit/0ac9924723944acf1b0e320e6bda5be264e98651).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-921670874


   **[Test build #143391 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143391/testReport)** for PR 33993 at commit [`434a2be`](https://github.com/apache/spark/commit/434a2beb096025db1e9363d73b1ff4a7ffab7c61).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-920836860


   **[Test build #143346 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143346/testReport)** for PR 33993 at commit [`5546a55`](https://github.com/apache/spark/commit/5546a55bdba6855c3a229a2b44778c1240b71bc1).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-920332893


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143307/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-920734541


   **[Test build #143345 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143345/testReport)** for PR 33993 at commit [`202cf4e`](https://github.com/apache/spark/commit/202cf4e966d24b063e2fe2d7c9179c42eb6257c3).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-921588012


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143385/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #33993:
URL: https://github.com/apache/spark/pull/33993#discussion_r710697090



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##########
@@ -3410,32 +3410,60 @@ case class ArrayDistinct(child: Expression)
   }
 
   override def nullSafeEval(array: Any): Any = {
-    val data = array.asInstanceOf[ArrayData].toArray[AnyRef](elementType)
+    val data = array.asInstanceOf[ArrayData]
     doEvaluation(data)
   }
 
   @transient private lazy val doEvaluation = if (TypeUtils.typeWithProperEquals(elementType)) {
-    (data: Array[AnyRef]) => new GenericArrayData(data.distinct.asInstanceOf[Array[Any]])
+    (array: ArrayData) =>
+      val arrayBuffer = new scala.collection.mutable.ArrayBuffer[Any]
+      val hs = new SQLOpenHashSet[Any]()
+      val (isNaN, valueNaN) = SQLOpenHashSet.isNaNFuncAndValueNaN(elementType)

Review comment:
       `isNaN` and `valueNaN` are only used by the next function call `withNaNCheckFunc`, we can calculate them in `withNaNCheckFunc`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-921628160


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143387/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-919420666


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143262/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-920771959


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47850/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-921445437


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47892/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-920874071


   **[Test build #143342 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143342/testReport)** for PR 33993 at commit [`2478eb4`](https://github.com/apache/spark/commit/2478eb446f8a47bc983661dbfd7801c7f6fe2230).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-920147834


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47809/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-920723076


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47848/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-920730752


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47848/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-920773099


   **[Test build #143345 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143345/testReport)** for PR 33993 at commit [`202cf4e`](https://github.com/apache/spark/commit/202cf4e966d24b063e2fe2d7c9179c42eb6257c3).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-920773210


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143345/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-921123904


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47867/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-921626552


   **[Test build #143387 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143387/testReport)** for PR 33993 at commit [`21e9422`](https://github.com/apache/spark/commit/21e9422d9cd83590ba3b9935ef6596b82d5f9b4e).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-921123855


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47867/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-921282041


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143360/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AngersZhuuuu commented on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
AngersZhuuuu commented on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-921675421


   ping @cloud-fan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-921461082


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47895/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-921534626


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47902/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-920332893


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143307/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-921475961


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47895/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-921544035


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47902/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-921497259


   **[Test build #143391 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143391/testReport)** for PR 33993 at commit [`434a2be`](https://github.com/apache/spark/commit/434a2beb096025db1e9363d73b1ff4a7ffab7c61).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-921451536


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47894/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-921434154


   **[Test build #143387 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143387/testReport)** for PR 33993 at commit [`21e9422`](https://github.com/apache/spark/commit/21e9422d9cd83590ba3b9935ef6596b82d5f9b4e).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-921628160


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143387/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-919535649


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143267/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-920788615


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47851/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-921280951


   **[Test build #143360 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143360/testReport)** for PR 33993 at commit [`389c9fd`](https://github.com/apache/spark/commit/389c9fd70b6cdff1df09aa90735844558ddef35b).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-921123904


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47867/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-921475961


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47895/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-920875649


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143342/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-921445098


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47894/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-921467329


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47894/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #33993:
URL: https://github.com/apache/spark/pull/33993#discussion_r710752639



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##########
@@ -3410,32 +3410,59 @@ case class ArrayDistinct(child: Expression)
   }
 
   override def nullSafeEval(array: Any): Any = {
-    val data = array.asInstanceOf[ArrayData].toArray[AnyRef](elementType)
+    val data = array.asInstanceOf[ArrayData]
     doEvaluation(data)
   }
 
   @transient private lazy val doEvaluation = if (TypeUtils.typeWithProperEquals(elementType)) {
-    (data: Array[AnyRef]) => new GenericArrayData(data.distinct.asInstanceOf[Array[Any]])
+    (array: ArrayData) =>
+      val arrayBuffer = new scala.collection.mutable.ArrayBuffer[Any]
+      val hs = new SQLOpenHashSet[Any]()
+      val withNaNCheckFunc = SQLOpenHashSet.withNaNCheckFunc(elementType, hs,
+        (value: Any) =>
+          if (!hs.contains(value)) {
+            if (arrayBuffer.size > ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH) {
+              ArrayBinaryLike.throwUnionLengthOverflowException(arrayBuffer.size)
+            }
+            arrayBuffer += value
+            hs.add(value)
+          },
+        (value: Any) => arrayBuffer += value)

Review comment:
       nit: name it `valueNaN` to be clear




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-921434154


   **[Test build #143387 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143387/testReport)** for PR 33993 at commit [`21e9422`](https://github.com/apache/spark/commit/21e9422d9cd83590ba3b9935ef6596b82d5f9b4e).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-919366104


   Kubernetes integration test unable to build dist.
   
   exiting with code: 1
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47770/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-919374713


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47770/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-920765589


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47850/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-920972806


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143347/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-920837653


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143346/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-920757770


   **[Test build #143346 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143346/testReport)** for PR 33993 at commit [`5546a55`](https://github.com/apache/spark/commit/5546a55bdba6855c3a229a2b44778c1240b71bc1).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-920793747


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47851/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AngersZhuuuu commented on a change in pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
AngersZhuuuu commented on a change in pull request #33993:
URL: https://github.com/apache/spark/pull/33993#discussion_r709250978



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##########
@@ -3491,17 +3521,41 @@ case class ArrayDistinct(child: Expression)
             body
           }
 
-        val processArray = withArrayNullAssignment(
+        def withNaNCheck(body: String): String = {

Review comment:
       > shall we move these codegen utils to `SQLOpenHashSet` as well to reduce duplicated code?
   
   How about to do this after all done.

##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##########
@@ -3491,17 +3521,41 @@ case class ArrayDistinct(child: Expression)
             body
           }
 
-        val processArray = withArrayNullAssignment(
+        def withNaNCheck(body: String): String = {

Review comment:
       > shall we move these codegen utils to `SQLOpenHashSet` as well to reduce duplicated code?
   
   How about to do this after all done. Not only this can be refactored.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-920088524


   **[Test build #143307 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143307/testReport)** for PR 33993 at commit [`f5c5452`](https://github.com/apache/spark/commit/f5c54527905343d599d760e76a03a435962f8d1a).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-920773210


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143345/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-920734541


   **[Test build #143345 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143345/testReport)** for PR 33993 at commit [`202cf4e`](https://github.com/apache/spark/commit/202cf4e966d24b063e2fe2d7c9179c42eb6257c3).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33993: [SPARK-36742][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-919166595


   **[Test build #143262 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143262/testReport)** for PR 33993 at commit [`0ac9924`](https://github.com/apache/spark/commit/0ac9924723944acf1b0e320e6bda5be264e98651).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-919535649


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143267/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-921445457


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47892/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-920803835


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47852/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-920875649


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143342/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-920771917


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47850/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AngersZhuuuu commented on a change in pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
AngersZhuuuu commented on a change in pull request #33993:
URL: https://github.com/apache/spark/pull/33993#discussion_r709243579



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##########
@@ -3410,32 +3410,63 @@ case class ArrayDistinct(child: Expression)
   }
 
   override def nullSafeEval(array: Any): Any = {
-    val data = array.asInstanceOf[ArrayData].toArray[AnyRef](elementType)
+    val data = array.asInstanceOf[ArrayData]
     doEvaluation(data)
   }
 
   @transient private lazy val doEvaluation = if (TypeUtils.typeWithProperEquals(elementType)) {
-    (data: Array[AnyRef]) => new GenericArrayData(data.distinct.asInstanceOf[Array[Any]])
+    (array: ArrayData) =>
+      val arrayBuffer = new scala.collection.mutable.ArrayBuffer[Any]
+      val hs = new SQLOpenHashSet[Any]()
+      val isNaN = SQLOpenHashSet.isNaN(elementType)
+      var i = 0
+      while (i < array.numElements()) {
+        if (array.isNullAt(i)) {
+          if (!hs.containsNull) {
+            hs.addNull
+            arrayBuffer += null
+          }
+        } else {
+          val elem = array.get(i, elementType)
+          if (isNaN(elem)) {
+            if (!hs.containsNaN) {
+              arrayBuffer += elem

Review comment:
       > For this, let's wait for the decision at the first PR.
   > 
   > * https://github.com/apache/spark/pull/33955/files#r708570515
   
   Done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-921282041


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143360/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #33993:
URL: https://github.com/apache/spark/pull/33993#discussion_r708571227



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##########
@@ -3410,32 +3410,63 @@ case class ArrayDistinct(child: Expression)
   }
 
   override def nullSafeEval(array: Any): Any = {
-    val data = array.asInstanceOf[ArrayData].toArray[AnyRef](elementType)
+    val data = array.asInstanceOf[ArrayData]
     doEvaluation(data)
   }
 
   @transient private lazy val doEvaluation = if (TypeUtils.typeWithProperEquals(elementType)) {
-    (data: Array[AnyRef]) => new GenericArrayData(data.distinct.asInstanceOf[Array[Any]])
+    (array: ArrayData) =>
+      val arrayBuffer = new scala.collection.mutable.ArrayBuffer[Any]
+      val hs = new SQLOpenHashSet[Any]()
+      val isNaN = SQLOpenHashSet.isNaN(elementType)
+      var i = 0
+      while (i < array.numElements()) {
+        if (array.isNullAt(i)) {
+          if (!hs.containsNull) {
+            hs.addNull
+            arrayBuffer += null
+          }
+        } else {
+          val elem = array.get(i, elementType)
+          if (isNaN(elem)) {
+            if (!hs.containsNaN) {
+              arrayBuffer += elem

Review comment:
       For this, let's wait for the decision at the first PR.
   - https://github.com/apache/spark/pull/33955/files#r708570515




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #33993:
URL: https://github.com/apache/spark/pull/33993#discussion_r709942031



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/util/SQLOpenHashSet.scala
##########
@@ -60,21 +60,52 @@ class SQLOpenHashSet[@specialized(Long, Int, Double, Float) T: ClassTag](
 }
 
 object SQLOpenHashSet {
-  def isNaN(dataType: DataType): Any => Boolean = {
+  def isNaNFuncAndValueNaN(dataType: DataType): (Any => Boolean, Any) = {
     dataType match {
       case DoubleType =>
-        (value: Any) => java.lang.Double.isNaN(value.asInstanceOf[java.lang.Double])
+        ((value: Any) => java.lang.Double.isNaN(value.asInstanceOf[java.lang.Double]),
+          java.lang.Double.NaN)
       case FloatType =>
-        (value: Any) => java.lang.Float.isNaN(value.asInstanceOf[java.lang.Float])
-      case _ => (_: Any) => false
+        ((value: Any) => java.lang.Float.isNaN(value.asInstanceOf[java.lang.Float]),
+          java.lang.Float.NaN)
+      case _ => ((_: Any) => false, null)
     }
   }
 
-  def valueNaN(dataType: DataType): Any = {
+  def isNaNFuncAndValueNaN(dataType: DataType, valueName: String): Option[(String, String)] = {

Review comment:
       seems we can remove it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-920771530


   **[Test build #143347 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143347/testReport)** for PR 33993 at commit [`72870f6`](https://github.com/apache/spark/commit/72870f6350d870b4c2eb71c70be962affd5a237a).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #33993:
URL: https://github.com/apache/spark/pull/33993#discussion_r710270899



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/util/SQLOpenHashSet.scala
##########
@@ -60,21 +60,59 @@ class SQLOpenHashSet[@specialized(Long, Int, Double, Float) T: ClassTag](
 }
 
 object SQLOpenHashSet {
-  def isNaN(dataType: DataType): Any => Boolean = {
+  def isNaNFuncAndValueNaN(dataType: DataType): (Any => Boolean, Any) = {
     dataType match {
       case DoubleType =>
-        (value: Any) => java.lang.Double.isNaN(value.asInstanceOf[java.lang.Double])
+        ((value: Any) => java.lang.Double.isNaN(value.asInstanceOf[java.lang.Double]),
+          java.lang.Double.NaN)
       case FloatType =>
-        (value: Any) => java.lang.Float.isNaN(value.asInstanceOf[java.lang.Float])
-      case _ => (_: Any) => false
+        ((value: Any) => java.lang.Float.isNaN(value.asInstanceOf[java.lang.Float]),
+          java.lang.Float.NaN)
+      case _ => ((_: Any) => false, null)
     }
   }
 
-  def valueNaN(dataType: DataType): Any = {
-    dataType match {
-      case DoubleType => java.lang.Double.NaN
-      case FloatType => java.lang.Float.NaN
-      case _ => null
+  def withNaNCheckFunc(
+      isNaN: Any => Boolean,
+      valueNaN: Any,
+      value: Any,
+      hashSet: SQLOpenHashSet[Any],
+      handleNotNaN: () => Unit,
+      handleNaN: Any => Unit): Unit = {

Review comment:
       can we make it return a function?
   ```
   def withNaNCheckFunc(
       dataType: DataType,
       hashSet: SQLOpenHashSet[Any],
       handleNotNaN: Any => Unit,
       handleNaN: Any => Unit): Any => Unit = {
     (dataType match {
       case FloatType => ...
       case DoubleType => ...
     }).map { case (isNaN, valueNaN) =>
       (value: Any) => ...
     }.getOrElse {
       (value: Any) => handleNonNaN(value)
     }
   }
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AngersZhuuuu commented on a change in pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
AngersZhuuuu commented on a change in pull request #33993:
URL: https://github.com/apache/spark/pull/33993#discussion_r710317852



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/util/SQLOpenHashSet.scala
##########
@@ -60,21 +60,59 @@ class SQLOpenHashSet[@specialized(Long, Int, Double, Float) T: ClassTag](
 }
 
 object SQLOpenHashSet {
-  def isNaN(dataType: DataType): Any => Boolean = {
+  def isNaNFuncAndValueNaN(dataType: DataType): (Any => Boolean, Any) = {
     dataType match {
       case DoubleType =>
-        (value: Any) => java.lang.Double.isNaN(value.asInstanceOf[java.lang.Double])
+        ((value: Any) => java.lang.Double.isNaN(value.asInstanceOf[java.lang.Double]),
+          java.lang.Double.NaN)
       case FloatType =>
-        (value: Any) => java.lang.Float.isNaN(value.asInstanceOf[java.lang.Float])
-      case _ => (_: Any) => false
+        ((value: Any) => java.lang.Float.isNaN(value.asInstanceOf[java.lang.Float]),
+          java.lang.Float.NaN)
+      case _ => ((_: Any) => false, null)
     }
   }
 
-  def valueNaN(dataType: DataType): Any = {
-    dataType match {
-      case DoubleType => java.lang.Double.NaN
-      case FloatType => java.lang.Float.NaN
-      case _ => null
+  def withNaNCheckFunc(
+      isNaN: Any => Boolean,
+      valueNaN: Any,
+      value: Any,
+      hashSet: SQLOpenHashSet[Any],
+      handleNotNaN: () => Unit,
+      handleNaN: Any => Unit): Unit = {

Review comment:
       Good suggestion. Done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AngersZhuuuu commented on a change in pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
AngersZhuuuu commented on a change in pull request #33993:
URL: https://github.com/apache/spark/pull/33993#discussion_r709883506



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##########
@@ -3491,17 +3521,41 @@ case class ArrayDistinct(child: Expression)
             body
           }
 
-        val processArray = withArrayNullAssignment(
+        def withNaNCheck(body: String): String = {

Review comment:
       @cloud-fan Updated, How about current?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-919220064


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47765/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-920683880


   **[Test build #143342 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143342/testReport)** for PR 33993 at commit [`2478eb4`](https://github.com/apache/spark/commit/2478eb446f8a47bc983661dbfd7801c7f6fe2230).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-921586672


   **[Test build #143385 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143385/testReport)** for PR 33993 at commit [`d8e80fc`](https://github.com/apache/spark/commit/d8e80fc567f2524075cd6ee23787e47bca480c4f).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-920730752


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47848/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-920803802


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47852/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-920793747


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47851/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-919418628


   **[Test build #143262 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143262/testReport)** for PR 33993 at commit [`0ac9924`](https://github.com/apache/spark/commit/0ac9924723944acf1b0e320e6bda5be264e98651).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-921420738


   **[Test build #143385 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143385/testReport)** for PR 33993 at commit [`d8e80fc`](https://github.com/apache/spark/commit/d8e80fc567f2524075cd6ee23787e47bca480c4f).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-919338900


   **[Test build #143267 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143267/testReport)** for PR 33993 at commit [`63763df`](https://github.com/apache/spark/commit/63763dfa8518d4cdd3b90755f2502354363a2493).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-921530458


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47902/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-919374713


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47770/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-920147939


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47809/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AngersZhuuuu commented on a change in pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
AngersZhuuuu commented on a change in pull request #33993:
URL: https://github.com/apache/spark/pull/33993#discussion_r708461715



##########
File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala
##########
@@ -2326,4 +2326,13 @@ class CollectionExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper
       Literal.create(Seq(Float.NaN, null, 1f), ArrayType(FloatType))),
       Seq(Float.NaN, null, 1f))
   }
+
+  test("SPARK-36740: ArrayDistinct should handle duplicated Double.NaN and Float.Nan") {

Review comment:
       Done...




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-919534589


   **[Test build #143267 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143267/testReport)** for PR 33993 at commit [`63763df`](https://github.com/apache/spark/commit/63763dfa8518d4cdd3b90755f2502354363a2493).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-919338900


   **[Test build #143267 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143267/testReport)** for PR 33993 at commit [`63763df`](https://github.com/apache/spark/commit/63763dfa8518d4cdd3b90755f2502354363a2493).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-919220064


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47765/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-921672405


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143391/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-920771959


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47850/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-920088524


   **[Test build #143307 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143307/testReport)** for PR 33993 at commit [`f5c5452`](https://github.com/apache/spark/commit/f5c54527905343d599d760e76a03a435962f8d1a).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-921467329


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47894/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-920730673


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47848/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-920147939


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47809/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33993: [SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33993:
URL: https://github.com/apache/spark/pull/33993#issuecomment-921445457


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47892/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org