You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/09/14 14:41:41 UTC

[GitHub] [spark] AngersZhuuuu opened a new pull request #33995: [SPARK-36754][SQL] ArrayIntersect handle duplicated Double.NaN and Float.NaN

AngersZhuuuu opened a new pull request #33995:
URL: https://github.com/apache/spark/pull/33995


   ### What changes were proposed in this pull request?
   For query
   ```
   select array_intersect(array(cast('nan' as double), 1d), array(cast('nan' as double)))
   ```
   This returns [NaN], but it should return [].
   This issue is caused by `OpenHashSet` can't handle `Double.NaN` and `Float.NaN` too.
   In this pr fix this based on https://github.com/apache/spark/pull/33955
   
   
   ### Why are the changes needed?
   Fix bug
   
   ### Does this PR introduce _any_ user-facing change?
   ArrayIntersect won't show equal `NaN` value
   
   
   ### How was this patch tested?
   Added UT
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] karenfeng commented on pull request #33995: [SPARK-36754][SQL] ArrayIntersect handle duplicated Double.NaN and Float.NaN

Posted by GitBox <gi...@apache.org>.
karenfeng commented on pull request #33995:
URL: https://github.com/apache/spark/pull/33995#issuecomment-928263469


   For clarification @AngersZhuuuu: the PR description says:
   
   > For query
   > ```
   > select array_intersect(array(cast('nan' as double), 1d), array(cast('nan' as double)))
   > ```
   > This returns [NaN], but it should return [].
   
   Is this the right way around? It seems like we now correctly return `[NaN]`, but previously incorrectly returned `[]`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33995: [SPARK-36754][SQL] ArrayIntersect handle duplicated Double.NaN and Float.NaN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33995:
URL: https://github.com/apache/spark/pull/33995#issuecomment-921984203


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143412/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33995: [SPARK-36754][SQL] ArrayIntersect handle duplicated Double.NaN and Float.NaN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33995:
URL: https://github.com/apache/spark/pull/33995#issuecomment-919460993


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143265/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33995: [SPARK-36754][SQL] ArrayIntersect handle duplicated Double.NaN and Float.NaN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33995:
URL: https://github.com/apache/spark/pull/33995#issuecomment-921984203


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143412/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33995: [SPARK-36754][SQL] ArrayIntersect handle duplicated Double.NaN and Float.NaN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33995:
URL: https://github.com/apache/spark/pull/33995#issuecomment-921520816


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47904/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33995: [SPARK-36754][SQL] ArrayIntersect handle duplicated Double.NaN and Float.NaN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33995:
URL: https://github.com/apache/spark/pull/33995#issuecomment-920326547


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143308/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33995: [SPARK-36754][SQL] ArrayIntersect handle duplicated Double.NaN and Float.NaN

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33995:
URL: https://github.com/apache/spark/pull/33995#issuecomment-920324098


   **[Test build #143308 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143308/testReport)** for PR 33995 at commit [`64afef9`](https://github.com/apache/spark/commit/64afef9c444ccb68e54991afba17f5a7bc59b631).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan closed pull request #33995: [SPARK-36754][SQL] ArrayIntersect handle duplicated Double.NaN and Float.NaN

Posted by GitBox <gi...@apache.org>.
cloud-fan closed pull request #33995:
URL: https://github.com/apache/spark/pull/33995


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AngersZhuuuu commented on pull request #33995: [SPARK-36754][SQL] ArrayIntersect handle duplicated Double.NaN and Float.NaN

Posted by GitBox <gi...@apache.org>.
AngersZhuuuu commented on pull request #33995:
URL: https://github.com/apache/spark/pull/33995#issuecomment-921776994


   ping @cloud-fan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33995: [SPARK-36754][SQL] ArrayIntersect handle duplicated Double.NaN and Float.NaN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33995:
URL: https://github.com/apache/spark/pull/33995#issuecomment-920326547


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143308/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33995: [SPARK-36754][SQL] ArrayIntersect handle duplicated Double.NaN and Float.NaN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33995:
URL: https://github.com/apache/spark/pull/33995#issuecomment-921833299


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47919/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33995: [SPARK-36754][SQL] ArrayIntersect handle duplicated Double.NaN and Float.NaN

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33995:
URL: https://github.com/apache/spark/pull/33995#issuecomment-921520789


   Kubernetes integration test unable to build dist.
   
   exiting with code: 1
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47904/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AngersZhuuuu commented on pull request #33995: [SPARK-36754][SQL] ArrayIntersect handle duplicated Double.NaN and Float.NaN

Posted by GitBox <gi...@apache.org>.
AngersZhuuuu commented on pull request #33995:
URL: https://github.com/apache/spark/pull/33995#issuecomment-928951346


   > For clarification @AngersZhuuuu: the PR description says:
   > 
   > > For query
   > > ```
   > > select array_intersect(array(cast('nan' as double), 1d), array(cast('nan' as double)))
   > > ```
   > > 
   > > 
   > >     
   > >       
   > >     
   > > 
   > >       
   > >     
   > > 
   > >     
   > >   
   > > This returns [NaN], but it should return [].
   > 
   > Is this the right way around? It seems like we now correctly return `[NaN]`, but previously incorrectly returned `[]`.
   
   Oh, sorry for the mistake. Correct is we should return [NaN]


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] karenfeng commented on pull request #33995: [SPARK-36754][SQL] ArrayIntersect handle duplicated Double.NaN and Float.NaN

Posted by GitBox <gi...@apache.org>.
karenfeng commented on pull request #33995:
URL: https://github.com/apache/spark/pull/33995#issuecomment-928263469


   For clarification @AngersZhuuuu: the PR description says:
   
   > For query
   > ```
   > select array_intersect(array(cast('nan' as double), 1d), array(cast('nan' as double)))
   > ```
   > This returns [NaN], but it should return [].
   
   Is this the right way around? It seems like we now correctly return `[NaN]`, but previously incorrectly returned `[]`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #33995: [SPARK-36754][SQL] ArrayIntersect handle duplicated Double.NaN and Float.NaN

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on pull request #33995:
URL: https://github.com/apache/spark/pull/33995#issuecomment-922740969


   thanks, merging to master/3.2/3.1/3.0!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33995: [SPARK-36754][SQL] ArrayIntersect handle duplicated Double.NaN and Float.NaN

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33995:
URL: https://github.com/apache/spark/pull/33995#issuecomment-919261621


   Kubernetes integration test unable to build dist.
   
   exiting with code: 1
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47768/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AngersZhuuuu commented on pull request #33995: [SPARK-36754][SQL] ArrayIntersect handle duplicated Double.NaN and Float.NaN

Posted by GitBox <gi...@apache.org>.
AngersZhuuuu commented on pull request #33995:
URL: https://github.com/apache/spark/pull/33995#issuecomment-920605753


   ping @cloud-fan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33995: [SPARK-36754][SQL] ArrayIntersect handle duplicated Double.NaN and Float.NaN

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33995:
URL: https://github.com/apache/spark/pull/33995#issuecomment-919226400


   **[Test build #143265 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143265/testReport)** for PR 33995 at commit [`4189d71`](https://github.com/apache/spark/commit/4189d713d88faec170469cd97fde35c4e3cdcdf7).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33995: [SPARK-36754][SQL] ArrayIntersect handle duplicated Double.NaN and Float.NaN

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33995:
URL: https://github.com/apache/spark/pull/33995#issuecomment-920148715


   Kubernetes integration test unable to build dist.
   
   exiting with code: 1
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47811/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #33995: [SPARK-36754][SQL] ArrayIntersect handle duplicated Double.NaN and Float.NaN

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33995:
URL: https://github.com/apache/spark/pull/33995#issuecomment-921518420


   **[Test build #143396 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143396/testReport)** for PR 33995 at commit [`a9e6205`](https://github.com/apache/spark/commit/a9e620556fcbc61088735581d52645b021881a00).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33995: [SPARK-36754][SQL] ArrayIntersect handle duplicated Double.NaN and Float.NaN

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33995:
URL: https://github.com/apache/spark/pull/33995#issuecomment-921825059


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47919/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #33995: [SPARK-36754][SQL] ArrayIntersect handle duplicated Double.NaN and Float.NaN

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33995:
URL: https://github.com/apache/spark/pull/33995#issuecomment-921777299


   **[Test build #143412 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143412/testReport)** for PR 33995 at commit [`85f9d9d`](https://github.com/apache/spark/commit/85f9d9d3c3d594b7d68875fcad2be0e5c60a1780).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33995: [SPARK-36754][SQL] ArrayIntersect handle duplicated Double.NaN and Float.NaN

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33995:
URL: https://github.com/apache/spark/pull/33995#issuecomment-921833265


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47919/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33995: [SPARK-36754][SQL] ArrayIntersect handle duplicated Double.NaN and Float.NaN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33995:
URL: https://github.com/apache/spark/pull/33995#issuecomment-921520816


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47904/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33995: [SPARK-36754][SQL] ArrayIntersect handle duplicated Double.NaN and Float.NaN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33995:
URL: https://github.com/apache/spark/pull/33995#issuecomment-920148760


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47811/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #33995: [SPARK-36754][SQL] ArrayIntersect handle duplicated Double.NaN and Float.NaN

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33995:
URL: https://github.com/apache/spark/pull/33995#issuecomment-919226400


   **[Test build #143265 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143265/testReport)** for PR 33995 at commit [`4189d71`](https://github.com/apache/spark/commit/4189d713d88faec170469cd97fde35c4e3cdcdf7).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AngersZhuuuu commented on pull request #33995: [SPARK-36754][SQL] ArrayIntersect handle duplicated Double.NaN and Float.NaN

Posted by GitBox <gi...@apache.org>.
AngersZhuuuu commented on pull request #33995:
URL: https://github.com/apache/spark/pull/33995#issuecomment-928951346


   > For clarification @AngersZhuuuu: the PR description says:
   > 
   > > For query
   > > ```
   > > select array_intersect(array(cast('nan' as double), 1d), array(cast('nan' as double)))
   > > ```
   > > 
   > > 
   > >     
   > >       
   > >     
   > > 
   > >       
   > >     
   > > 
   > >     
   > >   
   > > This returns [NaN], but it should return [].
   > 
   > Is this the right way around? It seems like we now correctly return `[NaN]`, but previously incorrectly returned `[]`.
   
   Oh, sorry for the mistake. Correct is we should return [NaN]


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33995: [SPARK-36754][SQL] ArrayIntersect handle duplicated Double.NaN and Float.NaN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33995:
URL: https://github.com/apache/spark/pull/33995#issuecomment-919275297


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47768/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33995: [SPARK-36754][SQL] ArrayIntersect handle duplicated Double.NaN and Float.NaN

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33995:
URL: https://github.com/apache/spark/pull/33995#issuecomment-919458748


   **[Test build #143265 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143265/testReport)** for PR 33995 at commit [`4189d71`](https://github.com/apache/spark/commit/4189d713d88faec170469cd97fde35c4e3cdcdf7).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33995: [SPARK-36754][SQL] ArrayIntersect handle duplicated Double.NaN and Float.NaN

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33995:
URL: https://github.com/apache/spark/pull/33995#issuecomment-921582901


   **[Test build #143396 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143396/testReport)** for PR 33995 at commit [`a9e6205`](https://github.com/apache/spark/commit/a9e620556fcbc61088735581d52645b021881a00).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33995: [SPARK-36754][SQL] ArrayIntersect handle duplicated Double.NaN and Float.NaN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33995:
URL: https://github.com/apache/spark/pull/33995#issuecomment-919275297


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47768/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33995: [SPARK-36754][SQL] ArrayIntersect handle duplicated Double.NaN and Float.NaN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33995:
URL: https://github.com/apache/spark/pull/33995#issuecomment-921833299


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47919/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #33995: [SPARK-36754][SQL] ArrayIntersect handle duplicated Double.NaN and Float.NaN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #33995:
URL: https://github.com/apache/spark/pull/33995#issuecomment-921584966


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143396/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33995: [SPARK-36754][SQL] ArrayIntersect handle duplicated Double.NaN and Float.NaN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33995:
URL: https://github.com/apache/spark/pull/33995#issuecomment-920148760


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47811/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #33995: [SPARK-36754][SQL] ArrayIntersect handle duplicated Double.NaN and Float.NaN

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #33995:
URL: https://github.com/apache/spark/pull/33995#issuecomment-920099215


   **[Test build #143308 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143308/testReport)** for PR 33995 at commit [`64afef9`](https://github.com/apache/spark/commit/64afef9c444ccb68e54991afba17f5a7bc59b631).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33995: [SPARK-36754][SQL] ArrayIntersect handle duplicated Double.NaN and Float.NaN

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33995:
URL: https://github.com/apache/spark/pull/33995#issuecomment-921982775


   **[Test build #143412 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143412/testReport)** for PR 33995 at commit [`85f9d9d`](https://github.com/apache/spark/commit/85f9d9d3c3d594b7d68875fcad2be0e5c60a1780).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds the following public classes _(experimental)_:
     * `case class WriterBucketSpec(`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33995: [SPARK-36754][SQL] ArrayIntersect handle duplicated Double.NaN and Float.NaN

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33995:
URL: https://github.com/apache/spark/pull/33995#issuecomment-920099215


   **[Test build #143308 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143308/testReport)** for PR 33995 at commit [`64afef9`](https://github.com/apache/spark/commit/64afef9c444ccb68e54991afba17f5a7bc59b631).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33995: [SPARK-36754][SQL] ArrayIntersect handle duplicated Double.NaN and Float.NaN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33995:
URL: https://github.com/apache/spark/pull/33995#issuecomment-921584966


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143396/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #33995: [SPARK-36754][SQL] ArrayIntersect handle duplicated Double.NaN and Float.NaN

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #33995:
URL: https://github.com/apache/spark/pull/33995#discussion_r708572365



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
##########
@@ -3826,32 +3826,43 @@ case class ArrayIntersect(left: Expression, right: Expression) extends ArrayBina
     if (TypeUtils.typeWithProperEquals(elementType)) {
       (array1, array2) =>
         if (array1.numElements() != 0 && array2.numElements() != 0) {
-          val hs = new OpenHashSet[Any]
-          val hsResult = new OpenHashSet[Any]
-          var foundNullElement = false
+          val hs = new SQLOpenHashSet[Any]
+          val hsResult = new SQLOpenHashSet[Any]
+          val isNaN = SQLOpenHashSet.isNaN(elementType)
           var i = 0
           while (i < array2.numElements()) {
             if (array2.isNullAt(i)) {
-              foundNullElement = true
+              hs.addNull()
             } else {
               val elem = array2.get(i, elementType)
-              hs.add(elem)
+              if (isNaN(elem)) {
+                hs.addNaN()
+              } else {
+                hs.add(elem)
+              }
             }
             i += 1
           }
           val arrayBuffer = new scala.collection.mutable.ArrayBuffer[Any]
           i = 0
           while (i < array1.numElements()) {
             if (array1.isNullAt(i)) {
-              if (foundNullElement) {
+              if (hs.containsNull() && !hsResult.containsNull()) {
                 arrayBuffer += null
-                foundNullElement = false
+                hsResult.addNull()
               }
             } else {
               val elem = array1.get(i, elementType)
-              if (hs.contains(elem) && !hsResult.contains(elem)) {
-                arrayBuffer += elem
-                hsResult.add(elem)
+              if (isNaN(elem)) {
+                if (hs.containsNaN() && !hsResult.containsNaN()) {
+                  arrayBuffer += elem

Review comment:
       For this, let's wait a little bit for the decision at the first PR.
   - https://github.com/apache/spark/pull/33955/files#r708570515




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33995: [SPARK-36754][SQL] ArrayIntersect handle duplicated Double.NaN and Float.NaN

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33995:
URL: https://github.com/apache/spark/pull/33995#issuecomment-921518420


   **[Test build #143396 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143396/testReport)** for PR 33995 at commit [`a9e6205`](https://github.com/apache/spark/commit/a9e620556fcbc61088735581d52645b021881a00).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #33995: [SPARK-36754][SQL] ArrayIntersect handle duplicated Double.NaN and Float.NaN

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #33995:
URL: https://github.com/apache/spark/pull/33995#issuecomment-921777299


   **[Test build #143412 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143412/testReport)** for PR 33995 at commit [`85f9d9d`](https://github.com/apache/spark/commit/85f9d9d3c3d594b7d68875fcad2be0e5c60a1780).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #33995: [SPARK-36754][SQL] ArrayIntersect handle duplicated Double.NaN and Float.NaN

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #33995:
URL: https://github.com/apache/spark/pull/33995#issuecomment-919460993


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143265/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org