You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by cloud-fan <gi...@git.apache.org> on 2018/12/05 15:58:33 UTC

[GitHub] spark pull request #23239: [SPARK-26021][SQL][followup] only deal with NaN a...

GitHub user cloud-fan opened a pull request:

    https://github.com/apache/spark/pull/23239

    [SPARK-26021][SQL][followup] only deal with NaN and -0.0 in UnsafeWriter

    ## What changes were proposed in this pull request?
    
    A followup of https://github.com/apache/spark/pull/23043
    
    There are 4 places we need to deal with NaN and -0.0:
    1. Range partitioner(the sorter). `-0.0` and `0.0` should be assigned to the same partition. Different NaNs should be assigned to the same partition.
    2. Join keys. `-0.0` and `0.0` should be treated as same. Different NaNs should be treated as same.
    3. grouping keys. `-0.0` and `0.0` should be assigned to the same group. Different NaNs should be assigned to the same group.
    4. comparison expressions. `-0.0` and `0.0` should be treated as same. Different NaNs should be treated as same.
    
    The case 4 is OK. Our comparison already handles NaN and -0.0, and for struct/array/map, we will recursively compare the fields/elements.
    
    Case 1, 2 and 3 are problematic, as they compare `UnsafeRow` binary directly, and different NaNs have different binary representation, and the same thing happens for -0.0 and 0.0.
    
    To fix it, a simple solution is: let `UnsafeProjection` always produce `UnsafeRow`s with NaN and -0.0 normalized(use the standard NaN and replace -0.0 with 0.0). The `UnsafeRow`s in case 1, 2 and 3 are all created by `UnsafeProjection`.
    
    Following this direction, this PR moves the handling of NaN and -0.0 from `Platform` to `UnsafeWriter`, so that places like `UnsafeRow.setFloat` will not handle them, which reduces the perf overhead. It's also easier to add comments explaining why we do it in `UnsafeWriter`.
    
    ## How was this patch tested?
    
    existing tests

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/cloud-fan/spark minor

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/23239.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #23239
    
----
commit 797ade3eb175c41866efbffa3cb4c30f90e49ca7
Author: Wenchen Fan <we...@...>
Date:   2018-12-05T15:05:39Z

    only deal with NaN and -0.0 in UnsafeWriter

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23239: [SPARK-26021][SQL][followup] only deal with NaN and -0.0...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23239
  
    **[Test build #99819 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99819/testReport)** for PR 23239 at commit [`b9371a6`](https://github.com/apache/spark/commit/b9371a6672b601bc630b6eb8b7c8f7d8a8d509c9).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23239: [SPARK-26021][SQL][followup] only deal with NaN and -0.0...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/23239
  
    cc @adoron @kiszk @viirya @gatorsmile 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23239: [SPARK-26021][SQL][followup] only deal with NaN and -0.0...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23239
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23239: [SPARK-26021][SQL][followup] only deal with NaN a...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23239#discussion_r239507673
  
    --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeWriter.java ---
    @@ -198,11 +198,45 @@ protected final void writeLong(long offset, long value) {
         Platform.putLong(getBuffer(), offset, value);
       }
     
    +  // We need to take care of NaN and -0.0 in several places:
    +  //   1. When compare values, different NaNs should be treated as same, `-0.0` and `0.0` should be
    +  //      treated as same.
    +  //   2. In range partitioner, different NaNs should belong to the same partition, -0.0 and 0.0
    --- End diff --
    
    It turns out this is not a problem. The doc of `RangePartitioning` is misleading. I'm updating the doc at https://github.com/apache/spark/pull/23249


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23239: [SPARK-26021][SQL][followup] only deal with NaN and -0.0...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23239
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23239: [SPARK-26021][SQL][followup] only deal with NaN and -0.0...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23239
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23239: [SPARK-26021][SQL][followup] only deal with NaN and -0.0...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23239
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23239: [SPARK-26021][SQL][followup] only deal with NaN and -0.0...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23239
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99820/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23239: [SPARK-26021][SQL][followup] only deal with NaN and -0.0...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/23239
  
    The change looks fine.
    Do we already have tests for cases 2 and 4?  We know test for case 3 is [here](https://github.com/apache/spark/pull/23043).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23239: [SPARK-26021][SQL][followup] only deal with NaN and -0.0...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23239
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23239: [SPARK-26021][SQL][followup] only deal with NaN and -0.0...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23239
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5847/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23239: [SPARK-26021][SQL][followup] only deal with NaN and -0.0...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23239
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99737/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23239: [SPARK-26021][SQL][followup] only deal with NaN and -0.0...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23239
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5838/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23239: [SPARK-26021][SQL][followup] only deal with NaN and -0.0...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23239
  
    **[Test build #99737 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99737/testReport)** for PR 23239 at commit [`797ade3`](https://github.com/apache/spark/commit/797ade3eb175c41866efbffa3cb4c30f90e49ca7).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23239: [SPARK-26021][SQL][followup] only deal with NaN and -0.0...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23239
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5850/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23239: [SPARK-26021][SQL][followup] only deal with NaN and -0.0...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/23239
  
    The migration guide has changed by another followup https://github.com/apache/spark/pull/23141:
    
    > In Spark version 2.4 and earlier, float/double -0.0 is semantically equal to 0.0, but users can still distinguish them via `Dataset.show`, `Dataset.collect` etc. Since Spark 3.0, float/double -0.0 is replaced by 0.0 internally, and users can't distinguish them any more.
    
    Is above still correct after this change?
    



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23239: [SPARK-26021][SQL][followup] only deal with NaN and -0.0...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23239
  
    **[Test build #99819 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99819/testReport)** for PR 23239 at commit [`b9371a6`](https://github.com/apache/spark/commit/b9371a6672b601bc630b6eb8b7c8f7d8a8d509c9).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23239: [SPARK-26021][SQL][followup] only deal with NaN and -0.0...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23239
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23239: [SPARK-26021][SQL][followup] only deal with NaN and -0.0...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23239
  
    **[Test build #99801 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99801/testReport)** for PR 23239 at commit [`84e3989`](https://github.com/apache/spark/commit/84e3989329da1e7bb8f26dc2ded7558ce6fd9b23).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23239: [SPARK-26021][SQL][followup] only deal with NaN and -0.0...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/23239
  
    Hi, @cloud-fan . Please make another PR for `branch-2.4`. There is a conflict on `branch-2.4`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23239: [SPARK-26021][SQL][followup] only deal with NaN a...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23239#discussion_r239690045
  
    --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeWriter.java ---
    @@ -198,11 +198,45 @@ protected final void writeLong(long offset, long value) {
         Platform.putLong(getBuffer(), offset, value);
       }
     
    +  // We need to take care of NaN and -0.0 in several places:
    +  //   1. When compare values, different NaNs should be treated as same, `-0.0` and `0.0` should be
    +  //      treated as same.
    +  //   2. In range partitioner, different NaNs should belong to the same partition, -0.0 and 0.0
    --- End diff --
    
    As this is not a problem, we should update the PR description too.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23239: [SPARK-26021][SQL][followup] only deal with NaN and -0.0...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23239
  
    **[Test build #99820 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99820/testReport)** for PR 23239 at commit [`b7a5497`](https://github.com/apache/spark/commit/b7a54979dbbb905bcda2e073697bb4ec82c3f93d).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23239: [SPARK-26021][SQL][followup] only deal with NaN and -0.0...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23239
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23239: [SPARK-26021][SQL][followup] only deal with NaN and -0.0...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23239
  
    **[Test build #99820 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99820/testReport)** for PR 23239 at commit [`b7a5497`](https://github.com/apache/spark/commit/b7a54979dbbb905bcda2e073697bb4ec82c3f93d).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23239: [SPARK-26021][SQL][followup] only deal with NaN and -0.0...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23239
  
    **[Test build #99738 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99738/testReport)** for PR 23239 at commit [`7d5ff06`](https://github.com/apache/spark/commit/7d5ff06969cd1fe8d05adcf8c79d6b56d0865f44).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23239: [SPARK-26021][SQL][followup] only deal with NaN and -0.0...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23239
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5851/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23239: [SPARK-26021][SQL][followup] only deal with NaN and -0.0...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23239
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99780/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23239: [SPARK-26021][SQL][followup] only deal with NaN and -0.0...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/23239
  
    Yes it is. `UnsafeProjection` always normalize NaN and -0.0, and Spark uses `UnsafeProjection` to produce output. So users can't distinguish them.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23239: [SPARK-26021][SQL][followup] only deal with NaN and -0.0...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23239
  
    **[Test build #99780 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99780/testReport)** for PR 23239 at commit [`84e3989`](https://github.com/apache/spark/commit/84e3989329da1e7bb8f26dc2ded7558ce6fd9b23).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23239: [SPARK-26021][SQL][followup] only deal with NaN and -0.0...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23239
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23239: [SPARK-26021][SQL][followup] only deal with NaN and -0.0...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/23239
  
    LGTM


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23239: [SPARK-26021][SQL][followup] only deal with NaN a...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/23239


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23239: [SPARK-26021][SQL][followup] only deal with NaN and -0.0...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23239
  
    **[Test build #99816 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99816/testReport)** for PR 23239 at commit [`c9dfe67`](https://github.com/apache/spark/commit/c9dfe67ee2ff6a2ac5478e072c2709c1b0d0e5f4).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23239: [SPARK-26021][SQL][followup] only deal with NaN and -0.0...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23239
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5817/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23239: [SPARK-26021][SQL][followup] only deal with NaN and -0.0...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23239
  
    **[Test build #99737 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99737/testReport)** for PR 23239 at commit [`797ade3`](https://github.com/apache/spark/commit/797ade3eb175c41866efbffa3cb4c30f90e49ca7).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23239: [SPARK-26021][SQL][followup] only deal with NaN a...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23239#discussion_r239302780
  
    --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeWriter.java ---
    @@ -198,11 +198,45 @@ protected final void writeLong(long offset, long value) {
         Platform.putLong(getBuffer(), offset, value);
       }
     
    +  // We need to take care of NaN and -0.0 in several places:
    +  //   1. When compare values, different NaNs should be treated as same, `-0.0` and `0.0` should be
    +  //      treated as same.
    +  //   2. In range partitioner, different NaNs should belong to the same partition, -0.0 and 0.0
    --- End diff --
    
    Do we already have related test for case 2?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23239: [SPARK-26021][SQL][followup] only deal with NaN and -0.0...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23239
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23239: [SPARK-26021][SQL][followup] only deal with NaN and -0.0...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/23239
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23239: [SPARK-26021][SQL][followup] only deal with NaN and -0.0...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23239
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99816/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23239: [SPARK-26021][SQL][followup] only deal with NaN and -0.0...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23239
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23239: [SPARK-26021][SQL][followup] only deal with NaN and -0.0...

Posted by adoron <gi...@git.apache.org>.
Github user adoron commented on the issue:

    https://github.com/apache/spark/pull/23239
  
    @cloud-fan what about UnsafeRow::setDouble/Float? It doesn't go through the same flow. Is it not used?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23239: [SPARK-26021][SQL][followup] only deal with NaN and -0.0...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23239
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23239: [SPARK-26021][SQL][followup] only deal with NaN and -0.0...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23239
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99801/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23239: [SPARK-26021][SQL][followup] only deal with NaN and -0.0...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23239
  
    **[Test build #99738 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99738/testReport)** for PR 23239 at commit [`7d5ff06`](https://github.com/apache/spark/commit/7d5ff06969cd1fe8d05adcf8c79d6b56d0865f44).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23239: [SPARK-26021][SQL][followup] only deal with NaN and -0.0...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23239
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5778/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23239: [SPARK-26021][SQL][followup] only deal with NaN and -0.0...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23239
  
    **[Test build #99816 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99816/testReport)** for PR 23239 at commit [`c9dfe67`](https://github.com/apache/spark/commit/c9dfe67ee2ff6a2ac5478e072c2709c1b0d0e5f4).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23239: [SPARK-26021][SQL][followup] only deal with NaN and -0.0...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23239
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99738/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23239: [SPARK-26021][SQL][followup] only deal with NaN and -0.0...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/23239
  
    I checked the original PR that handles NaN: https://github.com/apache/spark/commit/c032b0bf92130dc4facb003f0deaeb1228aefded
    
    It didn't add end-to-end tests, so I added 2 new tests.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23239: [SPARK-26021][SQL][followup] only deal with NaN and -0.0...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23239
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23239: [SPARK-26021][SQL][followup] only deal with NaN a...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23239#discussion_r239256853
  
    --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java ---
    @@ -174,11 +174,6 @@ public static float getFloat(Object object, long offset) {
       }
     
       public static void putFloat(Object object, long offset, float value) {
    -    if (Float.isNaN(value)) {
    -      value = Float.NaN;
    -    } else if (value == -0.0f) {
    -      value = 0.0f;
    -    }
    --- End diff --
    
    These change are expected to cause the following test case failure in `PlatformUtilSuite`, but it seems to be missed. Could you fix the test case or remove together, @cloud-fan ?
    - https://github.com/apache/spark/blob/master/common/unsafe/src/test/java/org/apache/spark/unsafe/PlatformUtilSuite.java#L162-L163


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23239: [SPARK-26021][SQL][followup] only deal with NaN and -0.0...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23239
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23239: [SPARK-26021][SQL][followup] only deal with NaN and -0.0...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23239
  
    **[Test build #99801 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99801/testReport)** for PR 23239 at commit [`84e3989`](https://github.com/apache/spark/commit/84e3989329da1e7bb8f26dc2ded7558ce6fd9b23).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23239: [SPARK-26021][SQL][followup] only deal with NaN and -0.0...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/23239
  
    Yes, the 3 cases I pointed that need to handle NaN and -0.0 do not change the value in `UnsafeRow`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23239: [SPARK-26021][SQL][followup] only deal with NaN and -0.0...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23239
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5777/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23239: [SPARK-26021][SQL][followup] only deal with NaN and -0.0...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23239
  
    **[Test build #99780 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99780/testReport)** for PR 23239 at commit [`84e3989`](https://github.com/apache/spark/commit/84e3989329da1e7bb8f26dc2ded7558ce6fd9b23).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23239: [SPARK-26021][SQL][followup] only deal with NaN and -0.0...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23239
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org