You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/09/04 09:04:58 UTC

[GitHub] [spark] cloud-fan opened a new pull request #29647: [SPARK-32764][SQL] -0.0 should be equal to 0.0

cloud-fan opened a new pull request #29647:
URL: https://github.com/apache/spark/pull/29647


   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html
     2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html
     3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'.
     4. Be sure to keep the PR description updated to reflect all changes.
     5. Please write your PR title to summarize what this PR proposes.
     6. If possible, provide a concise example to reproduce the issue for a faster review.
     7. If you want to add a new configuration, please read the guideline first for naming configurations in
        'core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala'.
   -->
   
   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   This is a Spark 3.0 regression introduced by https://github.com/apache/spark/pull/26761. We missed a corner case that `java.lang.Double.compare` treats 0.0 and -0.0 as different, which breaks SQL semantic.
   
   This PR adds back the `OrderingUtil`, to provide custom compare methods that take care of 0.0 vs -0.0
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   Fix a correctness bug.
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Spark versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   Yes, now `SELECT  0.0 > -0.0` returns false correctly as Spark 2.x.
   
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   -->
   new tests


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29647: [SPARK-32764][SQL] -0.0 should be equal to 0.0

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29647:
URL: https://github.com/apache/spark/pull/29647#issuecomment-687027752






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #29647: [SPARK-32764][SQL] -0.0 should be equal to 0.0

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #29647:
URL: https://github.com/apache/spark/pull/29647#discussion_r484475154



##########
File path: sql/core/src/test/resources/sql-tests/inputs/operators.sql
##########
@@ -95,3 +95,9 @@ select width_bucket(5.35, 0.024, 10.06, null);
 select width_bucket(5.35, 0.024, 10.06, -5);
 select width_bucket(5.35, 0.024, 10.06, 9223372036854775807L); -- long max value
 select width_bucket(5.35, 0.024, 10.06, 9223372036854775807L - 1);
+
+-- special floating values
+select double('NaN') = double('NaN'), double('NaN') >= double('NaN'), double('NaN') <= double('NaN');
+select double('NaN') > double("+Infinity"), double("+Infinity") < double('NaN'), double('NaN') > double('NaN');
+select 0.0 = -0.0, 0.0 >= -0.0, 0.0 <= -0.0;
+select 0.0 > -0.0, 0.0 < -0.0;

Review comment:
       I'm removing them. I can't reproduce the bug with temp view as there are optimizations around literals. I'll add an end-to-end test in `DataFrameSuite`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29647: [SPARK-32764][SQL] -0.0 should be equal to 0.0

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29647:
URL: https://github.com/apache/spark/pull/29647#issuecomment-687213825


   **[Test build #128291 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128291/testReport)** for PR 29647 at commit [`77b1e44`](https://github.com/apache/spark/commit/77b1e4421b66d0285a9fa80e35bb6cf336977ac2).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29647: [SPARK-32764][SQL] -0.0 should be equal to 0.0

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29647:
URL: https://github.com/apache/spark/pull/29647#issuecomment-688494587






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gengliangwang commented on a change in pull request #29647: [SPARK-32764][SQL] -0.0 should be equal to 0.0

Posted by GitBox <gi...@apache.org>.
gengliangwang commented on a change in pull request #29647:
URL: https://github.com/apache/spark/pull/29647#discussion_r483641822



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/OrderingUtil.scala
##########
@@ -0,0 +1,41 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.util
+
+object OrderingUtil {

Review comment:
       Nit: How about rename this as `SQLOrderingUtil` and rename `compareDoublesSQL` as `compareDoubles`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29647: [SPARK-32764][SQL] -0.0 should be equal to 0.0

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29647:
URL: https://github.com/apache/spark/pull/29647#issuecomment-688397061






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29647: [SPARK-32764][SQL] -0.0 should be equal to 0.0

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29647:
URL: https://github.com/apache/spark/pull/29647#issuecomment-688397061






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29647: [SPARK-32764][SQL] -0.0 should be equal to 0.0

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29647:
URL: https://github.com/apache/spark/pull/29647#issuecomment-688494022


   **[Test build #128359 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128359/testReport)** for PR 29647 at commit [`869856c`](https://github.com/apache/spark/commit/869856ce318ed6e2994ca48dab565fbbd73c9ad0).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29647: [SPARK-32764][SQL] -0.0 should be equal to 0.0

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29647:
URL: https://github.com/apache/spark/pull/29647#issuecomment-687027182


   **[Test build #128291 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128291/testReport)** for PR 29647 at commit [`77b1e44`](https://github.com/apache/spark/commit/77b1e4421b66d0285a9fa80e35bb6cf336977ac2).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29647: [SPARK-32764][SQL] -0.0 should be equal to 0.0

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29647:
URL: https://github.com/apache/spark/pull/29647#issuecomment-687215185






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29647: [SPARK-32764][SQL] -0.0 should be equal to 0.0

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #29647:
URL: https://github.com/apache/spark/pull/29647#discussion_r483724531



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/OrderingUtil.scala
##########
@@ -0,0 +1,41 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.util
+
+object OrderingUtil {

Review comment:
       +1




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on pull request #29647: [SPARK-32764][SQL] -0.0 should be equal to 0.0

Posted by GitBox <gi...@apache.org>.
srowen commented on pull request #29647:
URL: https://github.com/apache/spark/pull/29647#issuecomment-687201604


   @gengliangwang wait, so `0.0 > -0.0` is true in those systems?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29647: [SPARK-32764][SQL] -0.0 should be equal to 0.0

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29647:
URL: https://github.com/apache/spark/pull/29647#issuecomment-688494587






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #29647: [SPARK-32764][SQL] -0.0 should be equal to 0.0

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #29647:
URL: https://github.com/apache/spark/pull/29647#issuecomment-687249226


   Thank you, @cloud-fan !


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #29647: [SPARK-32764][SQL] -0.0 should be equal to 0.0

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #29647:
URL: https://github.com/apache/spark/pull/29647#discussion_r484220676



##########
File path: sql/core/src/test/resources/sql-tests/inputs/operators.sql
##########
@@ -95,3 +95,9 @@ select width_bucket(5.35, 0.024, 10.06, null);
 select width_bucket(5.35, 0.024, 10.06, -5);
 select width_bucket(5.35, 0.024, 10.06, 9223372036854775807L); -- long max value
 select width_bucket(5.35, 0.024, 10.06, 9223372036854775807L - 1);
+
+-- special floating values
+select double('NaN') = double('NaN'), double('NaN') >= double('NaN'), double('NaN') <= double('NaN');
+select double('NaN') > double("+Infinity"), double("+Infinity") < double('NaN'), double('NaN') > double('NaN');
+select 0.0 = -0.0, 0.0 >= -0.0, 0.0 <= -0.0;
+select 0.0 > -0.0, 0.0 < -0.0;

Review comment:
       literals are OK. I'll update the tests to use columns.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gengliangwang commented on pull request #29647: [SPARK-32764][SQL] -0.0 should be equal to 0.0

Posted by GitBox <gi...@apache.org>.
gengliangwang commented on pull request #29647:
URL: https://github.com/apache/spark/pull/29647#issuecomment-687209454


   @srowen  it's false.
   I tried with 
   ```
   create table foo(i int);
   insert into foo values(1);
   select * from foo where 0.0>-0.0;
   ```
   in http://sqlfiddle.com/


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #29647: [SPARK-32764][SQL] -0.0 should be equal to 0.0

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #29647:
URL: https://github.com/apache/spark/pull/29647#discussion_r483819954



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/OrderingUtil.scala
##########
@@ -0,0 +1,41 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.util
+
+object OrderingUtil {
+
+  /**
+   * A special version of double comparison that follows SQL semantic:
+   *  1. NaN == NaN
+   *  2. NaN is greater than any non-NaN double
+   *  3. -0.0 == 0.0
+   */
+  def compareDoublesSQL(x: Double, y: Double): Int = {
+    if (x == y) 0 else java.lang.Double.compare(x, y)
+  }
+
+  /**
+   * A special version of float comparison that follows SQL semantic:
+   *  1. NaN == NaN
+   *  2. NaN is greater than any non-NaN double

Review comment:
       NaN is greater than any non-NaN double -> NaN is greater than any non-NaN float




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on pull request #29647: [SPARK-32764][SQL] -0.0 should be equal to 0.0

Posted by GitBox <gi...@apache.org>.
srowen commented on pull request #29647:
URL: https://github.com/apache/spark/pull/29647#issuecomment-687116478


   `0.0 > -0.0` should be false in SQL? then I agree. That's true in the JVM languages too.
   
   However that's not true in other contexts, like sorting things, where you do want a total ordering on doubles. I think we had to confront that a while ago in dealing with Scala 2.13 or something. If this doesn't affect sorts then I think that's fine (and it doesn't seem to)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29647: [SPARK-32764][SQL] -0.0 should be equal to 0.0

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29647:
URL: https://github.com/apache/spark/pull/29647#issuecomment-688396359


   **[Test build #128359 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128359/testReport)** for PR 29647 at commit [`869856c`](https://github.com/apache/spark/commit/869856ce318ed6e2994ca48dab565fbbd73c9ad0).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun closed pull request #29647: [SPARK-32764][SQL] -0.0 should be equal to 0.0

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun closed pull request #29647:
URL: https://github.com/apache/spark/pull/29647


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #29647: [SPARK-32764][SQL] -0.0 should be equal to 0.0

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on pull request #29647:
URL: https://github.com/apache/spark/pull/29647#issuecomment-687024636


   cc @srowen @maropu @viirya 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29647: [SPARK-32764][SQL] -0.0 should be equal to 0.0

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #29647:
URL: https://github.com/apache/spark/pull/29647#discussion_r483741668



##########
File path: sql/core/src/test/resources/sql-tests/inputs/operators.sql
##########
@@ -95,3 +95,9 @@ select width_bucket(5.35, 0.024, 10.06, null);
 select width_bucket(5.35, 0.024, 10.06, -5);
 select width_bucket(5.35, 0.024, 10.06, 9223372036854775807L); -- long max value
 select width_bucket(5.35, 0.024, 10.06, 9223372036854775807L - 1);
+
+-- special floating values
+select double('NaN') = double('NaN'), double('NaN') >= double('NaN'), double('NaN') <= double('NaN');
+select double('NaN') > double("+Infinity"), double("+Infinity") < double('NaN'), double('NaN') > double('NaN');
+select 0.0 = -0.0, 0.0 >= -0.0, 0.0 <= -0.0;
+select 0.0 > -0.0, 0.0 < -0.0;

Review comment:
       This doesn't add a new test coverage because this has the same result with Spark 3.0.0.
   ```
   spark-sql> select version();
   3.0.0 3fdfce3120f307147244e5eaf46d61419a723d50
   
   spark-sql> select double('NaN') = double('NaN'), double('NaN') >= double('NaN'), double('NaN') <= double('NaN');
   true	true	true
   
   spark-sql> select double('NaN') > double("+Infinity"), double("+Infinity") < double('NaN'), double('NaN') > double('NaN');
   true	true	false
   
   spark-sql> select 0.0 = -0.0, 0.0 >= -0.0, 0.0 <= -0.0;
   true	true	true
   
   spark-sql> select 0.0 > -0.0, 0.0 < -0.0;
   false	false
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29647: [SPARK-32764][SQL] -0.0 should be equal to 0.0

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29647:
URL: https://github.com/apache/spark/pull/29647#issuecomment-687215185






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29647: [SPARK-32764][SQL] -0.0 should be equal to 0.0

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #29647:
URL: https://github.com/apache/spark/pull/29647#discussion_r483742704



##########
File path: sql/core/src/test/resources/sql-tests/inputs/operators.sql
##########
@@ -95,3 +95,9 @@ select width_bucket(5.35, 0.024, 10.06, null);
 select width_bucket(5.35, 0.024, 10.06, -5);
 select width_bucket(5.35, 0.024, 10.06, 9223372036854775807L); -- long max value
 select width_bucket(5.35, 0.024, 10.06, 9223372036854775807L - 1);
+
+-- special floating values
+select double('NaN') = double('NaN'), double('NaN') >= double('NaN'), double('NaN') <= double('NaN');
+select double('NaN') > double("+Infinity"), double("+Infinity") < double('NaN'), double('NaN') > double('NaN');
+select 0.0 = -0.0, 0.0 >= -0.0, 0.0 <= -0.0;
+select 0.0 > -0.0, 0.0 < -0.0;

Review comment:
       We can remove these if we cannot reproduce the regression in SQL layer. Otherwise, this may be misleading.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29647: [SPARK-32764][SQL] -0.0 should be equal to 0.0

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #29647:
URL: https://github.com/apache/spark/pull/29647#discussion_r484632927



##########
File path: sql/core/src/test/resources/sql-tests/inputs/operators.sql
##########
@@ -95,3 +95,9 @@ select width_bucket(5.35, 0.024, 10.06, null);
 select width_bucket(5.35, 0.024, 10.06, -5);
 select width_bucket(5.35, 0.024, 10.06, 9223372036854775807L); -- long max value
 select width_bucket(5.35, 0.024, 10.06, 9223372036854775807L - 1);
+
+-- special floating values
+select double('NaN') = double('NaN'), double('NaN') >= double('NaN'), double('NaN') <= double('NaN');
+select double('NaN') > double("+Infinity"), double("+Infinity") < double('NaN'), double('NaN') > double('NaN');
+select 0.0 = -0.0, 0.0 >= -0.0, 0.0 <= -0.0;
+select 0.0 > -0.0, 0.0 < -0.0;

Review comment:
       Thanks!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29647: [SPARK-32764][SQL] -0.0 should be equal to 0.0

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29647:
URL: https://github.com/apache/spark/pull/29647#issuecomment-687027182


   **[Test build #128291 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128291/testReport)** for PR 29647 at commit [`77b1e44`](https://github.com/apache/spark/commit/77b1e4421b66d0285a9fa80e35bb6cf336977ac2).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29647: [SPARK-32764][SQL] -0.0 should be equal to 0.0

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29647:
URL: https://github.com/apache/spark/pull/29647#issuecomment-688396359


   **[Test build #128359 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128359/testReport)** for PR 29647 at commit [`869856c`](https://github.com/apache/spark/commit/869856ce318ed6e2994ca48dab565fbbd73c9ad0).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gengliangwang commented on pull request #29647: [SPARK-32764][SQL] -0.0 should be equal to 0.0

Posted by GitBox <gi...@apache.org>.
gengliangwang commented on pull request #29647:
URL: https://github.com/apache/spark/pull/29647#issuecomment-687170627


   @srowen I just verified on PostgreSQL/Oracle/Mysql  and the results of `0.0 = -0.0` are all `true`. It's a very corner case but still a correctness bug.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29647: [SPARK-32764][SQL] -0.0 should be equal to 0.0

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #29647:
URL: https://github.com/apache/spark/pull/29647#discussion_r483744099



##########
File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/OrderingUtilSuite.scala
##########
@@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.util
+
+import java.lang.{Double => JDouble, Float => JFloat}
+
+import org.apache.spark.SparkFunSuite
+
+class OrderingUtilSuite extends SparkFunSuite {
+
+  test("compareDoublesSQL") {
+    def shouldMatchDefaultOrder(a: Double, b: Double): Unit = {
+      assert(OrderingUtil.compareDoublesSQL(a, b) === JDouble.compare(a, b))
+      assert(OrderingUtil.compareDoublesSQL(b, a) === JDouble.compare(b, a))
+    }
+    shouldMatchDefaultOrder(0d, 0d)
+    shouldMatchDefaultOrder(0d, 1d)
+    shouldMatchDefaultOrder(-1d, 1d)
+    shouldMatchDefaultOrder(Double.MinValue, Double.MaxValue)
+    assert(OrderingUtil.compareDoublesSQL(Double.NaN, Double.NaN) === 0)
+    assert(OrderingUtil.compareDoublesSQL(Double.NaN, Double.PositiveInfinity) > 0)
+    assert(OrderingUtil.compareDoublesSQL(Double.NaN, Double.NegativeInfinity) > 0)
+    assert(OrderingUtil.compareDoublesSQL(Double.PositiveInfinity, Double.NaN) < 0)
+    assert(OrderingUtil.compareDoublesSQL(Double.NegativeInfinity, Double.NaN) < 0)
+    assert(OrderingUtil.compareDoublesSQL(0.0d, -0.0d) === 0)
+    assert(OrderingUtil.compareDoublesSQL(-0.0d, 0.0d) === 0)
+  }
+
+  test("compareFloatsSQL") {
+    def shouldMatchDefaultOrder(a: Float, b: Float): Unit = {
+      assert(OrderingUtil.compareFloatsSQL(a, b) === JFloat.compare(a, b))
+      assert(OrderingUtil.compareFloatsSQL(b, a) === JFloat.compare(b, a))
+    }
+    shouldMatchDefaultOrder(0f, 0f)
+    shouldMatchDefaultOrder(0f, 1f)
+    shouldMatchDefaultOrder(-1f, 1f)
+    shouldMatchDefaultOrder(Float.MinValue, Float.MaxValue)
+    assert(OrderingUtil.compareDoublesSQL(Float.NaN, Float.NaN) === 0)

Review comment:
       `NaN` is not a unique value. Since there are multiple NaN values, could you add a test coverage for different NaN values too?
   ```scala
   scala> java.nio.ByteBuffer.allocate(4).putFloat(Float.NaN).array
   res0: Array[Byte] = Array(127, -64, 0, 0)
   
   scala> val x = java.lang.Float.intBitsToFloat(-6966608)
   x: Float = NaN
   
   scala> java.nio.ByteBuffer.allocate(4).putFloat(x).array
   res1: Array[Byte] = Array(-1, -107, -78, -80)
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen edited a comment on pull request #29647: [SPARK-32764][SQL] -0.0 should be equal to 0.0

Posted by GitBox <gi...@apache.org>.
srowen edited a comment on pull request #29647:
URL: https://github.com/apache/spark/pull/29647#issuecomment-687116478


   `0.0 > -0.0` should be false in SQL? then I agree. That's the answer in the JVM languages too.
   
   However that's not true in other contexts, like sorting things, where you do want a total ordering on doubles. I think we had to confront that a while ago in dealing with Scala 2.13 or something. If this doesn't affect sorts then I think that's fine (and it doesn't seem to)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29647: [SPARK-32764][SQL] -0.0 should be equal to 0.0

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29647:
URL: https://github.com/apache/spark/pull/29647#issuecomment-687027752






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #29647: [SPARK-32764][SQL] -0.0 should be equal to 0.0

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #29647:
URL: https://github.com/apache/spark/pull/29647#discussion_r483818327



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/OrderingUtil.scala
##########
@@ -0,0 +1,41 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.util
+
+object OrderingUtil {

Review comment:
       sounds good.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29647: [SPARK-32764][SQL] -0.0 should be equal to 0.0

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #29647:
URL: https://github.com/apache/spark/pull/29647#discussion_r484633284



##########
File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/SQLOrderingUtilSuite.scala
##########
@@ -0,0 +1,75 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.util
+
+import java.lang.{Double => JDouble, Float => JFloat}
+
+import org.apache.spark.SparkFunSuite
+
+class SQLOrderingUtilSuite extends SparkFunSuite {
+
+  test("compareDoublesSQL") {
+    def shouldMatchDefaultOrder(a: Double, b: Double): Unit = {
+      assert(SQLOrderingUtil.compareDoubles(a, b) === JDouble.compare(a, b))
+      assert(SQLOrderingUtil.compareDoubles(b, a) === JDouble.compare(b, a))
+    }
+    shouldMatchDefaultOrder(0d, 0d)
+    shouldMatchDefaultOrder(0d, 1d)
+    shouldMatchDefaultOrder(-1d, 1d)
+    shouldMatchDefaultOrder(Double.MinValue, Double.MaxValue)
+
+    val specialNaN = JDouble.longBitsToDouble(0x7ff1234512345678L)
+    assert(JDouble.isNaN(specialNaN))
+    assert(JDouble.doubleToRawLongBits(Double.NaN) != JDouble.doubleToRawLongBits(specialNaN))
+
+    assert(SQLOrderingUtil.compareDoubles(Double.NaN, Double.NaN) === 0)
+    assert(SQLOrderingUtil.compareDoubles(Double.NaN, specialNaN) === 0)

Review comment:
       Thanks!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org