You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Tanel Kiis (Jira)" <ji...@apache.org> on 2020/08/24 09:40:00 UTC

[jira] [Commented] (SPARK-32110) -0.0 vs 0.0 is inconsistent

    [ https://issues.apache.org/jira/browse/SPARK-32110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17183107#comment-17183107 ] 

Tanel Kiis commented on SPARK-32110:
------------------------------------

there is also inconsistency in the code gen and the interpreted code paths for EqualTo and EqualNullSafe
{code:java}
org.scalatest.exceptions.GeneratorDrivenPropertyCheckFailedException: TestFailedException was thrown during property evaluation.   Message: Incorrect evaluation: (-0.0 = 0.0), interpret: false, codegen: true   Location: (ExpressionEvalHelper.scala:454)   Occurred when passed generated values (     arg0 = -0.0,     arg1 = 0.0   )
{code}

> -0.0 vs 0.0 is inconsistent
> ---------------------------
>
>                 Key: SPARK-32110
>                 URL: https://issues.apache.org/jira/browse/SPARK-32110
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.0.0
>            Reporter: Robert Joseph Evans
>            Priority: Major
>
> This is related to SPARK-26021 where some things were fixed but there is still a lot that is not consistent.
> When parsing SQL {{-0.0}} is turned into {{0.0}}. This can produce quick results that appear to be correct but are totally inconsistent for the same operators.
> {code:java}
> scala> import spark.implicits._
> import spark.implicits._
> scala> spark.sql("SELECT 0.0 = -0.0").collect
> res0: Array[org.apache.spark.sql.Row] = Array([true])
> scala> Seq((0.0, -0.0)).toDF("a", "b").selectExpr("a = b").collect
> res1: Array[org.apache.spark.sql.Row] = Array([false])
> {code}
> This also shows up in sorts
> {code:java}
> scala> Seq((0.0, -100.0), (-0.0, 100.0), (0.0, 100.0), (-0.0, -100.0)).toDF("a", "b").orderBy("a", "b").collect
> res2: Array[org.apache.spark.sql.Row] = Array([-0.0,-100.0], [-0.0,100.0], [0.0,-100.0], [0.0,100.0])
> {code}
> But not for a equi-join or for an aggregate
> {code:java}
> scala> Seq((0.0, -0.0)).toDF("a", "b").join(Seq((-0.0, 0.0)).toDF("r_a", "r_b"), $"a" === $"r_a").collect
> res3: Array[org.apache.spark.sql.Row] = Array([0.0,-0.0,-0.0,0.0])
> scala> Seq((0.0, 1.0), (-0.0, 1.0)).toDF("a", "b").groupBy("a").count.collect
> res6: Array[org.apache.spark.sql.Row] = Array([0.0,2])
> {code}
> This can lead to some very odd results. Like an equi-join with a filter that logically should do nothing, but ends up filtering the result to nothing.
> {code:java}
> scala> Seq((0.0, -0.0)).toDF("a", "b").join(Seq((-0.0, 0.0)).toDF("r_a", "r_b"), $"a" === $"r_a" && $"a" <= $"r_a").collect
> res8: Array[org.apache.spark.sql.Row] = Array()
> scala> Seq((0.0, -0.0)).toDF("a", "b").join(Seq((-0.0, 0.0)).toDF("r_a", "r_b"), $"a" === $"r_a").collect
> res9: Array[org.apache.spark.sql.Row] = Array([0.0,-0.0,-0.0,0.0])
> {code}
> Hive never normalizes -0.0 to 0.0 so this results in non-ieee complaint behavior everywhere, but at least it is consistently odd.
> MySQL, Oracle, Postgres, and SQLite all appear to normalize the {{-0.0}} to {{0.0}}.
> The root cause of this appears to be that the java implementation of {{Double.compare}} and {{Float.compare}} for open JDK places {{-0.0}} < {{0.0}}.
> This is not documented in the java docs but it is clearly documented in the code, so it is not a "bug" that java is going to fix.
> [https://github.com/openjdk/jdk/blob/a0a0539b0d3f9b6809c9759e697bfafd7b138ec1/src/java.base/share/classes/java/lang/Double.java#L1022-L1035]
> It is also consistent with what is in the java docs for {{Double.equals}}
>  [https://docs.oracle.com/javase/8/docs/api/java/lang/Double.html#equals-java.lang.Object-]
> To be clear I am filing this mostly to document the current state rather than to think it needs to be fixed ASAP. It is a rare corner case, but ended up being really frustrating for me to debug what was happening.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org