You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "David Vogelbacher (Jira)" <ji...@apache.org> on 2022/07/26 22:38:00 UTC

[jira] [Created] (SPARK-39885) Behavior differs between array_overlap and array_contains for negative 0.0

David Vogelbacher created SPARK-39885:
-----------------------------------------

             Summary: Behavior differs between array_overlap and array_contains for negative 0.0
                 Key: SPARK-39885
                 URL: https://issues.apache.org/jira/browse/SPARK-39885
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 3.2.2
            Reporter: David Vogelbacher


{{array_contains([0.0], -0.0)}} will return true. {{array_overlaps([0.0], [-0.0])}} will return false. I think we generally want to treat -0.0 and 0.0 as the same (see https://github.com/apache/spark/blob/e9eb28e27d10497c8b36774609823f4bbd2c8500/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/SQLOrderingUtil.scala#L28)
However, the {{Double::equals}} method doesn't. Therefore, we should either mark double as false in {{TypeUtils#typeWithProperEquals}}, or we should wrap it with our own equals method that handles this case.

Java code snippets showing the issue:

{code:java}
dataset = sparkSession.createDataFrame(
            List.of(RowFactory.create(List.of(-0.0))),
            DataTypes.createStructType(ImmutableList.of(DataTypes.createStructField(
                    "doubleCol", DataTypes.createArrayType(DataTypes.DoubleType), false))));
        Dataset<Row> df = dataset.withColumn(
            "overlaps", functions.arrays_overlap(functions.array(functions.lit(+0.0)), dataset.col("doubleCol")));
        List<Row> result = df.collectAsList(); // [[WrappedArray(-0.0),false]]
{code}

{code:java}
dataset = sparkSession.createDataFrame(
                List.of(RowFactory.create(-0.0)),
                DataTypes.createStructType(
                        ImmutableList.of(DataTypes.createStructField("doubleCol", DataTypes.DoubleType, false))));
        Dataset<Row> df = dataset.withColumn(
                "overlaps", functions.array_contains(functions.array(functions.lit(+0.0)), dataset.col("doubleCol")));
        List<Row> result = df.collectAsList(); // [[-0.0,true]]
{code}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org