You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Josh Rosen (JIRA)" <ji...@apache.org> on 2019/05/02 02:37:00 UTC

[jira] [Created] (SPARK-27619) MapType should be prohibited in hash expressions

Josh Rosen created SPARK-27619:
----------------------------------

             Summary: MapType should be prohibited in hash expressions
                 Key: SPARK-27619
                 URL: https://issues.apache.org/jira/browse/SPARK-27619
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.4.0
            Reporter: Josh Rosen


Spark currently allows MapType expressions to be used as input to hash expressions, but I think that this should be prohibited because Spark SQL does not support map equality. Currently, Spark SQL's map hashcodes are sensitive to the insertion order of map elements:
{code:java}
val a = spark.createDataset(Map(1->1, 2->2) :: Nil)
val b = spark.createDataset(Map(2->2, 1->1) :: Nil)

# Demonstration of how Scala Map equality is unaffected by insertion order:
assert(Map(1->1, 2->2).hashCode() == Map(2->2, 1->1).hashCode())
assert(Map(1->1, 2->2) == Map(2->2, 1->1))
assert(a.first() == b.first())

# In contrast, this will print two different hashcodes:
println(Seq(a, b).map(_.selectExpr("hash(*)").first())){code}
I think there's precedence for banning the use of MapType here because we already prohibit MapType in aggregation / joins (SPARK-9415) and set operations (SPARK-19893).

Alternatively, we could support hashing here if we implemented support for comparable map types (SPARK-18134).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org