You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Apache Spark (JIRA)" <ji...@apache.org> on 2016/04/18 16:32:25 UTC

[jira] [Commented] (SPARK-14700) PySpark Row equality operator is not overridden

    [ https://issues.apache.org/jira/browse/SPARK-14700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15245752#comment-15245752 ] 

Apache Spark commented on SPARK-14700:
--------------------------------------

User 'JasonMWhite' has created a pull request for this issue:
https://github.com/apache/spark/pull/12470

> PySpark Row equality operator is not overridden
> -----------------------------------------------
>
>                 Key: SPARK-14700
>                 URL: https://issues.apache.org/jira/browse/SPARK-14700
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 1.6.1
>            Reporter: Jason White
>
> The pyspark.sql.Row class doesn't override the equality operator. As a result, it uses the superclass's equality operator, `tuple`. This is insufficient, as the order of the elements in the tuple are meant to be used in combination with the private `__fields__` member.
> This leads to difficulties in preparing proper unit tests in PySpark DataFrames. It leads to seemingly illogical conditions such as:
> Row(a=1) == Row(b=1) # True, since column names aren't considered
> r1 = Row('b', 'a')(2, 1) # Row(b=2, a=1)
> r1 == Row(b=2, a=1) # False, since kwarg operators are sorted alphabetically in the Row constructor
> r1 == Row(a=2, b=1) # True, since the tuple for each is (2, 1)
> Indeed, a few bugs in existing Spark code were exposed when I patched this. PR incoming.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org