You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Kazuaki Ishizaki <IS...@jp.ibm.com> on 2016/06/18 04:43:48 UTC

Question about equality of o.a.s.sql.Row

Dear all,

I have three questions about equality of org.apache.spark.sql.Row.

(1) If a Row has a complex type (e.g. Array), is the following behavior 
expected?
If two Rows has the same array instance, Row.equals returns true in the 
second assert. If two Rows has different array instances (a1 and a2) that 
have the same array elements, Row.equals returns false in the third 
assert.

val a1 = Array(3, 4)
val a2 = Array(3, 4)
val r1 = Row(a1)
val r2 = Row(a2)
assert(a1.sameElements(a2)) // SUCCESS
assert(Row(a1).equals(Row(a1)))  // SUCCESS
assert(Row(a1).equals(Row(a2)))  // FAILURE

This is because two objects are compared by "o1 != o2" instead of 
"o1.equals(o2)" at 
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala#L408

(2) If (1) is expected, where is this behavior is described or defined? I 
cannot find the description in the API document.
https://spark.apache.org/docs/1.6.1/api/java/org/apache/spark/sql/Row.html
https://home.apache.org/~pwendell/spark-releases/spark-2.0.0-preview-docs/api/scala/index.html#org.apache.spark.sql.Row

(3) If (3) is expected, is there any recommendation to write code of 
equality of two Rows that have an Array or complex types (e.g. Map)?

Best Regards,
Kazuaki Ishizaki, @kiszk


Re: Question about equality of o.a.s.sql.Row

Posted by dhruve ashar <dh...@gmail.com>.
In scala, "==" and "!=" are not operators but methods which are defined here
<http://www.scala-lang.org/api/current/#scala.Any> as :

The expression x == that is equivalent to if (x eq null) that eq null else
x.*equals*(that).
The expression x != that is equivalent to true if !(this == that)

So its recommended that you override equals method but check for equality
using == and !=.




On Mon, Jun 20, 2016 at 2:03 PM, Michael Armbrust <mi...@databricks.com>
wrote:

> This is because two objects are compared by "o1 != o2" instead of
>> "o1.equals(o2)" at
>> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala#L408
>
>
> Even equals(...) does not do what you want on the JVM:
>
> scala> Array(1,2).equals(Array(1,2))
> res1: Boolean = false
>
>
>> (2) If (1) is expected, where is this behavior is described or defined? I
>> cannot find the description in the API document.
>> https://spark.apache.org/docs/1.6.1/api/java/org/apache/spark/sql/Row.html
>>
>> https://home.apache.org/~pwendell/spark-releases/spark-2.0.0-preview-docs/api/scala/index.html#org.apache.spark.sql.Row
>
>
> Pull requests for documentation welcome!
>
>
>> (3) If (3) is expected, is there any recommendation to write code of
>> equality of two Rows that have an Array or complex types (e.g. Map)?
>
>
> Internally for tests, we usually compare the string representation of the
> Row.
>



-- 
-Dhruve Ashar

Re: Question about equality of o.a.s.sql.Row

Posted by Michael Armbrust <mi...@databricks.com>.
>
> This is because two objects are compared by "o1 != o2" instead of
> "o1.equals(o2)" at
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala#L408


Even equals(...) does not do what you want on the JVM:

scala> Array(1,2).equals(Array(1,2))
res1: Boolean = false


> (2) If (1) is expected, where is this behavior is described or defined? I
> cannot find the description in the API document.
> https://spark.apache.org/docs/1.6.1/api/java/org/apache/spark/sql/Row.html
>
> https://home.apache.org/~pwendell/spark-releases/spark-2.0.0-preview-docs/api/scala/index.html#org.apache.spark.sql.Row


Pull requests for documentation welcome!


> (3) If (3) is expected, is there any recommendation to write code of
> equality of two Rows that have an Array or complex types (e.g. Map)?


Internally for tests, we usually compare the string representation of the
Row.