You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Brett Stime (JIRA)" <ji...@apache.org> on 2016/06/15 20:10:09 UTC

[jira] [Commented] (SPARK-3847) Enum.hashCode is only consistent within the same JVM

    [ https://issues.apache.org/jira/browse/SPARK-3847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15332453#comment-15332453 ] 

Brett Stime commented on SPARK-3847:
------------------------------------

Seems like, rather than a warning about the specifics of enums, the real fix (as mentioned in the highest voted answer to the question posted in the description--http://stackoverflow.com/a/4885292/93345) is to stop comparing hashCodes across distinct JVMs. In the worst case, perhaps the underlying keys should be deserialized in the target JVM and have their hashCodes recomputed. Seems like it should alternately work to create an implementation of hashCode and equals for the serialized bytes.

> Enum.hashCode is only consistent within the same JVM
> ----------------------------------------------------
>
>                 Key: SPARK-3847
>                 URL: https://issues.apache.org/jira/browse/SPARK-3847
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.1.0
>         Environment: Oracle JDK 7u51 64bit on Ubuntu 12.04
>            Reporter: Nathan Bijnens
>              Labels: enum
>
> When using java Enum's as key in some operations the results will be very unexpected. The issue is that the Java Enum.hashCode returns the memoryposition, which is different on each JVM. 
> {code}
> messages.filter(_.getHeader.getKind == Kind.EVENT).count
> >> 503650
> val tmp = messages.filter(_.getHeader.getKind == Kind.EVENT)
> tmp.map(_.getHeader.getKind).countByValue
> >> Map(EVENT -> 1389)
> {code}
> Because it's actually a JVM issue we either should reject with an error enums as key or implement a workaround.
> A good writeup of the issue can be found here (and a workaround):
> http://dev.bizo.com/2014/02/beware-enums-in-spark.html
> Somewhat more on the hash codes and Enum's:
> https://stackoverflow.com/questions/4885095/what-is-the-reason-behind-enum-hashcode
> And some issues (most of them rejected) at the Oracle Bug Java database:
> - http://bugs.java.com/bugdatabase/view_bug.do?bug_id=8050217
> - http://bugs.java.com/bugdatabase/view_bug.do?bug_id=7190798



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org