You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by JoshRosen <gi...@git.apache.org> on 2014/12/25 06:42:39 UTC

[GitHub] spark pull request: [SPARK-3847] Raise exception when hashing Java...

GitHub user JoshRosen opened a pull request:

    https://github.com/apache/spark/pull/3795

    [SPARK-3847] Raise exception when hashing Java enums

    This patch modifies Spark to throw exceptions when attempting to hash-partition Java Enums.  Java Enums' hashCodes are machine/JVM-dependent, so it is unsafe to compare enum hashCodes generated in different JVMs; this means that we can't partition RDDs with enum keys.
    
    The fix here is based on a similar fix which prevents Java arrays from being used as keys when repartitioning (https://github.com/mesos/spark/pull/348): at RDD definition time, use reflection to check whether the key class is an enumeration and throw a warning if a HashPartitioner is being used.  We do not throw a warning if some other partitioner is used because, in principle, that partitioner could have custom logic for properly handling enums (e.g. by calling `toString()` and using the string's hashcode).  There are some corner-cases that this will miss (such as enums that are nested in other objects, like pairs of enums), but I think that this may be the best that we can do without adding a per-record performance overhead or marking changes to the shuffle code.
    
    In case we have to add similar error checks in the future, I've factored the logic into a helper function in the `Partitioner` object.  I also improved the warning messages to reference the relevant JIRA issues.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/JoshRosen/spark SPARK-3847

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/3795.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3795
    
----
commit 4ae4efca4d5f1dbe5207c71702cc46c943b23b56
Author: Josh Rosen <jo...@databricks.com>
Date:   2014-12-25T02:50:21Z

    [SPARK-3847] Raise exception when hashing Java enums

commit c41483d035ebf3b5b20acfb244c67b8bbea24412
Author: Josh Rosen <jo...@databricks.com>
Date:   2014-12-25T05:30:13Z

    Minor documentation fixes

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3847] Raise exception when hashing Java...

Posted by JoshRosen <gi...@git.apache.org>.

Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/3795#issuecomment-68090285
  
    There are a few alternative approaches that we could consider for adding warnings / errors, but they run up against various roadblocks:
    
    - Ideally, we could move the `Partitioner.assertPartitionerSupportsKeyClass` call to the `ShuffledRDD` constructor.  This would ensure that new transformations benefit from these checks, too, and minimizes the risk that someone forgets to add the call.  Unfortunately, ShuffledRDD does not have access to the key and value ClassManifests.  We can't change ShuffledRDD to capture ClassManifests because that would break binary compatibility for a class that's marked `@DeveloperApi`.  TypeTags would also fix this issue, but we can't use those because they would introduce binary compatibilities in public APIs.
    - We could move the checks for array/enum keys to the `PairRDDFunctions` constructor, but at that point we don't know which `Partitioner` class is being used.  If we threw an error, this would break any user code which relied on a custom `Partitioner` subclass to properly partition array / enum keys.
    - We could move these checks to runtime by reflecting on the first element of each partition's iterator.  This works around some erasure issues, but it may be difficult to implement: there's not a great way to peek at the first element of a Scala iterator, so we'd have to either wrap the iterator to let us push the first element back, or move the checking logic into the low-level code that consumes the iterator.  These iterators are consumed in the hash- and sort-shuffle writers, both of which are already fairly complicated, so I'm concerned that this would be an invasive change.
    
    There might be an argument in favor of never throwing warnings and only logging ERROR-level messages when using these key classes.  I think that we should definitely throw an exception whenever we see `HashPartitioner` being used with classes for which we know it has the wrong behavior, since this is probably the case that users are most likely to run into (I don't think many people use custom partitioners) and they might ignore log messages (e.g. if they're running in an interactive notebook).
    
    I think a reasonable compromise is to log a WARNING-level message in the `PairRDDFunctions` constructor whenever we see an array / enum key class, and keep the current exception-throwing logic for the `HashPartitioner` case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3847] Raise exception when hashing Java...

Posted by JoshRosen <gi...@git.apache.org>.

Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/3795#issuecomment-68088438
  
    We should probably add a warning about this in the documentation, too, although I'm not sure what would be the best place.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3847] Raise exception when hashing Java...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/3795#issuecomment-68095313
  
      [Test build #24813 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24813/consoleFull) for   PR 3795 at commit [`09d837f`](https://github.com/apache/spark/commit/09d837f0f933a43cd7e2e1b8d2befec0f6516e6b).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3847] Raise exception when hashing Java...

Posted by aarondav <gi...@git.apache.org>.

Github user aarondav commented on the pull request:

    https://github.com/apache/spark/pull/3795#issuecomment-68088854
  
    In general, it seems that it would be better not to rely on hashCode
    directly in spark, but rather have some layer of indirection, similar to
    nonNegativeHashCode. After all, the ordinal of an enum seems like it would
    make a reasonable hash code.
    On Dec 24, 2014 9:47 PM, "Apache Spark QA" <no...@github.com> wrote:
    
    > Test build #24809 has started
    > <https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24809/consoleFull>
    > for PR 3795 at commit c41483d
    > <https://github.com/apache/spark/commit/c41483d035ebf3b5b20acfb244c67b8bbea24412>
    > .
    >
    >    - This patch merges cleanly.
    >
    > —
    > Reply to this email directly or view it on GitHub
    > <https://github.com/apache/spark/pull/3795#issuecomment-68088533>.
    >


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3847] Use portable hashcode for Java en...

Posted by JoshRosen <gi...@git.apache.org>.

Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/3795#discussion_r22552583
  
    --- Diff: core/src/main/scala/org/apache/spark/rdd/OrderedRDDFunctions.scala ---
    @@ -72,6 +72,8 @@ class OrderedRDDFunctions[K : Ordering : ClassTag,
        * because it can push the sorting down into the shuffle machinery.
        */
       def repartitionAndSortWithinPartitions(partitioner: Partitioner): RDD[(K, V)] = {
    +    val keyClass = implicitly[ClassTag[K]].runtimeClass
    +    Partitioner.assertHashCodeIsWellBehaved(keyClass)
    --- End diff --
    
    I think the concern here is that there's other code which might call `hashCode` on the items, so even if we use a custom partitioner we will still get incorrect results for Java arrays.  I think that this is safe as long as we can guarantee that the sorting phase doesn't call `hashCode`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3847] Raise exception when hashing Java...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/3795#issuecomment-68092463
  
      [Test build #24813 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24813/consoleFull) for   PR 3795 at commit [`09d837f`](https://github.com/apache/spark/commit/09d837f0f933a43cd7e2e1b8d2befec0f6516e6b).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3847] Raise exception when hashing Java...

Posted by JoshRosen <gi...@git.apache.org>.

Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/3795#issuecomment-68092486
  
    I've pushed a commit which adds warnings in the PairRDDFunctions constructor.  Here's what that looks like in the REPL:
    
    ```scala
    scala> arrayPairs.groupByKey()
    14/12/24 23:48:19 WARN PairRDDFunctions: Using arrays as keys may lead to incorrect results (see SPARK-597)
    org.apache.spark.SparkException: HashPartitioner cannot partition array keys (see SPARK-597)
    	at org.apache.spark.Partitioner$.assertPartitionerSupportsKeyClass(Partitioner.scala:84)
    	at org.apache.spark.rdd.PairRDDFunctions.combineByKey(PairRDDFunctions.scala:91)
    	at org.apache.spark.rdd.PairRDDFunctions.groupByKey(PairRDDFunctions.scala:452)
    [...]
    ```
    
    In this case, we first get the warning when the constructor is called, and get the exception because the default HashPartitioner was used.
    
    Here's a case where we run successfully, because the method does not try to hash the keys, but still log a warning:
    
    ```scala
    scala> arrayPairs.mapValues(x => 2 * x).values
    14/12/24 23:52:44 WARN PairRDDFunctions: Using arrays as keys may lead to incorrect results (see SPARK-597)
    14/12/24 23:52:44 WARN PairRDDFunctions: Using arrays as keys may lead to incorrect results (see SPARK-597)
    res0: org.apache.spark.rdd.RDD[Int] = MapPartitionsRDD[3] at values at <console>:15
    ```
    
    In this case, the PairRDDFunctions constructor was invoked twice, hence two warnings.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3847] Use portable hashcode for Java en...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/3795#issuecomment-68118999
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24825/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3847] Use portable hashcode for Java en...

Posted by JoshRosen <gi...@git.apache.org>.

Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/3795#issuecomment-68433103
  
    For Enums, this patch seems like a strict improvement over the status quo.  The strengthening of the array checks is the only potentially controversial change, but I think it's extremely unlikely to break user programs (it could only affect users who tried to use CombineByKey with array keys and a custom serializer, which seems like an unlikely use case); besides, any program that this breaks was likely giving the wrong answer / results, so it's better to fail loudly.
    
    I guess there are still a few cases that could slip through the cracks:
    
    - Java users who use custom serializers
    - Cases where the Java API uses the wrong manifest and can't tell that we've passed an array.
    
    I think both of these cases can only be detected with runtime-checks on the first record being shuffled.  Maybe we should add those as part of a separate PR, though, if we think they're worthwhile.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3847] Raise exception when hashing Java...

Posted by JoshRosen <gi...@git.apache.org>.

Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/3795#issuecomment-68089376
  
    @aarondav If we did that, we'd have to be careful to do the specialization statically; we end up calling `getPartition()` a ton of times, so an approach that used reflection for each record would be prohibitively expensive.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3847] Use portable hashcode for Java en...

Posted by aarondav <gi...@git.apache.org>.

Github user aarondav commented on a diff in the pull request:

    https://github.com/apache/spark/pull/3795#discussion_r22548022
  
    --- Diff: core/src/main/scala/org/apache/spark/rdd/OrderedRDDFunctions.scala ---
    @@ -72,6 +72,8 @@ class OrderedRDDFunctions[K : Ordering : ClassTag,
        * because it can push the sorting down into the shuffle machinery.
        */
       def repartitionAndSortWithinPartitions(partitioner: Partitioner): RDD[(K, V)] = {
    +    val keyClass = implicitly[ClassTag[K]].runtimeClass
    +    Partitioner.assertHashCodeIsWellBehaved(keyClass)
    --- End diff --
    
    Technically, this method may work even though the hash code is not well-behaved, if the partitioner is not a HashPartitioner, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3847] Raise exception when hashing Java...

Posted by aarondav <gi...@git.apache.org>.

Github user aarondav commented on the pull request:

    https://github.com/apache/spark/pull/3795#issuecomment-68105479
  
    To note, my suggested solution would look more like this in HashPartitioner:
    
    ```scala
      def getPartition(key: Any): Int = key match {
        case null => 0
        case enum: Enum[_] => Utils.nonNegativeMod(enum.ordinal(), numPartitions)
        case _ => Utils.nonNegativeMod(key.hashCode, numPartitions)
      }
    ```
    
    This does not require reflection and Java is fast at doing instanceof checks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3847] Use portable hashcode for Java en...

Posted by JoshRosen <gi...@git.apache.org>.

Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/3795#issuecomment-68813288
  
    @aarondav @pwendell Does one of you want to do the final sign-off on this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3847] Use portable hashcode for Java en...

Posted by JoshRosen <gi...@git.apache.org>.

Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/3795#issuecomment-68117310
  
    I just realized that some of this error-checking for array might not work for Java API users due to type erasure / fake class manifests.  If that's the case, we might want to just move the check to runtime in HashPartitioner and throw an exception.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3847] Raise exception when hashing Java...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/3795#issuecomment-68091127
  
      [Test build #24809 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24809/consoleFull) for   PR 3795 at commit [`c41483d`](https://github.com/apache/spark/commit/c41483d035ebf3b5b20acfb244c67b8bbea24412).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3847] Raise exception when hashing Java...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/3795#issuecomment-68095317
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24813/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3847] Raise exception when hashing Java...

Posted by JoshRosen <gi...@git.apache.org>.

Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/3795#issuecomment-68117045
  
    Alright, I've updated this to support Enums as @aarondav has described and have strengthened the array error-checking to prohibit most uses of arrays as keys in PairRDDFunctions, even when using a custom partitioner.  In order to properly handle those cases, we would need to make sure that the hashmaps that we use for aggregation will perform special-case hashing of arrays.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3847] Raise exception when hashing Java...

Posted by JoshRosen <gi...@git.apache.org>.

Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/3795#issuecomment-68116658
  
    @aarondav Good idea, I'll make that change.
    
    Note that we can't do a similar fix for arrays: many PairRDDFunctions methods rely on being able to use keys to index into hashmaps, and that will involve the arrays' Object.hashCode.  Therefore, we should probably strengthen the warnings for array into errors, since there's a high likelihood that users will get incorrect results.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3847] Raise exception when hashing Java...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/3795#issuecomment-68088533
  
      [Test build #24809 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24809/consoleFull) for   PR 3795 at commit [`c41483d`](https://github.com/apache/spark/commit/c41483d035ebf3b5b20acfb244c67b8bbea24412).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3847] Raise exception when hashing Java...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/3795#issuecomment-68117017
  
      [Test build #24825 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24825/consoleFull) for   PR 3795 at commit [`1cd87e0`](https://github.com/apache/spark/commit/1cd87e051c72ff7c17ee7c3aa8b9fd507167cdad).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3847] Raise exception when hashing Java...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/3795#issuecomment-68091128
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24809/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3847] Use portable hashcode for Java en...

Posted by JoshRosen <gi...@git.apache.org>.

Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/3795#issuecomment-68964841
  
    Based on some offline with @aarondav, I'm going to re-work this to not allow Enums to be used, since the current behavior in this PR is misleading: we can automatically do the right thing for Enums used directly as keys, but cases like pairs of enums will still give incorrect output.  Throwing a warning for the cases that we can safely detect is a good opportunity to educate users.  I'll also add some loud warnings to the documentation if I can find a good place.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3847] Use portable hashcode for Java en...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/3795#issuecomment-68118998
  
      [Test build #24825 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24825/consoleFull) for   PR 3795 at commit [`1cd87e0`](https://github.com/apache/spark/commit/1cd87e051c72ff7c17ee7c3aa8b9fd507167cdad).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3847] Raise exception when hashing Java...

Posted by srowen <gi...@git.apache.org>.

Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/3795#issuecomment-68095076
  
    +1 LGTM. I remember this came up at least once, so good to guard against it directly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-3847] Use portable hashcode for Java en...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/3795#issuecomment-75147555
  
    As discussed offline, we will extend the existing checks against using Arrays as keys to using Enums as well, instead of trying to manually fix this specific case. Otherwise we may give the user a false sense of hope if they later try to use, say, tuples of Enums as keys, which won't work for the same reason.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org