You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@spark.apache.org by da...@apache.org on 2015/09/17 19:02:19 UTC

spark git commit: [SPARK-10642] [PYSPARK] Fix crash when calling rdd.lookup() on tuple keys

Repository: spark
Updated Branches:
  refs/heads/master c88bb5df9 -> 136c77d8b


[SPARK-10642] [PYSPARK] Fix crash when calling rdd.lookup() on tuple keys

JIRA: https://issues.apache.org/jira/browse/SPARK-10642

When calling `rdd.lookup()` on a RDD with tuple keys, `portable_hash` will return a long. That causes `DAGScheduler.submitJob` to throw `java.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.Integer`.

Author: Liang-Chi Hsieh <vi...@appier.com>

Closes #8796 from viirya/fix-pyrdd-lookup.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/136c77d8
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/136c77d8
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/136c77d8

Branch: refs/heads/master
Commit: 136c77d8bbf48f7c45dd7c3fbe261a0476f455fe
Parents: c88bb5d
Author: Liang-Chi Hsieh <vi...@appier.com>
Authored: Thu Sep 17 10:02:15 2015 -0700
Committer: Davies Liu <da...@gmail.com>
Committed: Thu Sep 17 10:02:15 2015 -0700

----------------------------------------------------------------------
 python/pyspark/rdd.py | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/136c77d8/python/pyspark/rdd.py
----------------------------------------------------------------------
diff --git a/python/pyspark/rdd.py b/python/pyspark/rdd.py
index 9ef60a7..ab5aab1 100644
--- a/python/pyspark/rdd.py
+++ b/python/pyspark/rdd.py
@@ -84,7 +84,7 @@ def portable_hash(x):
         h ^= len(x)
         if h == -1:
             h = -2
-        return h
+        return int(h)
     return hash(x)
 
 
@@ -2192,6 +2192,9 @@ class RDD(object):
         [42]
         >>> sorted.lookup(1024)
         []
+        >>> rdd2 = sc.parallelize([(('a', 'b'), 'c')]).groupByKey()
+        >>> list(rdd2.lookup(('a', 'b'))[0])
+        ['c']
         """
         values = self.filter(lambda kv: kv[0] == key).values()
 


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org