You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by da...@apache.org on 2015/09/17 19:02:19 UTC
spark git commit: [SPARK-10642] [PYSPARK] Fix crash when calling
rdd.lookup() on tuple keys
Repository: spark
Updated Branches:
refs/heads/master c88bb5df9 -> 136c77d8b
[SPARK-10642] [PYSPARK] Fix crash when calling rdd.lookup() on tuple keys
JIRA: https://issues.apache.org/jira/browse/SPARK-10642
When calling `rdd.lookup()` on a RDD with tuple keys, `portable_hash` will return a long. That causes `DAGScheduler.submitJob` to throw `java.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.Integer`.
Author: Liang-Chi Hsieh <vi...@appier.com>
Closes #8796 from viirya/fix-pyrdd-lookup.
Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/136c77d8
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/136c77d8
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/136c77d8
Branch: refs/heads/master
Commit: 136c77d8bbf48f7c45dd7c3fbe261a0476f455fe
Parents: c88bb5d
Author: Liang-Chi Hsieh <vi...@appier.com>
Authored: Thu Sep 17 10:02:15 2015 -0700
Committer: Davies Liu <da...@gmail.com>
Committed: Thu Sep 17 10:02:15 2015 -0700
----------------------------------------------------------------------
python/pyspark/rdd.py | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/spark/blob/136c77d8/python/pyspark/rdd.py
----------------------------------------------------------------------
diff --git a/python/pyspark/rdd.py b/python/pyspark/rdd.py
index 9ef60a7..ab5aab1 100644
--- a/python/pyspark/rdd.py
+++ b/python/pyspark/rdd.py
@@ -84,7 +84,7 @@ def portable_hash(x):
h ^= len(x)
if h == -1:
h = -2
- return h
+ return int(h)
return hash(x)
@@ -2192,6 +2192,9 @@ class RDD(object):
[42]
>>> sorted.lookup(1024)
[]
+ >>> rdd2 = sc.parallelize([(('a', 'b'), 'c')]).groupByKey()
+ >>> list(rdd2.lookup(('a', 'b'))[0])
+ ['c']
"""
values = self.filter(lambda kv: kv[0] == key).values()
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org