You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Kay Ousterhout (JIRA)" <ji...@apache.org> on 2014/11/06 08:59:35 UTC
[jira] [Commented] (SPARK-1498) Spark can hang if pyspark tasks
fail
[ https://issues.apache.org/jira/browse/SPARK-1498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14199968#comment-14199968 ]
Kay Ousterhout commented on SPARK-1498:
---------------------------------------
I closed this since 0.9 seems pretty ancient now.
> Spark can hang if pyspark tasks fail
> ------------------------------------
>
> Key: SPARK-1498
> URL: https://issues.apache.org/jira/browse/SPARK-1498
> Project: Spark
> Issue Type: Bug
> Components: PySpark
> Affects Versions: 0.9.0, 0.9.1, 0.9.2
> Reporter: Kay Ousterhout
> Fix For: 1.0.0
>
>
> In pyspark, when some kinds of jobs fail, Spark hangs rather than returning an error. This is partially a scheduler problem -- the scheduler sometimes thinks failed tasks succeed, even though they have a stack trace and exception.
> You can reproduce this problem with:
> ardd = sc.parallelize([(1,2,3), (4,5,6)])
> brdd = sc.parallelize([(1,2,6), (4,5,9)])
> ardd.join(brdd).count()
> The last line will run forever (the problem in this code is that the RDD entries have 3 values instead of the expected 2). I haven't verified if this is a problem for 1.0 as well as 0.9.
> Thanks to Shivaram for helping diagnose this issue!
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org