You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2015/09/24 22:44:04 UTC

[jira] [Updated] (SPARK-10796) The Stage taskSets may are all removed while stage still have pending partitions after having lost some executors

     [ https://issues.apache.org/jira/browse/SPARK-10796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen updated SPARK-10796:
------------------------------
    Component/s: Scheduler

> The Stage taskSets may are all removed while stage still have pending partitions after having lost some executors
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-10796
>                 URL: https://issues.apache.org/jira/browse/SPARK-10796
>             Project: Spark
>          Issue Type: Bug
>          Components: Scheduler
>    Affects Versions: 1.3.0
>            Reporter: SuYan
>            Priority: Minor
>
> We meet that problem in Spark 1.3.0, and I also check the lastest Spark code. and I think that problem still exist.
> 1. while a stage occurs fetchFailed, then will new resubmit the running stage, and mark previous stage as zombie.
> 2. if there have a executor lost, the zombie taskset may lost the results of already successful tasks. In Current code, it will resubmit, but it useless because it is zombie, will not be scheduler again.
> so if the active taskset and zombie taskset all finished the task in `runningtasks`, Spark will think they are finished.  but the running Stage still have pending partitions. so it will be hang....because no logical to re-run this pending partitions.
> Driver logical is complicated, it will be helpful if any one will check that 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org