You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Thomas Graves (JIRA)" <ji...@apache.org> on 2018/06/21 14:20:00 UTC

[jira] [Commented] (SPARK-24622) Task attempts in other stage attempts not killed when one task attempt succeeds

    [ https://issues.apache.org/jira/browse/SPARK-24622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16519416#comment-16519416 ] 

Thomas Graves commented on SPARK-24622:
---------------------------------------

Need to investigate further/test to make sure I am not missing anything

> Task attempts in other stage attempts not killed when one task attempt succeeds
> -------------------------------------------------------------------------------
>
>                 Key: SPARK-24622
>                 URL: https://issues.apache.org/jira/browse/SPARK-24622
>             Project: Spark
>          Issue Type: Bug
>          Components: Scheduler
>    Affects Versions: 2.1.0
>            Reporter: Thomas Graves
>            Priority: Major
>
> Looking through the code handling for [https://github.com/apache/spark/pull/21577,] I was looking to see how we are killing task attempts.  I don't any where that we actually kill task attempts for stage attempts not in the one that completed successfully.
>  
> For instance:
> stage 0.0 . (stage id 0, attempt 0)
>   - task 1.0 (task 1, attempt 0)
> Stage 0.1 (stage id 0, attempt 1) started due to fetch failure for instance
>   - task 1.0 (task 1, attempt 0) . Equivalent task for stage 0.0, task 1.0 because task 1.0 in stage 0.0 didn't finish and didn't fail.
>  
> Now if task 1.0 in stage 0.0 succeeds, it gets committed and marked as successful.  We will mark the task in stage 0.1 as completed but there is no where in the code that I see it actually kill task 1.0 in stage 0.1.
> Note that the scheduler does handle the case where we have 2 attempts (speculation) in a single stage attempt.  It will kill the other attempt when one of them succeeds.  See TaskSetManager.handleSuccessfulTask



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org