You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Ryan Blue (JIRA)" <ji...@apache.org> on 2018/06/28 19:11:00 UTC

[jira] [Resolved] (SPARK-24684) DAGScheduler reports the wrong attempt number to the commit coordinator

     [ https://issues.apache.org/jira/browse/SPARK-24684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ryan Blue resolved SPARK-24684.
-------------------------------
    Resolution: Not A Problem

Closing this. In master, the attempt number is still used. Looks like this was just backported incorrectly by me.

> DAGScheduler reports the wrong attempt number to the commit coordinator
> -----------------------------------------------------------------------
>
>                 Key: SPARK-24684
>                 URL: https://issues.apache.org/jira/browse/SPARK-24684
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core, SQL
>    Affects Versions: 2.1.3, 2.3.2
>            Reporter: Ryan Blue
>            Priority: Major
>
> SPARK-24552 changes writers to pass the task ID to the output coordinator so that the coordinator tracks each task uniquely because attempt numbers can be reused across stage attempts. However, the DAGScheduler still passes the attempt number when notifying the coordinator that a task has finished. The result is that when a task is authorized and then fails due to OOM or a similar error, the scheduler is notified but doesn't remove the commit authorization because the attempt number doesn't match. This causes infinite task retries.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org