You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (Jira)" <ji...@apache.org> on 2019/10/08 05:42:15 UTC

[jira] [Resolved] (SPARK-24848) When a stage fails onStageCompleted is called before onTaskEnd

     [ https://issues.apache.org/jira/browse/SPARK-24848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon resolved SPARK-24848.
----------------------------------
    Resolution: Incomplete

> When a stage fails onStageCompleted is called before onTaskEnd
> --------------------------------------------------------------
>
>                 Key: SPARK-24848
>                 URL: https://issues.apache.org/jira/browse/SPARK-24848
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.2.0
>            Reporter: Yavgeni Hotimsky
>            Priority: Minor
>              Labels: bulk-closed
>
> It seems that when a stage fails because one of it's tasks failed too many times the onStageCompleted callback of the SparkListener is called before the onTaskEnd listener for the failing task. We're using structured streaming in this case.
> We noticed this because we built a listener to track the precise number of active tasks to be exported as a metric and was using the stage callback to maintain a map from stage ids to some metadata extracted from the jobGroupId. The onStageCompleted listener was removing from the map to prevent unbounded memory usage and in this case I could see the onTaskEnd callback was being called after the onStageCompleted callback so it couldn't find the stageId in the map. We worked around it by replacing the map with a timed cache.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org