You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Yavgeni Hotimsky (JIRA)" <ji...@apache.org> on 2018/07/18 12:34:00 UTC

[jira] [Created] (SPARK-24848) When a stage fails onStageCompleted is called before onTaskEnd

Yavgeni Hotimsky created SPARK-24848:
----------------------------------------

             Summary: When a stage fails onStageCompleted is called before onTaskEnd
                 Key: SPARK-24848
                 URL: https://issues.apache.org/jira/browse/SPARK-24848
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 2.2.0
            Reporter: Yavgeni Hotimsky


It seems that when a stage fails because one of it's tasks failed too many times the onStageCompleted callback of the SparkListener is called before the onTaskEnd listener for the failing task. We're using structured streaming in this case.

We noticed this because we built a listener to track the precise number of active tasks per one of my processes to be exported as a metric and was using the stage callback to maintain a map from stage ids to some metadata extracted from the jobGroupId. The onStageCompleted listener was removing from the map to prevent unbounded memory and in this case I could see the onTaskEnd callback was being called after the onStageCompleted callback so it couldn't find the stageId in the map. We worked around it by replacing the map with a timed cache.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org