You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Imran Rashid (JIRA)" <ji...@apache.org> on 2018/02/05 18:44:01 UTC

[jira] [Commented] (SPARK-20087) Include accumulators / taskMetrics when sending TaskKilled to onTaskEnd listeners

    [ https://issues.apache.org/jira/browse/SPARK-20087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16352750#comment-16352750 ] 

Imran Rashid commented on SPARK-20087:
--------------------------------------

cc [~holden.karau@gmail.com]

I think this makes sense, as accumulators are already biased towards counting based on the *computation* done, not based on the *data* itself, wrt retries, failures, etc.  But wanted to get some more thoughts, as this is, in a way, a breaking change in the accumulator api.  I'd consider the behavior before a bug so its OK in my book.

> Include accumulators / taskMetrics when sending TaskKilled to onTaskEnd listeners
> ---------------------------------------------------------------------------------
>
>                 Key: SPARK-20087
>                 URL: https://issues.apache.org/jira/browse/SPARK-20087
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 2.1.0
>            Reporter: Charles Lewis
>            Priority: Major
>
> When tasks end due to an ExceptionFailure, subscribers to onTaskEnd receive accumulators / task metrics for that task, if they were still available. These metrics are not currently sent when tasks are killed intentionally, such as when a speculative retry finishes, and the original is killed (or vice versa). Since we're killing these tasks ourselves, these metrics should almost always exist, and we should treat them the same way as we treat ExceptionFailures.
> Sending these metrics with the TaskKilled end reason makes aggregation across all tasks in an app more accurate. This data can inform decisions about how to tune the speculation parameters in order to minimize duplicated work, and in general, the total cost of an app should include both successful and failed tasks, if that information exists.
> PR: https://github.com/apache/spark/pull/17422



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org