You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Jackey Lee (Jira)" <ji...@apache.org> on 2022/01/06 16:38:00 UTC

[jira] [Created] (SPARK-37831) Add task partition id in metrics

Jackey Lee created SPARK-37831:
----------------------------------

             Summary: Add task partition id in metrics
                 Key: SPARK-37831
                 URL: https://issues.apache.org/jira/browse/SPARK-37831
             Project: Spark
          Issue Type: Improvement
          Components: Spark Core
    Affects Versions: 3.2.1, 3.3.0
            Reporter: Jackey Lee


There is no partition id in current metrics, it makes difficult to trace stage metrics, such as stage shuffle read, especially when there are stage retries. It is also impossible to check task metrics between different applications.
{code:java}
class TaskData private[spark](
    val taskId: Long,
    val index: Int,
    val attempt: Int,
    val launchTime: Date,
    val resultFetchStart: Option[Date],
    @JsonDeserialize(contentAs = classOf[JLong])
    val duration: Option[Long],
    val executorId: String,
    val host: String,
    val status: String,
    val taskLocality: String,
    val speculative: Boolean,
    val accumulatorUpdates: Seq[AccumulableInfo],
    val errorMessage: Option[String] = None,
    val taskMetrics: Option[TaskMetrics] = None,
    val executorLogs: Map[String, String],
    val schedulerDelay: Long,
    val gettingResultTime: Long) {code}
Adding partitionId in Task Data can not only make us easy to trace task metrics, also can make it possible to collect metrics for actual stage outputs, especially when stage retries.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org