You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2019/05/21 04:23:13 UTC

[jira] [Updated] (SPARK-15845) Expose metrics for sub-stage transformations and action

     [ https://issues.apache.org/jira/browse/SPARK-15845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon updated SPARK-15845:
---------------------------------
    Labels: bulk-closed  (was: )

> Expose metrics for sub-stage transformations and action 
> --------------------------------------------------------
>
>                 Key: SPARK-15845
>                 URL: https://issues.apache.org/jira/browse/SPARK-15845
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 1.5.2
>            Reporter: nirav patel
>            Priority: Major
>              Labels: bulk-closed
>
> Spark optimizes DAG processing by efficiently selecting stage boundaries.  This makes spark stage a sequence of multiple transformation and one or zero action. As Aa result stage that spark is currently running can be internally series of (map -> shuffle -> map -> map -> collect) Notice here that it goes pass shuffle dependency and includes the next transformations and actions into same stage. So any task of this stage is essentially doing all those transformation/actions as a Unit and there is no further visibility inside it. Basically network read, populating partitions, compute, shuffle write, shuffle read, compute, writing final partitions to disk ALL happens within one stage! Means all tasks of that stage is basically doing all those operations on single partition as a unit. This takes away huge visibility into users transformation and actions in terms of which one is taking longer or which one is resource bottleneck and which one is failing.
> spark UI just shows its currently running some action stage. If job fails at that point spark UI just says Action failed but in fact it could be any stage in that lazy chain of evaluation. Looking at executor logs gives some insights but that's not always straightforward. 
> I think we need more visibility into what's happening underneath a task (series of spark transformations/actions that comprise a stage) so we can easily troubleshoot as well as find bottlenecks and optimize our DAG.  
> PS - Had a positive feedback about this from DataBricks dev team member at SparkSummit. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org