You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2019/05/21 04:21:24 UTC

[jira] [Updated] (SPARK-9739) Execution visualizer

     [ https://issues.apache.org/jira/browse/SPARK-9739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon updated SPARK-9739:
--------------------------------
    Labels: bulk-closed  (was: )

> Execution visualizer
> --------------------
>
>                 Key: SPARK-9739
>                 URL: https://issues.apache.org/jira/browse/SPARK-9739
>             Project: Spark
>          Issue Type: Improvement
>          Components: Web UI
>            Reporter: Zoltán Zvara
>            Priority: Major
>              Labels: bulk-closed
>
> Apache Spark, especially its user interface provided by the Web UI component lacks a tool that helps to understand the physical plan of the task scheduler and the possibility to monitor execution at a very low level, along with the communication triggered by data-flow and remote block-requests. We propose a tool that would allow users real-time monitoring and later to replay, examine job executions on any cluster currently supported by Spark.
> The visualizer we implement would allow users to monitor Spark program’s data-flow at task level during execution in the current web user interface provided by the master. One would be able to see where executors, tasks get deployed on the cluster, along with communication triggered by tasks on a representative graph.
> For this, we minimally modify Spark’s core to be able to collect information related to block requests. Slight modification and evident refactoring impacts the tasks’ code to allow reporting of execution state to the driver’s monitoring object, which has been added to SparkContext. Most aspect of the proposed module are configurable.
> Our execution-visualizer would not raise any measurable performance impact on Spark programs, but would introduce the following benefits.
> *Benefits*
> We think the execution-visualizer would give the following benefits to end-users:
> - understand the execution mechanism of Spark and demonstrate how executors, tasks work internally, which would attract new users;
> - provided by the advanced visual monitoring of programs, the ability to discover issues of executors and tasks in a more detailed and convenient way;
> - the possibility to highlight inefficient communication patterns of certain workflows, that would add insight to advanced optimization strategies.
> *Implementation*
> We modified tasks to send more detailed information to the driver before and after their effective work, which we collect as JSON on the driver’s file system. The logs would be read on every interval by the visualizer written using the D3 JavaScript library. The visualizer would provide the following main features:
> - dynamically show hosts, executors, tasks currently running and finishing in a graph;
> - show critical and additional backend information related to hosts, executors (along with available resources);
> - show useful information about running tasks: RDD and split to compute, dependencies, stages and others;
> - show failed executors and tasks;
> - show task metrics and provide multiple ways to summarize;
> - show communication as directed edges between executors in form of block requests;
> - let the user to replay executions in a different speed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org