You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/08/28 22:12:10 UTC

[GitHub] [spark] srowen commented on a change in pull request #25598: [SPARK-28542][DOCS][WebUI] Stages Tab

srowen commented on a change in pull request #25598: [SPARK-28542][DOCS][WebUI] Stages Tab
URL: https://github.com/apache/spark/pull/25598#discussion_r318814341
 
 

 ##########
 File path: docs/web-ui.md
 ##########
 @@ -94,9 +94,76 @@ This page displays the details of a specific job identified by its job ID.
 </p>
 
 ## Stages Tab
+
 The Stages tab displays a summary page that shows the current state of all stages of all jobs in
-the Spark application, and, when you click on a stage, a details page for that stage. The details
-page shows the event timeline, DAG visualization, and all tasks for the stage.
+the Spark application.
+
+At the beginning of the page is the summary with the count of all stages by status (active, pending, completed, sikipped, and failed)
+
+<p style="text-align: center;">
+  <img src="img/AllStagesPageDetail1.png" title="Stages header" alt="Stages header" width="30%">
+</p>
+
+In [Fair scheduling mode](job-scheduling.html#scheduling-within-an-application) there is a table that displays [pools properties](job-scheduling.html#configuring-pool-properties)
+
+<p style="text-align: center;">
+  <img src="img/AllStagesPageDetail2.png" title="Pool properties" alt="Pool properties">
+</p>
+
+After that are the details of stages per status (active, pending, completed,skipped, failed). In active stages, it's possible to kill the stage with the kill button. Only in failure stages, failure reason is shown. There is  access to the task detail by clicking on the description.
+
+<p style="text-align: center;">
+  <img src="img/AllStagesPageDetail3.png" title="Stages detail" alt="Stages detail">
+</p>
+
+### Stage detail
+The summary is at the beginning of the page with information like Total time across all tasks, [Locality level summary](tuning.html#data-locality) , [Shuffle Read Size / Records](rdd-programming-guide.html#shuffle-operations) and Associated Job Ids.
+
+<p style="text-align: center;">
+  <img src="img/AllStagesPageDetail4.png" title="Stage header" alt="Stage header" width="30%">
+</p>
+
+There is also the visual representatión of the directed acyclic graph (DAG) of this stage, where vertices represent the RDDs or DataFrames and the edges represent an operation to be applied
+
+<p style="text-align: center;">
+  <img src="img/AllStagesPageDetail5.png" title="Stage DAG" alt="Stage DAG" width="50%">
+</p>
+
+Summary metrics for all task are represented in a table and in a timeline
+* **[Tasks deserialization time](configuration.html#compression-and-serialization)**
+* **Duration of tasks**
+* **GC time**
+* **Result serialization time** is the time spent serializing the task result on a executor before sending it back to the driver
+* **Getting result time** is the time that the driver spends fetching task results from workers
+* **Scheduler delay** includes the time to ship the task from the scheduler to executors, and the time to send the task result from the executors to the scheduler
 
 Review comment:
   Isn't this the time the task waited to schedule for execution?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org