You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by LucaCanali <gi...@git.apache.org> on 2018/08/21 07:35:08 UTC
[GitHub] spark pull request #22167: [SPARK-25170][DOC] Add list and short description...
GitHub user LucaCanali opened a pull request:
https://github.com/apache/spark/pull/22167
[SPARK-25170][DOC] Add list and short description of Spark Executor Task Metrics to the documentation
## What changes were proposed in this pull request?
Add description of Task Metrics to the documentation.
## How was this patch tested?
None.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/LucaCanali/spark docMonitoringTaskMetrics
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/22167.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #22167
----
commit a8db1605adbc271c785fda24b4945bf87149a4cd
Author: LucaCanali <lu...@...>
Date: 2018-08-20T14:12:52Z
Document Spark Executor Task Metrics
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22167: [SPARK-25170][DOC] Add list and short description of Spa...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22167
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22167: [SPARK-25170][DOC] Add list and short description...
Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:
https://github.com/apache/spark/pull/22167#discussion_r215315657
--- Diff: docs/monitoring.md ---
@@ -388,6 +388,163 @@ value triggering garbage collection on jobs, and `spark.ui.retainedStages` that
Note that the garbage collection takes place on playback: it is possible to retrieve
more entries by increasing these values and restarting the history server.
+### Executor Task Metrics
+
+The REST API exposes the values of the Task Metrics collected by Spark executors at the
+task execution level. The metrics can be used for performance troubleshooting.
+A list of the available metrics with a short description:
+
+<table class="table">
+ <tr><th>Spark Executor Task Metric name</th>
+ <th>Short description</th>
+ </tr>
+ <tr>
+ <td>executorRunTime</td>
+ <td>Time the executor spent running this task. This includes time fetching shuffle data.
+ The value is expressed in milliseconds.</td>
+ </tr>
+ <tr>
+ <td>executorCpuTime
+ <td>CPU Time the executor spent running this task. This includes time fetching shuffle data.
+ The value is expressed in nanoseconds.
+ </tr>
+ <tr>
+ <td>executorDeserializeTime</td>
+ <td>Time taken on the executor to deserialize this task.
+ The value is expressed in milliseconds.</td>
+ </tr>
+ <tr>
+ <td>executorDeserializeCpuTime</td>
+ <td>CPU Time taken on the executor to deserialize this task.
+ The value is expressed in nanoseconds.</td>
+ </tr>
+ <tr>
+ <td>resultSize</td>
+ <td>The number of bytes this task transmitted back to the driver as the TaskResult.</td>
+ </tr>
+ <tr>
+ <td>jvmGCTime</td>
+ <td>Amount of time the JVM spent in garbage collection while executing this task.
--- End diff --
Why do we start with `amount of` while the above parameters start `Time` or `CPU Time`?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22167: [SPARK-25170][DOC] Add list and short description of Spa...
Posted by LucaCanali <gi...@git.apache.org>.
Github user LucaCanali commented on the issue:
https://github.com/apache/spark/pull/22167
Thanks @kiszk for reviewing this. I have addressed your comments in a new commit +
apologies as I have now moved this to a new PR https://github.com/apache/spark/pull/22397
I am closing this to avoid confusion.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22167: [SPARK-25170][DOC] Add list and short description of Spa...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22167
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22167: [SPARK-25170][DOC] Add list and short description...
Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:
https://github.com/apache/spark/pull/22167#discussion_r215314710
--- Diff: docs/monitoring.md ---
@@ -388,6 +388,163 @@ value triggering garbage collection on jobs, and `spark.ui.retainedStages` that
Note that the garbage collection takes place on playback: it is possible to retrieve
more entries by increasing these values and restarting the history server.
+### Executor Task Metrics
+
+The REST API exposes the values of the Task Metrics collected by Spark executors at the
+task execution level. The metrics can be used for performance troubleshooting.
+A list of the available metrics with a short description:
+
+<table class="table">
+ <tr><th>Spark Executor Task Metric name</th>
+ <th>Short description</th>
+ </tr>
+ <tr>
+ <td>executorRunTime</td>
+ <td>Time the executor spent running this task. This includes time fetching shuffle data.
+ The value is expressed in milliseconds.</td>
+ </tr>
+ <tr>
+ <td>executorCpuTime
--- End diff --
Do we miss `<td>` in these two lines?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22167: [SPARK-25170][DOC] Add list and short description...
Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:
https://github.com/apache/spark/pull/22167#discussion_r215315711
--- Diff: docs/monitoring.md ---
@@ -388,6 +388,163 @@ value triggering garbage collection on jobs, and `spark.ui.retainedStages` that
Note that the garbage collection takes place on playback: it is possible to retrieve
more entries by increasing these values and restarting the history server.
+### Executor Task Metrics
+
+The REST API exposes the values of the Task Metrics collected by Spark executors at the
+task execution level. The metrics can be used for performance troubleshooting.
+A list of the available metrics with a short description:
+
+<table class="table">
+ <tr><th>Spark Executor Task Metric name</th>
+ <th>Short description</th>
+ </tr>
+ <tr>
+ <td>executorRunTime</td>
+ <td>Time the executor spent running this task. This includes time fetching shuffle data.
+ The value is expressed in milliseconds.</td>
+ </tr>
+ <tr>
+ <td>executorCpuTime
+ <td>CPU Time the executor spent running this task. This includes time fetching shuffle data.
+ The value is expressed in nanoseconds.
+ </tr>
+ <tr>
+ <td>executorDeserializeTime</td>
+ <td>Time taken on the executor to deserialize this task.
+ The value is expressed in milliseconds.</td>
+ </tr>
+ <tr>
+ <td>executorDeserializeCpuTime</td>
+ <td>CPU Time taken on the executor to deserialize this task.
+ The value is expressed in nanoseconds.</td>
+ </tr>
+ <tr>
+ <td>resultSize</td>
+ <td>The number of bytes this task transmitted back to the driver as the TaskResult.</td>
+ </tr>
+ <tr>
+ <td>jvmGCTime</td>
+ <td>Amount of time the JVM spent in garbage collection while executing this task.
+ The value is expressed in milliseconds.</td>
+ </tr>
+ <tr>
+ <td>resultSerializationTime</td>
+ <td>Amount of time spent serializing the task result.
--- End diff --
ditto
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22167: [SPARK-25170][DOC] Add list and short description of Spa...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22167
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22167: [SPARK-25170][DOC] Add list and short description...
Posted by LucaCanali <gi...@git.apache.org>.
Github user LucaCanali closed the pull request at:
https://github.com/apache/spark/pull/22167
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22167: [SPARK-25170][DOC] Add list and short description...
Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:
https://github.com/apache/spark/pull/22167#discussion_r215315965
--- Diff: docs/monitoring.md ---
@@ -388,6 +388,163 @@ value triggering garbage collection on jobs, and `spark.ui.retainedStages` that
Note that the garbage collection takes place on playback: it is possible to retrieve
more entries by increasing these values and restarting the history server.
+### Executor Task Metrics
+
+The REST API exposes the values of the Task Metrics collected by Spark executors at the
+task execution level. The metrics can be used for performance troubleshooting.
+A list of the available metrics with a short description:
+
+<table class="table">
+ <tr><th>Spark Executor Task Metric name</th>
+ <th>Short description</th>
+ </tr>
+ <tr>
+ <td>executorRunTime</td>
+ <td>Time the executor spent running this task. This includes time fetching shuffle data.
+ The value is expressed in milliseconds.</td>
+ </tr>
+ <tr>
+ <td>executorCpuTime
+ <td>CPU Time the executor spent running this task. This includes time fetching shuffle data.
--- End diff --
nit: `CPU time`?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22167: [SPARK-25170][DOC] Add list and short description of Spa...
Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:
https://github.com/apache/spark/pull/22167
I like to add description for metrics.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22167: [SPARK-25170][DOC] Add list and short description...
Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:
https://github.com/apache/spark/pull/22167#discussion_r215314527
--- Diff: docs/monitoring.md ---
@@ -388,6 +388,163 @@ value triggering garbage collection on jobs, and `spark.ui.retainedStages` that
Note that the garbage collection takes place on playback: it is possible to retrieve
more entries by increasing these values and restarting the history server.
+### Executor Task Metrics
+
+The REST API exposes the values of the Task Metrics collected by Spark executors at the
+task execution level. The metrics can be used for performance troubleshooting.
+A list of the available metrics with a short description:
+
+<table class="table">
+ <tr><th>Spark Executor Task Metric name</th>
+ <th>Short description</th>
+ </tr>
+ <tr>
+ <td>executorRunTime</td>
+ <td>Time the executor spent running this task. This includes time fetching shuffle data.
--- End diff --
Does `Time` mean `elapsed time` or other `time`?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org