You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by LucaCanali <gi...@git.apache.org> on 2018/08/21 07:35:08 UTC

[GitHub] spark pull request #22167: [SPARK-25170][DOC] Add list and short description...

GitHub user LucaCanali opened a pull request:

    https://github.com/apache/spark/pull/22167

    [SPARK-25170][DOC] Add list and short description of Spark Executor Task Metrics to the documentation

    ## What changes were proposed in this pull request?
    
    Add description of Task Metrics to the documentation.
    
    ## How was this patch tested?
    
    None.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/LucaCanali/spark docMonitoringTaskMetrics

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22167.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22167
    
----
commit a8db1605adbc271c785fda24b4945bf87149a4cd
Author: LucaCanali <lu...@...>
Date:   2018-08-20T14:12:52Z

    Document Spark Executor Task Metrics

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22167: [SPARK-25170][DOC] Add list and short description of Spa...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22167
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22167: [SPARK-25170][DOC] Add list and short description...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22167#discussion_r215315657
  
    --- Diff: docs/monitoring.md ---
    @@ -388,6 +388,163 @@ value triggering garbage collection on jobs, and `spark.ui.retainedStages` that
     Note that the garbage collection takes place on playback: it is possible to retrieve
     more entries by increasing these values and restarting the history server.
     
    +### Executor Task Metrics
    +
    +The REST API exposes the values of the Task Metrics collected by Spark executors at the
    +task execution level. The metrics can be used for performance troubleshooting.
    +A list of the available metrics with a short description:
    +
    +<table class="table">
    +  <tr><th>Spark Executor Task Metric name</th>
    +      <th>Short description</th>
    +  </tr>
    +  <tr>
    +    <td>executorRunTime</td>
    +    <td>Time the executor spent running this task. This includes time fetching shuffle data.
    +    The value is expressed in milliseconds.</td>
    +  </tr>
    +  <tr>
    +    <td>executorCpuTime
    +    <td>CPU Time the executor spent running this task. This includes time fetching shuffle data.
    +    The value is expressed in nanoseconds.
    +  </tr>
    +  <tr>
    +    <td>executorDeserializeTime</td>
    +    <td>Time taken on the executor to deserialize this task.
    +    The value is expressed in milliseconds.</td>
    +  </tr>
    +  <tr>
    +    <td>executorDeserializeCpuTime</td>
    +    <td>CPU Time taken on the executor to deserialize this task.
    +     The value is expressed in nanoseconds.</td>
    +  </tr>
    +  <tr>
    +    <td>resultSize</td>
    +    <td>The number of bytes this task transmitted back to the driver as the TaskResult.</td>
    +  </tr>
    +  <tr>
    +    <td>jvmGCTime</td>
    +    <td>Amount of time the JVM spent in garbage collection while executing this task.
    --- End diff --
    
    Why do we start with `amount of` while the above parameters start `Time` or `CPU Time`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22167: [SPARK-25170][DOC] Add list and short description of Spa...

Posted by LucaCanali <gi...@git.apache.org>.
Github user LucaCanali commented on the issue:

    https://github.com/apache/spark/pull/22167
  
    Thanks @kiszk for reviewing this. I have addressed your comments in a new commit +
    apologies as I have now moved this to a new PR https://github.com/apache/spark/pull/22397 
    I am closing this to avoid confusion.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22167: [SPARK-25170][DOC] Add list and short description of Spa...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22167
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22167: [SPARK-25170][DOC] Add list and short description...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22167#discussion_r215314710
  
    --- Diff: docs/monitoring.md ---
    @@ -388,6 +388,163 @@ value triggering garbage collection on jobs, and `spark.ui.retainedStages` that
     Note that the garbage collection takes place on playback: it is possible to retrieve
     more entries by increasing these values and restarting the history server.
     
    +### Executor Task Metrics
    +
    +The REST API exposes the values of the Task Metrics collected by Spark executors at the
    +task execution level. The metrics can be used for performance troubleshooting.
    +A list of the available metrics with a short description:
    +
    +<table class="table">
    +  <tr><th>Spark Executor Task Metric name</th>
    +      <th>Short description</th>
    +  </tr>
    +  <tr>
    +    <td>executorRunTime</td>
    +    <td>Time the executor spent running this task. This includes time fetching shuffle data.
    +    The value is expressed in milliseconds.</td>
    +  </tr>
    +  <tr>
    +    <td>executorCpuTime
    --- End diff --
    
    Do we miss `<td>` in these two lines?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22167: [SPARK-25170][DOC] Add list and short description...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22167#discussion_r215315711
  
    --- Diff: docs/monitoring.md ---
    @@ -388,6 +388,163 @@ value triggering garbage collection on jobs, and `spark.ui.retainedStages` that
     Note that the garbage collection takes place on playback: it is possible to retrieve
     more entries by increasing these values and restarting the history server.
     
    +### Executor Task Metrics
    +
    +The REST API exposes the values of the Task Metrics collected by Spark executors at the
    +task execution level. The metrics can be used for performance troubleshooting.
    +A list of the available metrics with a short description:
    +
    +<table class="table">
    +  <tr><th>Spark Executor Task Metric name</th>
    +      <th>Short description</th>
    +  </tr>
    +  <tr>
    +    <td>executorRunTime</td>
    +    <td>Time the executor spent running this task. This includes time fetching shuffle data.
    +    The value is expressed in milliseconds.</td>
    +  </tr>
    +  <tr>
    +    <td>executorCpuTime
    +    <td>CPU Time the executor spent running this task. This includes time fetching shuffle data.
    +    The value is expressed in nanoseconds.
    +  </tr>
    +  <tr>
    +    <td>executorDeserializeTime</td>
    +    <td>Time taken on the executor to deserialize this task.
    +    The value is expressed in milliseconds.</td>
    +  </tr>
    +  <tr>
    +    <td>executorDeserializeCpuTime</td>
    +    <td>CPU Time taken on the executor to deserialize this task.
    +     The value is expressed in nanoseconds.</td>
    +  </tr>
    +  <tr>
    +    <td>resultSize</td>
    +    <td>The number of bytes this task transmitted back to the driver as the TaskResult.</td>
    +  </tr>
    +  <tr>
    +    <td>jvmGCTime</td>
    +    <td>Amount of time the JVM spent in garbage collection while executing this task.
    +    The value is expressed in milliseconds.</td>
    +  </tr>
    +  <tr>
    +    <td>resultSerializationTime</td>
    +    <td>Amount of time spent serializing the task result.
    --- End diff --
    
    ditto


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22167: [SPARK-25170][DOC] Add list and short description of Spa...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22167
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22167: [SPARK-25170][DOC] Add list and short description...

Posted by LucaCanali <gi...@git.apache.org>.
Github user LucaCanali closed the pull request at:

    https://github.com/apache/spark/pull/22167


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22167: [SPARK-25170][DOC] Add list and short description...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22167#discussion_r215315965
  
    --- Diff: docs/monitoring.md ---
    @@ -388,6 +388,163 @@ value triggering garbage collection on jobs, and `spark.ui.retainedStages` that
     Note that the garbage collection takes place on playback: it is possible to retrieve
     more entries by increasing these values and restarting the history server.
     
    +### Executor Task Metrics
    +
    +The REST API exposes the values of the Task Metrics collected by Spark executors at the
    +task execution level. The metrics can be used for performance troubleshooting.
    +A list of the available metrics with a short description:
    +
    +<table class="table">
    +  <tr><th>Spark Executor Task Metric name</th>
    +      <th>Short description</th>
    +  </tr>
    +  <tr>
    +    <td>executorRunTime</td>
    +    <td>Time the executor spent running this task. This includes time fetching shuffle data.
    +    The value is expressed in milliseconds.</td>
    +  </tr>
    +  <tr>
    +    <td>executorCpuTime
    +    <td>CPU Time the executor spent running this task. This includes time fetching shuffle data.
    --- End diff --
    
    nit: `CPU time`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22167: [SPARK-25170][DOC] Add list and short description of Spa...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/22167
  
    I like to add description for metrics.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22167: [SPARK-25170][DOC] Add list and short description...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22167#discussion_r215314527
  
    --- Diff: docs/monitoring.md ---
    @@ -388,6 +388,163 @@ value triggering garbage collection on jobs, and `spark.ui.retainedStages` that
     Note that the garbage collection takes place on playback: it is possible to retrieve
     more entries by increasing these values and restarting the history server.
     
    +### Executor Task Metrics
    +
    +The REST API exposes the values of the Task Metrics collected by Spark executors at the
    +task execution level. The metrics can be used for performance troubleshooting.
    +A list of the available metrics with a short description:
    +
    +<table class="table">
    +  <tr><th>Spark Executor Task Metric name</th>
    +      <th>Short description</th>
    +  </tr>
    +  <tr>
    +    <td>executorRunTime</td>
    +    <td>Time the executor spent running this task. This includes time fetching shuffle data.
    --- End diff --
    
    Does `Time` mean `elapsed time` or other `time`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org