You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by JoshRosen <gi...@git.apache.org> on 2017/05/16 22:39:30 UTC

[GitHub] spark pull request #18008: [SPARK-20776] Fix perf. problems in TaskMetrics.n...

GitHub user JoshRosen opened a pull request:

    https://github.com/apache/spark/pull/18008

    [SPARK-20776] Fix perf. problems in TaskMetrics.nameToAccums map initialization

    ## What changes were proposed in this pull request?
    
    In 
    
    ```
    ./bin/spark-shell --master=local[64]
    ```
    
    I ran 
    
    ``` 
    sc.parallelize(1 to 100000, 100000).count()
    ```
    and profiled the time spend in the LiveListenerBus event processing thread. I discovered that the majority of the time was being spent initializing the `TaskMetrics.nameToAccums` map:
    
    ![image](https://cloud.githubusercontent.com/assets/50748/26131230/a9f83ee0-3a4d-11e7-9ac9-5b21e1c57083.png)
    
    By using a pre-sized Java hash map I was able to remove this performance bottleneck and prevent dropped listener events (the old code couldn't keep up with the event rate and dropped some events).
    
    ## How was this patch tested?
    
    Benchmarks described above.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/JoshRosen/spark nametoaccums-improvements

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/18008.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #18008
    
----
commit 622951f5f97ac79235070fad6b82a2de1e4fdfa0
Author: Josh Rosen <jo...@databricks.com>
Date:   2016-06-10T23:26:50Z

    TaskMetrics nameToAccums improvements.

commit 4675b21b93e3f8912143ff0fe70268c22faa86bc
Author: Josh Rosen <jo...@databricks.com>
Date:   2017-05-16T22:37:11Z

    Add comment.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18008: [SPARK-20776] Fix perf. problems in TaskMetrics.n...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18008#discussion_r116876355
  
    --- Diff: core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala ---
    @@ -200,32 +202,47 @@ class TaskMetrics private[spark] () extends Serializable {
     
     
       import InternalAccumulator._
    -  @transient private[spark] lazy val nameToAccums = LinkedHashMap(
    -    EXECUTOR_DESERIALIZE_TIME -> _executorDeserializeTime,
    -    EXECUTOR_DESERIALIZE_CPU_TIME -> _executorDeserializeCpuTime,
    -    EXECUTOR_RUN_TIME -> _executorRunTime,
    -    EXECUTOR_CPU_TIME -> _executorCpuTime,
    -    RESULT_SIZE -> _resultSize,
    -    JVM_GC_TIME -> _jvmGCTime,
    -    RESULT_SERIALIZATION_TIME -> _resultSerializationTime,
    -    MEMORY_BYTES_SPILLED -> _memoryBytesSpilled,
    -    DISK_BYTES_SPILLED -> _diskBytesSpilled,
    -    PEAK_EXECUTION_MEMORY -> _peakExecutionMemory,
    -    UPDATED_BLOCK_STATUSES -> _updatedBlockStatuses,
    -    shuffleRead.REMOTE_BLOCKS_FETCHED -> shuffleReadMetrics._remoteBlocksFetched,
    -    shuffleRead.LOCAL_BLOCKS_FETCHED -> shuffleReadMetrics._localBlocksFetched,
    -    shuffleRead.REMOTE_BYTES_READ -> shuffleReadMetrics._remoteBytesRead,
    -    shuffleRead.LOCAL_BYTES_READ -> shuffleReadMetrics._localBytesRead,
    -    shuffleRead.FETCH_WAIT_TIME -> shuffleReadMetrics._fetchWaitTime,
    -    shuffleRead.RECORDS_READ -> shuffleReadMetrics._recordsRead,
    -    shuffleWrite.BYTES_WRITTEN -> shuffleWriteMetrics._bytesWritten,
    -    shuffleWrite.RECORDS_WRITTEN -> shuffleWriteMetrics._recordsWritten,
    -    shuffleWrite.WRITE_TIME -> shuffleWriteMetrics._writeTime,
    -    input.BYTES_READ -> inputMetrics._bytesRead,
    -    input.RECORDS_READ -> inputMetrics._recordsRead,
    -    output.BYTES_WRITTEN -> outputMetrics._bytesWritten,
    -    output.RECORDS_WRITTEN -> outputMetrics._recordsWritten
    -  ) ++ testAccum.map(TEST_ACCUM -> _)
    +  @transient private[spark] lazy val nameToAccums = {
    +    // The construction of this map is a performance hotspot in the JobProgressListener, so we
    +    // optimize this by using a pre-sized Java hashmap; see SPARK-20776 for more details.
    +    val mapEntries = Array[(String, AccumulatorV2[_, _])](
    +      EXECUTOR_DESERIALIZE_TIME -> _executorDeserializeTime,
    +      EXECUTOR_DESERIALIZE_CPU_TIME -> _executorDeserializeCpuTime,
    +      EXECUTOR_RUN_TIME -> _executorRunTime,
    +      EXECUTOR_CPU_TIME -> _executorCpuTime,
    +      RESULT_SIZE -> _resultSize,
    +      JVM_GC_TIME -> _jvmGCTime,
    +      RESULT_SERIALIZATION_TIME -> _resultSerializationTime,
    +      MEMORY_BYTES_SPILLED -> _memoryBytesSpilled,
    +      DISK_BYTES_SPILLED -> _diskBytesSpilled,
    +      PEAK_EXECUTION_MEMORY -> _peakExecutionMemory,
    +      UPDATED_BLOCK_STATUSES -> _updatedBlockStatuses,
    +      shuffleRead.REMOTE_BLOCKS_FETCHED -> shuffleReadMetrics._remoteBlocksFetched,
    +      shuffleRead.LOCAL_BLOCKS_FETCHED -> shuffleReadMetrics._localBlocksFetched,
    +      shuffleRead.REMOTE_BYTES_READ -> shuffleReadMetrics._remoteBytesRead,
    +      shuffleRead.LOCAL_BYTES_READ -> shuffleReadMetrics._localBytesRead,
    +      shuffleRead.FETCH_WAIT_TIME -> shuffleReadMetrics._fetchWaitTime,
    +      shuffleRead.RECORDS_READ -> shuffleReadMetrics._recordsRead,
    +      shuffleWrite.BYTES_WRITTEN -> shuffleWriteMetrics._bytesWritten,
    +      shuffleWrite.RECORDS_WRITTEN -> shuffleWriteMetrics._recordsWritten,
    +      shuffleWrite.WRITE_TIME -> shuffleWriteMetrics._writeTime,
    +      input.BYTES_READ -> inputMetrics._bytesRead,
    +      input.RECORDS_READ -> inputMetrics._recordsRead,
    +      output.BYTES_WRITTEN -> outputMetrics._bytesWritten,
    +      output.RECORDS_WRITTEN -> outputMetrics._recordsWritten
    +    )
    +    val map = Maps.newHashMapWithExpectedSize[String, AccumulatorV2[_, _]](mapEntries.length)
    +    var i = 0
    +    while (i < mapEntries.length) {
    +      val e = mapEntries(i)
    +      map.put(e._1, e._2)
    +      i += 1
    +    }
    +    testAccum.foreach { accum =>
    +      map.put(TEST_ACCUM, accum)
    +    }
    +    map.asScala
    --- End diff --
    
    The map + wrapper might consume a little bit of extra memory compared to the old code but it doesn't matter because we don't have that many `TaskMetrics` resident in the JVM at the same time: in the executor, the only instances are in TaskContexts and in the driver you only have one per stage in the scheduler and some temporary ones in the listener bus queue which are freed as soon as the queue events are processed (which happens faster now, outweighing the extra space usage).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18008: [SPARK-20776] Fix perf. problems in JobProgressLi...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18008#discussion_r116885970
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/UIData.scala ---
    @@ -112,9 +112,9 @@ private[spark] object UIData {
       /**
        * These are kept mutable and reused throughout a task's lifetime to avoid excessive reallocation.
        */
    -  class TaskUIData private(
    -      private var _taskInfo: TaskInfo,
    -      private var _metrics: Option[TaskMetricsUIData]) {
    +  class TaskUIData private(private var _taskInfo: TaskInfo) {
    +
    +    private[this] var _metrics: Option[TaskMetricsUIData] = Some(TaskMetricsUIData.EMPTY)
    --- End diff --
    
    when will this be None?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18008: [SPARK-20776] Fix perf. problems in JobProgressListener ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/18008
  
    thanks, merging to master/2.2!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18008: [SPARK-20776] Fix perf. problems in JobProgressListener ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18008
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76991/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18008: [SPARK-20776] Fix perf. problems in JobProgressListener ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18008
  
    **[Test build #76991 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76991/testReport)** for PR 18008 at commit [`feda785`](https://github.com/apache/spark/commit/feda785f81dd2d8ac915a96edd68d3def353359f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18008: [SPARK-20776] Fix perf. problems in TaskMetrics.nameToAc...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the issue:

    https://github.com/apache/spark/pull/18008
  
    Actually, stepping back a second, we might be able to completely remove this bottleneck by simply not constructing tons of empty TaskMetrics objects in JobProgressListener's hot path. Let me see if I can update to do that instead.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18008: [SPARK-20776] Fix perf. problems in JobProgressListener ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/18008
  
    LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18008: [SPARK-20776] Fix perf. problems in JobProgressListener ...

Posted by witgo <gi...@git.apache.org>.
Github user witgo commented on the issue:

    https://github.com/apache/spark/pull/18008
  
    @JoshRosen   I see, Thank you.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18008: [SPARK-20776] Fix perf. problems in JobProgressListener ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18008
  
    **[Test build #76988 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76988/testReport)** for PR 18008 at commit [`6e66b80`](https://github.com/apache/spark/commit/6e66b80c6842c6a8ee10df34711bd6799a839439).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18008: [SPARK-20776] Fix perf. problems in JobProgressListener ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18008
  
    **[Test build #76988 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76988/testReport)** for PR 18008 at commit [`6e66b80`](https://github.com/apache/spark/commit/6e66b80c6842c6a8ee10df34711bd6799a839439).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18008: [SPARK-20776] Fix perf. problems in JobProgressLi...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18008#discussion_r116886967
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/UIData.scala ---
    @@ -112,9 +112,9 @@ private[spark] object UIData {
       /**
        * These are kept mutable and reused throughout a task's lifetime to avoid excessive reallocation.
        */
    -  class TaskUIData private(
    -      private var _taskInfo: TaskInfo,
    -      private var _metrics: Option[TaskMetricsUIData]) {
    +  class TaskUIData private(private var _taskInfo: TaskInfo) {
    +
    +    private[this] var _metrics: Option[TaskMetricsUIData] = Some(TaskMetricsUIData.EMPTY)
    --- End diff --
    
    The only way for this to become `None` is if `updateTaskMetrics` is called with `None`.
    
    `updateTaskMetrics` is called in two places:
    
    - In JobProgressListener.onTaskEnd, where the metrics are from `Option(taskEnd.taskMetrics)`, where `taskEnd.taskMetrics` can be `null` in case the task has failed (according to docs).
    - In JobProgressListener.onExecutorMetricsUpdate, where the metrics are guaranteed to be defined / non-None.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18008: [SPARK-20776] Fix perf. problems in JobProgressListener ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18008
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76990/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18008: [SPARK-20776] Fix perf. problems in JobProgressListener ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18008
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18008: [SPARK-20776] Fix perf. problems in JobProgressListener ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18008
  
    **[Test build #76984 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76984/testReport)** for PR 18008 at commit [`4675b21`](https://github.com/apache/spark/commit/4675b21b93e3f8912143ff0fe70268c22faa86bc).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18008: [SPARK-20776] Fix perf. problems in JobProgressListener ...

Posted by witgo <gi...@git.apache.org>.
Github user witgo commented on the issue:

    https://github.com/apache/spark/pull/18008
  
    @JoshRosen , what's the tool in your screenshot?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18008: [SPARK-20776] Fix perf. problems in TaskMetrics.nameToAc...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18008
  
    **[Test build #76984 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76984/testReport)** for PR 18008 at commit [`4675b21`](https://github.com/apache/spark/commit/4675b21b93e3f8912143ff0fe70268c22faa86bc).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18008: [SPARK-20776] Fix perf. problems in JobProgressListener ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18008
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18008: [SPARK-20776] Fix perf. problems in JobProgressListener ...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the issue:

    https://github.com/apache/spark/pull/18008
  
    @witgo, I'm using [YourKit Java Profiler](https://www.yourkit.com/java/profiler/) 2016.02. In these screenshots I enabled CPU sampling then took a performance snapshot and used the per-thread view, focusing on the time taken in the live listener bus thread by right-clicking on the subtree and choosing "focus subtree" from the context menu.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18008: [SPARK-20776] Fix perf. problems in JobProgressLi...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/18008


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18008: [SPARK-20776] Fix perf. problems in JobProgressLi...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18008#discussion_r116883398
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/JobProgressListener.scala ---
    @@ -405,7 +404,7 @@ class JobProgressListener(conf: SparkConf) extends SparkListener with Logging {
             updateAggregateMetrics(stageData, info.executorId, m, oldMetrics)
           }
     
    -      val taskData = stageData.taskData.getOrElseUpdate(info.taskId, TaskUIData(info, None))
    --- End diff --
    
    Important note here: in the old code, the `elseUpdate` branch would only be taken in rare error cases where we  somehow purged the TaskUIData which should have been created when the task launched. It technically doesn't matter what we put in for the `Option[Metrics]` here since it just gets unconditionally overwritten on line 410 in the old code. So while my new code constructs TaskUIData with default metrics it doesn't actually change the behavior of this block.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18008: [SPARK-20776] Fix perf. problems in JobProgressListener ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18008
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18008: [SPARK-20776] Fix perf. problems in JobProgressListener ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18008
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76984/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18008: [SPARK-20776] Fix perf. problems in JobProgressListener ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18008
  
    **[Test build #76990 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76990/testReport)** for PR 18008 at commit [`1c62909`](https://github.com/apache/spark/commit/1c62909466732a70a359e547552b3be5f9a7b781).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18008: [SPARK-20776] Fix perf. problems in TaskMetrics.n...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18008#discussion_r116875860
  
    --- Diff: core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala ---
    @@ -200,32 +202,47 @@ class TaskMetrics private[spark] () extends Serializable {
     
     
       import InternalAccumulator._
    -  @transient private[spark] lazy val nameToAccums = LinkedHashMap(
    --- End diff --
    
    It looks like the use of `LinkedHashMap` was added by @cloud-fan in #12612. As far as I can tell we don't actually rely on the ordering of the entries in this map, so I didn't preserved the use of `LinkedHashMap`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18008: [SPARK-20776] Fix perf. problems in JobProgressListener ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18008
  
    **[Test build #76991 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76991/testReport)** for PR 18008 at commit [`feda785`](https://github.com/apache/spark/commit/feda785f81dd2d8ac915a96edd68d3def353359f).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18008: [SPARK-20776] Fix perf. problems in JobProgressListener ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18008
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18008: [SPARK-20776] Fix perf. problems in JobProgressListener ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18008
  
    **[Test build #76990 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76990/testReport)** for PR 18008 at commit [`1c62909`](https://github.com/apache/spark/commit/1c62909466732a70a359e547552b3be5f9a7b781).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18008: [SPARK-20776] Fix perf. problems in JobProgressListener ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18008
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76988/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org