You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by tgravescs <gi...@git.apache.org> on 2017/05/31 14:46:49 UTC

[GitHub] spark pull request #18162: [SPARK-20923] Remove TaskMetrics._updatedBlockSta...

GitHub user tgravescs opened a pull request:

    https://github.com/apache/spark/pull/18162

    [SPARK-20923] Remove TaskMetrics._updatedBlockStatuses

    ## What changes were proposed in this pull request?
    Remove TaskMetrics._updatedBlockStatuses. As far as I can see its not used by anything and it uses a lot of memory when caching and processing a lot of blocks.  In my case it was taking 5GB of a 10GB heap and I even went up to 50GB heap and the job still ran out of memory.  With this change in place the same job easily runs in less then 10GB of heap.
    
    ## How was this patch tested?
    
    Ran unit tests that were modified and manually tested on a couple of jobs (with and without caching).  Clicked through the UI and didn't see anything missing. 
    Ran my very large hive query job with 200,000 small tasks, 1000 executors, cached 6+TB of data this runs fine now whereas without this change it would go into full gcs and eventually die.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tgravescs/spark SPARK-20923

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/18162.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #18162
    
----
commit 9177d7a9b17103276af734b5697f133d98d945f5
Author: Tom Graves <tg...@yahoo-inc.com>
Date:   2017-05-30T19:35:37Z

    [SPARK-20923] Remove TaskMetrics._updatedBlockStatuses

commit 6f08dc249efa229815774c7645bfb07d281d120f
Author: Thomas Graves <tg...@thirteenroutine.corp.gq1.yahoo.com>
Date:   2017-05-30T21:29:01Z

    Fix tests

commit 65ca111cdfd55dc937f8060aad00e7e4137a94fe
Author: Thomas Graves <tg...@thirteenroutine.corp.gq1.yahoo.com>
Date:   2017-05-31T14:33:11Z

    fix test

commit 5327ba5d81cfcfc2e13eb6269271756dc1504f0e
Author: Thomas Graves <tg...@thirteenroutine.corp.gq1.yahoo.com>
Date:   2017-05-31T14:40:36Z

    remove empty line

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18162: [SPARK-20923] turn tracking of TaskMetrics._updatedBlock...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18162
  
    **[Test build #77625 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77625/testReport)** for PR 18162 at commit [`ec3c29d`](https://github.com/apache/spark/commit/ec3c29d4014e87a00d1988ce9342521af62a335e).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18162: [SPARK-20923] turn tracking of TaskMetrics._updatedBlock...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18162
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18162: [SPARK-20923] turn tracking of TaskMetrics._updatedBlock...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18162
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77684/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18162: [SPARK-20923] turn tracking of TaskMetrics._updatedBlock...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18162
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18162: [SPARK-20923] Remove TaskMetrics._updatedBlockSta...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18162#discussion_r119421473
  
    --- Diff: core/src/main/scala/org/apache/spark/util/JsonProtocol.scala ---
    @@ -368,8 +356,7 @@ private[spark] object JsonProtocol {
         ("Shuffle Read Metrics" -> shuffleReadMetrics) ~
         ("Shuffle Write Metrics" -> shuffleWriteMetrics) ~
         ("Input Metrics" -> inputMetrics) ~
    -    ("Output Metrics" -> outputMetrics) ~
    -    ("Updated Blocks" -> updatedBlocks)
    --- End diff --
    
    Good point, I'll look through some older versions and test out a few things with this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18162: [SPARK-20923] turn tracking of TaskMetrics._updatedBlock...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18162
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18162: [SPARK-20923] turn tracking of TaskMetrics._updatedBlock...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18162
  
    **[Test build #78468 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78468/testReport)** for PR 18162 at commit [`289e993`](https://github.com/apache/spark/commit/289e993e6cd745f8a4fab4a74a7928a3687b20b9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18162: [SPARK-20923] Remove TaskMetrics._updatedBlockStatuses

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18162
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77594/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18162: [SPARK-20923] turn tracking of TaskMetrics._updatedBlock...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18162
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78467/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18162: [SPARK-20923] turn tracking of TaskMetrics._updatedBlock...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18162
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18162: [SPARK-20923] turn tracking of TaskMetrics._updatedBlock...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the issue:

    https://github.com/apache/spark/pull/18162
  
    sorry missed that you had commented, yes we can change that


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18162: [SPARK-20923] turn tracking of TaskMetrics._updatedBlock...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18162
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78300/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18162: [SPARK-20923] Remove TaskMetrics._updatedBlockStatuses

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18162
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18162: [SPARK-20923] turn tracking of TaskMetrics._updatedBlock...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the issue:

    https://github.com/apache/spark/pull/18162
  
    Jenkins, test this please



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18162: [SPARK-20923] turn tracking of TaskMetrics._updatedBlock...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18162
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18162: [SPARK-20923] turn tracking of TaskMetrics._updatedBlock...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18162
  
    **[Test build #78300 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78300/testReport)** for PR 18162 at commit [`a875aac`](https://github.com/apache/spark/commit/a875aacdb32bd308f490270fbb3c072c5ef9c36e).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18162: [SPARK-20923] turn tracking of TaskMetrics._updatedBlock...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the issue:

    https://github.com/apache/spark/pull/18162
  
    failure is from previous push of code. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18162: [SPARK-20923] turn tracking of TaskMetrics._updatedBlock...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18162
  
    **[Test build #78467 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78467/testReport)** for PR 18162 at commit [`68fbd9f`](https://github.com/apache/spark/commit/68fbd9f8c1c2981eff859b1fec078e238aa0280c).
     * This patch **fails to build**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18162: [SPARK-20923] Remove TaskMetrics._updatedBlockSta...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18162#discussion_r119393010
  
    --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala ---
    @@ -996,12 +995,6 @@ private[spark] class BlockManager(
             // notified the master about the availability of this block, so we need to send an update
             // to remove this block location.
             removeBlockInternal(blockId, tellMaster = tellMaster)
    -        // The `putBody` code may have also added a new block status to TaskMetrics, so we need
    --- End diff --
    
    I'm having flashbacks to how tricky it was to figure out this logic when refactoring the block manager code (this used to be a lot messier). Really happy to see this get removed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18162: [SPARK-20923] turn tracking of TaskMetrics._updatedBlock...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18162
  
    **[Test build #77684 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77684/testReport)** for PR 18162 at commit [`018955a`](https://github.com/apache/spark/commit/018955acc111b6a43cf016f23a15f1b4a73d8ed1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18162: [SPARK-20923] Remove TaskMetrics._updatedBlockStatuses

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18162
  
    **[Test build #77603 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77603/testReport)** for PR 18162 at commit [`41ed774`](https://github.com/apache/spark/commit/41ed7745d2be28e3d1de8ca1e2aa594a43f45760).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18162: [SPARK-20923] turn tracking of TaskMetrics._updat...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18162#discussion_r119940037
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/JobProgressListener.scala ---
    @@ -528,7 +528,13 @@ class JobProgressListener(conf: SparkConf) extends SparkListener with Logging {
             new StageUIData
           })
           val taskData = stageData.taskData.get(taskId)
    -      val metrics = TaskMetrics.fromAccumulatorInfos(accumUpdates)
    +      val accumsFiltered = if (conf.get(TASK_METRICS_TRACK_UPDATED_BLOCK_STATUSES)) {
    +        accumUpdates
    +      } else {
    +        accumUpdates.filter(info => info.name.isDefined && info.update.isDefined && info.name !=
    --- End diff --
    
    When we reach here, I think we already stored `updatedBlockStatus` in memory, filtering them out here doesn't help.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18162: [SPARK-20923] turn tracking of TaskMetrics._updatedBlock...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18162
  
    **[Test build #77684 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77684/testReport)** for PR 18162 at commit [`018955a`](https://github.com/apache/spark/commit/018955acc111b6a43cf016f23a15f1b4a73d8ed1).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18162: [SPARK-20923] turn tracking of TaskMetrics._updat...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18162#discussion_r123527201
  
    --- Diff: core/src/main/scala/org/apache/spark/internal/config/package.scala ---
    @@ -295,4 +295,12 @@ package object config {
             "above this threshold. This is to avoid a giant request takes too much memory.")
           .bytesConf(ByteUnit.BYTE)
           .createWithDefaultString("200m")
    +
    +  private[spark] val TASK_METRICS_TRACK_UPDATED_BLOCK_STATUSES =
    +    ConfigBuilder("spark.taskMetrics.trackUpdatedBlockStatuses")
    --- End diff --
    
    Right this is why I originally had this off by default and you requested it on.  I will turn it back on and if a user finds they need it because they somehow extended the class they can turn it on. Turning this off will be most beneficial for the majority of users


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18162: [SPARK-20923] turn tracking of TaskMetrics._updatedBlock...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18162
  
    **[Test build #77687 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77687/testReport)** for PR 18162 at commit [`a875aac`](https://github.com/apache/spark/commit/a875aacdb32bd308f490270fbb3c072c5ef9c36e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18162: [SPARK-20923] Remove TaskMetrics._updatedBlockStatuses

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18162
  
    **[Test build #77594 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77594/testReport)** for PR 18162 at commit [`5327ba5`](https://github.com/apache/spark/commit/5327ba5d81cfcfc2e13eb6269271756dc1504f0e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18162: [SPARK-20923] turn tracking of TaskMetrics._updatedBlock...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18162
  
    **[Test build #78466 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78466/testReport)** for PR 18162 at commit [`68fbd9f`](https://github.com/apache/spark/commit/68fbd9f8c1c2981eff859b1fec078e238aa0280c).
     * This patch **fails to build**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18162: [SPARK-20923] turn tracking of TaskMetrics._updat...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18162#discussion_r119938877
  
    --- Diff: core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala ---
    @@ -112,6 +112,12 @@ class TaskMetrics private[spark] () extends Serializable {
     
       /**
        * Storage statuses of any blocks that have been updated as a result of this task.
    +   *
    +   * Tracking the _updatedBlockStatuses can use a lot of memory.
    +   * It is not used anywhere inside of Spark so we would ideally remove it, but its exposed to
    +   * the user in SparkListenerTaskEnd so the api is kept for compatibility.
    +   * By default it is configured to not actually save the block statuses via config
    --- End diff --
    
    remove this line as it's not corrected now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18162: [SPARK-20923] turn tracking of TaskMetrics._updatedBlock...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18162
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78460/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18162: [SPARK-20923] turn tracking of TaskMetrics._updatedBlock...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18162
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18162: [SPARK-20923] turn tracking of TaskMetrics._updatedBlock...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the issue:

    https://github.com/apache/spark/pull/18162
  
    will upmerge shortly, since there are conflicts


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18162: [SPARK-20923] turn tracking of TaskMetrics._updatedBlock...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18162
  
    **[Test build #78468 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78468/testReport)** for PR 18162 at commit [`289e993`](https://github.com/apache/spark/commit/289e993e6cd745f8a4fab4a74a7928a3687b20b9).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18162: [SPARK-20923] turn tracking of TaskMetrics._updat...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/18162


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18162: [SPARK-20923] Remove TaskMetrics._updatedBlockStatuses

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the issue:

    https://github.com/apache/spark/pull/18162
  
    Updated, I put the TaskMetrics api back with deprecated marking and just had it return Nil.  @JoshRosen  Were you thinking of adding more back?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18162: [SPARK-20923] turn tracking of TaskMetrics._updatedBlock...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18162
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78466/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18162: [SPARK-20923] turn tracking of TaskMetrics._updatedBlock...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/18162
  
    thanks, merging to master!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18162: [SPARK-20923] Remove TaskMetrics._updatedBlockSta...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18162#discussion_r119393878
  
    --- Diff: core/src/main/scala/org/apache/spark/util/JsonProtocol.scala ---
    @@ -368,8 +356,7 @@ private[spark] object JsonProtocol {
         ("Shuffle Read Metrics" -> shuffleReadMetrics) ~
         ("Shuffle Write Metrics" -> shuffleWriteMetrics) ~
         ("Input Metrics" -> inputMetrics) ~
    -    ("Output Metrics" -> outputMetrics) ~
    -    ("Updated Blocks" -> updatedBlocks)
    --- End diff --
    
    The only thing that maybe gives me pause here is compatibility when reading Spark logs produced by a new version of Spark in an old version of the History Server: if we remove a key which was present from day 0 then we might run into problems when code assumes it will be present. That said, it looks like the read of this key down on line 842 was already using `Utils.jsonOption` so maybe this was added later and thus is already handled as an option in the existing History Server code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18162: [SPARK-20923] Remove TaskMetrics._updatedBlockStatuses

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18162
  
    **[Test build #77594 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77594/testReport)** for PR 18162 at commit [`5327ba5`](https://github.com/apache/spark/commit/5327ba5d81cfcfc2e13eb6269271756dc1504f0e).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18162: [SPARK-20923] turn tracking of TaskMetrics._updatedBlock...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/18162
  
    I'm wondering if it's doable to stop tracking the `updatedBlockStatus` according to a config... there are many places that update `updatedBlockStatus` right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18162: [SPARK-20923] turn tracking of TaskMetrics._updatedBlock...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18162
  
    **[Test build #78466 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78466/testReport)** for PR 18162 at commit [`68fbd9f`](https://github.com/apache/spark/commit/68fbd9f8c1c2981eff859b1fec078e238aa0280c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18162: [SPARK-20923] turn tracking of TaskMetrics._updat...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18162#discussion_r122363701
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/JobProgressListener.scala ---
    @@ -528,7 +528,13 @@ class JobProgressListener(conf: SparkConf) extends SparkListener with Logging {
             new StageUIData
           })
           val taskData = stageData.taskData.get(taskId)
    -      val metrics = TaskMetrics.fromAccumulatorInfos(accumUpdates)
    +      val accumsFiltered = if (conf.get(TASK_METRICS_TRACK_UPDATED_BLOCK_STATUSES)) {
    +        accumUpdates
    +      } else {
    +        accumUpdates.filter(info => info.name.isDefined && info.update.isDefined && info.name !=
    --- End diff --
    
    to be more clear, I think we should just do an assert here to make sure there is not UPDATED_BLOCK_STATUSES accumulator updates, instead of doing a filter.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18162: [SPARK-20923] turn tracking of TaskMetrics._updatedBlock...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18162
  
    **[Test build #78300 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78300/testReport)** for PR 18162 at commit [`a875aac`](https://github.com/apache/spark/commit/a875aacdb32bd308f490270fbb3c072c5ef9c36e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18162: [SPARK-20923] Remove TaskMetrics._updatedBlockSta...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18162#discussion_r119444253
  
    --- Diff: core/src/main/scala/org/apache/spark/util/JsonProtocol.scala ---
    @@ -368,8 +356,7 @@ private[spark] object JsonProtocol {
         ("Shuffle Read Metrics" -> shuffleReadMetrics) ~
         ("Shuffle Write Metrics" -> shuffleWriteMetrics) ~
         ("Input Metrics" -> inputMetrics) ~
    -    ("Output Metrics" -> outputMetrics) ~
    -    ("Updated Blocks" -> updatedBlocks)
    --- End diff --
    
    I looked at this some more and I don't see any issue with this.  As you said below on 842 its an option and I tested this both ways with history server as well just to be sure.  new history file (without "Updated blocks" entry) with old history server and old file (with "Updated blocks" entry in history file) with new history server, both work fine.
    
    if you think we should leave it I can put it back with a empty value?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18162: [SPARK-20923] Remove TaskMetrics._updatedBlockStatuses

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the issue:

    https://github.com/apache/spark/pull/18162
  
    @JoshRosen



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18162: [SPARK-20923] Remove TaskMetrics._updatedBlockStatuses

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18162
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77603/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18162: [SPARK-20923] Remove TaskMetrics._updatedBlockStatuses

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the issue:

    https://github.com/apache/spark/pull/18162
  
    @cloud-fan has a good point about backwards-compatibility. In case any folks are actually relying on this behavior, I wonder whether we could mark it as deprecated and have a flag for disabling it instead of complete removal.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18162: [SPARK-20923] turn tracking of TaskMetrics._updatedBlock...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18162
  
    **[Test build #78467 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78467/testReport)** for PR 18162 at commit [`68fbd9f`](https://github.com/apache/spark/commit/68fbd9f8c1c2981eff859b1fec078e238aa0280c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18162: [SPARK-20923] turn tracking of TaskMetrics._updatedBlock...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18162
  
    **[Test build #77682 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77682/testReport)** for PR 18162 at commit [`fbe30cf`](https://github.com/apache/spark/commit/fbe30cf4a84863f0fc2cb8dd4e89e22e096e20c6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18162: [SPARK-20923] turn tracking of TaskMetrics._updat...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18162#discussion_r123262316
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/JobProgressListener.scala ---
    @@ -528,7 +528,13 @@ class JobProgressListener(conf: SparkConf) extends SparkListener with Logging {
             new StageUIData
           })
           val taskData = stageData.taskData.get(taskId)
    -      val metrics = TaskMetrics.fromAccumulatorInfos(accumUpdates)
    +      val accumsFiltered = if (conf.get(TASK_METRICS_TRACK_UPDATED_BLOCK_STATUSES)) {
    +        accumUpdates
    +      } else {
    +        accumUpdates.filter(info => info.name.isDefined && info.update.isDefined && info.name !=
    --- End diff --
    
    yes that is correct as I said it shouldn't really ever get there so I can add an assert.  Sorry for the delay was out of office for a while. I'll update it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18162: [SPARK-20923] turn tracking of TaskMetrics._updatedBlock...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the issue:

    https://github.com/apache/spark/pull/18162
  
    turned on by default for backwards compatibility but don't really agree with it.  We should make it more stable/usable for people by turning it off. I'm assuming anyone that is using this would be very minority of people, but to be safe we can leave it on and I'll file a separate jira to turn off and deprecate.  Note I'm out of office next week so if comments I might not respond for a bit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18162: [SPARK-20923] Remove TaskMetrics._updatedBlockSta...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18162#discussion_r119418862
  
    --- Diff: core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala ---
    @@ -110,15 +109,6 @@ class TaskMetrics private[spark] () extends Serializable {
        */
       def peakExecutionMemory: Long = _peakExecutionMemory.sum
     
    -  /**
    -   * Storage statuses of any blocks that have been updated as a result of this task.
    -   */
    -  def updatedBlockStatuses: Seq[(BlockId, BlockStatus)] = {
    --- End diff --
    
    Unfortunately `TaskMetrics` is a public class(via `SparkListenerTaskEnd`), so this is a breaking change.
    
    But I do agree we should not track the updated block status, how about we still keep this method and make it always return `Nil`? We can mark it as deprecated and add comments to say that it's only here for binary compatibility.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18162: [SPARK-20923] Remove TaskMetrics._updatedBlockSta...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18162#discussion_r119422001
  
    --- Diff: core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala ---
    @@ -110,15 +109,6 @@ class TaskMetrics private[spark] () extends Serializable {
        */
       def peakExecutionMemory: Long = _peakExecutionMemory.sum
     
    -  /**
    -   * Storage statuses of any blocks that have been updated as a result of this task.
    -   */
    -  def updatedBlockStatuses: Seq[(BlockId, BlockStatus)] = {
    --- End diff --
    
    yeah I'll update it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18162: [SPARK-20923] turn tracking of TaskMetrics._updatedBlock...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18162
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77682/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18162: [SPARK-20923] turn tracking of TaskMetrics._updatedBlock...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the issue:

    https://github.com/apache/spark/pull/18162
  
    ok, I'll update the default.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18162: [SPARK-20923] turn tracking of TaskMetrics._updatedBlock...

Posted by rdblue <gi...@git.apache.org>.
Github user rdblue commented on the issue:

    https://github.com/apache/spark/pull/18162
  
    @tgravescs, I deployed this to our production environment (based on 2.0.0) a few days ago and haven't hit any problems with it. I think this is good to go, unless something has been added recently that uses the block statuses.
    
    +1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18162: [SPARK-20923] turn tracking of TaskMetrics._updatedBlock...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/18162
  
    Yea I think we should merge this to 2.2, but we need to change the default value of the new config to `true`, to not surprise users.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18162: [SPARK-20923] turn tracking of TaskMetrics._updatedBlock...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18162
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78468/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18162: [SPARK-20923] turn tracking of TaskMetrics._updatedBlock...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18162
  
    **[Test build #78460 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78460/testReport)** for PR 18162 at commit [`c9bf058`](https://github.com/apache/spark/commit/c9bf0582c3cf5430da36b50d30836c7dc6f8ca72).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18162: [SPARK-20923] Remove TaskMetrics._updatedBlockStatuses

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the issue:

    https://github.com/apache/spark/pull/18162
  
    @tgravescs, I guess the question is whether any user has a SparkListener which actually uses the value of `updatedBlockStatuses` for some monitoring application or something similar. In that case returning `Nil` will preserve binary compatibility but will break semantics for their app. I don't know of any use-cases like this offhand so I don't have a personal stake in this, but I could imagine problems in case someone relies on the return value (hence suggestion of flagging if we want to be really conservative).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18162: [SPARK-20923] turn tracking of TaskMetrics._updatedBlock...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the issue:

    https://github.com/apache/spark/pull/18162
  
    sorry been out on vacation and probably won't have time this week to respond much but will update early next week. Thanks @rdblue .  I am running this in our production as well and can clearly see memory savings. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18162: [SPARK-20923] Remove TaskMetrics._updatedBlockStatuses

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the issue:

    https://github.com/apache/spark/pull/18162
  
    taskMetrics doesn't take the sparkconf or anything to get at a config so we would have to config out everywhere its incrementing or adding things.  I think that wouldn't be to hard.  I'll put everything back and just add config around them with default off.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18162: [SPARK-20923] turn tracking of TaskMetrics._updat...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18162#discussion_r121021598
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/JobProgressListener.scala ---
    @@ -528,7 +528,13 @@ class JobProgressListener(conf: SparkConf) extends SparkListener with Logging {
             new StageUIData
           })
           val taskData = stageData.taskData.get(taskId)
    -      val metrics = TaskMetrics.fromAccumulatorInfos(accumUpdates)
    +      val accumsFiltered = if (conf.get(TASK_METRICS_TRACK_UPDATED_BLOCK_STATUSES)) {
    +        accumUpdates
    +      } else {
    +        accumUpdates.filter(info => info.name.isDefined && info.update.isDefined && info.name !=
    --- End diff --
    
    ping @tgravescs , any ideas? If it's not doable, we can start a voting and decide whether to remove `updatedBlockStatus` entirely.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18162: [SPARK-20923] turn tracking of TaskMetrics._updatedBlock...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/18162
  
    @tgravescs can you address https://github.com/apache/spark/pull/18162#discussion_r122363701 ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18162: [SPARK-20923] turn tracking of TaskMetrics._updatedBlock...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18162
  
    **[Test build #77682 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77682/testReport)** for PR 18162 at commit [`fbe30cf`](https://github.com/apache/spark/commit/fbe30cf4a84863f0fc2cb8dd4e89e22e096e20c6).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18162: [SPARK-20923] turn tracking of TaskMetrics._updatedBlock...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the issue:

    https://github.com/apache/spark/pull/18162
  
    Jenkins, test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18162: [SPARK-20923] turn tracking of TaskMetrics._updatedBlock...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18162
  
    **[Test build #77687 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77687/testReport)** for PR 18162 at commit [`a875aac`](https://github.com/apache/spark/commit/a875aacdb32bd308f490270fbb3c072c5ef9c36e).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18162: [SPARK-20923] turn tracking of TaskMetrics._updatedBlock...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the issue:

    https://github.com/apache/spark/pull/18162
  
    Jenkins, test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18162: [SPARK-20923] turn tracking of TaskMetrics._updatedBlock...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the issue:

    https://github.com/apache/spark/pull/18162
  
    Yeah I was figuring I would file another jira to remove it later.  I can add the deprecated flag here if you guys agree.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18162: [SPARK-20923] turn tracking of TaskMetrics._updatedBlock...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the issue:

    https://github.com/apache/spark/pull/18162
  
    @JoshRosen what do you think should we add the deprecated?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18162: [SPARK-20923] turn tracking of TaskMetrics._updatedBlock...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the issue:

    https://github.com/apache/spark/pull/18162
  
    I'm not sure what you mean by its not doable?  what places are you seeing update the block statuses that I haven't covered here?  most of it was done by the BlockManager. Maybe I'm missing something in my intellij search or greps but I dont' think so.  Look to see where incUpdatedBlockStatuses, setUpdatedBlockStatuses (2 versions) are used.
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18162: [SPARK-20923] turn tracking of TaskMetrics._updatedBlock...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/18162
  
    I do love the clean up you did by removing `updatedBlockStatuses` entirely... Since `SparkListenerTaskEnd` is marked as a developer API, is it acceptable to make `TaskMetrics.updatedBlockStatuses` always return `Nil` in Spark 2.3? Maybe we can send an email to dev list to ask about this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18162: [SPARK-20923] turn tracking of TaskMetrics._updat...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18162#discussion_r123424029
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/JobProgressListener.scala ---
    @@ -528,7 +528,13 @@ class JobProgressListener(conf: SparkConf) extends SparkListener with Logging {
             new StageUIData
           })
           val taskData = stageData.taskData.get(taskId)
    -      val metrics = TaskMetrics.fromAccumulatorInfos(accumUpdates)
    +      val accumsFiltered = if (conf.get(TASK_METRICS_TRACK_UPDATED_BLOCK_STATUSES)) {
    +        accumUpdates
    +      } else {
    +        accumUpdates.filter(info => info.name.isDefined && info.update.isDefined && info.name !=
    --- End diff --
    
    that also works


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18162: [SPARK-20923] turn tracking of TaskMetrics._updatedBlock...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the issue:

    https://github.com/apache/spark/pull/18162
  
    upmerged to master and updated default and removed unneeded changes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18162: [SPARK-20923] turn tracking of TaskMetrics._updatedBlock...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18162
  
    **[Test build #78460 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78460/testReport)** for PR 18162 at commit [`c9bf058`](https://github.com/apache/spark/commit/c9bf0582c3cf5430da36b50d30836c7dc6f8ca72).
     * This patch passes all tests.
     * This patch **does not merge cleanly**.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18162: [SPARK-20923] turn tracking of TaskMetrics._updatedBlock...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18162
  
    Build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18162: [SPARK-20923] turn tracking of TaskMetrics._updatedBlock...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18162
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77625/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18162: [SPARK-20923] Remove TaskMetrics._updatedBlockSta...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18162#discussion_r119422345
  
    --- Diff: core/src/main/scala/org/apache/spark/util/JsonProtocol.scala ---
    @@ -368,8 +356,7 @@ private[spark] object JsonProtocol {
         ("Shuffle Read Metrics" -> shuffleReadMetrics) ~
         ("Shuffle Write Metrics" -> shuffleWriteMetrics) ~
         ("Input Metrics" -> inputMetrics) ~
    -    ("Output Metrics" -> outputMetrics) ~
    -    ("Updated Blocks" -> updatedBlocks)
    --- End diff --
    
    If we aren't sure I can also just leave it here but let it be empty


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18162: [SPARK-20923] turn tracking of TaskMetrics._updat...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18162#discussion_r119533807
  
    --- Diff: core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala ---
    @@ -112,6 +112,12 @@ class TaskMetrics private[spark] () extends Serializable {
     
       /**
        * Storage statuses of any blocks that have been updated as a result of this task.
    +   *
    +   * Tracking the _updatedBlockStatuses can use a lot of memory.
    +   * It is not used anywhere inside of Spark so we would ideally remove it, but its exposed to
    +   * the user in SparkListenerTaskEnd so the api is kept for compatibility.
    +   * By default it is configured to not actually save the block statuses via config
    +   * TASK_METRICS_TRACK_UPDATED_BLOCK_STATUSES.
        */
       def updatedBlockStatuses: Seq[(BlockId, BlockStatus)] = {
    --- End diff --
    
    Shall we deprecate it? Considering the cost of collecting this information, I think we should not encourage users to use it...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18162: [SPARK-20923] turn tracking of TaskMetrics._updatedBlock...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18162
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18162: [SPARK-20923] turn tracking of TaskMetrics._updat...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18162#discussion_r119954378
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/JobProgressListener.scala ---
    @@ -528,7 +528,13 @@ class JobProgressListener(conf: SparkConf) extends SparkListener with Logging {
             new StageUIData
           })
           val taskData = stageData.taskData.get(taskId)
    -      val metrics = TaskMetrics.fromAccumulatorInfos(accumUpdates)
    +      val accumsFiltered = if (conf.get(TASK_METRICS_TRACK_UPDATED_BLOCK_STATUSES)) {
    +        accumUpdates
    +      } else {
    +        accumUpdates.filter(info => info.name.isDefined && info.update.isDefined && info.name !=
    --- End diff --
    
    the `accumUpdates` was sent from executors, so if we already stopped tracking `updatedBlockStatus`, we don't need to do filter here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18162: [SPARK-20923] turn tracking of TaskMetrics._updat...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18162#discussion_r123273222
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/JobProgressListener.scala ---
    @@ -528,7 +528,13 @@ class JobProgressListener(conf: SparkConf) extends SparkListener with Logging {
             new StageUIData
           })
           val taskData = stageData.taskData.get(taskId)
    -      val metrics = TaskMetrics.fromAccumulatorInfos(accumUpdates)
    +      val accumsFiltered = if (conf.get(TASK_METRICS_TRACK_UPDATED_BLOCK_STATUSES)) {
    +        accumUpdates
    +      } else {
    +        accumUpdates.filter(info => info.name.isDefined && info.update.isDefined && info.name !=
    --- End diff --
    
    How about I just revert this. I'm not sure its worth an assert here.  the updated block statuses are just going to be empty so setting it to empty isn't going to hurt anything.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18162: [SPARK-20923] turn tracking of TaskMetrics._updatedBlock...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18162
  
    **[Test build #77625 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77625/testReport)** for PR 18162 at commit [`ec3c29d`](https://github.com/apache/spark/commit/ec3c29d4014e87a00d1988ce9342521af62a335e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18162: [SPARK-20923] turn tracking of TaskMetrics._updatedBlock...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the issue:

    https://github.com/apache/spark/pull/18162
  
    thanks for the reviews @cloud-fan 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18162: [SPARK-20923] Remove TaskMetrics._updatedBlockStatuses

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18162
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18162: [SPARK-20923] turn tracking of TaskMetrics._updat...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18162#discussion_r123424121
  
    --- Diff: core/src/main/scala/org/apache/spark/internal/config/package.scala ---
    @@ -295,4 +295,12 @@ package object config {
             "above this threshold. This is to avoid a giant request takes too much memory.")
           .bytesConf(ByteUnit.BYTE)
           .createWithDefaultString("200m")
    +
    +  private[spark] val TASK_METRICS_TRACK_UPDATED_BLOCK_STATUSES =
    +    ConfigBuilder("spark.taskMetrics.trackUpdatedBlockStatuses")
    --- End diff --
    
    you can document it in `configuration.md`.
    
    Actually can we turn it off by default? I think this is feature is useless for most of users. cc @JoshRosen 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18162: [SPARK-20923] turn tracking of TaskMetrics._updat...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18162#discussion_r119943504
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/JobProgressListener.scala ---
    @@ -528,7 +528,13 @@ class JobProgressListener(conf: SparkConf) extends SparkListener with Logging {
             new StageUIData
           })
           val taskData = stageData.taskData.get(taskId)
    -      val metrics = TaskMetrics.fromAccumulatorInfos(accumUpdates)
    +      val accumsFiltered = if (conf.get(TASK_METRICS_TRACK_UPDATED_BLOCK_STATUSES)) {
    +        accumUpdates
    +      } else {
    +        accumUpdates.filter(info => info.name.isDefined && info.update.isDefined && info.name !=
    --- End diff --
    
    I'm not sure what you mean by its already stored?  It gets stored into the TaskMetrics when the call below to TaskMetrics.fromAccumulatorInfo is made, now the task metrics UpdatedBlockStatuses it returns aren't really ever used by this function in updateAggregateMetrics or updateTaskMetric., but I didn't see any reason to set it since its not used.
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18162: [SPARK-20923] turn tracking of TaskMetrics._updatedBlock...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the issue:

    https://github.com/apache/spark/pull/18162
  
    It would be nice to get this into spark 2.2 if we can 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18162: [SPARK-20923] turn tracking of TaskMetrics._updat...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18162#discussion_r119525021
  
    --- Diff: core/src/main/scala/org/apache/spark/internal/config/package.scala ---
    @@ -295,4 +295,12 @@ package object config {
             "above this threshold. This is to avoid a giant request takes too much memory.")
           .bytesConf(ByteUnit.BYTE)
           .createWithDefaultString("200m")
    +
    +  private[spark] val TASK_METRICS_TRACK_UPDATED_BLOCK_STATUSES =
    +    ConfigBuilder("spark.taskMetrics.trackUpdatedBlockStatuses")
    --- End diff --
    
    Not sure if we want to document this somewhere?  If so what do you suggest there isn't any other config quite like this right now. Could either put in configuration doc or monitoring doc.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18162: [SPARK-20923] Remove TaskMetrics._updatedBlockStatuses

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18162
  
    **[Test build #77603 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77603/testReport)** for PR 18162 at commit [`41ed774`](https://github.com/apache/spark/commit/41ed7745d2be28e3d1de8ca1e2aa594a43f45760).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18162: [SPARK-20923] turn tracking of TaskMetrics._updatedBlock...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18162
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18162: [SPARK-20923] turn tracking of TaskMetrics._updatedBlock...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18162
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77687/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org