You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by uncleGen <gi...@git.apache.org> on 2014/08/26 09:06:24 UTC

[GitHub] spark pull request: [SPARK-3170][CORE][BUG]:RDD info loss in "Stor...

GitHub user uncleGen opened a pull request:

    https://github.com/apache/spark/pull/2131

    [SPARK-3170][CORE][BUG]:RDD info loss in "StorageTab" and "ExecutorTab"

    compeleted stage only need to remove its own partitions that are no longer cached. However, "StorageTab" may lost some rdds which are cached actually. Not only in "StorageTab", "ExectutorTab" may also lose some rdd info which have been overwritten by last rdd in a same task.
    1. "StorageTab": when multiple stages run simultaneously, completed stage will remove rdd info which belong to other stages that are still running.
    2. "ExectutorTab": taskcontext may lose some "updatedBlocks" info of  rdds  in a dependency chain. Like the following example:
             val r1 = sc.paralize(..).cache()
             val r2 = r1.map(...).cache()
             val n = r2.count()
    
    When count the r2, r1 and r2 will be cached finally. So in CacheManager.getOrCompute, the taskcontext should contain "updatedBlocks" of r1 and r2. Currently, the "updatedBlocks" only contain the info of r2. 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/uncleGen/spark master_ui_fix

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/2131.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2131
    
----
commit c82ba82ae90c92244e63811f30e1aeb05608c57a
Author: uncleGen <hu...@gmail.com>
Date:   2014-08-26T06:54:04Z

    Bug Fix: RDD info loss in "StorageTab" and "ExecutorTab"

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3170][CORE][BUG]:RDD info loss in "Stor...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2131#discussion_r16760036
  
    --- Diff: core/src/main/scala/org/apache/spark/CacheManager.scala ---
    @@ -68,7 +68,9 @@ private[spark] class CacheManager(blockManager: BlockManager) extends Logging {
               // Otherwise, cache the values and keep track of any updates in block statuses
               val updatedBlocks = new ArrayBuffer[(BlockId, BlockStatus)]
               val cachedValues = putInBlockManager(key, computedValues, storageLevel, updatedBlocks)
    -          context.taskMetrics.updatedBlocks = Some(updatedBlocks)
    +          val metrics = context.taskMetrics
    +          val lastUpdatedBlocks = metrics.updatedBlocks.getOrElse(Seq[(BlockId, BlockStatus)]())
    +          metrics.updatedBlocks = Some(lastUpdatedBlocks ++ updatedBlocks.toSeq)
    --- End diff --
    
    I raised this in the other PR and since we have a new one I'll just ask again. Can you explain what this fixes? My understanding is that this can only be called once per task, and since this is the only place where we set `updatedBlocks` I don't see how the original `TaskMetrics` could already have updated blocks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3170][CORE][BUG]:RDD info loss in "Stor...

Posted by uncleGen <gi...@git.apache.org>.
Github user uncleGen commented on the pull request:

    https://github.com/apache/spark/pull/2131#issuecomment-53521662
  
    Hi @andrewor14, test it again please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3170][CORE][BUG]:RDD info loss in "Stor...

Posted by uncleGen <gi...@git.apache.org>.
Github user uncleGen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2131#discussion_r16762096
  
    --- Diff: core/src/test/scala/org/apache/spark/CacheManagerSuite.scala ---
    @@ -87,4 +99,12 @@ class CacheManagerSuite extends FunSuite with BeforeAndAfter with EasyMockSugar
           assert(value.toList === List(1, 2, 3, 4))
         }
       }
    +
    +  test("verity task metrics updated correctly") {
    +    blockManager = sc.env.blockManager
    +    cacheManager = new CacheManager(blockManager)
    +    val context = new TaskContext(0, 0, 0)
    +    cacheManager.getOrCompute(rdd3, split, context, StorageLevel.MEMORY_ONLY)
    +    assert(context.taskMetrics.updatedBlocks.getOrElse(Seq()).size == 2)
    --- End diff --
    
    sorry for my poor coding, I will review again


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3170][CORE][BUG]:RDD info loss in "Stor...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2131#discussion_r16758471
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/storage/StorageTab.scala ---
    @@ -70,8 +70,10 @@ class StorageListener(storageStatusListener: StorageStatusListener) extends Spar
       }
     
       override def onStageCompleted(stageCompleted: SparkListenerStageCompleted) = synchronized {
    -    // Remove all partitions that are no longer cached
    -    _rddInfoMap.retain { case (_, info) => info.numCachedPartitions > 0 }
    +    // Remove all partitions that are no longer cached in current completed stage
    +    val completedRddInfoIds = Set[Int]() ++ stageCompleted.stageInfo.rddInfos.map(r => r.id)
    +    _rddInfoMap.retain { case (id, info) =>
    +               !completedRddInfoIds.contains(id) || info.numCachedPartitions > 0 }
    --- End diff --
    
    indentation is weird


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3170][CORE][BUG]:RDD info loss in "Stor...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2131#discussion_r16760122
  
    --- Diff: core/src/test/scala/org/apache/spark/CacheManagerSuite.scala ---
    @@ -87,4 +99,12 @@ class CacheManagerSuite extends FunSuite with BeforeAndAfter with EasyMockSugar
           assert(value.toList === List(1, 2, 3, 4))
         }
       }
    +
    +  test("verity task metrics updated correctly") {
    +    blockManager = sc.env.blockManager
    +    cacheManager = new CacheManager(blockManager)
    --- End diff --
    
    is there a reason why you're not using `sc.env.cacheManager`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3170][CORE][BUG]:RDD info loss in "Stor...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2131#discussion_r16758476
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/storage/StorageTab.scala ---
    @@ -70,8 +70,10 @@ class StorageListener(storageStatusListener: StorageStatusListener) extends Spar
       }
     
       override def onStageCompleted(stageCompleted: SparkListenerStageCompleted) = synchronized {
    -    // Remove all partitions that are no longer cached
    -    _rddInfoMap.retain { case (_, info) => info.numCachedPartitions > 0 }
    +    // Remove all partitions that are no longer cached in current completed stage
    +    val completedRddInfoIds = Set[Int]() ++ stageCompleted.stageInfo.rddInfos.map(r => r.id)
    --- End diff --
    
    Why not just use `toSet`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3170][CORE][BUG]:RDD info loss in "Stor...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2131#discussion_r16760352
  
    --- Diff: core/src/main/scala/org/apache/spark/CacheManager.scala ---
    @@ -68,7 +68,9 @@ private[spark] class CacheManager(blockManager: BlockManager) extends Logging {
               // Otherwise, cache the values and keep track of any updates in block statuses
               val updatedBlocks = new ArrayBuffer[(BlockId, BlockStatus)]
               val cachedValues = putInBlockManager(key, computedValues, storageLevel, updatedBlocks)
    -          context.taskMetrics.updatedBlocks = Some(updatedBlocks)
    +          val metrics = context.taskMetrics
    +          val lastUpdatedBlocks = metrics.updatedBlocks.getOrElse(Seq[(BlockId, BlockStatus)]())
    +          metrics.updatedBlocks = Some(lastUpdatedBlocks ++ updatedBlocks.toSeq)
    --- End diff --
    
    A single task could call `getOrCompute` multiple times in a chain - if it is computing several pipelined RDD's.
    
    ```
    a.cache().filter(---).cache().count
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3170][CORE][BUG]:RDD info loss in "Stor...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2131#discussion_r16760627
  
    --- Diff: core/src/test/scala/org/apache/spark/CacheManagerSuite.scala ---
    @@ -87,4 +99,12 @@ class CacheManagerSuite extends FunSuite with BeforeAndAfter with EasyMockSugar
           assert(value.toList === List(1, 2, 3, 4))
         }
       }
    +
    +  test("verity task metrics updated correctly") {
    +    blockManager = sc.env.blockManager
    +    cacheManager = new CacheManager(blockManager)
    +    val context = new TaskContext(0, 0, 0)
    +    cacheManager.getOrCompute(rdd3, split, context, StorageLevel.MEMORY_ONLY)
    +    assert(context.taskMetrics.updatedBlocks.getOrElse(Seq()).size == 2)
    --- End diff --
    
    Can you use `===` here instead of `==`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3170][CORE][BUG]:RDD info loss in "Stor...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/2131#issuecomment-53528924
  
    Looks like this failed one of your own tests


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3170][CORE][BUG]:RDD info loss in "Stor...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2131#issuecomment-53525126
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19284/consoleFull) for   PR 2131 at commit [`56ea488`](https://github.com/apache/spark/commit/56ea48848fe023a2224692bd348cb5f1fa25d718).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3170][CORE][BUG]:RDD info loss in "Stor...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2131#issuecomment-53455192
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19216/consoleFull) for   PR 2131 at commit [`c82ba82`](https://github.com/apache/spark/commit/c82ba82ae90c92244e63811f30e1aeb05608c57a).
     * This patch **fails** unit tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3170][CORE][BUG]:RDD info loss in "Stor...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2131#issuecomment-53550329
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19308/consoleFull) for   PR 2131 at commit [`a6a8a0b`](https://github.com/apache/spark/commit/a6a8a0b4a97893a9c47645e0858885192cb41abf).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3170][CORE][BUG]:RDD info loss in "Stor...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/2131


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3170][CORE][BUG]:RDD info loss in "Stor...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2131#discussion_r16758470
  
    --- Diff: core/src/test/scala/org/apache/spark/CacheManagerSuite.scala ---
    @@ -87,4 +99,12 @@ class CacheManagerSuite extends FunSuite with BeforeAndAfter with EasyMockSugar
           assert(value.toList === List(1, 2, 3, 4))
         }
       }
    +
    +  test("verity task metrics updated correctly") {
    --- End diff --
    
    verify


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3170][CORE][BUG]:RDD info loss in "Stor...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/2131#issuecomment-53525012
  
    add to whitelist


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3170][CORE][BUG]:RDD info loss in "Stor...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2131#issuecomment-53383213
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3170][CORE][BUG]:RDD info loss in "Stor...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2131#discussion_r16761022
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/storage/StorageTab.scala ---
    @@ -70,8 +70,10 @@ class StorageListener(storageStatusListener: StorageStatusListener) extends Spar
       }
     
       override def onStageCompleted(stageCompleted: SparkListenerStageCompleted) = synchronized {
    -    // Remove all partitions that are no longer cached
    -    _rddInfoMap.retain { case (_, info) => info.numCachedPartitions > 0 }
    +    // Remove all partitions that are no longer cached in current completed stage
    +    val completedRddInfoIds = Set[Int]() ++ stageCompleted.stageInfo.rddInfos.map(r => r.id)
    --- End diff --
    
    Also, I would just call this `completedRddIds`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3170][CORE][BUG]:RDD info loss in "Stor...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2131#issuecomment-53554871
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19308/consoleFull) for   PR 2131 at commit [`a6a8a0b`](https://github.com/apache/spark/commit/a6a8a0b4a97893a9c47645e0858885192cb41abf).
     * This patch **passes** unit tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3170][CORE][BUG]:RDD info loss in "Stor...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2131#issuecomment-53527156
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19284/consoleFull) for   PR 2131 at commit [`56ea488`](https://github.com/apache/spark/commit/56ea48848fe023a2224692bd348cb5f1fa25d718).
     * This patch **fails** unit tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3170][CORE][BUG]:RDD info loss in "Stor...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2131#discussion_r16760497
  
    --- Diff: core/src/main/scala/org/apache/spark/CacheManager.scala ---
    @@ -68,7 +68,9 @@ private[spark] class CacheManager(blockManager: BlockManager) extends Logging {
               // Otherwise, cache the values and keep track of any updates in block statuses
               val updatedBlocks = new ArrayBuffer[(BlockId, BlockStatus)]
               val cachedValues = putInBlockManager(key, computedValues, storageLevel, updatedBlocks)
    -          context.taskMetrics.updatedBlocks = Some(updatedBlocks)
    +          val metrics = context.taskMetrics
    +          val lastUpdatedBlocks = metrics.updatedBlocks.getOrElse(Seq[(BlockId, BlockStatus)]())
    +          metrics.updatedBlocks = Some(lastUpdatedBlocks ++ updatedBlocks.toSeq)
    --- End diff --
    
    Ah I see. Makes sense.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3170][CORE][BUG]:RDD info loss in "Stor...

Posted by uncleGen <gi...@git.apache.org>.
Github user uncleGen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2131#discussion_r16761636
  
    --- Diff: core/src/main/scala/org/apache/spark/CacheManager.scala ---
    @@ -68,7 +68,9 @@ private[spark] class CacheManager(blockManager: BlockManager) extends Logging {
               // Otherwise, cache the values and keep track of any updates in block statuses
               val updatedBlocks = new ArrayBuffer[(BlockId, BlockStatus)]
               val cachedValues = putInBlockManager(key, computedValues, storageLevel, updatedBlocks)
    -          context.taskMetrics.updatedBlocks = Some(updatedBlocks)
    +          val metrics = context.taskMetrics
    +          val lastUpdatedBlocks = metrics.updatedBlocks.getOrElse(Seq[(BlockId, BlockStatus)]())
    +          metrics.updatedBlocks = Some(lastUpdatedBlocks ++ updatedBlocks.toSeq)
    --- End diff --
    
    @andrewor14 IMHO, the "getOrCompute" can be called more than once per task (indirect recursively). In this code snippet: 
    
           val rdd1 = sc.parallelize(...).cache()
           val rdd2 = rdd1.map(...).cache()
           val count = rdd2.count()
    
    This code snippet will submit one stage . We take task-1 as an example. Task-1 firstly calls getOrCompute(rdd-2) , and then calls getOrCompute(rdd-1) inside "getOrCompute(rdd-2)". Therefore, it will generates and caches block rdd-1-1 and  block rdd-2-1 one by one. At the end of getOrCompute(rdd-1), the taskMetrics.updatedBlocks of task-1 will be seq(rdd-1-1). Then at the end of getOrCompute(rdd-2), the taskMetrics.updatedBlocks will be seq(rdd-1-1, rdd-2-1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3170][CORE][BUG]:RDD info loss in "Stor...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2131#discussion_r16760052
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/storage/StorageTab.scala ---
    @@ -70,8 +70,10 @@ class StorageListener(storageStatusListener: StorageStatusListener) extends Spar
       }
     
       override def onStageCompleted(stageCompleted: SparkListenerStageCompleted) = synchronized {
    -    // Remove all partitions that are no longer cached
    -    _rddInfoMap.retain { case (_, info) => info.numCachedPartitions > 0 }
    +    // Remove all partitions that are no longer cached in current completed stage
    +    val completedRddInfoIds = Set[Int]() ++ stageCompleted.stageInfo.rddInfos.map(r => r.id)
    +    _rddInfoMap.retain { case (id, info) =>
    +               !completedRddInfoIds.contains(id) || info.numCachedPartitions > 0 }
    --- End diff --
    
    also, the style should be
    ```
    _rddInfoMap.retain { case (id, info) =>
      !completed...
    }
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3170][CORE][BUG]:RDD info loss in "Stor...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2131#discussion_r16758468
  
    --- Diff: core/src/test/scala/org/apache/spark/ui/storage/StorageTabSuite.scala ---
    @@ -162,4 +163,29 @@ class StorageTabSuite extends FunSuite with BeforeAndAfter {
         assert(storageListener._rddInfoMap(2).numCachedPartitions === 0)
       }
     
    +  test("verity StorageTab contains all cached rdds") {
    --- End diff --
    
    verify


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3170][CORE][BUG]:RDD info loss in "Stor...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/2131#issuecomment-53454018
  
    test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3170][CORE][BUG]:RDD info loss in "Stor...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/2131#issuecomment-53455331
  
    Hi @uncleGen there's another line too long...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3170][CORE][BUG]:RDD info loss in "Stor...

Posted by uncleGen <gi...@git.apache.org>.
Github user uncleGen commented on the pull request:

    https://github.com/apache/spark/pull/2131#issuecomment-53549872
  
    @andrewor14 sorry for my poor coding. Unit test passed locally, test it again pls.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3170][CORE][BUG]:RDD info loss in "Stor...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2131#issuecomment-53455017
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19216/consoleFull) for   PR 2131 at commit [`c82ba82`](https://github.com/apache/spark/commit/c82ba82ae90c92244e63811f30e1aeb05608c57a).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3170][CORE][BUG]:RDD info loss in "Stor...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/2131#issuecomment-53610224
  
    Thanks, I have merged this into master and branch-1.1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org