You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by caneGuy <gi...@git.apache.org> on 2018/02/24 11:01:46 UTC

[GitHub] spark pull request #20667: [SPARK-23508][CORE] Use timeStampedHashMap for Bl...

GitHub user caneGuy opened a pull request:

    https://github.com/apache/spark/pull/20667

    [SPARK-23508][CORE] Use timeStampedHashMap for BlockmanagerId in case blockManagerIdCache…

    … cause oom
    
    ## What changes were proposed in this pull request?
    blockManagerIdCache in BlockManagerId will not remove old values which may cause oom
    
    `val blockManagerIdCache = new ConcurrentHashMap[BlockManagerId, BlockManagerId]()`
    Since whenever we apply a new BlockManagerId, it will put into this map.
    
    This patch will use timestampHashMap instead for `JsonProtocol`.
    
    ## How was this patch tested?
    Exist tests.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/caneGuy/spark zhoukang/fix-history

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20667.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20667
    
----
commit fc1b6a0169c123a825a253defb021c73aebf1c98
Author: zhoukang <zh...@...>
Date:   2018-02-24T10:13:01Z

    Use timeStampedHashMap for BlockmanagerId in case blockManagerIdCache cause oom

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20667: [SPARK-23508][CORE] Fix BlockmanagerId in case blockMana...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20667
  
    **[Test build #87748 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87748/testReport)** for PR 20667 at commit [`bf79f4d`](https://github.com/apache/spark/commit/bf79f4d5c83c364c7f1fc05f158753d282409330).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20667: [SPARK-23508][CORE] Use timeStampedHashMap for Bl...

Posted by caneGuy <gi...@git.apache.org>.
Github user caneGuy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20667#discussion_r170516775
  
    --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManagerId.scala ---
    @@ -132,10 +133,15 @@ private[spark] object BlockManagerId {
         getCachedBlockManagerId(obj)
       }
     
    -  val blockManagerIdCache = new ConcurrentHashMap[BlockManagerId, BlockManagerId]()
    +  val blockManagerIdCache = new TimeStampedHashMap[BlockManagerId, BlockManagerId](true)
     
    -  def getCachedBlockManagerId(id: BlockManagerId): BlockManagerId = {
    +  def getCachedBlockManagerId(id: BlockManagerId, clearOldValues: Boolean = false): BlockManagerId =
    +  {
         blockManagerIdCache.putIfAbsent(id, id)
    -    blockManagerIdCache.get(id)
    +    val blockManagerId = blockManagerIdCache.get(id)
    +    if (clearOldValues) {
    +      blockManagerIdCache.clearOldValues(System.currentTimeMillis - Utils.timeStringAsMs("10d"))
    --- End diff --
    
    @Ngone51 Thanks.i also though about remove when we delete a block.
    In this case, it is history replaying which will trigger this problem,and we do not delete any block actually.
    Maybe use `weakreference` better as @jiangxb1987 mentioned?WDYT?
    Thanks again!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20667: [SPARK-23508][CORE] Fix BlockmanagerId in case bl...

Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20667#discussion_r171136135
  
    --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManagerId.scala ---
    @@ -132,10 +133,17 @@ private[spark] object BlockManagerId {
         getCachedBlockManagerId(obj)
       }
     
    -  val blockManagerIdCache = new ConcurrentHashMap[BlockManagerId, BlockManagerId]()
    +  /**
    +   * Here we set max cache size as 10000.Since the size of a BlockManagerId object
    --- End diff --
    
    nit:
    ```
    The max cache size is hardcoded to 10000, since the size of a BlockManagerId object is about 48B, the total memory cost should be below 1MB which is feasible.
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20667: [SPARK-23508][CORE] Fix BlockmanagerId in case blockMana...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20667
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87748/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20667: [SPARK-23508][CORE] Fix BlockmanagerId in case bl...

Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20667#discussion_r170865897
  
    --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManagerId.scala ---
    @@ -132,10 +133,15 @@ private[spark] object BlockManagerId {
         getCachedBlockManagerId(obj)
       }
     
    -  val blockManagerIdCache = new ConcurrentHashMap[BlockManagerId, BlockManagerId]()
    +  val blockManagerIdCache = CacheBuilder.newBuilder()
    +    .maximumSize(500)
    --- End diff --
    
    I‘m +1 on hardcode this, but we shall also explain in the comment the reason why this number is chosen.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20667: [SPARK-23508][CORE] Use timeStampedHashMap for Bl...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20667#discussion_r170520957
  
    --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManagerId.scala ---
    @@ -132,10 +133,15 @@ private[spark] object BlockManagerId {
         getCachedBlockManagerId(obj)
       }
     
    -  val blockManagerIdCache = new ConcurrentHashMap[BlockManagerId, BlockManagerId]()
    +  val blockManagerIdCache = new TimeStampedHashMap[BlockManagerId, BlockManagerId](true)
     
    -  def getCachedBlockManagerId(id: BlockManagerId): BlockManagerId = {
    +  def getCachedBlockManagerId(id: BlockManagerId, clearOldValues: Boolean = false): BlockManagerId =
    +  {
         blockManagerIdCache.putIfAbsent(id, id)
    -    blockManagerIdCache.get(id)
    +    val blockManagerId = blockManagerIdCache.get(id)
    +    if (clearOldValues) {
    +      blockManagerIdCache.clearOldValues(System.currentTimeMillis - Utils.timeStringAsMs("10d"))
    --- End diff --
    
    Can we simply use `com.google.common.cache.Cache`? which has a size limitation and we don't need to worry about OOM.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20667: [SPARK-23508][CORE] Fix BlockmanagerId in case blockMana...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20667
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87761/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20667: [SPARK-23508][CORE] Fix BlockmanagerId in case bl...

Posted by caneGuy <gi...@git.apache.org>.
Github user caneGuy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20667#discussion_r170881808
  
    --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManagerId.scala ---
    @@ -132,10 +133,15 @@ private[spark] object BlockManagerId {
         getCachedBlockManagerId(obj)
       }
     
    -  val blockManagerIdCache = new ConcurrentHashMap[BlockManagerId, BlockManagerId]()
    +  val blockManagerIdCache = CacheBuilder.newBuilder()
    +    .maximumSize(500)
    --- End diff --
    
    Actually i think 500 executors can handle most applications.And for historyserver it is no need to cache too much `BlockManagerId`.If we set this number as 50 the max size of cache will below 30KB.
    Agree with that? @jiangxb1987 If ok i will update documentation.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20667: [SPARK-23508][CORE] Use softreference for BlockmanagerId...

Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on the issue:

    https://github.com/apache/spark/pull/20667
  
    Had a offline chat with @cloud-fan and we feel `com.google.common.cache.Cache` should be used here. You can find a example at `CodeGenerator.cache`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20667: [SPARK-23508][CORE] Fix BlockmanagerId in case blockMana...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/20667
  
    thanks, merging to master/2.3/2.2!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20667: [SPARK-23508][CORE] Use timeStampedHashMap for Blockmana...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20667
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20667: [SPARK-23508][CORE] Fix BlockmanagerId in case blockMana...

Posted by caneGuy <gi...@git.apache.org>.
Github user caneGuy commented on the issue:

    https://github.com/apache/spark/pull/20667
  
    Thanks @cloud-fan @jiangxb1987 @kiszk @Ngone51 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20667: [SPARK-23508][CORE] Fix BlockmanagerId in case bl...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20667#discussion_r170844541
  
    --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManagerId.scala ---
    @@ -132,10 +133,15 @@ private[spark] object BlockManagerId {
         getCachedBlockManagerId(obj)
       }
     
    -  val blockManagerIdCache = new ConcurrentHashMap[BlockManagerId, BlockManagerId]()
    +  val blockManagerIdCache = CacheBuilder.newBuilder()
    +    .maximumSize(500)
    +    .build(new CacheLoader[BlockManagerId, BlockManagerId]() {
    +      override def load(id: BlockManagerId) = {
    --- End diff --
    
    nit: `override def load(id: BlockManagerId) = id`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20667: [SPARK-23508][CORE] Use timeStampedHashMap for Blockmana...

Posted by caneGuy <gi...@git.apache.org>.
Github user caneGuy commented on the issue:

    https://github.com/apache/spark/pull/20667
  
    @cloud-fan I just find commit log below
    `Modified StorageLevel and BlockManagerId to cache common objects and use cached object while deserializing.`
    I can't figure out why we need cache, since i think the cache miss may be high?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20667: [SPARK-23508][CORE] Fix BlockmanagerId in case blockMana...

Posted by caneGuy <gi...@git.apache.org>.
Github user caneGuy commented on the issue:

    https://github.com/apache/spark/pull/20667
  
    Update @jiangxb1987 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20667: [SPARK-23508][CORE] Fix BlockmanagerId in case blockMana...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20667
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87744/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20667: [SPARK-23508][CORE] Use softreference for BlockmanagerId...

Posted by caneGuy <gi...@git.apache.org>.
Github user caneGuy commented on the issue:

    https://github.com/apache/spark/pull/20667
  
    Nice @jiangxb1987 @cloud-fan I will modify later.Thanks!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20667: [SPARK-23508][CORE] Fix BlockmanagerId in case bl...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20667#discussion_r170844624
  
    --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManagerId.scala ---
    @@ -132,10 +133,15 @@ private[spark] object BlockManagerId {
         getCachedBlockManagerId(obj)
       }
     
    -  val blockManagerIdCache = new ConcurrentHashMap[BlockManagerId, BlockManagerId]()
    +  val blockManagerIdCache = CacheBuilder.newBuilder()
    --- End diff --
    
     is it thread-safe?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20667: [SPARK-23508][CORE] Use timeStampedHashMap for Blockmana...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20667
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20667: [SPARK-23508][CORE] Fix BlockmanagerId in case bl...

Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20667#discussion_r170884966
  
    --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManagerId.scala ---
    @@ -132,10 +133,15 @@ private[spark] object BlockManagerId {
         getCachedBlockManagerId(obj)
       }
     
    -  val blockManagerIdCache = new ConcurrentHashMap[BlockManagerId, BlockManagerId]()
    +  val blockManagerIdCache = CacheBuilder.newBuilder()
    +    .maximumSize(500)
    --- End diff --
    
    Please feel free to update any comment, and I would set the default value to 10000 since I think cost of 500KB memory is feasible.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20667: [SPARK-23508][CORE] Fix BlockmanagerId in case blockMana...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/20667
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20667: [SPARK-23508][CORE] Fix BlockmanagerId in case blockMana...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/20667
  
    LGTM if we need caching.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20667: [SPARK-23508][CORE] Fix BlockmanagerId in case blockMana...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20667
  
    **[Test build #87744 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87744/testReport)** for PR 20667 at commit [`3379899`](https://github.com/apache/spark/commit/337989945b0757dfc6a069315c4e7828afe77d00).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20667: [SPARK-23508][CORE] Use timeStampedHashMap for Bl...

Posted by Ngone51 <gi...@git.apache.org>.
Github user Ngone51 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20667#discussion_r170424196
  
    --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManagerId.scala ---
    @@ -132,10 +133,15 @@ private[spark] object BlockManagerId {
         getCachedBlockManagerId(obj)
       }
     
    -  val blockManagerIdCache = new ConcurrentHashMap[BlockManagerId, BlockManagerId]()
    +  val blockManagerIdCache = new TimeStampedHashMap[BlockManagerId, BlockManagerId](true)
     
    -  def getCachedBlockManagerId(id: BlockManagerId): BlockManagerId = {
    +  def getCachedBlockManagerId(id: BlockManagerId, clearOldValues: Boolean = false): BlockManagerId =
    +  {
         blockManagerIdCache.putIfAbsent(id, id)
    -    blockManagerIdCache.get(id)
    +    val blockManagerId = blockManagerIdCache.get(id)
    +    if (clearOldValues) {
    +      blockManagerIdCache.clearOldValues(System.currentTimeMillis - Utils.timeStringAsMs("10d"))
    --- End diff --
    
    10 days? I don't think *time* can be a judging criteria to decide whether we should remove a cached id or not, even if you set the time threshold far less/greater than '10d'. Think about a extreamly case that a block could be frequently got all the time during the app‘s running. So, it would be certainly removed from cache due to the time threshold, and recached next time we get it, and repeatedly.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20667: [SPARK-23508][CORE] Fix BlockmanagerId in case blockMana...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/20667
  
    ok to test


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20667: [SPARK-23508][CORE] Fix BlockmanagerId in case blockMana...

Posted by Ngone51 <gi...@git.apache.org>.
Github user Ngone51 commented on the issue:

    https://github.com/apache/spark/pull/20667
  
    Hi, @caneGuy , sorry for my previous comment as I mixed up ```BlockId``` with ```BlockManagerId```, and leave some wrong comments. And thanks for your reply.
    
    Back to now, I have the same question with @cloud-fan ,
    > Why we need this cache?
    
    though, we have a better cache way(guava cache) now.
    
    My confusions:
    - It is weird  that we need to create a ```BlockManagerId ``` before  we get a same one from the cache.
    
    - And on executor side, when ```BlockManagerId ``` registered to master and return with an updated ```BlockManagerId ``` , the new ```BlockManagerId ``` does not be updated to ```blockManagerIdCache```. So, it seems executor side's ```BlockManagerId``` has little relevance with ```blockManagerIdCache```.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20667: [SPARK-23508][CORE] Fix BlockmanagerId in case bl...

Posted by caneGuy <gi...@git.apache.org>.
Github user caneGuy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20667#discussion_r170864215
  
    --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManagerId.scala ---
    @@ -132,10 +133,15 @@ private[spark] object BlockManagerId {
         getCachedBlockManagerId(obj)
       }
     
    -  val blockManagerIdCache = new ConcurrentHashMap[BlockManagerId, BlockManagerId]()
    +  val blockManagerIdCache = CacheBuilder.newBuilder()
    --- End diff --
    
    I think it is thread-safe which i refer from:
    https://google.github.io/guava/releases/22.0/api/docs/com/google/common/cache/LoadingCache.html
    and https://stackoverflow.com/questions/11124856/using-guava-for-high-performance-thread-safe-caching


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20667: [SPARK-23508][CORE] Use timeStampedHashMap for Bl...

Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20667#discussion_r170451578
  
    --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManagerId.scala ---
    @@ -132,10 +133,15 @@ private[spark] object BlockManagerId {
         getCachedBlockManagerId(obj)
       }
     
    -  val blockManagerIdCache = new ConcurrentHashMap[BlockManagerId, BlockManagerId]()
    +  val blockManagerIdCache = new TimeStampedHashMap[BlockManagerId, BlockManagerId](true)
    --- End diff --
    
    Will this cause serious performance regression? `TimeStampedHashMap` will copy all the old values on update. cc @cloud-fan 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20667: [SPARK-23508][CORE] Fix BlockmanagerId in case blockMana...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20667
  
    **[Test build #87746 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87746/testReport)** for PR 20667 at commit [`3379899`](https://github.com/apache/spark/commit/337989945b0757dfc6a069315c4e7828afe77d00).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20667: [SPARK-23508][CORE] Fix BlockmanagerId in case bl...

Posted by caneGuy <gi...@git.apache.org>.
Github user caneGuy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20667#discussion_r171121778
  
    --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManagerId.scala ---
    @@ -132,10 +133,15 @@ private[spark] object BlockManagerId {
         getCachedBlockManagerId(obj)
       }
     
    -  val blockManagerIdCache = new ConcurrentHashMap[BlockManagerId, BlockManagerId]()
    +  val blockManagerIdCache = CacheBuilder.newBuilder()
    +    .maximumSize(500)
    --- End diff --
    
    Thanks @jiangxb1987 i have updated the comment


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20667: [SPARK-23508][CORE] Fix BlockmanagerId in case blockMana...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20667
  
    **[Test build #87761 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87761/testReport)** for PR 20667 at commit [`bf79f4d`](https://github.com/apache/spark/commit/bf79f4d5c83c364c7f1fc05f158753d282409330).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20667: [SPARK-23508][CORE] Fix BlockmanagerId in case bl...

Posted by caneGuy <gi...@git.apache.org>.
Github user caneGuy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20667#discussion_r170817107
  
    --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManagerId.scala ---
    @@ -132,10 +133,15 @@ private[spark] object BlockManagerId {
         getCachedBlockManagerId(obj)
       }
     
    -  val blockManagerIdCache = new ConcurrentHashMap[BlockManagerId, BlockManagerId]()
    +  val blockManagerIdCache = CacheBuilder.newBuilder()
    +    .maximumSize(500)
    --- End diff --
    
    here i set 500
    since `blockmanagerId` about `48B` per object.
    I do not use spark conf since it is not convenient to get spark conf for historyserver when use BlockManagerId


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20667: [SPARK-23508][CORE] Fix BlockmanagerId in case blockMana...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20667
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20667: [SPARK-23508][CORE] Fix BlockmanagerId in case blockMana...

Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on the issue:

    https://github.com/apache/spark/pull/20667
  
    In case the same `BlockManagerId` being created multiple times, this cache will ensure we always use the first one that is created, which make it possible for the rest `BlockManagerId` instances being recycled shortly. The downside is we have to persist all the distinct `BlockManagerId` created.
    
    Since the code is added long times ago, and it's actually hard to examine the performance with/without the cache, we'd like to keep it for now. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20667: [SPARK-23508][CORE] Fix BlockmanagerId in case blockMana...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20667
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20667: [SPARK-23508][CORE] Fix BlockmanagerId in case blockMana...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20667
  
    **[Test build #87761 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87761/testReport)** for PR 20667 at commit [`bf79f4d`](https://github.com/apache/spark/commit/bf79f4d5c83c364c7f1fc05f158753d282409330).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20667: [SPARK-23508][CORE] Fix BlockmanagerId in case blockMana...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/20667
  
    LGTM


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20667: [SPARK-23508][CORE] Fix BlockmanagerId in case blockMana...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/20667
  
    add to whitelist


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20667: [SPARK-23508][CORE] Fix BlockmanagerId in case blockMana...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20667
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87746/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20667: [SPARK-23508][CORE] Use timeStampedHashMap for Blockmana...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/20667
  
    Why we need this cache?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20667: [SPARK-23508][CORE] Fix BlockmanagerId in case bl...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20667#discussion_r170844718
  
    --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManagerId.scala ---
    @@ -132,10 +133,15 @@ private[spark] object BlockManagerId {
         getCachedBlockManagerId(obj)
       }
     
    -  val blockManagerIdCache = new ConcurrentHashMap[BlockManagerId, BlockManagerId]()
    +  val blockManagerIdCache = CacheBuilder.newBuilder()
    +    .maximumSize(500)
    --- End diff --
    
    I'm ok to hardcode it for now.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20667: [SPARK-23508][CORE] Fix BlockmanagerId in case blockMana...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20667
  
    **[Test build #87744 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87744/testReport)** for PR 20667 at commit [`3379899`](https://github.com/apache/spark/commit/337989945b0757dfc6a069315c4e7828afe77d00).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20667: [SPARK-23508][CORE] Use timeStampedHashMap for Blockmana...

Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on the issue:

    https://github.com/apache/spark/pull/20667
  
    Can we use WeakReference here to keep cached `BlockManagerId` ?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20667: [SPARK-23508][CORE] Fix BlockmanagerId in case bl...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/20667


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20667: [SPARK-23508][CORE] Fix BlockmanagerId in case blockMana...

Posted by Ngone51 <gi...@git.apache.org>.
Github user Ngone51 commented on the issue:

    https://github.com/apache/spark/pull/20667
  
    Hi, @jiangxb1987 , thanks for your kindly explanation.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20667: [SPARK-23508][CORE] Fix BlockmanagerId in case blockMana...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20667
  
    **[Test build #87746 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87746/testReport)** for PR 20667 at commit [`3379899`](https://github.com/apache/spark/commit/337989945b0757dfc6a069315c4e7828afe77d00).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20667: [SPARK-23508][CORE] Fix BlockmanagerId in case blockMana...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20667
  
    **[Test build #87748 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87748/testReport)** for PR 20667 at commit [`bf79f4d`](https://github.com/apache/spark/commit/bf79f4d5c83c364c7f1fc05f158753d282409330).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20667: [SPARK-23508][CORE] Fix BlockmanagerId in case blockMana...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20667
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20667: [SPARK-23508][CORE] Fix BlockmanagerId in case blockMana...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20667
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org