You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by Earne <gi...@git.apache.org> on 2016/04/05 03:19:42 UTC

[GitHub] spark pull request: [SPARK-14289][WIP] Add support to multiple evi...

GitHub user Earne opened a pull request:

    https://github.com/apache/spark/pull/12162

    [SPARK-14289][WIP] Add support to multiple eviction strategies for cached RDD partitions

    ## What changes were proposed in this pull request?
    
    Currently, LRU is the only eviction strategy for cached RDD partitions in Spark.
    This pull request will refactor and add support to multiple eviction strategies, such as FIFO, LFU(WIP), LCS(WIP).
    
    
    ## How was this patch tested?
    
    Manual test by set "spark.memory.entryEvictionPolicy" to LRU(default), FIFO or LCS.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/SCTS/spark SPARK-14289

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/12162.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #12162
    
----

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14289][WIP] Support multiple eviction s...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12162#issuecomment-205577231
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14289][WIP] Support multiple eviction s...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/12162#issuecomment-205652310
  
    Thanks for the pull request. Is this actually motivated by a real use case, or just doing it because it might be good to support more than one policy?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12162: [SPARK-14289][WIP] Support multiple eviction strategies ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/12162
  
    @Earne, is it still active and any opinion on the comments above? Otherwise, I will propose to close this. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12162: [SPARK-14289][WIP] Support multiple eviction strategies ...

Posted by mmakdessii <gi...@git.apache.org>.
Github user mmakdessii commented on the issue:

    https://github.com/apache/spark/pull/12162
  
    I'm working on my thesis to improve cache management systems. But i don't know anything about Spark! I found this program and I don't know how to even run it. If possible, can someone refer to me a video or steps in order to run this file? If i can see a sample implementation of LRU and know how it's made step by step then I'll be able to implement my own algorithm. I would be very grateful if someone can offer their help!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #12162: [SPARK-14289][WIP] Support multiple eviction stra...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/12162


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12162: [SPARK-14289][WIP] Support multiple eviction strategies ...

Posted by michaelmior <gi...@git.apache.org>.
Github user michaelmior commented on the issue:

    https://github.com/apache/spark/pull/12162
  
    This branch appears to be incomplete. The configuration parameter `entryEvictionPolicy` does not exist and there is a good chunk of the code that does not do anything.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12162: [SPARK-14289][WIP] Support multiple eviction strategies ...

Posted by mozinrat <gi...@git.apache.org>.
Github user mozinrat commented on the issue:

    https://github.com/apache/spark/pull/12162
  
    @Earne is something relevant merged in spark 2.0.1, do we have FIFO eviction policy?.
    If yes how can I leverage it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14289][WIP] Support multiple eviction s...

Posted by Earne <gi...@git.apache.org>.
Github user Earne commented on the pull request:

    https://github.com/apache/spark/pull/12162#issuecomment-206238646
  
    @rxin The use case that motivate this is about below.
    
    - Java objects consume a factor of 2-5x more space than the “raw” data inside their fields.
    
    - Running graphx.LiveJournalPageRank example on a 8 nodes cluster (1 work as Master, each configured with 45GB memory for Spark running in  legacy memory management mode). The dataset (about 30GB) is generated by HiBench, while running 5 iterations, time of each iteration is getting worse and worse.
    
    - By analyzing the log file, I realize that it is because memory space for cached RDD is not sufficient, and lots of partition with high recomputing cost is dropped. Recomputing these partitions brought in lots of time.
    
    - FIFO can be implemented by initialize [entries](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala#L90) with LinkedHashMap\[BlockId, MemoryEntry\[_\]\](32, 0.75f, false). And even FIFO can get much better performance than LRU.
    
    - Storage level such as MEMORY_AND_DISK  may partial solve the problem, but the effect is not very good.
    
     An eviction strategy taken the computing cost into consideration may work well (even in unified memory mode or use the MEMORY_AND_DISK level). Some cost-aware replacement policy already exists in K-V stores, such as GD-Wheel(EuroSys’15).
    
    This PR can be separated to below sub-task.
    - [ ] Refactor to  support more than one policy (LRU, FIFO, LFU).
    
    - [ ] Add a policy that taken the computing cost into consideration.
    
    - [ ] Taken serialize and deserialize cost into consideration.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12162: [SPARK-14289][WIP] Support multiple eviction strategies ...

Posted by michaelmior <gi...@git.apache.org>.
Github user michaelmior commented on the issue:

    https://github.com/apache/spark/pull/12162
  
    As best I can tell, the code that was pushed here is incomplete. However, Spark's default cache eviction policy is LRU. You can find the code which performs eviction [here](https://github.com/apache/spark/blob/1e82335413bc2384073ead0d6d581c862036d0f5/core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala#L501). It basically just works by storing all the data in a `LinkedHashMap` configured to track which elements were accessed most recently.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org