You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@systemds.apache.org by "phaniarnab (via GitHub)" <gi...@apache.org> on 2023/05/27 20:09:23 UTC

[GitHub] [systemds] phaniarnab opened a new pull request, #1834: [SYSTEMDS-3518] Eviction of lineage-cached RDDs from Spark storage

phaniarnab opened a new pull request, #1834:
URL: https://github.com/apache/systemds/pull/1834

   This patch extends the lineage cache eviction policies to support RDDs persisted at the executors.
   - We checkpoint a RDD on the second cache hit (reduce cache pollution).
   - While checkpointing, we rely on the worst case size estimations and later update the eviction data structures with actual size once the RDDs are persisted.
   - We split the Spark operators into two groups, one for expensive, shuffle-based operations, and another for map-based operations. For the scoring function, we assume the first set is 2x more expensive
   - We also track the reference counts of RDDs and use that in the scoring. More references (many consumers) indicates higher importance.
   - We reduce the score by one hit count if we collect a persisted RDD. This is to evict the intermediates which are cached at multiple locations.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@systemds.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [systemds] phaniarnab closed pull request #1834: [SYSTEMDS-3518] Eviction of lineage-cached RDDs from Spark storage

Posted by "phaniarnab (via GitHub)" <gi...@apache.org>.
phaniarnab closed pull request #1834: [SYSTEMDS-3518] Eviction of lineage-cached RDDs from Spark storage
URL: https://github.com/apache/systemds/pull/1834


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@systemds.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org