You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@systemds.apache.org by GitBox <gi...@apache.org> on 2022/12/16 22:31:23 UTC

[GitHub] [systemds] phaniarnab opened a new pull request, #1751: [SYSTEMDS-3479] Persist and reuse of Spark RDDs

phaniarnab opened a new pull request, #1751:
URL: https://github.com/apache/systemds/pull/1751

   This patch extends lineage cache to store the RDD objects which are checkpointed. This addition allows the compiler to place a chkpoint after a potentially redundant operator. During runtime, we then persist the RDD, save the RDD in the lineage cache, and reuse if the instruction repeats. It is a bit tricky to cache a RDD for a lineage trace of a previous instruction. A better way would be to be able to mark any instruction to persist the result RDD and skip the chkpoint instruction.
   Hyperparameter tuning of LmDS with 2.5k columns improves by 4x by caching the cpmm results in the executors.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@systemds.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [systemds] phaniarnab closed pull request #1751: [SYSTEMDS-3479] Persist and reuse of Spark RDDs

Posted by GitBox <gi...@apache.org>.
phaniarnab closed pull request #1751: [SYSTEMDS-3479] Persist and reuse of Spark RDDs
URL: https://github.com/apache/systemds/pull/1751


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@systemds.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org