You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Xiangrui Meng (JIRA)" <ji...@apache.org> on 2015/06/22 08:45:00 UTC

[jira] [Updated] (SPARK-6717) Clear shuffle files after checkpointing in ALS

     [ https://issues.apache.org/jira/browse/SPARK-6717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Xiangrui Meng updated SPARK-6717:
---------------------------------
    Affects Version/s:     (was: 1.3.1)

> Clear shuffle files after checkpointing in ALS
> ----------------------------------------------
>
>                 Key: SPARK-6717
>                 URL: https://issues.apache.org/jira/browse/SPARK-6717
>             Project: Spark
>          Issue Type: Improvement
>          Components: MLlib
>    Affects Versions: 1.4.0
>            Reporter: Xiangrui Meng
>            Assignee: Xiangrui Meng
>              Labels: als
>
> In ALS iterations, we checkpoint RDDs to cut lineage and to reduce shuffle files. However, whether to clean shuffle files depends on the system GC, which may not be triggered in ALS iterations. So after checkpointing, before we let the RDD object go out of scope, we should clean its shuffle dependencies explicitly. This function could either stay inside ALS or go to Core.
> Without this feature, we can call System.gc() periodically to clean shuffle files of RDDs that went out of scope.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org