You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "yuhao yang (JIRA)" <ji...@apache.org> on 2017/07/25 21:55:00 UTC

[jira] [Created] (SPARK-21535) Reduce memory requirement for CrossValidator and TrainValidationSplit

yuhao yang created SPARK-21535:
----------------------------------

             Summary: Reduce memory requirement for CrossValidator and TrainValidationSplit 
                 Key: SPARK-21535
                 URL: https://issues.apache.org/jira/browse/SPARK-21535
             Project: Spark
          Issue Type: Improvement
          Components: ML
    Affects Versions: 2.2.0
            Reporter: yuhao yang


CrossValidator and TrainValidationSplit both use 
{code}models = est.fit(trainingDataset, epm) {code} to fit the models, where epm is Array[ParamMap].

Even though the training process is sequential, current implementation consumes extra driver memory for holding the trained models, which is not necessary and often leads to memory exception for both CrossValidator and TrainValidationSplit. My proposal is to changing the training implementation to train one model at a time, thus that used local model can be collected by GC, and avoid the unnecessary OOM exceptions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org