You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "yuhao yang (JIRA)" <ji...@apache.org> on 2017/07/25 21:55:00 UTC
[jira] [Created] (SPARK-21535) Reduce memory requirement for
CrossValidator and TrainValidationSplit
yuhao yang created SPARK-21535:
----------------------------------
Summary: Reduce memory requirement for CrossValidator and TrainValidationSplit
Key: SPARK-21535
URL: https://issues.apache.org/jira/browse/SPARK-21535
Project: Spark
Issue Type: Improvement
Components: ML
Affects Versions: 2.2.0
Reporter: yuhao yang
CrossValidator and TrainValidationSplit both use
{code}models = est.fit(trainingDataset, epm) {code} to fit the models, where epm is Array[ParamMap].
Even though the training process is sequential, current implementation consumes extra driver memory for holding the trained models, which is not necessary and often leads to memory exception for both CrossValidator and TrainValidationSplit. My proposal is to changing the training implementation to train one model at a time, thus that used local model can be collected by GC, and avoid the unnecessary OOM exceptions.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org