You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Ewan Higgs <ew...@ugent.be> on 2016/02/19 17:26:22 UTC

Spark Random Forest Memory issues

Hi all,
Back in september there was a bunch of machine learning profile results 
published here:
https://github.com/szilard/benchm-ml/

Spark's Random Forest seemed to fall down with memory issues at about 
10m entries:
https://github.com/szilard/benchm-ml/blob/master/2-rf/5c-spark-crash.txt

It was discussed for a bit here:
https://github.com/szilard/benchm-ml/issues/19

But I haven't seen an update. Is there an open ticket on the Spark JIRA?

I didn't see any in the searches I made:
https://issues.apache.org/jira/issues/?jql=text%20~%20%22bench-ml%22
https://issues.apache.org/jira/issues/?jql=text%20~%20%22randomforest%20gc%22

I have a user who is trying to use Spark's RF implementation but is 
running into memory issues which look exactly like the ones seen in the 
benchmarking example.

Thanks,
Ewan

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org