You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Pat Ferrel (JIRA)" <ji...@apache.org> on 2015/05/13 19:25:00 UTC
[jira] [Created] (MAHOUT-1707) Spark-itemsimilarity uses too much
memory
Pat Ferrel created MAHOUT-1707:
----------------------------------
Summary: Spark-itemsimilarity uses too much memory
Key: MAHOUT-1707
URL: https://issues.apache.org/jira/browse/MAHOUT-1707
Project: Mahout
Issue Type: Bug
Components: Collaborative Filtering, cooccurrence
Affects Versions: 0.10.0
Environment: Spark
Reporter: Pat Ferrel
Assignee: Pat Ferrel
Fix For: 0.10.1
java.lang.OutOfMemoryError: Java heap space
The code has an unnecessary .collect(), forcing all interaction data into memory of the client/driver. Increasing the executor memory will not help with this.
remove this line and rebuild Mahout.
https://github.com/apache/mahout/blob/mahout-0.10.x/spark/src/main/scala/org/apache/mahout/drivers/TextDelimitedReaderWriter.scala#L157
The errant line reads:
interactions.collect()
This forces the user action data into memory, a bad thing for memory consumption.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)