You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@mahout.apache.org by "Pat Ferrel (JIRA)" <ji...@apache.org> on 2015/05/13 19:25:00 UTC

[jira] [Created] (MAHOUT-1707) Spark-itemsimilarity uses too much memory

Pat Ferrel created MAHOUT-1707:
----------------------------------

             Summary: Spark-itemsimilarity uses too much memory
                 Key: MAHOUT-1707
                 URL: https://issues.apache.org/jira/browse/MAHOUT-1707
             Project: Mahout
          Issue Type: Bug
          Components: Collaborative Filtering, cooccurrence
    Affects Versions: 0.10.0
         Environment: Spark
            Reporter: Pat Ferrel
            Assignee: Pat Ferrel
             Fix For: 0.10.1


java.lang.OutOfMemoryError: Java heap space

The code has an unnecessary .collect(), forcing all interaction data into memory of the client/driver. Increasing the executor memory will not help with this.

remove this line and rebuild Mahout.
https://github.com/apache/mahout/blob/mahout-0.10.x/spark/src/main/scala/org/apache/mahout/drivers/TextDelimitedReaderWriter.scala#L157

The errant line reads:

    interactions.collect()

This forces the user action data into memory, a bad thing for memory consumption.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)