You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Pat Ferrel (JIRA)" <ji...@apache.org> on 2015/05/13 19:25:00 UTC

[jira] [Updated] (MAHOUT-1707) Spark-itemsimilarity uses too much memory

     [ https://issues.apache.org/jira/browse/MAHOUT-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pat Ferrel updated MAHOUT-1707:
-------------------------------
    Description: 
java.lang.OutOfMemoryError: Java heap space

The code has an unnecessary .collect(), forcing all interaction data into memory of the client/driver. Increasing the executor memory will not help with this.

remove this line and rebuild Mahout.
https://github.com/apache/mahout/blob/mahout-0.10.x/spark/src/main/scala/org/apache/mahout/drivers/TextDelimitedReaderWriter.scala#L157

The errant line reads:

    interactions.collect()

This forces the user action data into memory, a bad thing for memory consumption. Removing it should allow for better Spark memory management.

  was:
java.lang.OutOfMemoryError: Java heap space

The code has an unnecessary .collect(), forcing all interaction data into memory of the client/driver. Increasing the executor memory will not help with this.

remove this line and rebuild Mahout.
https://github.com/apache/mahout/blob/mahout-0.10.x/spark/src/main/scala/org/apache/mahout/drivers/TextDelimitedReaderWriter.scala#L157

The errant line reads:

    interactions.collect()

This forces the user action data into memory, a bad thing for memory consumption.


> Spark-itemsimilarity uses too much memory
> -----------------------------------------
>
>                 Key: MAHOUT-1707
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1707
>             Project: Mahout
>          Issue Type: Bug
>          Components: Collaborative Filtering, cooccurrence
>    Affects Versions: 0.10.0
>         Environment: Spark
>            Reporter: Pat Ferrel
>            Assignee: Pat Ferrel
>             Fix For: 0.10.1
>
>
> java.lang.OutOfMemoryError: Java heap space
> The code has an unnecessary .collect(), forcing all interaction data into memory of the client/driver. Increasing the executor memory will not help with this.
> remove this line and rebuild Mahout.
> https://github.com/apache/mahout/blob/mahout-0.10.x/spark/src/main/scala/org/apache/mahout/drivers/TextDelimitedReaderWriter.scala#L157
> The errant line reads:
>     interactions.collect()
> This forces the user action data into memory, a bad thing for memory consumption. Removing it should allow for better Spark memory management.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)