You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Partha Pratim Talukdar <pa...@cs.cmu.edu> on 2013/12/20 06:40:53 UTC

mahout svd OOM error

Hello,

I am running mahout (v0.8) svd over a sparse matrix of size 5,064,569 x
44,543,104 with the matrix in the format as per [1]. However, I get OOM
error immediately as given below. I have tried increasing the JAVA_HEAP_MAX
and MAHOUT_HEAPSIZE in bin.mahout to 10GB, but to no effect. Anyone knows a
way out?


13/12/19 23:23:36 INFO common.AbstractJob: Command line arguments:
{--endPhase=[2147483647], --inMemory=[false],
--input=[/user/ppt/data/pra_svo/openie/input/openiev4_filtered_pred_arg1arg2_l2norm_center_sorted_col_mahout_inp.bin],
--maxError=[0.05], --minEigenvalue=[0.0], --numCols=[44543104],
--numRows=[5064569], --output=[/user/ppt/data/pra_svo/openie/output/],
--rank=[200], --startPhase=[0], --symmetric=[false],
--tempDir=[/user/ppt/data/pra_svo/openie/temp/],
--workingDir=[/user/ppt/data/pra_svo/openie/scratch/]}
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
        at org.apache.mahout.math.DenseVector.<init>(DenseVector.java:53)
        at
org.apache.mahout.math.hadoop.decomposer.DistributedLanczosSolver.getInitialVector(DistributedLanczosSolver.java:68)
        at
org.apache.mahout.math.hadoop.decomposer.DistributedLanczosSolver.run(DistributedLanczosSolver.java:203)
        at
org.apache.mahout.math.hadoop.decomposer.DistributedLanczosSolver.run(DistributedLanczosSolver.java:131)
        at
org.apache.mahout.math.hadoop.decomposer.DistributedLanczosSolver$DistributedLanczosSolverJob.run(DistributedLanczosSolver.java:291)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
        at
org.apache.mahout.math.hadoop.decomposer.DistributedLanczosSolver.main(DistributedLanczosSolver.java:297)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
        at
org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
        at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:194)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:160)

[1] http://bickson.blogspot.com/2011/02/mahout-svd-matrix-factorization.html

Re: mahout svd OOM error

Posted by Suneel Marthi <su...@yahoo.com>.
DistributedLanczosSolver has been deprecated (and the blog post u mention is old). Use Stochastic SVD (SSVD) instead.





On Friday, December 20, 2013 12:41 AM, Partha Pratim Talukdar <pa...@cs.cmu.edu> wrote:
 
Hello,

I am running mahout (v0.8) svd over a sparse matrix of size 5,064,569 x
44,543,104 with the matrix in the format as per [1]. However, I get OOM
error immediately as given below. I have tried increasing the JAVA_HEAP_MAX
and MAHOUT_HEAPSIZE in bin.mahout to 10GB, but to no effect. Anyone knows a
way out?


13/12/19 23:23:36 INFO common.AbstractJob: Command line arguments:
{--endPhase=[2147483647], --inMemory=[false],
--input=[/user/ppt/data/pra_svo/openie/input/openiev4_filtered_pred_arg1arg2_l2norm_center_sorted_col_mahout_inp.bin],
--maxError=[0.05], --minEigenvalue=[0.0], --numCols=[44543104],
--numRows=[5064569], --output=[/user/ppt/data/pra_svo/openie/output/],
--rank=[200], --startPhase=[0], --symmetric=[false],
--tempDir=[/user/ppt/data/pra_svo/openie/temp/],
--workingDir=[/user/ppt/data/pra_svo/openie/scratch/]}
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
        at org.apache.mahout.math.DenseVector.<init>(DenseVector.java:53)
        at
org.apache.mahout.math.hadoop.decomposer.DistributedLanczosSolver.getInitialVector(DistributedLanczosSolver.java:68)
        at
org.apache.mahout.math.hadoop.decomposer.DistributedLanczosSolver.run(DistributedLanczosSolver.java:203)
        at
org.apache.mahout.math.hadoop.decomposer.DistributedLanczosSolver.run(DistributedLanczosSolver.java:131)
        at
org.apache.mahout.math.hadoop.decomposer.DistributedLanczosSolver$DistributedLanczosSolverJob.run(DistributedLanczosSolver.java:291)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
        at
org.apache.mahout.math.hadoop.decomposer.DistributedLanczosSolver.main(DistributedLanczosSolver.java:297)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
        at
org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
        at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:194)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:160)

[1] http://bickson.blogspot.com/2011/02/mahout-svd-matrix-factorization.html