You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by tanweiguo <ta...@huawei.com> on 2010/08/04 09:32:13 UTC

Error: Java heap space when running FPGrowth

I just followed the wiki to test FPGrowth:
https://cwiki.apache.org/MAHOUT/parallel-frequent-pattern-mining.html
 
1.unzip and put the accidents.dat.gz to HDFS accidents folder
2.run on a hadoop cluster(1 master and 3 slaves)
    hadoop jar mahout-examples-0.3.job
org.apache.mahout.fpm.pfpgrowth.FPGrowthDriver \
         -i accidents \
         -o patterns \
         -k 50 \
         -method mapreduce \
         -g 10 \
         -regex [\ ] \
         -s 2
 
The first two MapReduce(Parallel Counting Driver running over input:
accidents; PFP Transaction Sorting running over inputaccidents) succeed. 
However, the third MapReduce(PFP Growth Driver running over
inputpatterns/sortedoutput) always fail with this error message:
    
    10/08/04 15:23:45 INFO input.FileInputFormat: Total input paths to
process : 1
    10/08/04 15:23:46 INFO mapred.JobClient: Running job:
job_201007271506_0025
    10/08/04 15:23:47 INFO mapred.JobClient:  map 0% reduce 0%
    10/08/04 15:24:05 INFO mapred.JobClient:  map 13% reduce 0%
    10/08/04 15:24:08 INFO mapred.JobClient:  map 22% reduce 0%
    10/08/04 15:24:11 INFO mapred.JobClient:  map 24% reduce 0%
    10/08/04 15:24:29 INFO mapred.JobClient:  map 0% reduce 0%
    10/08/04 15:24:31 INFO mapred.JobClient: Task Id :
attempt_201007271506_0025_m_000000_0, Status : FAILED
    Error: java.lang.OutOfMemoryError: Java heap space
            at
org.apache.mahout.fpm.pfpgrowth.TransactionTree.resize(TransactionTree.java:
446)
            at
org.apache.mahout.fpm.pfpgrowth.TransactionTree.createNode(TransactionTree.j
ava:409)
            at
org.apache.mahout.fpm.pfpgrowth.TransactionTree.addPattern(TransactionTree.j
ava:202)
            at
org.apache.mahout.fpm.pfpgrowth.TransactionTree.getCompressedTree(Transactio
nTree.java:285)
            at
org.apache.mahout.fpm.pfpgrowth.ParallelFPGrowthCombiner.reduce(ParallelFPGr
owthCombiner.java:51)
            at
org.apache.mahout.fpm.pfpgrowth.ParallelFPGrowthCombiner.reduce(ParallelFPGr
owthCombiner.java:33)
            at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:174)
            at
org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1214)
            at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1
227)
            at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:64
8)
            at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.jav
a:1135)
    
The parameter mapred.child.java.opts is set to -Xmx512m in my cluster.
I also tried -g 5 and -g 20, both failed with the same error message.
 
Another question: I find there is only one mapper. How to adjust parameter
to have more mappers to improve speed?