You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Geet Garg <ga...@gmail.com> on 2010/10/27 19:44:00 UTC
Error while running Terrier-3.0 on hadoop or mahout-0.3 on hadoop
Hi,
I'm trying to run Terrier-3.0 on hadoop-0.18.3, with general configuration
settings. My hadoop cluster is running on 3 nodes, (1 master, 3 slaves). If
I try to run Terrier Basic Single Pass Indexing (with default
configurations) on a very small data ~1 GB, it works fine. But for larger
data ~10 GB, I get the error:
attempt_201010272120_0001_m_000002_0: java.lang.OutOfMemoryError: GC
overhead limit exceeded
attempt_201010272120_0001_m_000002_0: at
org.terrier.structures.indexing.singlepass.hadoop.SplitEmittedTerm.createNewTerm(SplitEmittedTerm.java:64)
attempt_201010272120_0001_m_000002_0: at
org.terrier.structures.indexing.singlepass.hadoop.HadoopRunWriter.writeTerm(HadoopRunWriter.java:84)
attempt_201010272120_0001_m_000002_0: at
org.terrier.structures.indexing.singlepass.MemoryPostings.writeToWriter(MemoryPostings.java:151)
attempt_201010272120_0001_m_000002_0: at
org.terrier.structures.indexing.singlepass.MemoryPostings.finish(MemoryPostings.java:112)
attempt_201010272120_0001_m_000002_0: at
org.terrier.indexing.hadoop.Hadoop_BasicSinglePassIndexer.forceFlush(Hadoop_BasicSinglePassIndexer.java:308)
attempt_201010272120_0001_m_000002_0: at
org.terrier.indexing.hadoop.Hadoop_BasicSinglePassIndexer.closeMap(Hadoop_BasicSinglePassIndexer.java:419)
attempt_201010272120_0001_m_000002_0: at
org.terrier.indexing.hadoop.Hadoop_BasicSinglePassIndexer.close(Hadoop_BasicSinglePassIndexer.java:236)
attempt_201010272120_0001_m_000002_0: at
org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
attempt_201010272120_0001_m_000002_0: at
org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
attempt_201010272120_0001_m_000002_0: at
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2198)
Also, I tried running Mahout-0.3 on hadoop-0.20.2. It works fine for tasks
on small datasets ( < 1 MB). But for even slightly larger datasets (~30 MB)
it starts giving error:
Error: java.lang.OutOfMemoryError: Java heap
space
at
org.apache.mahout.fpm.pfpgrowth.TransactionTree.resize(TransactionTree.java:446)
at
org.apache.mahout.fpm.pfpgrowth.TransactionTree.createNode(TransactionTree.java:409)
at
org.apache.mahout.fpm.pfpgrowth.TransactionTree.addPattern(TransactionTree.java:202)
at
org.apache.mahout.fpm.pfpgrowth.TransactionTree.getCompressedTree(TransactionTree.java:285)
at
org.apache.mahout.fpm.pfpgrowth.ParallelFPGrowthCombiner.reduce(ParallelFPGrowthCombiner.java:51)
at
org.apache.mahout.fpm.pfpgrowth.ParallelFPGrowthCombiner.reduce(ParallelFPGrowthCombiner.java:33)
at
org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
at
org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1222)
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1265)
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:686)
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1173)
I'm absolutely stuck. I've tried increasing the java heap size in
hadoop-env.sh. I've tried using parallelGC. Nothing seems to work.
Can anyone help me please?
Thanks.
Regards,
Geet
--
Geet Garg
Final Year Dual Degree Student
Department of Computer Science and Engineering
Indian Institute of Technology Kharagpur
INDIA
Phone: +91 97344 26187
e-Mail: garggeetus@gmail.com
Re: Error while running Terrier-3.0 on hadoop or mahout-0.3 on
hadoop
Posted by Allen Wittenauer <aw...@linkedin.com>.
On Oct 27, 2010, at 10:44 AM, Geet Garg wrote:
>
> I'm absolutely stuck. I've tried increasing the java heap size in
> hadoop-env.sh. I've tried using parallelGC. Nothing seems to work.
>
> Can anyone help me please?
hadoop-env.sh is for the daemons. You need to increase the heap in mapred.child.java.opts.