You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Fadzi Ushewokunze <fa...@butterflycluster.net> on 2009/10/11 06:26:14 UTC

OutOfMemoryError: Java heap space

hi all,

I am getting this JVM error below during a recrawl specifically during the execution of 

$NUTCH_HOME/bin/nutch mergesegs crawl/MERGEDsegments crawl/segments/*

i am running on a single machine:
Linux 2.6.24-23-xen  x86_64
4G RAM
java-6-sun
nutch-1.0
JAVA_HEAP_MAX=-Xmx1000m 

Any suggestions? I am about to up my heap max to Xmx2000m

i havent encountered this before running with the above specs, so i am not sure what could have changed?
Any suggestions will be greatly appreciated.

Thanks.


> 
> 
> 2009-10-11 14:29:56,752 INFO [org.apache.hadoop.mapred.LocalJobRunner] - reduce > reduce
> 2009-10-11 14:30:15,801 INFO [org.apache.hadoop.mapred.LocalJobRunner] - reduce > reduce
> 2009-10-11 14:31:19,197 INFO [org.apache.hadoop.mapred.TaskRunner] - Communication exception: java.lang.OutOfMemoryError: Java heap space
> 	at java.util.ResourceBundle$Control.getCandidateLocales(ResourceBundle.java:2220)
> 	at java.util.ResourceBundle.getBundleImpl(ResourceBundle.java:1229)
> 	at java.util.ResourceBundle.getBundle(ResourceBundle.java:715)
> 	at org.apache.hadoop.mapred.Counters$Group.getResourceBundle(Counters.java:218)
> 	at org.apache.hadoop.mapred.Counters$Group.<init>(Counters.java:202)
> 	at org.apache.hadoop.mapred.Counters.getGroup(Counters.java:410)
> 	at org.apache.hadoop.mapred.Counters.incrAllCounters(Counters.java:491)
> 	at org.apache.hadoop.mapred.Counters.sum(Counters.java:506)
> 	at org.apache.hadoop.mapred.LocalJobRunner$Job.statusUpdate(LocalJobRunner.java:222)
> 	at org.apache.hadoop.mapred.Task$1.run(Task.java:418)
> 	at java.lang.Thread.run(Thread.java:619)
> 
> 2009-10-11 14:31:22,197 INFO [org.apache.hadoop.mapred.LocalJobRunner] - reduce > reduce
> 2009-10-11 14:31:25,197 INFO [org.apache.hadoop.mapred.LocalJobRunner] - reduce > reduce
> 2009-10-11 14:31:40,002 WARN [org.apache.hadoop.mapred.LocalJobRunner] - job_local_0001
> java.lang.OutOfMemoryError: Java heap space
> 	at java.util.concurrent.locks.ReentrantLock.<init>(ReentrantLock.java:234)
> 	at java.util.concurrent.ConcurrentHashMap$Segment.<init>(ConcurrentHashMap.java:289)
> 	at java.util.concurrent.ConcurrentHashMap.<init>(ConcurrentHashMap.java:613)
> 	at java.util.concurrent.ConcurrentHashMap.<init>(ConcurrentHashMap.java:652)
> 	at org.apache.hadoop.io.AbstractMapWritable.<init>(AbstractMapWritable.java:49)
> 	at org.apache.hadoop.io.MapWritable.<init>(MapWritable.java:42)
> 	at org.apache.nutch.crawl.CrawlDatum.readFields(CrawlDatum.java:260)
> 	at org.apache.nutch.util.GenericWritableConfigurable.readFields(GenericWritableConfigurable.java:54)
> 	at org.apache.nutch.metadata.MetaWrapper.readFields(MetaWrapper.java:101)
> 	at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
> 	at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
> 	at org.apache.hadoop.mapred.Task$ValuesIterator.readNextValue(Task.java:940)
> 	at org.apache.hadoop.mapred.Task$ValuesIterator.next(Task.java:880)
> 	at org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.moveToNext(ReduceTask.java:237)
> 	at org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.next(ReduceTask.java:233)
> 	at org.apache.nutch.segment.SegmentMerger.reduce(SegmentMerger.java:377)
> 	at org.apache.nutch.segment.SegmentMerger.reduce(SegmentMerger.java:113)
> 	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:436)
> 	at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:170)


RE: OutOfMemoryError: Java heap space

Posted by fa...@butterflycluster.net.
that seems to work; thanks for that.
hi there,

i ended up using a topN during my generate phase, but i didnt want to do
this - although it seems fix my problem.

what i have been observing also is that the reduce action seems to take
very long on large segments.

would any one shed some light on the ratio of segment size to processing
time ratio on a standard machine say:
2G RAM
4-core server ?

my segments in my test environment are now about 3Meg or so each, at one
point i had a 600Meg segments and mergeSegments seemed to take forever and
eventually stop responding.. the last i heard from the process was
something like:

2009-10-11 14:29:56,752 INFO [org.apache.hadoop.mapred.LocalJobRunner] -
reduce > reduce .. ..

over and over then finally silence...

any insight?

>
> i guess your segments are too big...
> try to merge just few of them in one shot....
> if you have N segments try to start merging just N-1 if you still hv the
> error, try N-2...till you fill find the best number of segments you can
> merge at one shot....
>
> thx
>
>
>> Subject: OutOfMemoryError: Java heap space
>> From: fadzi@butterflycluster.net
>> To: nutch-user@lucene.apache.org
>> Date: Sun, 11 Oct 2009 15:26:14 +1100
>>
>> hi all,
>>
>> I am getting this JVM error below during a recrawl specifically during
>> the execution of
>>
>> $NUTCH_HOME/bin/nutch mergesegs crawl/MERGEDsegments crawl/segments/*
>>
>> i am running on a single machine:
>> Linux 2.6.24-23-xen  x86_64
>> 4G RAM
>> java-6-sun
>> nutch-1.0
>> JAVA_HEAP_MAX=-Xmx1000m
>>
>> Any suggestions? I am about to up my heap max to Xmx2000m
>>
>> i havent encountered this before running with the above specs, so i am
>> not sure what could have changed?
>> Any suggestions will be greatly appreciated.
>>
>> Thanks.
>>
>>
>> >
>> >
>> > 2009-10-11 14:29:56,752 INFO [org.apache.hadoop.mapred.LocalJobRunner]
>> - reduce > reduce
>> > 2009-10-11 14:30:15,801 INFO [org.apache.hadoop.mapred.LocalJobRunner]
>> - reduce > reduce
>> > 2009-10-11 14:31:19,197 INFO [org.apache.hadoop.mapred.TaskRunner] -
>> Communication exception: java.lang.OutOfMemoryError: Java heap space
>> > 	at
>> java.util.ResourceBundle$Control.getCandidateLocales(ResourceBundle.java:2220)
>> > 	at java.util.ResourceBundle.getBundleImpl(ResourceBundle.java:1229)
>> > 	at java.util.ResourceBundle.getBundle(ResourceBundle.java:715)
>> > 	at
>> org.apache.hadoop.mapred.Counters$Group.getResourceBundle(Counters.java:218)
>> > 	at org.apache.hadoop.mapred.Counters$Group.<init>(Counters.java:202)
>> > 	at org.apache.hadoop.mapred.Counters.getGroup(Counters.java:410)
>> > 	at
>> org.apache.hadoop.mapred.Counters.incrAllCounters(Counters.java:491)
>> > 	at org.apache.hadoop.mapred.Counters.sum(Counters.java:506)
>> > 	at
>> org.apache.hadoop.mapred.LocalJobRunner$Job.statusUpdate(LocalJobRunner.java:222)
>> > 	at org.apache.hadoop.mapred.Task$1.run(Task.java:418)
>> > 	at java.lang.Thread.run(Thread.java:619)
>> >
>> > 2009-10-11 14:31:22,197 INFO [org.apache.hadoop.mapred.LocalJobRunner]
>> - reduce > reduce
>> > 2009-10-11 14:31:25,197 INFO [org.apache.hadoop.mapred.LocalJobRunner]
>> - reduce > reduce
>> > 2009-10-11 14:31:40,002 WARN [org.apache.hadoop.mapred.LocalJobRunner]
>> - job_local_0001
>> > java.lang.OutOfMemoryError: Java heap space
>> > 	at
>> java.util.concurrent.locks.ReentrantLock.<init>(ReentrantLock.java:234)
>> > 	at
>> java.util.concurrent.ConcurrentHashMap$Segment.<init>(ConcurrentHashMap.java:289)
>> > 	at
>> java.util.concurrent.ConcurrentHashMap.<init>(ConcurrentHashMap.java:613)
>> > 	at
>> java.util.concurrent.ConcurrentHashMap.<init>(ConcurrentHashMap.java:652)
>> > 	at
>> org.apache.hadoop.io.AbstractMapWritable.<init>(AbstractMapWritable.java:49)
>> > 	at org.apache.hadoop.io.MapWritable.<init>(MapWritable.java:42)
>> > 	at org.apache.nutch.crawl.CrawlDatum.readFields(CrawlDatum.java:260)
>> > 	at
>> org.apache.nutch.util.GenericWritableConfigurable.readFields(GenericWritableConfigurable.java:54)
>> > 	at
>> org.apache.nutch.metadata.MetaWrapper.readFields(MetaWrapper.java:101)
>> > 	at
>> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
>> > 	at
>> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
>> > 	at
>> org.apache.hadoop.mapred.Task$ValuesIterator.readNextValue(Task.java:940)
>> > 	at org.apache.hadoop.mapred.Task$ValuesIterator.next(Task.java:880)
>> > 	at
>> org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.moveToNext(ReduceTask.java:237)
>> > 	at
>> org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.next(ReduceTask.java:233)
>> > 	at
>> org.apache.nutch.segment.SegmentMerger.reduce(SegmentMerger.java:377)
>> > 	at
>> org.apache.nutch.segment.SegmentMerger.reduce(SegmentMerger.java:113)
>> > 	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:436)
>> > 	at
>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:170)
>>
>
> _________________________________________________________________
> New! Faster Messenger access on the new MSN homepage
> http://go.microsoft.com/?linkid=9677406



RE: OutOfMemoryError: Java heap space

Posted by BELLINI ADAM <mb...@msn.com>.
i guess your segments are too big...
try to merge just few of them in one shot....
if you have N segments try to start merging just N-1 if you still hv the error, try N-2...till you fill find the best number of segments you can merge at one shot....

thx


> Subject: OutOfMemoryError: Java heap space
> From: fadzi@butterflycluster.net
> To: nutch-user@lucene.apache.org
> Date: Sun, 11 Oct 2009 15:26:14 +1100
> 
> hi all,
> 
> I am getting this JVM error below during a recrawl specifically during the execution of 
> 
> $NUTCH_HOME/bin/nutch mergesegs crawl/MERGEDsegments crawl/segments/*
> 
> i am running on a single machine:
> Linux 2.6.24-23-xen  x86_64
> 4G RAM
> java-6-sun
> nutch-1.0
> JAVA_HEAP_MAX=-Xmx1000m 
> 
> Any suggestions? I am about to up my heap max to Xmx2000m
> 
> i havent encountered this before running with the above specs, so i am not sure what could have changed?
> Any suggestions will be greatly appreciated.
> 
> Thanks.
> 
> 
> > 
> > 
> > 2009-10-11 14:29:56,752 INFO [org.apache.hadoop.mapred.LocalJobRunner] - reduce > reduce
> > 2009-10-11 14:30:15,801 INFO [org.apache.hadoop.mapred.LocalJobRunner] - reduce > reduce
> > 2009-10-11 14:31:19,197 INFO [org.apache.hadoop.mapred.TaskRunner] - Communication exception: java.lang.OutOfMemoryError: Java heap space
> > 	at java.util.ResourceBundle$Control.getCandidateLocales(ResourceBundle.java:2220)
> > 	at java.util.ResourceBundle.getBundleImpl(ResourceBundle.java:1229)
> > 	at java.util.ResourceBundle.getBundle(ResourceBundle.java:715)
> > 	at org.apache.hadoop.mapred.Counters$Group.getResourceBundle(Counters.java:218)
> > 	at org.apache.hadoop.mapred.Counters$Group.<init>(Counters.java:202)
> > 	at org.apache.hadoop.mapred.Counters.getGroup(Counters.java:410)
> > 	at org.apache.hadoop.mapred.Counters.incrAllCounters(Counters.java:491)
> > 	at org.apache.hadoop.mapred.Counters.sum(Counters.java:506)
> > 	at org.apache.hadoop.mapred.LocalJobRunner$Job.statusUpdate(LocalJobRunner.java:222)
> > 	at org.apache.hadoop.mapred.Task$1.run(Task.java:418)
> > 	at java.lang.Thread.run(Thread.java:619)
> > 
> > 2009-10-11 14:31:22,197 INFO [org.apache.hadoop.mapred.LocalJobRunner] - reduce > reduce
> > 2009-10-11 14:31:25,197 INFO [org.apache.hadoop.mapred.LocalJobRunner] - reduce > reduce
> > 2009-10-11 14:31:40,002 WARN [org.apache.hadoop.mapred.LocalJobRunner] - job_local_0001
> > java.lang.OutOfMemoryError: Java heap space
> > 	at java.util.concurrent.locks.ReentrantLock.<init>(ReentrantLock.java:234)
> > 	at java.util.concurrent.ConcurrentHashMap$Segment.<init>(ConcurrentHashMap.java:289)
> > 	at java.util.concurrent.ConcurrentHashMap.<init>(ConcurrentHashMap.java:613)
> > 	at java.util.concurrent.ConcurrentHashMap.<init>(ConcurrentHashMap.java:652)
> > 	at org.apache.hadoop.io.AbstractMapWritable.<init>(AbstractMapWritable.java:49)
> > 	at org.apache.hadoop.io.MapWritable.<init>(MapWritable.java:42)
> > 	at org.apache.nutch.crawl.CrawlDatum.readFields(CrawlDatum.java:260)
> > 	at org.apache.nutch.util.GenericWritableConfigurable.readFields(GenericWritableConfigurable.java:54)
> > 	at org.apache.nutch.metadata.MetaWrapper.readFields(MetaWrapper.java:101)
> > 	at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
> > 	at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
> > 	at org.apache.hadoop.mapred.Task$ValuesIterator.readNextValue(Task.java:940)
> > 	at org.apache.hadoop.mapred.Task$ValuesIterator.next(Task.java:880)
> > 	at org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.moveToNext(ReduceTask.java:237)
> > 	at org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.next(ReduceTask.java:233)
> > 	at org.apache.nutch.segment.SegmentMerger.reduce(SegmentMerger.java:377)
> > 	at org.apache.nutch.segment.SegmentMerger.reduce(SegmentMerger.java:113)
> > 	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:436)
> > 	at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:170)
> 
 		 	   		  
_________________________________________________________________
New! Faster Messenger access on the new MSN homepage
http://go.microsoft.com/?linkid=9677406