You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Brian Whitman <br...@variogr.am> on 2007/01/17 17:57:45 UTC

out of memory error at end of indexing

(nutch-nightly, hadoop 0.9.1, linux/686, 4GB ram)

At the end of a long index in a crawl cycle I got a  
java.lang.outOfMemoryError: Java heap space from the indexer. I have  
4GB of ram. There appears to be 142150 docs.

Any idea what this could be caused by?


The bin/nutch index commandline reported:

Indexer: java.io.IOException: Job failed!
         at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java: 
399)
         at org.apache.nutch.indexer.Indexer.index(Indexer.java:297)
         at org.apache.nutch.indexer.Indexer.run(Indexer.java:319)
         at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:189)
         at org.apache.nutch.indexer.Indexer.main(Indexer.java:302)


And the hadoop log reported:

2007-01-17 05:09:40,257 INFO  indexer.Indexer - merging segments  
_2qf2 (125000 docs) _2sdx (2500 docs) _2ucs (2500 docs) _2wbn (2500  
docs) _2yai (2500 docs) _309d (2500 docs) _3288 (2500 docs) _329n (50  
docs) _32b2 (50 docs) _32ch (50 docs) _32dw (50 docs) _32fb (50 docs)  
_32gq (50 docs) _32i5 (50 docs) _32jk (50 docs) _32kz (50 docs) _32me  
(50 docs) _32nt (50 docs) _32p8 (50 docs) _32qn (50 docs) _32s2 (50  
docs) _32th (50 docs) _32uw (50 docs) _32wb (50 docs) _32xq (50 docs)  
_32z5 (50 docs) _330k (50 docs) _331z (50 docs) _333e (50 docs) _334t  
(50 docs) _3368 (50 docs) _337n (50 docs) _3392 (50 docs) _33ah (50  
docs) _33bw (50 docs) _33db (50 docs) _33eq (50 docs) _33g5 (50 docs)  
_33hk (50 docs) _33iz (50 docs) _33ke (50 docs) _33lt (50 docs) _33n8  
(50 docs) _33on (50 docs) _33q2 (50 docs) _33rh (50 docs) _33sw (50  
docs) _33ub (50 docs) _33vq (50 docs) _33x6 (50 docs) into _33x7  
(142150 docs)
2007-01-17 05:09:40,647 WARN  mapred.LocalJobRunner - job_w1h0ii
java.lang.OutOfMemoryError: Java heap space
2007-01-17 05:09:41,005 FATAL indexer.Indexer - Indexer:  
java.io.IOException: Job failed!
         at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java: 
399)
         at org.apache.nutch.indexer.Indexer.index(Indexer.java:297)
         at org.apache.nutch.indexer.Indexer.run(Indexer.java:319)
         at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:189)
         at org.apache.nutch.indexer.Indexer.main(Indexer.java:302)



--
http://variogr.am/
brian.whitman@variogr.am




Re: out of memory error at end of indexing

Posted by Brian Whitman <br...@variogr.am>.
On Jan 17, 2007, at 11:57 AM, Brian Whitman wrote:
> (nutch-nightly, hadoop 0.9.1, linux/686, 4GB ram)
>
> At the end of a long index in a crawl cycle I got a  
> java.lang.outOfMemoryError: Java heap space from the indexer. I  
> have 4GB of ram. There appears to be 142150 docs.


Please ignore, there was a runaway bug in a plugin we are developing.  
It didn't crop up until we had to index a lot of documents at once, I  
guess.