You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Brian Whitman <br...@variogr.am> on 2007/01/17 17:57:45 UTC
out of memory error at end of indexing
(nutch-nightly, hadoop 0.9.1, linux/686, 4GB ram)
At the end of a long index in a crawl cycle I got a
java.lang.outOfMemoryError: Java heap space from the indexer. I have
4GB of ram. There appears to be 142150 docs.
Any idea what this could be caused by?
The bin/nutch index commandline reported:
Indexer: java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:
399)
at org.apache.nutch.indexer.Indexer.index(Indexer.java:297)
at org.apache.nutch.indexer.Indexer.run(Indexer.java:319)
at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:189)
at org.apache.nutch.indexer.Indexer.main(Indexer.java:302)
And the hadoop log reported:
2007-01-17 05:09:40,257 INFO indexer.Indexer - merging segments
_2qf2 (125000 docs) _2sdx (2500 docs) _2ucs (2500 docs) _2wbn (2500
docs) _2yai (2500 docs) _309d (2500 docs) _3288 (2500 docs) _329n (50
docs) _32b2 (50 docs) _32ch (50 docs) _32dw (50 docs) _32fb (50 docs)
_32gq (50 docs) _32i5 (50 docs) _32jk (50 docs) _32kz (50 docs) _32me
(50 docs) _32nt (50 docs) _32p8 (50 docs) _32qn (50 docs) _32s2 (50
docs) _32th (50 docs) _32uw (50 docs) _32wb (50 docs) _32xq (50 docs)
_32z5 (50 docs) _330k (50 docs) _331z (50 docs) _333e (50 docs) _334t
(50 docs) _3368 (50 docs) _337n (50 docs) _3392 (50 docs) _33ah (50
docs) _33bw (50 docs) _33db (50 docs) _33eq (50 docs) _33g5 (50 docs)
_33hk (50 docs) _33iz (50 docs) _33ke (50 docs) _33lt (50 docs) _33n8
(50 docs) _33on (50 docs) _33q2 (50 docs) _33rh (50 docs) _33sw (50
docs) _33ub (50 docs) _33vq (50 docs) _33x6 (50 docs) into _33x7
(142150 docs)
2007-01-17 05:09:40,647 WARN mapred.LocalJobRunner - job_w1h0ii
java.lang.OutOfMemoryError: Java heap space
2007-01-17 05:09:41,005 FATAL indexer.Indexer - Indexer:
java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:
399)
at org.apache.nutch.indexer.Indexer.index(Indexer.java:297)
at org.apache.nutch.indexer.Indexer.run(Indexer.java:319)
at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:189)
at org.apache.nutch.indexer.Indexer.main(Indexer.java:302)
--
http://variogr.am/
brian.whitman@variogr.am
Re: out of memory error at end of indexing
Posted by Brian Whitman <br...@variogr.am>.
On Jan 17, 2007, at 11:57 AM, Brian Whitman wrote:
> (nutch-nightly, hadoop 0.9.1, linux/686, 4GB ram)
>
> At the end of a long index in a crawl cycle I got a
> java.lang.outOfMemoryError: Java heap space from the indexer. I
> have 4GB of ram. There appears to be 142150 docs.
Please ignore, there was a runaway bug in a plugin we are developing.
It didn't crop up until we had to index a lot of documents at once, I
guess.