You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Pierluigi D'Amadio <da...@ancitel.it> on 2007/07/18 12:23:16 UTC

OutOfMemoryError - Nutch 0.8.1

I've installed nutch 0.8.1 on a single node (a P4 3.0Ghz dual processor 
with RAM 1,5GB ) and my configuration is

threads = 20
depth = 1000
topN = 1000

on linux 2.6.9.

I'm trying an intranet crawling on a site with more than 50000 pages.

I've launched JVM with standard heap options  JAVA_HEAP_MAX=-Xmx1600m

After 10 hours of crawling, my logs contains a lot of OutOfMemoryError

2007-07-18 10:14:32,535 INFO  fetcher.Fetcher - fetch of 
http://www.anci.it/stampa.cfm?layout=dettaglio&IdSez=2446&IdDett=5936 
failed with: java.lang.OutOfMemoryError

and nutch process died.

Anyone has this kind of experience? I believe I could modify 
JAVA_HEAP_MAX, but probably the problem could appear another time 
subsequently.
Perhaps is the problem related to my configuration or is a memory leak 
of Fetcher class ? Do I need to install a distributed configuration with 
hadoop ?

Pierluigi D'Amadio