You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Pierluigi D'Amadio <da...@ancitel.it> on 2007/07/18 12:23:16 UTC
OutOfMemoryError - Nutch 0.8.1
I've installed nutch 0.8.1 on a single node (a P4 3.0Ghz dual processor
with RAM 1,5GB ) and my configuration is
threads = 20
depth = 1000
topN = 1000
on linux 2.6.9.
I'm trying an intranet crawling on a site with more than 50000 pages.
I've launched JVM with standard heap options JAVA_HEAP_MAX=-Xmx1600m
After 10 hours of crawling, my logs contains a lot of OutOfMemoryError
2007-07-18 10:14:32,535 INFO fetcher.Fetcher - fetch of
http://www.anci.it/stampa.cfm?layout=dettaglio&IdSez=2446&IdDett=5936
failed with: java.lang.OutOfMemoryError
and nutch process died.
Anyone has this kind of experience? I believe I could modify
JAVA_HEAP_MAX, but probably the problem could appear another time
subsequently.
Perhaps is the problem related to my configuration or is a memory leak
of Fetcher class ? Do I need to install a distributed configuration with
hadoop ?
Pierluigi D'Amadio