You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Mark Round <ma...@ahc.uk.com> on 2009/08/20 12:22:50 UTC
Possible memory leak in Nutch-1.0 ?
Hi all,
I am experiencing serious out of memory errors when querying Nutch, and
would appreciate any pointers or advice. I have a Nutch index that I'm
searching using a simple servlet. This servlet queries the index and
returns the results as XML, so other systems in my network can make use
of the index as a web service.
In a nutshell, the problem seems to be that after successive queries to
this servlet, the Tenured Gen increases until I run out of heap space.
I am running Nutch-1.0, with the NUTCH-738 and NUTCH-746 patches applied
(more about that below), Tomcat 6.0.20 and Sun's JVM, 1.6.0_12-b04 on
Debian Lenny 32-bit. I have also tested with OpenJDK, and got the same
results.
My servlet just does the following :
Configuration nutchConf = NutchConfiguration.create();
Path configPath = new Path(NUTCH_DIR + "/conf/" + site+
"/nutch-site.xml");
nutchConf.addResource(configPath);
NutchBean nutchBean = new NutchBean(nutchConf);
Query nutchQuery = Query.parse(nutchSearchString, nutchConf);
Hits nutchHits = nutchBean.search(nutchQuery, maxResults);
...
... Format the results as XML and output them
...
nutchBean.close();
After querying it a few hundred times, my Tenured Gen is up to 50Mb,
after a few thousand requests, I end up with over 500Mb used. I can of
course increase my heap size, but the problem is that no matter what I
set it to, eventually it will all get consumed and the only option is to
restart Tomcat.
I have obtained a heap dump and run it through jhat, but to be honest
I'm not really sure what I'm looking for. I've made the dump available
at http://www.markround.com/static/tomcat.hprof, in case that helps
anyone investigate further.
For what it's worth, I didn't seem to get this issue with Nutch-0.9.
Regarding the two patches I have applied - I had to make use of them as
otherwise, I get a lot of threads in the TIMED_WAITING state, which
according to Lambda Probe are stuck here :
java.lang.Thread.sleep ( native code )
org.apache.nutch.searcher.FetchedSegments$SegmentUpdater.run (
FetchedSegments.java:115 )
With the 2 patches applied, I still get lots of these "stuck" threads,
but they do seem to eventually get cleaned up; I wonder if this could
have anything to do with the problem ?
Please let me know if there are any other diagnostics I can run, or
information I can provide.
Many thanks,
-Mark