You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Markus Jelsma <ma...@openindex.io> on 2011/12/23 15:19:06 UTC

Nutch unstable on 0.22.0

Hi,

In the past few weeks we evaluated and partially migrated from Hadoop 
0.20.203.0 to 0.22.0. Most stuff works fine locally and simple jobs do well on 
the cluster. However, the most essential part of Nutch, the fetcher, seems to 
be very unstable on 0.22.0. In every crawl i can no be almost certain that at 
least some mappers mysteriously freeze and eventually time out. Other mappers 
are killed straight away or after a few minutes because of OOM errors. Memory 
consumption is also a lot higher on 0.22.0.

Right now we have three clusters, an old 0.20.203 cluster and the unstable 
0.22.0 and a 0.20.205 running on the same new cluster. When we run identical 
jobs on all three clusters 0.22.0 almost always fails, eating RAM and 
occasionally freezing a mapper. Stack traces of those mappers show all threads 
are blocked and sometimes we see jstack unable to print deadlocks (null).

I tried many settings for 0.22.0 and very conservative settings for Nutch such 
as few threads to spare resources (which are abundant actually) but i cannot 
seem to find the issue. The fetcher job still uses the old mapred API.

I'd like to present a better issue report but i don't know what component in 
all this mess is actually responsible. It looks like the tasktracker but i'm 
unsure.

If anyone can point us in the right direction so we can find the issue and 
assist in fixing it that would be great.

Thanks