You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Luca Rondanini <lu...@translated.net> on 2007/07/20 15:33:14 UTC

Fetching problems: Nutch 0.9 Hung Threads

Hi all,

First of all....I've read all the posts regarding this problem in the 
mailing list!! :)

I'm try to index more than 200k documents. I'm reading those documents 
through an nfs mount partition. Everything seems fine till we arrive at 
40k-50k documents....then the fetcher fails with the error "Hung Threads"!!

These are the configurations that i've tried:

1)	topN=20.000
	fetcher.threads=10
	ulimit -n=1024
	MergeFactor=20
	file.limit=1M

----> Hung Threads

2)	topN=5000
	fetcher.threads=10
	ulimit -n=1024
	MergeFactor=20
	file.limit=1M

----> Hung Threads


3)	topN=5000
	fetcher.threads=5
	ulimit -n=1024
	MergeFactor=20
	file.limit=1M

----> Too many open file


4)	topN=5000
	fetcher.threads=5
	ulimit -n=4096
	MergeFactor=10
	file.limit=1M

----> Hung Threads



Can anyone please give me a clue as to what is going on?!?
Thanks,
Luca

Re: Fetching problems: Nutch 0.9 Hung Threads

Posted by Luca Rondanini <lu...@translated.net>.
Hi,

Forgot to say the the problem doesn't occur if I crawl the same files on 
the local file system.

Thanks!

Luca

Luca Rondanini wrote:
> Hi all,
> 
> First of all....I've read all the posts regarding this problem in the 
> mailing list!! :)
> 
> I'm try to index more than 200k documents. I'm reading those documents 
> through an nfs mount partition. Everything seems fine till we arrive at 
> 40k-50k documents....then the fetcher fails with the error "Hung Threads"!!
> 
> These are the configurations that i've tried:
> 
> 1)    topN=20.000
>     fetcher.threads=10
>     ulimit -n=1024
>     MergeFactor=20
>     file.limit=1M
> 
> ----> Hung Threads
> 
> 2)    topN=5000
>     fetcher.threads=10
>     ulimit -n=1024
>     MergeFactor=20
>     file.limit=1M
> 
> ----> Hung Threads
> 
> 
> 3)    topN=5000
>     fetcher.threads=5
>     ulimit -n=1024
>     MergeFactor=20
>     file.limit=1M
> 
> ----> Too many open file
> 
> 
> 4)    topN=5000
>     fetcher.threads=5
>     ulimit -n=4096
>     MergeFactor=10
>     file.limit=1M
> 
> ----> Hung Threads
> 
> 
> 
> Can anyone please give me a clue as to what is going on?!?
> Thanks,
> Luca

Re: Fetching problems: Nutch 0.9 Hung Threads

Posted by Luca Rondanini <lu...@translated.net>.
After many tries...the problem seems solved!
I've changed the hadoop-site.xml file adding these lines:

<property>
     <name>mapred.speculative.execution</name>
     <value>false</value>
</property>

I hope this will help someone else!!
Thanks


Luca Rondanini
Research and Development
luca@translated.net
Tel: +39 06 91 62 00 55
Fax: +39 06 233 200 102

http://www.translated.net

Luca Rondanini wrote:
> Hi all,
> 
> First of all....I've read all the posts regarding this problem in the 
> mailing list!! :)
> 
> I'm try to index more than 200k documents. I'm reading those documents 
> through an nfs mount partition. Everything seems fine till we arrive at 
> 40k-50k documents....then the fetcher fails with the error "Hung Threads"!!
> 
> These are the configurations that i've tried:
> 
> 1)    topN=20.000
>     fetcher.threads=10
>     ulimit -n=1024
>     MergeFactor=20
>     file.limit=1M
> 
> ----> Hung Threads
> 
> 2)    topN=5000
>     fetcher.threads=10
>     ulimit -n=1024
>     MergeFactor=20
>     file.limit=1M
> 
> ----> Hung Threads
> 
> 
> 3)    topN=5000
>     fetcher.threads=5
>     ulimit -n=1024
>     MergeFactor=20
>     file.limit=1M
> 
> ----> Too many open file
> 
> 
> 4)    topN=5000
>     fetcher.threads=5
>     ulimit -n=4096
>     MergeFactor=10
>     file.limit=1M
> 
> ----> Hung Threads
> 
> 
> 
> Can anyone please give me a clue as to what is going on?!?
> Thanks,
> Luca