You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Audrey Liu <au...@gmail.com> on 2007/07/20 22:56:52 UTC
tweaking config files for better performance
Hi,
I am using Nutch 0.9, and I'm trying to crawl our Intranet site (~60,000
pages, ~28,000 htmls). I've seen other posts where people mentioned they can
get their crawler to do 20pages/sec, and the best I've seen so far is only 8
pages/sec.
I've also read that the fetcher threads tend to block when it tries to fetch
pages from the same host. So I'm wondering what kind of configurations
should I set to get the best performance, my current configurations in
nutch-site.xml is as follows:
<property>
<name>fetcher.threads.fetch</name>
<value>200</value>
</property>
<property>
<name>fetcher.threads.per.host</name>
<value>50</value>
</property>
<property>
<name>http.max.delays</name>
<value>1</value>
</property>
Any pointers are greatly appreciated!! Thanks in advance.
AL
--
View this message in context: http://www.nabble.com/tweaking-config-files-for-better-performance-tf4119552.html#a11715927
Sent from the Nutch - User mailing list archive at Nabble.com.
Re: tweaking config files for better performance
Posted by Audrey Liu <au...@gmail.com>.
Hi,
Thanks for the reply!
I've tried the configurations that is in the link, it didn't seem to help
much, at least not to get it up to 20 pages/sec. Could it be that I'm doing
an Intranet search?
I just really want to know how did other people get their performance to be
so fast??
Any pointers are appreciated! Thanks!!
Audrey
Audrey Liu wrote:
>
> Hi,
>
> I am using Nutch 0.9, and I'm trying to crawl our Intranet site (~60,000
> pages, ~28,000 htmls). I've seen other posts where people mentioned they
> can get their crawler to do 20pages/sec, and the best I've seen so far is
> only 8 pages/sec.
>
> I've also read that the fetcher threads tend to block when it tries to
> fetch pages from the same host. So I'm wondering what kind of
> configurations should I set to get the best performance, my current
> configurations in nutch-site.xml is as follows:
>
> <property>
> <name>fetcher.threads.fetch</name>
> <value>200</value>
> </property>
>
> <property>
> <name>fetcher.threads.per.host</name>
> <value>50</value>
> </property>
>
> <property>
> <name>http.max.delays</name>
> <value>1</value>
> </property>
>
> Any pointers are greatly appreciated!! Thanks in advance.
>
> AL
>
--
View this message in context: http://www.nabble.com/tweaking-config-files-for-better-performance-tf4119552.html#a11750336
Sent from the Nutch - User mailing list archive at Nabble.com.