You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Emmanuel JOKE <jo...@gmail.com> on 2007/06/20 14:55:12 UTC

Performance: Fetcher2 or Fetcher

Hi Guys,

I have a cluster of 2 machines. I tried to crawl some website which contains
over 1M of pages. I notice that it takes fews days to complete the crawl.
The logs said 0.5p/s at 200kb/s. It seems very slow. I would like to try
Fetcher2, i guess it might improve the performance.

It might be a stupid question but i'm wondering how to i setup my nutch to
use Fetcher2 instead of Fetcher.
Could you help me to understand ?

Beside, what is usually the standard to configure fetcher.server.delay, I
was told that we should set this property to 1 second but i can see in
nutch-default.xml that it has been setup to 5. What is the best to do to
gain in term of performance and to stay enough polite ?

More tricks to gain performance are welcome

E

Re: Performance: Fetcher2 or Fetcher

Posted by Doğacan Güney <do...@gmail.com>.
On 6/20/07, Emmanuel JOKE <jo...@gmail.com> wrote:
> Hi Guys,
>
> I have a cluster of 2 machines. I tried to crawl some website which contains
> over 1M of pages. I notice that it takes fews days to complete the crawl.
> The logs said 0.5p/s at 200kb/s. It seems very slow. I would like to try
> Fetcher2, i guess it might improve the performance.
>
> It might be a stupid question but i'm wondering how to i setup my nutch to
> use Fetcher2 instead of Fetcher.
> Could you help me to understand ?

Are you running nutch with 'crawl' command, with seperate commands
(inject, generate, fetch, etc.)or something else?

If you are running seperate commands, all you have to do is change
fetch to fetch2.

>
> Beside, what is usually the standard to configure fetcher.server.delay, I
> was told that we should set this property to 1 second but i can see in
> nutch-default.xml that it has been setup to 5. What is the best to do to
> gain in term of performance and to stay enough polite ?

That's kind of between you and the server you are fetching but I
wouldn't recommend a delay lower than 5 seconds.

>
> More tricks to gain performance are welcome
>
> E
>


-- 
Doğacan Güney