You are viewing a plain text version of this content. The canonical link for it is here.
Posted to agent@nutch.apache.org by eyal edri <ey...@gmail.com> on 2007/09/03 07:51:37 UTC

Fetch2 vs Fetch

Hi,

Can anyone explain what is different in fetch2 vs fetch?
I've run fetch2, and i see it is restricted by the number of threads given
to him (in practise, when i run it with 1000 threads, it's much slower than
fetch).
I'm trying to understand the logic in using it instead of fetch.

thanks,

-- 
Eyal Edri

Re: Fetch2 vs Fetch

Posted by Doğacan Güney <do...@gmail.com>.
On 9/3/07, eyal edri <ey...@gmail.com> wrote:
> Hi,
>
> Can anyone explain what is different in fetch2 vs fetch?
> I've run fetch2, and i see it is restricted by the number of threads given
> to him (in practise, when i run it with 1000 threads, it's much slower than
> fetch).

When you fetch a url from host, nutch blocks that host(as in, doesn't
fetch another url from it) for a while (5 seconds by default) for
politeness. If another url from the same host comes within 5 seconds,
one of the threads in "fetch" is blocked for 5 seconds then fetches
that url. However, if the same url is read in "fetch2", fetch2 inserts
the url into a queue (so that it can fetch it later) and continues to
read the next url (either from input, or from one of the queues). So,
fetch2 should work better with a smaller number of threads, say,
around 50 which fetch needs a lot of threads since threads are blocked
all the time.

> I'm trying to understand the logic in using it instead of fetch.
>
> thanks,
>
> --
> Eyal Edri
>


-- 
Doğacan Güney

Fetch2 vs Fetch

Posted by eyal edri <ey...@gmail.com>.
Hi,

Can anyone explain what is different in fetch2 vs fetch?
I've run fetch2, and i see it is restricted by the number of threads given
to him (in practise, when i run it with 1000 threads, it's much slower than
fetch).
I'm trying to understand the logic in using it instead of fetch.

thanks,

-- 
Eyal Edri