You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Emmanuel <jo...@gmail.com> on 2007/09/10 15:22:23 UTC

Fetcher2 politeness?

I decided to use Fetcher2 instead of Fetcher and i noticed that
Fetcher2 doesn't act
on a polite way. I mean it doesn't wait fetcher.server.delay before
doing another
request on the same server.

In Fetcher2 (on the last version of trunk), someone has defined this option:
    // set non-blocking & no-robots mode for HTTP protocol plugins.
    getConf().setBoolean(Protocol.CHECK_BLOCKING, false);
    getConf().setBoolean(Protocol.CHECK_ROBOTS, false);

In this case, the protocol HTTP doesn't wait crawlDelay defore doing
another request.
May I know exactly why ?
Is it normal or a bug ?

Re: Fetcher2 politeness?

Posted by Andrzej Bialecki <ab...@getopt.org>.
Emmanuel wrote:
> I decided to use Fetcher2 instead of Fetcher and i noticed that
> Fetcher2 doesn't act
> on a polite way. I mean it doesn't wait fetcher.server.delay before
> doing another
> request on the same server.
> 
> In Fetcher2 (on the last version of trunk), someone has defined this option:
>     // set non-blocking & no-robots mode for HTTP protocol plugins.
>     getConf().setBoolean(Protocol.CHECK_BLOCKING, false);
>     getConf().setBoolean(Protocol.CHECK_ROBOTS, false);
> 
> In this case, the protocol HTTP doesn't wait crawlDelay defore doing
> another request.
> May I know exactly why ?
> Is it normal or a bug ?
> 

Have you actually observed this wrong behavior during fetching? Fetcher2 
  performs blocking in a different way than Fetcher - it controls the 
blocking itself, instead of delegating it to the protocol plugin. These 
two properties are set to false on purpose.


-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com