You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@nutch.apache.org by Danicela nutch <Da...@mail.com> on 2012/02/14 16:10:09 UTC

fetcher.max.crawl.delay = -1 doesn't work?

Hi,

 I have in my nutch-site.xml the value fetcher.max.crawl.delay = -1.

 When I try to fetch a site with a robots.txt with a Crawl Delay, it doesn't work.

 If I put fetcher.max.crawl.delay = 10000, it works.

 I use Nutch 1.2, but according to the changelog, nothing has been changed about that since then.

 Is this a Nutch bug or I misused something ?

 Another thing, in hadoop.log, the pages which couldn't be fetched are still marked as "fetching", is this normal ? Shouldn't they be marked as "dropped" or something ?

 Thanks.

Re: fetcher.max.crawl.delay = -1 doesn't work?

Posted by Lewis John Mcgibbney <le...@gmail.com>.

Hi Danicela,

Before I try this, have you configured any other overrides for generating
or fetching in nutch-site.xml?

Thanks

On Tue, Feb 14, 2012 at 3:10 PM, Danicela nutch <Da...@mail.com>wrote:

> Hi,
>
>  I have in my nutch-site.xml the value fetcher.max.crawl.delay = -1.
>
>  When I try to fetch a site with a robots.txt with a Crawl Delay, it
> doesn't work.
>
>  If I put fetcher.max.crawl.delay = 10000, it works.
>
>  I use Nutch 1.2, but according to the changelog, nothing has been changed
> about that since then.
>
>  Is this a Nutch bug or I misused something ?
>
>  Another thing, in hadoop.log, the pages which couldn't be fetched are
> still marked as "fetching", is this normal ? Shouldn't they be marked as
> "dropped" or something ?
>
>  Thanks.
>



-- 
*Lewis*