You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Danicela nutch <Da...@mail.com> on 2012/02/14 16:10:09 UTC
fetcher.max.crawl.delay = -1 doesn't work?
Hi,
I have in my nutch-site.xml the value fetcher.max.crawl.delay = -1.
When I try to fetch a site with a robots.txt with a Crawl Delay, it doesn't work.
If I put fetcher.max.crawl.delay = 10000, it works.
I use Nutch 1.2, but according to the changelog, nothing has been changed about that since then.
Is this a Nutch bug or I misused something ?
Another thing, in hadoop.log, the pages which couldn't be fetched are still marked as "fetching", is this normal ? Shouldn't they be marked as "dropped" or something ?
Thanks.
Re: fetcher.max.crawl.delay = -1 doesn't work?
Posted by Lewis John Mcgibbney <le...@gmail.com>.
Hi Danicela,
Before I try this, have you configured any other overrides for generating
or fetching in nutch-site.xml?
Thanks
On Tue, Feb 14, 2012 at 3:10 PM, Danicela nutch <Da...@mail.com>wrote:
> Hi,
>
> I have in my nutch-site.xml the value fetcher.max.crawl.delay = -1.
>
> When I try to fetch a site with a robots.txt with a Crawl Delay, it
> doesn't work.
>
> If I put fetcher.max.crawl.delay = 10000, it works.
>
> I use Nutch 1.2, but according to the changelog, nothing has been changed
> about that since then.
>
> Is this a Nutch bug or I misused something ?
>
> Another thing, in hadoop.log, the pages which couldn't be fetched are
> still marked as "fetching", is this normal ? Shouldn't they be marked as
> "dropped" or something ?
>
> Thanks.
>
--
*Lewis*