You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by "Meraj A. Khan" <me...@gmail.com> on 2015/01/03 21:44:31 UTC

Re: Question about db.default.fetch.interval.

Reposting my question.

Hi All,

I have a quick question regarding the db.default.fetch.interval
parameter , I have currently set it to 15 days , however my crawl
cycle itself  is going beyond 15 days and upto 30 days , now I was not
sure since I have set the db.default.fetch.interval to be only 15 days
, is there a possibility that even before a complete crawl is
completed , an already fetched page will get re-fetched before an
un-fetched page is fetched and there by fetching less number of
distinct pages.

I guess, I am trying to know if setting the db.default.fetch.interval
to a value less than the time it takes to do one complete crawl of the
web will  lead to some kind of infinite loop where the recently
fetched pages will be re-fetched before the completely un-fetched ones
because the value of the interval is less than the total crawl time ?


Thanks.

Thanks.

On Sun, Dec 28, 2014 at 11:18 AM, Meraj A. Khan <me...@gmail.com> wrote:
> Hi All,
>
> I have a quick question regarding the db.default.fetch.interval
> parameter , I have currently set it to 15 days , however my crawl
> cycle itself  is going beyond 15 days and upto 30 days , now I was not
> sure since I have set the db.default.fetch.interval to be only 15 days
> , is there a possibility that even before a complete crawl is
> completed , an already fetched page will get re-fetched before an
> un-fetched page is fetched and there by fetching less number of
> distinct pages.
>
> I guess, I am trying to know if db.default.fetch.interval be set to
> at-least be greater than one comprehensive crawl cycle time .
>
> Thanks.