You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Raghavendra Prabhu <rr...@gmail.com> on 2006/03/28 15:04:34 UTC
adaptive fetch
Hi Andrzej
After applying the patch, i seemed to find some strange behaviour
The fetch list for each URL was getting created inspite of the fact that
db.default.fetch.interval had not been reached
I thought this was supposed to be in this order
1)For the particular url/file get db fetch interval (which changes)
2) if current date exceeds db fetch interval, generate fetch list for the
particular file url
3) fetch list checks for file modified date and then decides to fetch the
latest contents file/URL
It is supposed to function in the above manner right. Did i miss out
anything???
Rgds
Prabhu
Re: adaptive fetch
Posted by Andrzej Bialecki <ab...@getopt.org>.
Raghavendra Prabhu wrote:
> Hi Andrzej
>
> After applying the patch, i seemed to find some strange behaviour
>
> The fetch list for each URL was getting created inspite of the fact that
> db.default.fetch.interval had not been reached
>
You probably forgot to change the interval from days to seconds. It's
now expressed in seconds. This defines the maximum allowed interval, and
any pages with interval higher than that will be refetched anyway - so
if it's 30 (seconds :) ) then there is a high probability that you reach
this limit before each cycle completes...
> I thought this was supposed to be in this order
>
> 1)For the particular url/file get db fetch interval (which changes)
>
> 2) if current date exceeds db fetch interval, generate fetch list for the
> particular file url
>
> 3) fetch list checks for file modified date and then decides to fetch the
> latest contents file/URL
>
> It is supposed to function in the above manner right. Did i miss out
> anything???
>
>
Yes, this is how it's supposed to work.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com