You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Raghavendra Prabhu <rr...@gmail.com> on 2006/03/28 15:04:34 UTC

adaptive fetch

Hi Andrzej

After applying the patch, i seemed to find some strange behaviour

The fetch list for each URL was getting created inspite of the fact that
db.default.fetch.interval had not been reached

I thought this was supposed to be in this order

1)For the particular url/file get db fetch interval (which changes)

2) if current date exceeds db fetch interval, generate fetch list for the
particular file url

3) fetch list checks for file modified date and then decides to fetch the
latest contents file/URL

It is supposed to function in the above manner right. Did i miss out
anything???


Rgds
Prabhu

Re: adaptive fetch

Posted by Andrzej Bialecki <ab...@getopt.org>.
Raghavendra Prabhu wrote:
> Hi Andrzej
>
> After applying the patch, i seemed to find some strange behaviour
>
> The fetch list for each URL was getting created inspite of the fact that
> db.default.fetch.interval had not been reached
>   

You probably forgot to change the interval from days to seconds. It's 
now expressed in seconds. This defines the maximum allowed interval, and 
any pages with interval higher than that will be refetched anyway - so 
if it's 30 (seconds :) ) then there is a high probability that you reach 
this limit before each cycle completes...

> I thought this was supposed to be in this order
>
> 1)For the particular url/file get db fetch interval (which changes)
>
> 2) if current date exceeds db fetch interval, generate fetch list for the
> particular file url
>
> 3) fetch list checks for file modified date and then decides to fetch the
> latest contents file/URL
>
> It is supposed to function in the above manner right. Did i miss out
> anything???
>
>   

Yes, this is how it's supposed to work.

-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com