You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Briggs <ac...@gmail.com> on 2007/04/19 21:02:24 UTC

Nutch and Crawl Frequency

Nutch 0.9

Anyone know if it is possible to be more granular regarding crawl
frequency?  Meaning, that I would like some sites to be crawled more
often then others. Like, a news site should be crawled every day, but
your average business website should be crawled every 30 days.  So, is
it possible to specify a crawl frequency for specific urls, or is it
only global for within the crawl db?  I suppose I could have several
crawldbs or something like that, and deal with it.. but, just curious.

Thanks
-- 
"Conscious decisions by conscious minds are what make reality real"

Re: Nutch and Crawl Frequency

Posted by Tomi N/A <he...@gmail.com>.
2007/4/19, Briggs <ac...@gmail.com>:
> Nutch 0.9
>
> Anyone know if it is possible to be more granular regarding crawl
> frequency?  Meaning, that I would like some sites to be crawled more
> often then others. Like, a news site should be crawled every day, but
> your average business website should be crawled every 30 days.  So, is
> it possible to specify a crawl frequency for specific urls, or is it
> only global for within the crawl db?  I suppose I could have several
> crawldbs or something like that, and deal with it.. but, just curious.

There's something like that in the nutch JIRA (couldn't find it,
though), only the JIRA issue is about an adaptive algorithm (as
opposed to user provided settings) which would determine the rate of
content change at any given URL and adapt the crawl frequency
accordingly. Don't know if it's more than a wish, at this point.

Cheers,
t.n.a.

Re: Nutch and Crawl Frequency

Posted by Briggs <ac...@gmail.com>.
Cool, cool.  Thanks!

On 4/19/07, Gal Nitzan <ga...@gmail.com> wrote:
> As it is right now... You answered the question yourself :-) ...
>
> Separate db's and the whole ceremony...
>
>
> > -----Original Message-----
> > From: Briggs [mailto:acidbriggs@gmail.com]
> > Sent: Thursday, April 19, 2007 10:02 PM
> > To: nutch-user@lucene.apache.org
> > Subject: Nutch and Crawl Frequency
> >
> > Nutch 0.9
> >
> > Anyone know if it is possible to be more granular regarding crawl
> > frequency?  Meaning, that I would like some sites to be crawled more
> > often then others. Like, a news site should be crawled every day, but
> > your average business website should be crawled every 30 days.  So, is
> > it possible to specify a crawl frequency for specific urls, or is it
> > only global for within the crawl db?  I suppose I could have several
> > crawldbs or something like that, and deal with it.. but, just curious.
> >
> > Thanks
> > --
> > "Conscious decisions by conscious minds are what make reality real"
>
>
>


-- 
"Conscious decisions by concious minds are what make reality real"

RE: Nutch and Crawl Frequency

Posted by Gal Nitzan <ga...@gmail.com>.
As it is right now... You answered the question yourself :-) ...

Separate db's and the whole ceremony...


> -----Original Message-----
> From: Briggs [mailto:acidbriggs@gmail.com]
> Sent: Thursday, April 19, 2007 10:02 PM
> To: nutch-user@lucene.apache.org
> Subject: Nutch and Crawl Frequency
>
> Nutch 0.9
>
> Anyone know if it is possible to be more granular regarding crawl
> frequency?  Meaning, that I would like some sites to be crawled more
> often then others. Like, a news site should be crawled every day, but
> your average business website should be crawled every 30 days.  So, is
> it possible to specify a crawl frequency for specific urls, or is it
> only global for within the crawl db?  I suppose I could have several
> crawldbs or something like that, and deal with it.. but, just curious.
>
> Thanks
> --
> "Conscious decisions by conscious minds are what make reality real"