You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Fred Zimmerman <wf...@gmail.com> on 2018/04/09 19:13:41 UTC
how do fetch wait times work?
When I run bin/crawl once and it generates a segment list with a bunch of
fetch dates in the future, does nutch proactively run those fetches on
those future dates, or do I have to do something to make that happen?
Re: how do fetch wait times work?
Posted by Sebastian Nagel <wa...@googlemail.com>.
Hi Fred,
Nutch does nothing "proactively", the crawl jobs must be explicitly called.
But you need no special command:
- let's say the you didn't change the defaults and
db.fetch.interval.default == 30 days
- if you launch bin/crawl one month later, all pages are refetched,
and optionally reindexed (404s removed)
- just to clarify: new segments will be created, old segments can be
removed, except you need same to recover eg. if the index is lost
Best,
Sebastian
On 04/09/2018 09:13 PM, Fred Zimmerman wrote:
> When I run bin/crawl once and it generates a segment list with a bunch of
> fetch dates in the future, does nutch proactively run those fetches on
> those future dates, or do I have to do something to make that happen?
>