You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@nutch.apache.org by Fred Zimmerman <wf...@gmail.com> on 2018/04/09 19:13:41 UTC

how do fetch wait times work?

When I run bin/crawl once and it generates a segment list with a bunch of
fetch dates in the future, does nutch proactively run those fetches on
those future dates, or do I have to do something to make that happen?

Re: how do fetch wait times work?

Posted by Sebastian Nagel <wa...@googlemail.com>.

Hi Fred,

Nutch does nothing "proactively", the crawl jobs must be explicitly called.
But you need no special command:
- let's say the you didn't change the defaults and
  db.fetch.interval.default == 30 days
- if you launch bin/crawl one month later, all pages are refetched,
  and optionally reindexed (404s removed)
- just to clarify: new segments will be created, old segments can be
  removed, except you need same to recover eg. if the index is lost

Best,
Sebastian

On 04/09/2018 09:13 PM, Fred Zimmerman wrote:
> When I run bin/crawl once and it generates a segment list with a bunch of
> fetch dates in the future, does nutch proactively run those fetches on
> those future dates, or do I have to do something to make that happen?
>