You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2019/09/30 08:02:00 UTC
[jira] [Created] (NUTCH-2737) Generator: count and log reason of
rejections during selection
Sebastian Nagel created NUTCH-2737:
--------------------------------------
Summary: Generator: count and log reason of rejections during selection
Key: NUTCH-2737
URL: https://issues.apache.org/jira/browse/NUTCH-2737
Project: Nutch
Issue Type: Improvement
Affects Versions: 1.16
Reporter: Sebastian Nagel
Fix For: 1.17
During the map phase of the selection step, the generator rejects many (usually most of) items for various reasons:
- not yet time for a refetch (returned by the fetch scheduler)
- generator score too low
- status does not match restrict status
- Jexl expression not matched
and some more. It would be useful if the reasons are counted and logged, esp. when the CrawlDb gets bigger and multiple options to restrict the selection are used.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)