You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Vladimir Loubenski <vl...@opentext.com> on 2016/10/12 17:18:49 UTC

Nutch 2.3.1 OPICscoring filter

Hi,
My understanding that nutch "Generate"  job filters all available URLs in a Nutch database by two criteria:
1.  fetchTime + fetchInterval should be less than current time.
2. Number selected URLs for "Fetch" job should be less than -topN parameter value. Value for "score" field from the database is used for this selection.
During crawling I can see only two values for the  "score" field. Value 1 is always set during "Inject" job, value 0 is always set during "Parse" job. Looking on the code I see that OPICscoring plugin is used to define these values.

Is my understanding correct?   How can be defined "score" value different then 0 or 1?

Regards,
Vladimir.