You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Arthur B (JIRA)" <ji...@apache.org> on 2016/10/18 13:16:58 UTC
[jira] [Created] (NUTCH-2328) GeneratorJob does not generate
anything on second run
Arthur B created NUTCH-2328:
-------------------------------
Summary: GeneratorJob does not generate anything on second run
Key: NUTCH-2328
URL: https://issues.apache.org/jira/browse/NUTCH-2328
Project: Nutch
Issue Type: Bug
Components: generator
Affects Versions: 2.3.1, 2.2.1, 2.3, 2.2, 2.4, 2.5
Environment: Ubuntu 16.04 / Hadoop 2.7.1
Reporter: Arthur B
Given a topN parameter (ie 10) the GeneratorJob will fail to generate anything new on the subsequent runs within the same process space.
To reproduce the issue submit the GeneratorJob twice one after another to the M/R framework. Second time will say it generated 0 URLs.
This issue is due to the usage of the static count field (org.apache.nutch.crawl.GeneratorReducer#count) to determine if the topN value has been reached.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)