You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Arthur B (JIRA)" <ji...@apache.org> on 2016/10/18 13:16:58 UTC

[jira] [Created] (NUTCH-2328) GeneratorJob does not generate anything on second run

Arthur B created NUTCH-2328:
-------------------------------

             Summary: GeneratorJob does not generate anything on second run
                 Key: NUTCH-2328
                 URL: https://issues.apache.org/jira/browse/NUTCH-2328
             Project: Nutch
          Issue Type: Bug
          Components: generator
    Affects Versions: 2.3.1, 2.2.1, 2.3, 2.2, 2.4, 2.5
         Environment: Ubuntu 16.04 / Hadoop 2.7.1
            Reporter: Arthur B


Given a topN parameter (ie 10) the GeneratorJob will fail to generate anything new on the subsequent runs within the same process space.
To reproduce the issue submit the GeneratorJob twice one after another to the M/R framework. Second time will say it generated 0 URLs.
This issue is due to the usage of the static count field (org.apache.nutch.crawl.GeneratorReducer#count) to determine if the topN value has been reached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)