You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2012/04/23 10:53:44 UTC

[jira] [Updated] (NUTCH-1322) Indexer not to reindex unmodified docs

     [ https://issues.apache.org/jira/browse/NUTCH-1322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Markus Jelsma updated NUTCH-1322:
---------------------------------

    Assignee:     (was: Markus Jelsma)
    
> Indexer not to reindex unmodified docs
> --------------------------------------
>
>                 Key: NUTCH-1322
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1322
>             Project: Nutch
>          Issue Type: Improvement
>          Components: indexer
>    Affects Versions: 1.4
>            Reporter: Markus Jelsma
>
> IndexerMapReduce already attempts not to index unmodified pages if their fetch status is set to unmodified. This, however, doesn't always work. Some documents do not have that fetch status but are actually not modified at all.
> The indexer should optionally be able not to reindex these pages.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira