You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Markus Jelsma (Created) (JIRA)" <ji...@apache.org> on 2012/03/30 22:07:26 UTC

[jira] [Created] (NUTCH-1322) Indexer not to reindex unmodified docs

Indexer not to reindex unmodified docs
--------------------------------------

                 Key: NUTCH-1322
                 URL: https://issues.apache.org/jira/browse/NUTCH-1322
             Project: Nutch
          Issue Type: Improvement
          Components: indexer
    Affects Versions: 1.4
            Reporter: Markus Jelsma
            Assignee: Markus Jelsma


IndexerMapReduce already attempts not to index unmodified pages if their fetch status is set to unmodified. This, however, doesn't always work. Some documents do not have that fetch status but are actually not modified at all.

The indexer should optionally be able not to reindex these pages.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (NUTCH-1322) Indexer not to reindex unmodified docs

Posted by "Markus Jelsma (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-1322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Markus Jelsma updated NUTCH-1322:
---------------------------------

    Assignee:     (was: Markus Jelsma)
    
> Indexer not to reindex unmodified docs
> --------------------------------------
>
>                 Key: NUTCH-1322
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1322
>             Project: Nutch
>          Issue Type: Improvement
>          Components: indexer
>    Affects Versions: 1.4
>            Reporter: Markus Jelsma
>
> IndexerMapReduce already attempts not to index unmodified pages if their fetch status is set to unmodified. This, however, doesn't always work. Some documents do not have that fetch status but are actually not modified at all.
> The indexer should optionally be able not to reindex these pages.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Closed] (NUTCH-1322) Indexer not to reindex unmodified docs

Posted by "Markus Jelsma (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-1322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Markus Jelsma closed NUTCH-1322.
--------------------------------

    Resolution: Duplicate
    
> Indexer not to reindex unmodified docs
> --------------------------------------
>
>                 Key: NUTCH-1322
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1322
>             Project: Nutch
>          Issue Type: Improvement
>          Components: indexer
>    Affects Versions: 1.4
>            Reporter: Markus Jelsma
>
> IndexerMapReduce already attempts not to index unmodified pages if their fetch status is set to unmodified. This, however, doesn't always work. Some documents do not have that fetch status but are actually not modified at all.
> The indexer should optionally be able not to reindex these pages.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira