You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2012/11/22 15:58:59 UTC

[jira] [Updated] (NUTCH-1430) Freegenerator records overwrite CrawlDB records with AdaptiveFetchSchedule

     [ https://issues.apache.org/jira/browse/NUTCH-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lewis John McGibbney updated NUTCH-1430:
----------------------------------------

    Patch Info: Patch Available
    
> Freegenerator records overwrite CrawlDB records with AdaptiveFetchSchedule
> --------------------------------------------------------------------------
>
>                 Key: NUTCH-1430
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1430
>             Project: Nutch
>          Issue Type: Bug
>          Components: crawldb
>    Affects Versions: 1.5
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>            Priority: Critical
>             Fix For: 1.6
>
>         Attachments: NUTCH-1430-1.6-1.patch, NUTCH-1430-1.6-2.patch
>
>
> Steps to reproduce:
> Without AdaptiveFetchSchedule:
> {code}
> $ bin/nutch readdb crawl/crawldb/ -url http://www.openindex.io/en/home.html
> URL: http://www.openindex.io/en/home.html
> Version: 7
> Status: 2 (db_fetched)
> Fetch time: Thu Aug 16 13:58:23 CEST 2012
> Modified time: Thu Jan 01 01:00:00 CET 1970
> Retries since fetch: 0
> Retry interval: 2592000 seconds (30 days)
> Score: 0.0
> Signature: c2601ca503f2fc5edcb286501d7fb271
> Metadata: Content-Type: text/html_pst_: success(1), lastModified=0
> {code}
> With AdaptiveFetchSchedule:
> {code}
> $ bin/nutch readdb crawl/crawldb/ -url http://www.openindex.io/en/home.html
> URL: http://www.openindex.io/en/home.html
> Version: 7
> Status: 2 (db_fetched)
> Fetch time: Tue Jul 17 13:56:33 CEST 2012
> Modified time: Tue Jul 17 13:55:33 CEST 2012
> Retries since fetch: 0
> Retry interval: 60 seconds (0 days)
> Score: 0.0
> Signature: 23567bb52ee8b905b8649c4305ed82ee
> Metadata: Content-Type: text/html_pst_: success(1), lastModified=0
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira