You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2012/07/17 14:02:36 UTC

[jira] [Created] (NUTCH-1430) Freegenerator records overwrite CrawlDB records with AdaptiveFetchSchedule

Markus Jelsma created NUTCH-1430:
------------------------------------

             Summary: Freegenerator records overwrite CrawlDB records with AdaptiveFetchSchedule
                 Key: NUTCH-1430
                 URL: https://issues.apache.org/jira/browse/NUTCH-1430
             Project: Nutch
          Issue Type: Bug
          Components: crawldb
    Affects Versions: 1.5
            Reporter: Markus Jelsma
            Priority: Critical
             Fix For: 1.6


Steps to reproduce:

Without AdaptiveFetchSchedule:

{code}
$ bin/nutch readdb crawl/crawldb/ -url http://www.openindex.io/en/home.html
URL: http://www.openindex.io/en/home.html
Version: 7
Status: 2 (db_fetched)
Fetch time: Thu Aug 16 13:58:23 CEST 2012
Modified time: Thu Jan 01 01:00:00 CET 1970
Retries since fetch: 0
Retry interval: 2592000 seconds (30 days)
Score: 0.0
Signature: c2601ca503f2fc5edcb286501d7fb271
Metadata: Content-Type: text/html_pst_: success(1), lastModified=0
{code}

With AdaptiveFetchSchedule:

{code}
$ bin/nutch readdb crawl/crawldb/ -url http://www.openindex.io/en/home.html
URL: http://www.openindex.io/en/home.html
Version: 7
Status: 2 (db_fetched)
Fetch time: Tue Jul 17 13:56:33 CEST 2012
Modified time: Tue Jul 17 13:55:33 CEST 2012
Retries since fetch: 0
Retry interval: 60 seconds (0 days)
Score: 0.0
Signature: 23567bb52ee8b905b8649c4305ed82ee
Metadata: Content-Type: text/html_pst_: success(1), lastModified=0
{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (NUTCH-1430) Freegenerator records overwrite CrawlDB records with AdaptiveFetchSchedule

Posted by "Markus Jelsma (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13440101#comment-13440101 ] 

Markus Jelsma commented on NUTCH-1430:
--------------------------------------

This is a serious issue for users that utilize the FreeGenerator tool. Any comments? We've had this fix running in production for quite some time now. I consider it solved.
                
> Freegenerator records overwrite CrawlDB records with AdaptiveFetchSchedule
> --------------------------------------------------------------------------
>
>                 Key: NUTCH-1430
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1430
>             Project: Nutch
>          Issue Type: Bug
>          Components: crawldb
>    Affects Versions: 1.5
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>            Priority: Critical
>             Fix For: 1.6
>
>         Attachments: NUTCH-1430-1.6-1.patch
>
>
> Steps to reproduce:
> Without AdaptiveFetchSchedule:
> {code}
> $ bin/nutch readdb crawl/crawldb/ -url http://www.openindex.io/en/home.html
> URL: http://www.openindex.io/en/home.html
> Version: 7
> Status: 2 (db_fetched)
> Fetch time: Thu Aug 16 13:58:23 CEST 2012
> Modified time: Thu Jan 01 01:00:00 CET 1970
> Retries since fetch: 0
> Retry interval: 2592000 seconds (30 days)
> Score: 0.0
> Signature: c2601ca503f2fc5edcb286501d7fb271
> Metadata: Content-Type: text/html_pst_: success(1), lastModified=0
> {code}
> With AdaptiveFetchSchedule:
> {code}
> $ bin/nutch readdb crawl/crawldb/ -url http://www.openindex.io/en/home.html
> URL: http://www.openindex.io/en/home.html
> Version: 7
> Status: 2 (db_fetched)
> Fetch time: Tue Jul 17 13:56:33 CEST 2012
> Modified time: Tue Jul 17 13:55:33 CEST 2012
> Retries since fetch: 0
> Retry interval: 60 seconds (0 days)
> Score: 0.0
> Signature: 23567bb52ee8b905b8649c4305ed82ee
> Metadata: Content-Type: text/html_pst_: success(1), lastModified=0
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (NUTCH-1430) Freegenerator records overwrite CrawlDB records with AdaptiveFetchSchedule

Posted by "Markus Jelsma (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13416947#comment-13416947 ] 

Markus Jelsma commented on NUTCH-1430:
--------------------------------------

I've double checked it a couple of times now and it fixes the issue. Any comments?
                
> Freegenerator records overwrite CrawlDB records with AdaptiveFetchSchedule
> --------------------------------------------------------------------------
>
>                 Key: NUTCH-1430
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1430
>             Project: Nutch
>          Issue Type: Bug
>          Components: crawldb
>    Affects Versions: 1.5
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>            Priority: Critical
>             Fix For: 1.6
>
>         Attachments: NUTCH-1430-1.6-1.patch
>
>
> Steps to reproduce:
> Without AdaptiveFetchSchedule:
> {code}
> $ bin/nutch readdb crawl/crawldb/ -url http://www.openindex.io/en/home.html
> URL: http://www.openindex.io/en/home.html
> Version: 7
> Status: 2 (db_fetched)
> Fetch time: Thu Aug 16 13:58:23 CEST 2012
> Modified time: Thu Jan 01 01:00:00 CET 1970
> Retries since fetch: 0
> Retry interval: 2592000 seconds (30 days)
> Score: 0.0
> Signature: c2601ca503f2fc5edcb286501d7fb271
> Metadata: Content-Type: text/html_pst_: success(1), lastModified=0
> {code}
> With AdaptiveFetchSchedule:
> {code}
> $ bin/nutch readdb crawl/crawldb/ -url http://www.openindex.io/en/home.html
> URL: http://www.openindex.io/en/home.html
> Version: 7
> Status: 2 (db_fetched)
> Fetch time: Tue Jul 17 13:56:33 CEST 2012
> Modified time: Tue Jul 17 13:55:33 CEST 2012
> Retries since fetch: 0
> Retry interval: 60 seconds (0 days)
> Score: 0.0
> Signature: 23567bb52ee8b905b8649c4305ed82ee
> Metadata: Content-Type: text/html_pst_: success(1), lastModified=0
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (NUTCH-1430) Freegenerator records overwrite CrawlDB records with AdaptiveFetchSchedule

Posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lewis John McGibbney updated NUTCH-1430:
----------------------------------------

    Patch Info: Patch Available
    
> Freegenerator records overwrite CrawlDB records with AdaptiveFetchSchedule
> --------------------------------------------------------------------------
>
>                 Key: NUTCH-1430
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1430
>             Project: Nutch
>          Issue Type: Bug
>          Components: crawldb
>    Affects Versions: 1.5
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>            Priority: Critical
>             Fix For: 1.6
>
>         Attachments: NUTCH-1430-1.6-1.patch, NUTCH-1430-1.6-2.patch
>
>
> Steps to reproduce:
> Without AdaptiveFetchSchedule:
> {code}
> $ bin/nutch readdb crawl/crawldb/ -url http://www.openindex.io/en/home.html
> URL: http://www.openindex.io/en/home.html
> Version: 7
> Status: 2 (db_fetched)
> Fetch time: Thu Aug 16 13:58:23 CEST 2012
> Modified time: Thu Jan 01 01:00:00 CET 1970
> Retries since fetch: 0
> Retry interval: 2592000 seconds (30 days)
> Score: 0.0
> Signature: c2601ca503f2fc5edcb286501d7fb271
> Metadata: Content-Type: text/html_pst_: success(1), lastModified=0
> {code}
> With AdaptiveFetchSchedule:
> {code}
> $ bin/nutch readdb crawl/crawldb/ -url http://www.openindex.io/en/home.html
> URL: http://www.openindex.io/en/home.html
> Version: 7
> Status: 2 (db_fetched)
> Fetch time: Tue Jul 17 13:56:33 CEST 2012
> Modified time: Tue Jul 17 13:55:33 CEST 2012
> Retries since fetch: 0
> Retry interval: 60 seconds (0 days)
> Score: 0.0
> Signature: 23567bb52ee8b905b8649c4305ed82ee
> Metadata: Content-Type: text/html_pst_: success(1), lastModified=0
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (NUTCH-1430) Freegenerator records overwrite CrawlDB records with AdaptiveFetchSchedule

Posted by "Markus Jelsma (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13416124#comment-13416124 ] 

Markus Jelsma commented on NUTCH-1430:
--------------------------------------

If an existing record exists in the CrawlDB, it is just overwritten. The bug has been present in all recent versions. Until fixed it's bad idea to use the FreeGenerator tool with AdaptiveFetchScheduling enabled on an existing CrawlDB.
                
> Freegenerator records overwrite CrawlDB records with AdaptiveFetchSchedule
> --------------------------------------------------------------------------
>
>                 Key: NUTCH-1430
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1430
>             Project: Nutch
>          Issue Type: Bug
>          Components: crawldb
>    Affects Versions: 1.5
>            Reporter: Markus Jelsma
>            Priority: Critical
>             Fix For: 1.6
>
>
> Steps to reproduce:
> Without AdaptiveFetchSchedule:
> {code}
> $ bin/nutch readdb crawl/crawldb/ -url http://www.openindex.io/en/home.html
> URL: http://www.openindex.io/en/home.html
> Version: 7
> Status: 2 (db_fetched)
> Fetch time: Thu Aug 16 13:58:23 CEST 2012
> Modified time: Thu Jan 01 01:00:00 CET 1970
> Retries since fetch: 0
> Retry interval: 2592000 seconds (30 days)
> Score: 0.0
> Signature: c2601ca503f2fc5edcb286501d7fb271
> Metadata: Content-Type: text/html_pst_: success(1), lastModified=0
> {code}
> With AdaptiveFetchSchedule:
> {code}
> $ bin/nutch readdb crawl/crawldb/ -url http://www.openindex.io/en/home.html
> URL: http://www.openindex.io/en/home.html
> Version: 7
> Status: 2 (db_fetched)
> Fetch time: Tue Jul 17 13:56:33 CEST 2012
> Modified time: Tue Jul 17 13:55:33 CEST 2012
> Retries since fetch: 0
> Retry interval: 60 seconds (0 days)
> Score: 0.0
> Signature: 23567bb52ee8b905b8649c4305ed82ee
> Metadata: Content-Type: text/html_pst_: success(1), lastModified=0
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (NUTCH-1430) Freegenerator records overwrite CrawlDB records with AdaptiveFetchSchedule

Posted by "Markus Jelsma (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Markus Jelsma updated NUTCH-1430:
---------------------------------

    Attachment: NUTCH-1430-1.6-2.patch

Here's a new patch. It sets a defaultInterval for all free generated records. This also solves the problem of fetching 404's with the FreeGenerator, those will keep an interval of zero!

Any comments to this approach?
                
> Freegenerator records overwrite CrawlDB records with AdaptiveFetchSchedule
> --------------------------------------------------------------------------
>
>                 Key: NUTCH-1430
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1430
>             Project: Nutch
>          Issue Type: Bug
>          Components: crawldb
>    Affects Versions: 1.5
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>            Priority: Critical
>             Fix For: 1.6
>
>         Attachments: NUTCH-1430-1.6-1.patch, NUTCH-1430-1.6-2.patch
>
>
> Steps to reproduce:
> Without AdaptiveFetchSchedule:
> {code}
> $ bin/nutch readdb crawl/crawldb/ -url http://www.openindex.io/en/home.html
> URL: http://www.openindex.io/en/home.html
> Version: 7
> Status: 2 (db_fetched)
> Fetch time: Thu Aug 16 13:58:23 CEST 2012
> Modified time: Thu Jan 01 01:00:00 CET 1970
> Retries since fetch: 0
> Retry interval: 2592000 seconds (30 days)
> Score: 0.0
> Signature: c2601ca503f2fc5edcb286501d7fb271
> Metadata: Content-Type: text/html_pst_: success(1), lastModified=0
> {code}
> With AdaptiveFetchSchedule:
> {code}
> $ bin/nutch readdb crawl/crawldb/ -url http://www.openindex.io/en/home.html
> URL: http://www.openindex.io/en/home.html
> Version: 7
> Status: 2 (db_fetched)
> Fetch time: Tue Jul 17 13:56:33 CEST 2012
> Modified time: Tue Jul 17 13:55:33 CEST 2012
> Retries since fetch: 0
> Retry interval: 60 seconds (0 days)
> Score: 0.0
> Signature: 23567bb52ee8b905b8649c4305ed82ee
> Metadata: Content-Type: text/html_pst_: success(1), lastModified=0
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (NUTCH-1430) Freegenerator records overwrite CrawlDB records with AdaptiveFetchSchedule

Posted by "Markus Jelsma (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419153#comment-13419153 ] 

Markus Jelsma commented on NUTCH-1430:
--------------------------------------

Now that i've got|had your attention anyway, anyone else seen this issue? Implement change differently?
                
> Freegenerator records overwrite CrawlDB records with AdaptiveFetchSchedule
> --------------------------------------------------------------------------
>
>                 Key: NUTCH-1430
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1430
>             Project: Nutch
>          Issue Type: Bug
>          Components: crawldb
>    Affects Versions: 1.5
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>            Priority: Critical
>             Fix For: 1.6
>
>         Attachments: NUTCH-1430-1.6-1.patch
>
>
> Steps to reproduce:
> Without AdaptiveFetchSchedule:
> {code}
> $ bin/nutch readdb crawl/crawldb/ -url http://www.openindex.io/en/home.html
> URL: http://www.openindex.io/en/home.html
> Version: 7
> Status: 2 (db_fetched)
> Fetch time: Thu Aug 16 13:58:23 CEST 2012
> Modified time: Thu Jan 01 01:00:00 CET 1970
> Retries since fetch: 0
> Retry interval: 2592000 seconds (30 days)
> Score: 0.0
> Signature: c2601ca503f2fc5edcb286501d7fb271
> Metadata: Content-Type: text/html_pst_: success(1), lastModified=0
> {code}
> With AdaptiveFetchSchedule:
> {code}
> $ bin/nutch readdb crawl/crawldb/ -url http://www.openindex.io/en/home.html
> URL: http://www.openindex.io/en/home.html
> Version: 7
> Status: 2 (db_fetched)
> Fetch time: Tue Jul 17 13:56:33 CEST 2012
> Modified time: Tue Jul 17 13:55:33 CEST 2012
> Retries since fetch: 0
> Retry interval: 60 seconds (0 days)
> Score: 0.0
> Signature: 23567bb52ee8b905b8649c4305ed82ee
> Metadata: Content-Type: text/html_pst_: success(1), lastModified=0
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (NUTCH-1430) Freegenerator records overwrite CrawlDB records with AdaptiveFetchSchedule

Posted by "Markus Jelsma (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Markus Jelsma reassigned NUTCH-1430:
------------------------------------

    Assignee: Markus Jelsma
    
> Freegenerator records overwrite CrawlDB records with AdaptiveFetchSchedule
> --------------------------------------------------------------------------
>
>                 Key: NUTCH-1430
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1430
>             Project: Nutch
>          Issue Type: Bug
>          Components: crawldb
>    Affects Versions: 1.5
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>            Priority: Critical
>             Fix For: 1.6
>
>
> Steps to reproduce:
> Without AdaptiveFetchSchedule:
> {code}
> $ bin/nutch readdb crawl/crawldb/ -url http://www.openindex.io/en/home.html
> URL: http://www.openindex.io/en/home.html
> Version: 7
> Status: 2 (db_fetched)
> Fetch time: Thu Aug 16 13:58:23 CEST 2012
> Modified time: Thu Jan 01 01:00:00 CET 1970
> Retries since fetch: 0
> Retry interval: 2592000 seconds (30 days)
> Score: 0.0
> Signature: c2601ca503f2fc5edcb286501d7fb271
> Metadata: Content-Type: text/html_pst_: success(1), lastModified=0
> {code}
> With AdaptiveFetchSchedule:
> {code}
> $ bin/nutch readdb crawl/crawldb/ -url http://www.openindex.io/en/home.html
> URL: http://www.openindex.io/en/home.html
> Version: 7
> Status: 2 (db_fetched)
> Fetch time: Tue Jul 17 13:56:33 CEST 2012
> Modified time: Tue Jul 17 13:55:33 CEST 2012
> Retries since fetch: 0
> Retry interval: 60 seconds (0 days)
> Score: 0.0
> Signature: 23567bb52ee8b905b8649c4305ed82ee
> Metadata: Content-Type: text/html_pst_: success(1), lastModified=0
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (NUTCH-1430) Freegenerator records overwrite CrawlDB records with AdaptiveFetchSchedule

Posted by "Markus Jelsma (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/NUTCH-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Markus Jelsma updated NUTCH-1430:
---------------------------------

    Attachment: NUTCH-1430-1.6-1.patch

Patch for 1.6. This fixes the issue by setting a default interval for CrawlDatum records without one before proceeding with the scheduler's other code.


                
> Freegenerator records overwrite CrawlDB records with AdaptiveFetchSchedule
> --------------------------------------------------------------------------
>
>                 Key: NUTCH-1430
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1430
>             Project: Nutch
>          Issue Type: Bug
>          Components: crawldb
>    Affects Versions: 1.5
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>            Priority: Critical
>             Fix For: 1.6
>
>         Attachments: NUTCH-1430-1.6-1.patch
>
>
> Steps to reproduce:
> Without AdaptiveFetchSchedule:
> {code}
> $ bin/nutch readdb crawl/crawldb/ -url http://www.openindex.io/en/home.html
> URL: http://www.openindex.io/en/home.html
> Version: 7
> Status: 2 (db_fetched)
> Fetch time: Thu Aug 16 13:58:23 CEST 2012
> Modified time: Thu Jan 01 01:00:00 CET 1970
> Retries since fetch: 0
> Retry interval: 2592000 seconds (30 days)
> Score: 0.0
> Signature: c2601ca503f2fc5edcb286501d7fb271
> Metadata: Content-Type: text/html_pst_: success(1), lastModified=0
> {code}
> With AdaptiveFetchSchedule:
> {code}
> $ bin/nutch readdb crawl/crawldb/ -url http://www.openindex.io/en/home.html
> URL: http://www.openindex.io/en/home.html
> Version: 7
> Status: 2 (db_fetched)
> Fetch time: Tue Jul 17 13:56:33 CEST 2012
> Modified time: Tue Jul 17 13:55:33 CEST 2012
> Retries since fetch: 0
> Retry interval: 60 seconds (0 days)
> Score: 0.0
> Signature: 23567bb52ee8b905b8649c4305ed82ee
> Metadata: Content-Type: text/html_pst_: success(1), lastModified=0
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (NUTCH-1430) Freegenerator records overwrite CrawlDB records with AdaptiveFetchSchedule

Posted by "Lewis John McGibbney (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/NUTCH-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13440189#comment-13440189 ] 

Lewis John McGibbney commented on NUTCH-1430:
---------------------------------------------

Hi Markus yeah you are right (and although I am not using FreeGenerator) this is a bad one. The last thing we wish is for the default interval to disappear (overwrite) and our ModifiedTime to space hop back to the 70's... not a good thought, however I do like Led Zeppelin. Anyway I'm +1 for this, although you've had it running in production it would be real nice to try and test for this though. Basically I'm +1. Thanks
                
> Freegenerator records overwrite CrawlDB records with AdaptiveFetchSchedule
> --------------------------------------------------------------------------
>
>                 Key: NUTCH-1430
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1430
>             Project: Nutch
>          Issue Type: Bug
>          Components: crawldb
>    Affects Versions: 1.5
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>            Priority: Critical
>             Fix For: 1.6
>
>         Attachments: NUTCH-1430-1.6-1.patch
>
>
> Steps to reproduce:
> Without AdaptiveFetchSchedule:
> {code}
> $ bin/nutch readdb crawl/crawldb/ -url http://www.openindex.io/en/home.html
> URL: http://www.openindex.io/en/home.html
> Version: 7
> Status: 2 (db_fetched)
> Fetch time: Thu Aug 16 13:58:23 CEST 2012
> Modified time: Thu Jan 01 01:00:00 CET 1970
> Retries since fetch: 0
> Retry interval: 2592000 seconds (30 days)
> Score: 0.0
> Signature: c2601ca503f2fc5edcb286501d7fb271
> Metadata: Content-Type: text/html_pst_: success(1), lastModified=0
> {code}
> With AdaptiveFetchSchedule:
> {code}
> $ bin/nutch readdb crawl/crawldb/ -url http://www.openindex.io/en/home.html
> URL: http://www.openindex.io/en/home.html
> Version: 7
> Status: 2 (db_fetched)
> Fetch time: Tue Jul 17 13:56:33 CEST 2012
> Modified time: Tue Jul 17 13:55:33 CEST 2012
> Retries since fetch: 0
> Retry interval: 60 seconds (0 days)
> Score: 0.0
> Signature: 23567bb52ee8b905b8649c4305ed82ee
> Metadata: Content-Type: text/html_pst_: success(1), lastModified=0
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira