You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2013/01/12 20:38:15 UTC

[jira] [Updated] (NUTCH-475) Adaptive crawl delay

     [ https://issues.apache.org/jira/browse/NUTCH-475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lewis John McGibbney updated NUTCH-475:
---------------------------------------

    Fix Version/s: 1.7
    
> Adaptive crawl delay
> --------------------
>
>                 Key: NUTCH-475
>                 URL: https://issues.apache.org/jira/browse/NUTCH-475
>             Project: Nutch
>          Issue Type: Improvement
>          Components: fetcher
>            Reporter: Doğacan Güney
>             Fix For: 1.7
>
>         Attachments: adaptive-delay_draft.patch, NUTCH-475.patch
>
>
> Current fetcher implementation waits a default interval before making another request to the same server (if crawl-delay is not specified in robots.txt). IMHO, an adaptive implementation will be better. If the server is under little load and can server requests fast, then fetcher can ask for more pages in a given interval. Similarly, if the server is suffering from heavy load, fetcher can slow down(w.r.t that host), easing the load on the server.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira