You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Enis Soztutar (JIRA)" <ji...@apache.org> on 2007/04/26 07:55:15 UTC

[jira] Commented: (NUTCH-475) Adaptive crawl delay

    [ https://issues.apache.org/jira/browse/NUTCH-475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12491882 ] 

Enis Soztutar commented on NUTCH-475:
-------------------------------------

we can use a formula like : 

delay = alpha * delay + (1 - alpha) * (k * t)

where 0 < alpha <= 1

so that the waiting time is less sensitive to varying reply times of the server. 


> Adaptive crawl delay
> --------------------
>
>                 Key: NUTCH-475
>                 URL: https://issues.apache.org/jira/browse/NUTCH-475
>             Project: Nutch
>          Issue Type: Improvement
>          Components: fetcher
>            Reporter: Doğacan Güney
>             Fix For: 1.0.0
>
>         Attachments: adaptive-delay_draft.patch
>
>
> Current fetcher implementation waits a default interval before making another request to the same server (if crawl-delay is not specified in robots.txt). IMHO, an adaptive implementation will be better. If the server is under little load and can server requests fast, then fetcher can ask for more pages in a given interval. Similarly, if the server is suffering from heavy load, fetcher can slow down(w.r.t that host), easing the load on the server.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.