You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2006/09/26 11:57:53 UTC

[jira] Closed: (NUTCH-372) Fetcher halting and throttling

     [ http://issues.apache.org/jira/browse/NUTCH-372?page=all ]

Andrzej Bialecki  closed NUTCH-372.
-----------------------------------

    Resolution: Invalid

When submitting this issue, JIRA reported SQL errors and refused to continue, giving the impression that this sub-task was not created.. so I decided to put this code under the original issue. Please see NUTCH-368 for the code.

> Fetcher halting and throttling
> ------------------------------
>
>                 Key: NUTCH-372
>                 URL: http://issues.apache.org/jira/browse/NUTCH-372
>             Project: Nutch
>          Issue Type: Sub-task
>          Components: fetcher
>            Reporter: Andrzej Bialecki 
>         Assigned To: Andrzej Bialecki 
>
> This patch uses the message queueing framework to implement the following functionality:
> * ability to gracefully stop fetching the current segment. This is different from simply killing the job in that the partial results (partially fetched segment) are available and can be further processed. This is especially useful for fetching large segments with long "tails", i.e. pages which are fetched very slowly, either because of politeness settings or the target site's bandwidth limitations.
> * ability to dynamicaly adjust the number of fetcher threads. For a long-running fetch job it makes sense to decrease the number of fetcher threads during the day, and increase it during the night. This can be done now with a cron script, using the MsgQueueTool command-line.
> It's worthwhile to note that the patch itself is trivial, and most of the work is done by the MQ framework.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira