You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "applepear (JIRA)" <ji...@apache.org> on 2012/07/24 23:59:34 UTC

[jira] [Commented] (NUTCH-1238) Fetcher throughput threshold must start before feeder finished

    [ https://issues.apache.org/jira/browse/NUTCH-1238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13421804#comment-13421804 ] 

applepear commented on NUTCH-1238:
----------------------------------

the fix is not correct... in the fix, when the throughput falls below the threshold, the queue is emptied and the throughput threshold is disabled. however, if the feeder is still alive, it will continue to feed more urls of the same domain, which is problematic. now that the throughput threshold is disabled, the map will no longer stop due to low throughput.
                
> Fetcher throughput threshold must start before feeder finished
> --------------------------------------------------------------
>
>                 Key: NUTCH-1238
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1238
>             Project: Nutch
>          Issue Type: Improvement
>          Components: fetcher
>    Affects Versions: 1.4
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>            Priority: Trivial
>             Fix For: 1.5
>
>         Attachments: NUTCH-1238-1.5-1.patch
>
>
> Right now the fetcher's minimum throughput threshold is activated only when the feeder has finished. However, for various reasons a running fetch can be slow. This issue must change the feature to start checking earlier, but not right after initialization.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira