You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Julien Nioche (JIRA)" <ji...@apache.org> on 2012/05/01 11:36:50 UTC

[jira] [Commented] (NUTCH-1347) fetcher politeness related to map-reduce

    [ https://issues.apache.org/jira/browse/NUTCH-1347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13265728#comment-13265728 ] 

Julien Nioche commented on NUTCH-1347:
--------------------------------------

Not clear what the issue is. You can group URLs into a map input by host, domain or IP and then into each queue based on the same criteria.
BTW why not asking on the mailing list before filing a JIRA? You've opened quite a few - which is good - but don't reply to comments or questions on them which defeats the object
Thanks
                
> fetcher politeness related to map-reduce
> ----------------------------------------
>
>                 Key: NUTCH-1347
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1347
>             Project: Nutch
>          Issue Type: Improvement
>          Components: fetcher
>    Affects Versions: 1.4
>            Reporter: behnam nikbakht
>              Labels: fetch
>
> when Nutch is running on Hadoop , based on map-reduce concept, each map task do some thing on it's owned data, so, each fetcher map-task work with it's Queues and do not know any thing about other Queus. so, enforce delay between successive requests and maximum concurrent requests policies on it's Queues. but with a simple test we found that it's not good piliteness mechanism when we have multiple map tasks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira