You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by chethan <ch...@gmail.com> on 2014/05/07 13:09:22 UTC
Nutch fetching on only one node
Hi,
I'm running Nutch 1.7 on 10 nodes but the fetch happens only on one node, I
realize that this is because the generator has only 1 reduce task and
generated only 1 fetch list, question is how do you change that? I would
want the fetch to happen on all nodes there by improving performance
drastically. Thanks for your help!
Regards,
--
Chethan Prasad
Re: Nutch fetching on only one node
Posted by Julien Nioche <li...@gmail.com>.
Hi,
Usage: Generator <crawldb> <segments_dir> [-force] [-topN N] *[-numFetchers
numFetchers]* [-adddays numDays] [-noFilter] [-noNorm][-maxNumSegments num]
set -numFetchers 10 to use all your slaves. Of course if all your URLs
belong to the same host they'll end up being processed by a single mapper.
See crawl script
>
>
>
>
> *############################################## MODIFY THE PARAMETERS
> BELOW TO YOUR NEEDS ############################################### set the
> number of slaves nodesnumSlaves=1*
and further down
> * echo "Generating a new segment"** $bin/nutch generate $commonOptions
> $CRAWL_PATH/crawldb $CRAWL_PATH/segments -topN $sizeFetchlist -numFetchers
> $numSlaves -noFilter*
Julien
On 7 May 2014 12:09, chethan <ch...@gmail.com> wrote:
> Hi,
>
> I'm running Nutch 1.7 on 10 nodes but the fetch happens only on one node, I
> realize that this is because the generator has only 1 reduce task and
> generated only 1 fetch list, question is how do you change that? I would
> want the fetch to happen on all nodes there by improving performance
> drastically. Thanks for your help!
>
> Regards,
>
> --
> Chethan Prasad
>
--
Open Source Solutions for Text Engineering
http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble