You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by chethan <ch...@gmail.com> on 2014/05/07 13:09:22 UTC

Nutch fetching on only one node

Hi,

I'm running Nutch 1.7 on 10 nodes but the fetch happens only on one node, I
realize that this is because the generator has only 1 reduce task and
generated only 1 fetch list, question is how do you change that? I would
want the fetch to happen on all nodes there by improving performance
drastically. Thanks for your help!

Regards,

--
Chethan Prasad

Re: Nutch fetching on only one node

Posted by Julien Nioche <li...@gmail.com>.
Hi,

Usage: Generator <crawldb> <segments_dir> [-force] [-topN N] *[-numFetchers
numFetchers]* [-adddays numDays] [-noFilter] [-noNorm][-maxNumSegments num]

set -numFetchers 10 to use all your slaves. Of course if all your URLs
belong to the same host they'll end up being processed by a single mapper.

See crawl script


>
>
>
>
> *############################################## MODIFY THE PARAMETERS
> BELOW TO YOUR NEEDS ############################################### set the
> number of slaves nodesnumSlaves=1*


and further down


> *  echo "Generating a new segment"**  $bin/nutch generate $commonOptions
> $CRAWL_PATH/crawldb $CRAWL_PATH/segments -topN $sizeFetchlist -numFetchers
> $numSlaves -noFilter*


Julien


On 7 May 2014 12:09, chethan <ch...@gmail.com> wrote:

> Hi,
>
> I'm running Nutch 1.7 on 10 nodes but the fetch happens only on one node, I
> realize that this is because the generator has only 1 reduce task and
> generated only 1 fetch list, question is how do you change that? I would
> want the fetch to happen on all nodes there by improving performance
> drastically. Thanks for your help!
>
> Regards,
>
> --
> Chethan Prasad
>



-- 

Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble