You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Brian Tingle <Br...@ucop.edu> on 2009/07/24 05:16:19 UTC

adding [-numFetchers numFetchers] to crawl

How do I set the number of Map tasks when I do a command like

 

hadoop jar nutch-1.0.job org.apache.nutch.crawler.Crawl 

 

?

 

I think I'm going to try out the change below, is there any reason not
to do it, or is Crawl supposed to be more of a demo and I should write
some script or my own crawler class?

 

> diff Crawl.java.orig Crawl.java

53c53

<         ("Usage: Crawl <urlDir> [-dir d] [-threads n] [-depth i]
[-topN N]");

---

>         ("Usage: Crawl <urlDir> [-dir d] [-threads n] [-depth i]
[-topN N] [-numFetchers]");

65a66

>     int numFetchers = -1;

78a80,82

>       } else if ("-numFetchers".equals(args[i])) {

>           numFetchers = Integer.parseInt(args[i+1]);

>           i++;

116c120

<       Path segment = generator.generate(crawlDb, segments, -1, topN,
System

---

>       Path segment = generator.generate(crawlDb, segments,
numFetchers, topN, System