You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by David Podunavac <da...@wyona.com> on 2006/08/25 16:26:29 UTC
reading crawl dir from nutch-default.xml
Hi
i think this patch will make it way easier to configure nutch, crawl dir
will be read from
nutch-default.xml instead of a relative path from where it has been executed
So nutch-default.xml will have its
<property>
<name>searcher.dir</name>
<value>PATH_TO_CRAWL_DIR</value>
<description>
and this value will be used instead
Index: nutch-0.8/src/java/org/apache/nutch/crawl/Crawl.java
===================================================================
--- nutch-0.8/src/java/org/apache/nutch/crawl/Crawl.java
(Revision 436809)
+++ nutch-0.8/src/java/org/apache/nutch/crawl/Crawl.java
(Arbeitskopie)
@@ -53,10 +53,12 @@
Configuration conf = NutchConfiguration.create();
conf.addDefaultResource("crawl-tool.xml");
+ conf.addDefaultResource("nutch-default.xml");
JobConf job = new NutchJob(conf);
Path rootUrlDir = null;
- Path dir = new Path("crawl-" + getDate());
+ String path2crawlDir = conf.get("searcher.dir");
+ Path dir = new Path(path2crawlDir);
int threads = job.getInt("fetcher.threads.fetch", 10);
int depth = 5;
int topN = Integer.MAX_VALUE;
and this patch will make the CrawlDbReader find that crawl directory
Index: nutch-0.8/src/java/org/apache/nutch/crawl/CrawlDbReader.java
===================================================================
--- nutch-0.8/src/java/org/apache/nutch/crawl/CrawlDbReader.java
(Revision 436809)
+++ nutch-0.8/src/java/org/apache/nutch/crawl/CrawlDbReader.java
(Arbeitskopie)
@@ -406,8 +406,10 @@
return;
}
String param = null;
- String crawlDb = args[0];
+ //String crawlDb = args[0];
Configuration conf = NutchConfiguration.create();
+ conf.addDefaultResource("nutch-default.xml");
+ String crawlDb = conf.get("searcher.dir") + "/crawldb";
for (int i = 1; i < args.length; i++) {
if (args[i].equals("-stats")) {
dbr.processStatJob(crawlDb, conf);
WDYT
thanks
David