You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Hemant Bist <he...@gmail.com> on 2008/06/14 07:47:21 UTC
problem running nutch from eclipse 3.2 in ubuntu hardy.
Hi,
I am trying to build and run nutch from trunk in eclipse 3.2 in Ubuntu
hardy. I am unable to get it to crawlany site after compiling it. As far as
I can tell, there is something wrong in my configuration but I can't figure
out what it is!
I am following [http://wiki.apache.org/nutch/RunNutchInEclipse0.9]
and have included conf in .classpath. and modified nutch-defaults.xml for
plugin.folders and http.agent.name
I get the final warning message as [complete hadoop.log is attached]
WARN crawl.Crawl - No URLs to fetch - check your seed list and URL filters.
and
some of the earlier warning messages are
WARN mapred.JobClient - No job jar file set. User classes may not be
found. See JobConf(Class) or JobConf#setJar(String).
2008-06-13 22:29:34,978 WARN regex.RegexURLNormalizer - Can't load the
default config file! /nutch/home/work/nutch/trunk/conf/regex-normalize.xml
2008-06-13 22:29:34,990 WARN suffix.SuffixURLFilter - Missing
urlfilter.suffix.file, all URLs will be rejected!
2008-06-13 22:29:34,994 FATAL api.RegexURLFilterBase - Can't find resource:
crawl-urlfilter.txt
2008-06-13 22:29:34,995 FATAL api.RegexURLFilterBase - Can't find resource:
automaton-urlfilte r.txt
I would appreciate any pointers in debugging this.
Thanks,
HB