You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@nutch.apache.org by Hemant Bist <he...@gmail.com> on 2008/06/14 07:47:21 UTC

problem running nutch from eclipse 3.2 in ubuntu hardy.

Hi,
I am trying to build and run nutch  from trunk in eclipse 3.2 in Ubuntu
hardy. I am unable to get it to crawlany site after compiling it.  As far as
I can tell, there is something wrong in my configuration but I can't figure
out what it is!

I am following [http://wiki.apache.org/nutch/RunNutchInEclipse0.9]
and have included conf in .classpath. and modified nutch-defaults.xml for
plugin.folders and http.agent.name


I get the final warning message as [complete hadoop.log is attached]
WARN  crawl.Crawl - No URLs to fetch - check your seed list and URL filters.
and
some of the earlier warning messages are
 WARN  mapred.JobClient - No job jar file set.  User classes may not be
found. See JobConf(Class) or JobConf#setJar(String).
2008-06-13 22:29:34,978 WARN  regex.RegexURLNormalizer - Can't load the
default config file! /nutch/home/work/nutch/trunk/conf/regex-normalize.xml
2008-06-13 22:29:34,990 WARN  suffix.SuffixURLFilter - Missing
urlfilter.suffix.file, all URLs will be rejected!
2008-06-13 22:29:34,994 FATAL api.RegexURLFilterBase - Can't find resource:
crawl-urlfilter.txt
2008-06-13 22:29:34,995 FATAL api.RegexURLFilterBase - Can't find resource:
automaton-urlfilte r.txt



I would appreciate any pointers in debugging this.

Thanks,
HB