You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by stan_lee <le...@gmail.com> on 2010/08/07 19:49:00 UTC

How to let Fetcher get "parse-plugins.xml" resource?

Hi Experts,

I met below error when running fetch job:

CHILD PROCESS: java.lang.RuntimeException: Parse Plugins preferences could
not be loaded.
CHILD PROCESS:  at
org.apache.nutch.parse.ParserFactory.<init>(ParserFactory.java:79)
CHILD PROCESS:  at
org.apache.nutch.parse.ParseUtil.<init>(ParseUtil.java:50)
CHILD PROCESS:  at
org.apache.nutch.fetcher.FetcherEmptyRobots$FetcherThread.<init>(FetcherEmptyRobots.java:460)
CHILD PROCESS:  at
org.apache.nutch.fetcher.FetcherEmptyRobots.run(FetcherEmptyRobots.java:909)
CHILD PROCESS:  at
org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
CHILD PROCESS:  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
CHILD PROCESS:  at org.apache.hadoop.mapred.Child.main(Child.java:190)

I checked the code and really found there has been no resouces called
"parse-plugins.xml", seems it's not loaded as an resource. I think the conf
object was constructed in Child.java and passed down to ParserFactory.java,
but didn't find anywhere in Child.java which would add resources besides
hadoop default resources and another xml file named
"/hadoop/mapred/local/taskTracker/jobcache/job_201008080845_0006/attempt_201008080845_0006_m_000000_1/job.xml".

So my question is: would nutch add "parse-plugins.xml" as an resource
internally? If so, where does the code reside in? or we need to add that
before submitting the job to be executed like below?

                setConf(NutchConfiguration.create());
                getConf().addDefaultResource("parse-plugins.xml");
                JobConf fetchjob = new NutchJob(getConf());
                ....
                JobClient.runJob(fetchjob );

I tried above method seems doesn't work. Whom can tell me the reason? Thank
you very much!

Stan 
-- 
View this message in context: http://lucene.472066.n3.nabble.com/How-to-let-Fetcher-get-parse-plugins-xml-resource-tp1036431p1036431.html
Sent from the Nutch - Dev mailing list archive at Nabble.com.