You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Luis Cappa Banda <lu...@gmail.com> on 2011/09/15 17:00:41 UTC
Integrating Nutch-1.3 SVN version into another project.
Hello.
I've downloaded Nutch-1.3 version via Subversion and modified some classes a
little. My intention is to integrate with Maven the new artifacts created
from the new "hacked" Nutch version and integrate them with another Maven
project which has a dependency to the hacked version mentioned. Both
projects (Nutch personalized version and the other project) are inside a
parent project that orchestrates compilation by modules. All configuration
aparently looks good and compiles correctly.
When launching a crawling process using Solr index option appears the
following error:
2011-09-15 16:57:07,137 0 [main] INFO
es.desa.empleate.infojobs.CrawlingProperties - Loading property file...
2011-09-15 16:57:07,144 7 [main] INFO
es.desa.empleate.infojobs.CrawlingProperties - Property file loaded!
2011-09-15 16:57:07,145 8 [main] INFO
es.desa.empleate.infojobs.CrawlingProperties - Retrieving property
'URLS_DIR'
2011-09-15 16:57:07,145 8 [main] INFO
es.desa.empleate.infojobs.CrawlingProperties - Retrieving property
'SOLR_SERVER'
2011-09-15 16:57:07,145 8 [main] INFO
es.desa.empleate.infojobs.CrawlingProperties - Retrieving property 'DEPTH'
2011-09-15 16:57:07,145 8 [main] INFO
es.desa.empleate.infojobs.CrawlingProperties - Retrieving property
'THREADS'
2011-09-15 16:57:08,259 1122 [main] INFO
es.desa.empleate.infojobs.CrawlingProcess - > Crawling process started...
2011-09-15 16:57:09,653 2516 [main] INFO org.apache.nutch.crawl.Crawl -
crawl started in: crawl-20110915165709
2011-09-15 16:57:09,653 2516 [main] INFO org.apache.nutch.crawl.Crawl -
rootUrlDir =urls
2011-09-15 16:57:09,653 2516 [main] INFO org.apache.nutch.crawl.Crawl -
threads = 10
2011-09-15 16:57:09,653 2516 [main] INFO org.apache.nutch.crawl.Crawl -
depth = 3
2011-09-15 16:57:09,653 2516 [main] INFO org.apache.nutch.crawl.Crawl -
solrUrl=http://localhost:8080/server_infojobs
2011-09-15 16:57:10,090 2953 [main] INFO org.apache.nutch.crawl.Injector
- Injector: starting at 2011-09-15 16:57:10
2011-09-15 16:57:10,090 2953 [main] INFO org.apache.nutch.crawl.Injector
- Injector: crawlDb: crawl-20110915165709/crawldb
2011-09-15 16:57:10,090 2953 [main] INFO org.apache.nutch.crawl.Injector
- Injector: urlDir:
/home/lcappa/Escritorio/workspaces/Tomcats/Tomcat2/apache-tomcat-6.0.29/urls
2011-09-15 16:57:10,236 3099 [main] INFO org.apache.nutch.crawl.Injector
- Injector: Converting injected urls to crawl db entries.
2011-09-15 16:57:10,258 3121 [main] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics with
processName=JobTracker, sessionId=
* 2011-09-15 16:57:10,328 3191 [main] WARN
org.apache.hadoop.mapred.JobClient - No job jar file set. User classes may
not be found. See JobConf(Class) or JobConf#setJar(String).*
2011-09-15 16:57:10,344 3207 [main] INFO
org.apache.hadoop.mapred.FileInputFormat - Total input paths to process : 1
2011-09-15 16:57:10,567 3430 [Thread-10] INFO
org.apache.hadoop.mapred.FileInputFormat - Total input paths to process : 1
2011-09-15 16:57:10,584 3447 [main] INFO
org.apache.hadoop.mapred.JobClient - Running job: job_local_0001
2011-09-15 16:57:10,642 3505 [Thread-10] INFO
org.apache.hadoop.mapred.MapTask - numReduceTasks: 1
2011-09-15 16:57:10,648 3511 [Thread-10] INFO
org.apache.hadoop.mapred.MapTask - io.sort.mb = 100
2011-09-15 16:57:10,772 3635 [Thread-10] INFO
org.apache.hadoop.mapred.MapTask - data buffer = 79691776/99614720
2011-09-15 16:57:10,772 3635 [Thread-10] INFO
org.apache.hadoop.mapred.MapTask - record buffer = 262144/327680
2011-09-15 16:57:10,794 3657 [Thread-10] WARN
org.apache.hadoop.mapred.LocalJobRunner - job_local_0001
* java.lang.RuntimeException: Error in configuring object*
at
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
at
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:354)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
... 5 more
Caused by: java.lang.RuntimeException: Error in configuring object
at
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
at
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
... 10 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
... 13 more
Caused by: java.lang.IllegalArgumentException: plugin.folders is not defined
at
org.apache.nutch.plugin.PluginManifestParser.parsePluginFolder(PluginManifestParser.java:78)
at
org.apache.nutch.plugin.PluginRepository.<init>(PluginRepository.java:71)
at
org.apache.nutch.plugin.PluginRepository.get(PluginRepository.java:99)
at org.apache.nutch.net.URLNormalizers.<init>(URLNormalizers.java:117)
at
org.apache.nutch.crawl.Injector$InjectMapper.configure(Injector.java:70)
... 18 more
2011-09-15 16:57:11,587 4450 [main] INFO
org.apache.hadoop.mapred.JobClient - map 0% reduce 0%
2011-09-15 16:57:11,590 4453 [main] INFO
org.apache.hadoop.mapred.JobClient - Job complete: job_local_0001
2011-09-15 16:57:11,591 4454 [main] INFO
org.apache.hadoop.mapred.JobClient - Counters: 0
2011-09-15 16:57:11,591 4454 [main] ERROR
es.desa.empleate.infojobs.CrawlingProcess - > INFOJOBS CRAWLING ERROR: Job
failed!
2011-09-15 16:57:11,591 4454 [main] INFO
es.desa.empleate.infojobs.CrawlingProcess - > Crawling process finished.
Looking at the error I think that I need to include nutch .job artifact too.
The question is: is that so? If I have to, how can include it with Maven?
Any recomendation?
Thank you very much.