You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Armel T. Nene" <ar...@idna-solutions.com> on 2007/04/10 12:21:05 UTC
Nutch java.io.exception
Hi guys,
I am currently running Nutch .8.2-dev on MS Windows Vista using Sun JVM 6. I
have setup Nutch in my IDE (NetBeans) and it works great. Afterward, I have
applied Nutch-61 https://issues.apache.org/jira/browse/NUTCH-61 to my local
version. Now, when I run Nutch within the IDE, all the steps are performed
with no problem. I can view the content of the crawldb, segments and index
are fine. If i run it a loop, the process execute without any problem.
I then package the version and run it in a testing environment. At first no
index were being created. I setup the log files for Hadoop to debug as Nutch
wasn't giving any errors. There are some debug line from Hadoop that look
suspicious. Below is an extract:
>From the log status, I can see that the problem occurs on Generate and
Inject stage. Can anybody help me in overcoming this problem, I will be glad
to provide a working version of the Nutch-61 once tested.
2007-04-05 16:35:30,976 INFO mapred.LocalJobRunner -
E:/iDna-nutch-RC1/iDna-nutch-launcher/test/urls/urls:0+55
2007-04-05 16:35:31,073 INFO crawl.FetchScheduleFactory - Using
FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule
2007-04-05 16:35:31,074 INFO crawl.FetchSchedule -
defaultInterval=7.46496E9
2007-04-05 16:35:31,074 INFO crawl.FetchSchedule - maxInterval=2592000.0
2007-04-05 16:35:31,084 DEBUG io.SequenceFile - running sort pass
2007-04-05 16:35:31,096 INFO io.SequenceFile - flushing segment 0
2007-04-05 16:35:31,928 INFO mapred.JobClient - map 100% reduce 0%
2007-04-05 16:35:31,940 INFO mapred.LocalJobRunner - reduce > reduce
2007-04-05 16:35:32,928 INFO mapred.JobClient - Job complete: job_ui1cje
2007-04-05 16:35:32,928 INFO crawl.Injector - Injector: Merging injected
urls into crawl db.
2007-04-05 16:35:32,938 DEBUG conf.Configuration - java.io.IOException:
config(config)
at
org.apache.hadoop.conf.Configuration.<init>(Configuration.java:76)
at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:86)
at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:97)
at org.apache.nutch.util.NutchJob.<init>(NutchJob.java:26)
at org.apache.nutch.crawl.CrawlDb.createJob(CrawlDb.java:74)
at org.apache.nutch.crawl.Injector.inject(Injector.java:222)
at org.apache.nutch.crawl.Injector.main(Injector.java:242)
at
com.idna.nutch.launcher.CrawlerManager.injector(CrawlerManager.java:63)
at
com.idna.nutch.launcher.CrawlerManager.main(CrawlerManager.java:209)
2007-04-05 16:35:32,943 INFO conf.Configuration - parsing
jar:file:/E:/iDna-nutch-RC1/nutch-0.8.2-dev/lib/hadoop-0.4.0-patched.jar!/ha
doop-default.xml
2007-04-05 16:35:32,951 INFO conf.Configuration - parsing
file:/E:/iDna-nutch-RC1/iDna-nutch-launcher/test/conf/nutch-default.xml
2007-04-05 16:35:32,961 INFO conf.Configuration - parsing
jar:file:/E:/iDna-nutch-RC1/nutch-0.8.2-dev/lib/hadoop-0.4.0-patched.jar!/ma
pred-default.xml
2007-04-05 16:35:32,966 INFO conf.Configuration - parsing
jar:file:/E:/iDna-nutch-RC1/nutch-0.8.2-dev/lib/hadoop-0.4.0-patched.jar!/ma
pred-default.xml
2007-04-05 16:35:32,973 INFO conf.Configuration - parsing
file:/E:/iDna-nutch-RC1/iDna-nutch-launcher/test/conf/nutch-site.xml
2007-04-05 16:35:32,980 INFO conf.Configuration - parsing
file:/E:/iDna-nutch-RC1/iDna-nutch-launcher/test/conf/hadoop-site.xml
2007-04-05 16:35:33,040 DEBUG conf.Configuration - java.io.IOException:
config(config)
at
org.apache.hadoop.conf.Configuration.<init>(Configuration.java:76)
at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:86)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.<init>(LocalJobRunner.java:58)
at
org.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRunner.java:182)
at
org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:292)
at
org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:327)
at org.apache.nutch.crawl.Injector.inject(Injector.java:224)
at org.apache.nutch.crawl.Injector.main(Injector.java:242)
at
com.idna.nutch.launcher.CrawlerManager.injector(CrawlerManager.java:63)
at
com.idna.nutch.launcher.CrawlerManager.main(CrawlerManager.java:209)
2007-04-05 16:35:33,501 INFO crawl.FetchScheduleFactory - Using
FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule
2007-04-05 16:35:33,501 INFO crawl.FetchSchedule -
defaultInterval=7.46496E9
2007-04-05 16:35:33,501 INFO crawl.FetchSchedule - maxInterval=2592000.0
2007-04-05 16:35:33,508 DEBUG io.SequenceFile - running sort pass
2007-04-05 16:35:33,514 INFO io.SequenceFile - flushing segment 0
2007-04-05 16:35:33,639 INFO mapred.LocalJobRunner - reduce > reduce
2007-04-05 16:35:34,120 INFO mapred.JobClient - Job complete: job_qzwgkh
2007-04-05 16:35:34,429 INFO crawl.Injector - Injector: done
2007-04-05 16:35:34,439 INFO crawl.Generator - topN: 100
2007-04-05 16:35:34,439 DEBUG conf.Configuration - java.io.IOException:
config()
at
org.apache.hadoop.conf.Configuration.<init>(Configuration.java:67)
at
org.apache.nutch.util.NutchConfiguration.create(NutchConfiguration.java:50)
at org.apache.nutch.crawl.Generator.main(Generator.java:416)
at
com.idna.nutch.launcher.CrawlerManager.autoGenSegList(CrawlerManager.java:80
)
at
com.idna.nutch.launcher.CrawlerManager.main(CrawlerManager.java:211)
2007-04-05 16:35:34,443 INFO conf.Configuration - parsing
jar:file:/E:/iDna-nutch-RC1/nutch-0.8.2-dev/lib/hadoop-0.4.0-patched.jar!/ha
doop-default.xml
2007-04-05 16:35:34,450 INFO conf.Configuration - parsing
file:/E:/iDna-nutch-RC1/iDna-nutch-launcher/test/conf/nutch-default.xml
2007-04-05 16:35:34,462 INFO conf.Configuration - parsing
file:/E:/iDna-nutch-RC1/iDna-nutch-launcher/test/conf/nutch-site.xml
2007-04-05 16:35:34,468 INFO conf.Configuration - parsing
file:/E:/iDna-nutch-RC1/iDna-nutch-launcher/test/conf/hadoop-site.xml
2007-04-05 16:35:35,470 INFO crawl.Generator - Generator: starting
2007-04-05 16:35:35,470 INFO crawl.Generator - Generator: segment:
test/segments/20070405163535
2007-04-05 16:35:35,470 INFO crawl.Generator - Generator: Selecting
best-scoring urls due for fetch.
2007-04-05 16:35:35,471 DEBUG conf.Configuration - java.io.IOException:
config(config)
at
org.apache.hadoop.conf.Configuration.<init>(Configuration.java:76)
at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:86)
at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:97)
at org.apache.nutch.util.NutchJob.<init>(NutchJob.java:26)
at
org.apache.nutch.crawl.Generator.generate(Generator.java:309)
at org.apache.nutch.crawl.Generator.main(Generator.java:417)
at
com.idna.nutch.launcher.CrawlerManager.autoGenSegList(CrawlerManager.java:80
)
at
com.idna.nutch.launcher.CrawlerManager.main(CrawlerManager.java:211)
===========================
Armel T. Nene
iDNA Solutions LTD
Tel: +44 (20) 7257 6124
Mobile: +44 (7886)950 483
Web: http://www.idna-solutions.com
Blog: http://blog.idna-solutions.com