You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Armel T. Nene" <ar...@idna-solutions.com> on 2007/04/10 12:21:05 UTC

Nutch java.io.exception

 

Hi guys,

 

I am currently running Nutch .8.2-dev on MS Windows Vista using Sun JVM 6. I
have setup Nutch in my IDE (NetBeans) and it works great. Afterward, I have
applied Nutch-61 https://issues.apache.org/jira/browse/NUTCH-61 to my local
version. Now, when I run Nutch within the IDE, all the steps are performed
with no problem. I can view the content of the crawldb, segments and index
are fine. If i run it a loop, the process execute without any problem. 

 

I then package the version and run it in a testing environment. At first no
index were being created. I setup the log files for Hadoop to debug as Nutch
wasn't giving any errors. There are some debug line from Hadoop that look
suspicious. Below is an extract:

 

>From the log status, I can see that the problem occurs on Generate and
Inject stage. Can anybody help me in overcoming this problem, I will be glad
to provide a working version of the Nutch-61 once tested.

 

2007-04-05 16:35:30,976 INFO  mapred.LocalJobRunner -
E:/iDna-nutch-RC1/iDna-nutch-launcher/test/urls/urls:0+55

2007-04-05 16:35:31,073 INFO  crawl.FetchScheduleFactory - Using
FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule

2007-04-05 16:35:31,074 INFO  crawl.FetchSchedule -
defaultInterval=7.46496E9

2007-04-05 16:35:31,074 INFO  crawl.FetchSchedule - maxInterval=2592000.0

2007-04-05 16:35:31,084 DEBUG io.SequenceFile - running sort pass

2007-04-05 16:35:31,096 INFO  io.SequenceFile - flushing segment 0

2007-04-05 16:35:31,928 INFO  mapred.JobClient -  map 100%  reduce 0%

2007-04-05 16:35:31,940 INFO  mapred.LocalJobRunner - reduce > reduce

2007-04-05 16:35:32,928 INFO  mapred.JobClient - Job complete: job_ui1cje

2007-04-05 16:35:32,928 INFO  crawl.Injector - Injector: Merging injected
urls into crawl db.

2007-04-05 16:35:32,938 DEBUG conf.Configuration - java.io.IOException:
config(config)

                at
org.apache.hadoop.conf.Configuration.<init>(Configuration.java:76)

                at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:86)

                at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:97)

                at org.apache.nutch.util.NutchJob.<init>(NutchJob.java:26)

                at org.apache.nutch.crawl.CrawlDb.createJob(CrawlDb.java:74)

                at org.apache.nutch.crawl.Injector.inject(Injector.java:222)

                at org.apache.nutch.crawl.Injector.main(Injector.java:242)

                at
com.idna.nutch.launcher.CrawlerManager.injector(CrawlerManager.java:63)

                at
com.idna.nutch.launcher.CrawlerManager.main(CrawlerManager.java:209)

 

2007-04-05 16:35:32,943 INFO  conf.Configuration - parsing
jar:file:/E:/iDna-nutch-RC1/nutch-0.8.2-dev/lib/hadoop-0.4.0-patched.jar!/ha
doop-default.xml

2007-04-05 16:35:32,951 INFO  conf.Configuration - parsing
file:/E:/iDna-nutch-RC1/iDna-nutch-launcher/test/conf/nutch-default.xml

2007-04-05 16:35:32,961 INFO  conf.Configuration - parsing
jar:file:/E:/iDna-nutch-RC1/nutch-0.8.2-dev/lib/hadoop-0.4.0-patched.jar!/ma
pred-default.xml

2007-04-05 16:35:32,966 INFO  conf.Configuration - parsing
jar:file:/E:/iDna-nutch-RC1/nutch-0.8.2-dev/lib/hadoop-0.4.0-patched.jar!/ma
pred-default.xml

2007-04-05 16:35:32,973 INFO  conf.Configuration - parsing
file:/E:/iDna-nutch-RC1/iDna-nutch-launcher/test/conf/nutch-site.xml

2007-04-05 16:35:32,980 INFO  conf.Configuration - parsing
file:/E:/iDna-nutch-RC1/iDna-nutch-launcher/test/conf/hadoop-site.xml

2007-04-05 16:35:33,040 DEBUG conf.Configuration - java.io.IOException:
config(config)

                at
org.apache.hadoop.conf.Configuration.<init>(Configuration.java:76)

                at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:86)

                at
org.apache.hadoop.mapred.LocalJobRunner$Job.<init>(LocalJobRunner.java:58)

                at
org.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRunner.java:182)

                at
org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:292)

                at
org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:327)

                at org.apache.nutch.crawl.Injector.inject(Injector.java:224)

                at org.apache.nutch.crawl.Injector.main(Injector.java:242)

                at
com.idna.nutch.launcher.CrawlerManager.injector(CrawlerManager.java:63)

                at
com.idna.nutch.launcher.CrawlerManager.main(CrawlerManager.java:209)

 

2007-04-05 16:35:33,501 INFO  crawl.FetchScheduleFactory - Using
FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule

2007-04-05 16:35:33,501 INFO  crawl.FetchSchedule -
defaultInterval=7.46496E9

2007-04-05 16:35:33,501 INFO  crawl.FetchSchedule - maxInterval=2592000.0

2007-04-05 16:35:33,508 DEBUG io.SequenceFile - running sort pass

2007-04-05 16:35:33,514 INFO  io.SequenceFile - flushing segment 0

2007-04-05 16:35:33,639 INFO  mapred.LocalJobRunner - reduce > reduce

2007-04-05 16:35:34,120 INFO  mapred.JobClient - Job complete: job_qzwgkh

2007-04-05 16:35:34,429 INFO  crawl.Injector - Injector: done

2007-04-05 16:35:34,439 INFO  crawl.Generator - topN: 100

2007-04-05 16:35:34,439 DEBUG conf.Configuration - java.io.IOException:
config()

                at
org.apache.hadoop.conf.Configuration.<init>(Configuration.java:67)

                at
org.apache.nutch.util.NutchConfiguration.create(NutchConfiguration.java:50)

                at org.apache.nutch.crawl.Generator.main(Generator.java:416)

                at
com.idna.nutch.launcher.CrawlerManager.autoGenSegList(CrawlerManager.java:80
)

                at
com.idna.nutch.launcher.CrawlerManager.main(CrawlerManager.java:211)

 

2007-04-05 16:35:34,443 INFO  conf.Configuration - parsing
jar:file:/E:/iDna-nutch-RC1/nutch-0.8.2-dev/lib/hadoop-0.4.0-patched.jar!/ha
doop-default.xml

2007-04-05 16:35:34,450 INFO  conf.Configuration - parsing
file:/E:/iDna-nutch-RC1/iDna-nutch-launcher/test/conf/nutch-default.xml

2007-04-05 16:35:34,462 INFO  conf.Configuration - parsing
file:/E:/iDna-nutch-RC1/iDna-nutch-launcher/test/conf/nutch-site.xml

2007-04-05 16:35:34,468 INFO  conf.Configuration - parsing
file:/E:/iDna-nutch-RC1/iDna-nutch-launcher/test/conf/hadoop-site.xml

2007-04-05 16:35:35,470 INFO  crawl.Generator - Generator: starting

2007-04-05 16:35:35,470 INFO  crawl.Generator - Generator: segment:
test/segments/20070405163535

2007-04-05 16:35:35,470 INFO  crawl.Generator - Generator: Selecting
best-scoring urls due for fetch.

2007-04-05 16:35:35,471 DEBUG conf.Configuration - java.io.IOException:
config(config)

                at
org.apache.hadoop.conf.Configuration.<init>(Configuration.java:76)

                at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:86)

                at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:97)

                at org.apache.nutch.util.NutchJob.<init>(NutchJob.java:26)

                at
org.apache.nutch.crawl.Generator.generate(Generator.java:309)

                at org.apache.nutch.crawl.Generator.main(Generator.java:417)

                at
com.idna.nutch.launcher.CrawlerManager.autoGenSegList(CrawlerManager.java:80
)

                at
com.idna.nutch.launcher.CrawlerManager.main(CrawlerManager.java:211)

 

===========================

Armel T. Nene

iDNA Solutions LTD

Tel: +44 (20) 7257 6124

Mobile: +44 (7886)950 483 

Web: http://www.idna-solutions.com

Blog: http://blog.idna-solutions.com