You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Lokkju <lo...@gmail.com> on 2005/10/20 19:39:16 UTC

java.io.FileNotFoundException 's on Windows while running the CrawlTool

I am hoping someone might have some suggestions for me.  While working
with nutch under Windows XP, I found that somewhat randomly (2/3
runs), the crawler fails with a java.io.FileNotFoundException. 
Interestingly enough, it does not always fail on the same file.  I was
thinking perhaps it had to do with my threads, but I set the threads
to 1, with no imporovement.  Below are some of the error messages. 
Also, this looks amazingly similar to the issue NUTCH-40
(http://issues.apache.org/jira/browse/NUTCH-40), which was marked as
unreproducable.  you will notice a couple of the error messages say
access is denied - but i am running as a full administrator, and about
ever 3rd time, the crawl finishes successfully.  Again, any help or
suggestions would be appreciated.


nick jacobsen

Sample Errors:
****************************************************************************************
051020 102124 FetchListTool completed
Exception in thread "main" java.io.FileNotFoundException: C:\ovfa.crawl\segment
\tmp_20051020102123\fetchlist\data
        at org.apache.nutch.fs.LocalFileSystem.open(LocalFileSystem.java:93)
        at org.apache.nutch.io.SequenceFile$Reader.<init>(SequenceFile.java:194

        at org.apache.nutch.io.SequenceFile$Reader.<init>(SequenceFile.java:187

        at org.apache.nutch.io.MapFile$Reader.<init>(MapFile.java:190)
        at org.apache.nutch.io.MapFile$Reader.<init>(MapFile.java:179)
        at org.apache.nutch.io.ArrayFile$Reader.<init>(ArrayFile.java:50)
        at org.apache.nutch.fetcher.Fetcher.<init>(Fetcher.java:298)
        at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:475)
        at org.apache.nutch.tools.CrawlTool.main(CrawlTool.java:140)
******************************************************************************************
051020 103610 Processing pagesByURL: Sorted 1500.0 instructions/second
Exception in thread "main" java.io.IOException: already exists: C:\VSS\Working\S
earchSpider\nutch-0.7.1\bin\crawl-20051020103230\db\webdb.new\pagesByURL
        at org.apache.nutch.io.MapFile$Writer.<init>(MapFile.java:86)
        at org.apache.nutch.db.WebDBWriter$CloseProcessor.closeDown(WebDBWriter.
java:549)
        at org.apache.nutch.db.WebDBWriter.close(WebDBWriter.java:1544)
        at org.apache.nutch.tools.UpdateDatabaseTool.close(UpdateDatabaseTool.ja
va:321)
        at org.apache.nutch.tools.UpdateDatabaseTool.main(UpdateDatabaseTool.jav
a:371)
        at org.apache.nutch.tools.CrawlTool.main(CrawlTool.java:141)
******************************************************************************************
Exception in thread "main" java.io.FileNotFoundException: C:\VSS\Working\SearchS
pider\nutch-0.7.1\bin\crawl-20051020103741\db\webdb.new\tmp\linksByMD5.out (Acce
ss is denied)
        at java.io.FileOutputStream.open(Native Method)
        at java.io.FileOutputStream.<init>(FileOutputStream.java:179)
        at java.io.FileOutputStream.<init>(FileOutputStream.java:131)
        at org.apache.nutch.fs.LocalFileSystem$LocalNFSFileOutputStream.<init>(L
ocalFileSystem.java:112)
        at org.apache.nutch.fs.LocalFileSystem.create(LocalFileSystem.java:141)
        at org.apache.nutch.fs.LocalFileSystem.create(LocalFileSystem.java:102)
        at org.apache.nutch.fs.FileUtil.copyContents(FileUtil.java:57)
        at org.apache.nutch.fs.LocalFileSystem.rename(LocalFileSystem.java:149)
        at org.apache.nutch.db.WebDBWriter$CloseProcessor.closeDown(WebDBWriter.
java:543)
        at org.apache.nutch.db.WebDBWriter.close(WebDBWriter.java:1612)
        at org.apache.nutch.tools.UpdateDatabaseTool.close(UpdateDatabaseTool.ja
va:321)
        at org.apache.nutch.tools.UpdateDatabaseTool.main(UpdateDatabaseTool.jav
a:371)
        at org.apache.nutch.tools.CrawlTool.main(CrawlTool.java:141)
******************************************************************************************