You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Edward Quick <ed...@hotmail.com> on 2005/09/21 22:23:05 UTC

resuming intranet crawl

Hi,

I ran out of space whilst doing a crawl of our intranet (which has so far 
it's taken 24 hours). Is there a way to pick up the crawl from where it left 
off, or do I have to restart it?

Thanks,

Ed.


050921 151225 Processing document 52000
050921 151300 Finishing update
Exception in thread "main" java.io.IOException: No space left on device
        at java.io.FileOutputStream.writeBytes(Native Method)
        at java.io.FileOutputStream.write(FileOutputStream.java:260)
        at 
org.apache.nutch.fs.LocalFileSystem$LocalNFSFileOutputStream.write(LocalFileSystem.java:126)
        at 
org.apache.nutch.fs.NFSDataOutputStream$PositionCache.write(NFSDataOutputStream.java:36)
        at 
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:66)
        at java.io.BufferedOutputStream.write(BufferedOutputStream.java:110)
        at java.io.DataOutputStream.write(DataOutputStream.java:85)
        at 
org.apache.nutch.io.SequenceFile$Writer.append(SequenceFile.java:154)
        at 
org.apache.nutch.io.SequenceFile$Sorter$MergeQueue.merge(SequenceFile.java:753)
        at 
org.apache.nutch.io.SequenceFile$Sorter$MergePass.run(SequenceFile.java:654)
        at 
org.apache.nutch.io.SequenceFile$Sorter.mergePass(SequenceFile.java:591)
        at 
org.apache.nutch.io.SequenceFile$Sorter.sort(SequenceFile.java:419)
        at 
org.apache.nutch.db.WebDBWriter$CloseProcessor.closeDown(WebDBWriter.java:535)
        at org.apache.nutch.db.WebDBWriter.close(WebDBWriter.java:1544)
        at 
org.apache.nutch.tools.UpdateDatabaseTool.close(UpdateDatabaseTool.java:321)
        at 
org.apache.nutch.tools.UpdateDatabaseTool.main(UpdateDatabaseTool.java:371)
        at org.apache.nutch.tools.CrawlTool.main(CrawlTool.java:141)