You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by "carmmello@globo.com" <ca...@globo.com> on 2005/05/05 20:12:16 UTC

Index fails

I am trying to crawl, do depth 4, about 300 sites.  All the time, when
the segment 4 (nutch creates 4 segments, the last the biger one), I got
the following error message:


"050505 141422 found resource common-terms.utf8 at
file:/usr/local/nutch-nightly/conf/common-terms.utf8
050505 141901  Processed 20000 records (71.39339 rec/s)
050505 142403  Processed 40000 records (66.24074 rec/s)
050505 143005  Processed 60000 records (55.29903 rec/s)

050505 144953  Processed 120000 records (16.83333 rec/s)
050505 145726  Processed 140000 records (44.17517 rec/s)
Exception in thread "main"
java.io.FileNotFoundException: /mnt/C/maio_4/segments/20050503220741/index/_2r7e.prx (No space left on device)
        at java.io.RandomAccessFile.open(Native Method)
        at java.io.RandomAccessFile.<init>(RandomAccessFile.java:204)
        at
org.apache.lucene.store.FSOutputStream.<init>(FSDirectory.java:461)
        at org.apache.lucene.store.FSDirectory.createFile
(FSDirectory.java:263)
        at org.apache.lucene.index.SegmentMerger.mergeTerms
(SegmentMerger.java:248)
        at org.apache.lucene.index.SegmentMerger.merge
(SegmentMerger.java:93)
        at org.apache.lucene.index.IndexWriter.mergeSegments
(IndexWriter.java:487)
        at org.apache.lucene.index.IndexWriter.maybeMergeSegments
(IndexWriter.java:458)
        at org.apache.lucene.index.IndexWriter.addDocument
(IndexWriter.java:310)
        at org.apache.lucene.index.IndexWriter.addDocument
(IndexWriter.java:294)
        at org.apache.nutch.indexer.IndexSegment.indexPages
(IndexSegment.java:148)
        at org.apache.nutch.indexer.IndexSegment.main
(IndexSegment.java:254)
[root@localhost nutch-nightly]#

 Of course it is not lack of hardware space.  So, what is going on?

Thanks,

Wilson Melo



Re: Index fails

Posted by Byron Miller <By...@compaid.com>.
What is "/mnt/C/maio_4"

Are you mounting a fat partition or something else under linux? or are you
running this under cygwin?

-byron

-----Original Message-----
From: "carmmello@globo.com" <ca...@globo.com>
To: nutch-user@incubator.apache.org
Date: Thu, 05 May 2005 18:12:16 +0000
Subject: Index fails

> I am trying to crawl, do depth 4, about 300 sites.  All the time, when
> the segment 4 (nutch creates 4 segments, the last the biger one), I got
> the following error message:
> 
> 
> "050505 141422 found resource common-terms.utf8 at
> file:/usr/local/nutch-nightly/conf/common-terms.utf8
> 050505 141901  Processed 20000 records (71.39339 rec/s)
> 050505 142403  Processed 40000 records (66.24074 rec/s)
> 050505 143005  Processed 60000 records (55.29903 rec/s)
> 
> 050505 144953  Processed 120000 records (16.83333 rec/s)
> 050505 145726  Processed 140000 records (44.17517 rec/s)
> Exception in thread "main"
> java.io.FileNotFoundException:
> /mnt/C/maio_4/segments/20050503220741/index/_2r7e.prx (No space left on
> device)
>         at java.io.RandomAccessFile.open(Native Method)
>         at java.io.RandomAccessFile.<init>(RandomAccessFile.java:204)
>         at
> org.apache.lucene.store.FSOutputStream.<init>(FSDirectory.java:461)
>         at org.apache.lucene.store.FSDirectory.createFile
> (FSDirectory.java:263)
>         at org.apache.lucene.index.SegmentMerger.mergeTerms
> (SegmentMerger.java:248)
>         at org.apache.lucene.index.SegmentMerger.merge
> (SegmentMerger.java:93)
>         at org.apache.lucene.index.IndexWriter.mergeSegments
> (IndexWriter.java:487)
>         at org.apache.lucene.index.IndexWriter.maybeMergeSegments
> (IndexWriter.java:458)
>         at org.apache.lucene.index.IndexWriter.addDocument
> (IndexWriter.java:310)
>         at org.apache.lucene.index.IndexWriter.addDocument
> (IndexWriter.java:294)
>         at org.apache.nutch.indexer.IndexSegment.indexPages
> (IndexSegment.java:148)
>         at org.apache.nutch.indexer.IndexSegment.main
> (IndexSegment.java:254)
> [root@localhost nutch-nightly]#
> 
>  Of course it is not lack of hardware space.  So, what is going on?
> 
> Thanks,
> 
> Wilson Melo
>