You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by 1world1love <jd...@yahoo.com> on 2009/01/06 23:07:54 UTC

java.io.IOException: read past EOF non-corrupt index

Greetings all. I have an index that I have optimized and when I try to open
the index I get this:

java.io.IOException: read past EOF
	at
org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java)
	at
org.apache.lucene.index.CompoundFileReader$CSIndexInput.readInternal(CompoundFileReader.java)
	at
org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java)
	at
org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java)
	at org.apache.lucene.store.IndexInput.readInt(IndexInput.java)
	at org.apache.lucene.index.FieldsReader.<init>(FieldsReader.java)
	at org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java)
	at org.apache.lucene.index.SegmentReader.get(SegmentReader.java)
	at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:269)
	at
org.apache.lucene.index.DirectoryIndexReader$1.doBody(DirectoryIndexReader.java:99)
	at
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java)
	at
org.apache.lucene.index.DirectoryIndexReader.open(DirectoryIndexReader.java:111)
	at org.apache.lucene.index.IndexReader.open(IndexReader.java:316)
	at org.apache.lucene.index.IndexReader.open(IndexReader.java:227)
	at LuceneTest.MyMethod(LuceneTest.java:226)

Obviously there are a few threads on this and most seem to indicate a
corrupted index, but I ran checkindex from a different machine and this is
the result:

Opening index @ /lucenedata/index3

Segments file=segments_1q9 numSegments=1 version=FORMAT_HAS_PROX [Lucene
2.4]
  1 of 1: name=_v3 docCount=12695236
    compound=true
    hasProx=true
    numFiles=1
    size (MB)=17,679.742
    no deletions
    test: open reader.........OK
    test: fields, norms.......OK [6 fields]
    test: terms, freq, prox...OK [18507503 terms; 1204902303 terms/docs
pairs; 1978598629 tokens]
    test: stored fields.......OK [76171416 total field count; avg 6 fields
per doc]
    test: term vectors........OK [0 total vector count; avg 0 term/freq
vector fields per doc]

No problems were detected with this index.

Any ideas why this may be happening?
-- 
View this message in context: http://www.nabble.com/java.io.IOException%3A-read-past-EOF-non-corrupt-index-tp21319971p21319971.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: java.io.IOException: read past EOF non-corrupt index

Posted by 1world1love <jd...@yahoo.com>.
Ok. Just to followup, I performed the same steps with another of our indexes
and did not have the same issue:

Opening index @ /lucenedata/index4

Segments file=segments_85 numSegments=1 version=FORMAT_HAS_PROX [Lucene 2.4]
  1 of 1: name=_42 docCount=3986767
    compound=true
    hasProx=true
    numFiles=1
    size (MB)=3,467.235
    no deletions
    test: open reader.........OK
    test: fields, norms.......OK [6 fields]
    test: terms, freq, prox...OK [6678265 terms; 285071252 terms/docs pairs;
335057297 tokens]
    test: stored fields.......OK [23920602 total field count; avg 6 fields
per doc]
    test: term vectors........OK [0 total vector count; avg 0 term/freq
vector fields per doc]

No problems were detected with this index.


Opening the index works fine and searching works fine on this index.

This makes me wonder if the root is not some sort of memory/buffer issue. I
don't know what happens when Lucene opens the index, but 18GB is a pretty
big file.

My admins say that Oracle has as much memory as it needs, but I am not sure.
Maybe Marcelo has some thoughts on this.
-- 
View this message in context: http://www.nabble.com/java.io.IOException%3A-read-past-EOF-non-corrupt-index-tp21319971p21335241.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: java.io.IOException: read past EOF non-corrupt index

Posted by 1world1love <jd...@yahoo.com>.


Erick Erickson wrote:
> 
> I guess my first question, based on your statement that you ran
> checkindex from a different machine would be whether you have
> the same version of Lucene installed on both machines? And how
> did you get your index where it is now? did you optmize it in place
> or did you optimize it somewhere else and copy it?
> 
> And what happens if you open it with Luke? I believe that Luke will
> give you some idea how it was created, but I'm not totally sure.
> 
> 

Thanks Erick.

I do have the same version on both machines (2.4.0). The original index was
created in place with 2.3. I made a copy of the original and deleted some
documents from the copy and then optimized it.

The index sits on space that is mounted from both the machines.

The caveat is that the machine that I get the error on is an Oracle DB
server. The code is called from a Java stored procedure within the Ojvm. I
also created the original index from a stored procedure within the Ojvm.

I can't open it with Luke because I only have CLI access or through the
Ojvm.
-- 
View this message in context: http://www.nabble.com/java.io.IOException%3A-read-past-EOF-non-corrupt-index-tp21319971p21334271.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: java.io.IOException: read past EOF non-corrupt index

Posted by Erick Erickson <er...@gmail.com>.
I guess my first question, based on your statement that you ran
checkindex from a different machine would be whether you have
the same version of Lucene installed on both machines? And how
did you get your index where it is now? did you optmize it in place
or did you optimize it somewhere else and copy it?

And what happens if you open it with Luke? I believe that Luke will
give you some idea how it was created, but I'm not totally sure.

Best
Erick

On Tue, Jan 6, 2009 at 5:07 PM, 1world1love <jd...@yahoo.com> wrote:

>
> Greetings all. I have an index that I have optimized and when I try to open
> the index I get this:
>
> java.io.IOException: read past EOF
>        at
>
> org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java)
>        at
>
> org.apache.lucene.index.CompoundFileReader$CSIndexInput.readInternal(CompoundFileReader.java)
>        at
> org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java)
>        at
>
> org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java)
>        at org.apache.lucene.store.IndexInput.readInt(IndexInput.java)
>        at org.apache.lucene.index.FieldsReader.<init>(FieldsReader.java)
>        at
> org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java)
>        at org.apache.lucene.index.SegmentReader.get(SegmentReader.java)
>        at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:269)
>        at
>
> org.apache.lucene.index.DirectoryIndexReader$1.doBody(DirectoryIndexReader.java:99)
>        at
>
> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java)
>        at
>
> org.apache.lucene.index.DirectoryIndexReader.open(DirectoryIndexReader.java:111)
>        at org.apache.lucene.index.IndexReader.open(IndexReader.java:316)
>        at org.apache.lucene.index.IndexReader.open(IndexReader.java:227)
>        at LuceneTest.MyMethod(LuceneTest.java:226)
>
> Obviously there are a few threads on this and most seem to indicate a
> corrupted index, but I ran checkindex from a different machine and this is
> the result:
>
> Opening index @ /lucenedata/index3
>
> Segments file=segments_1q9 numSegments=1 version=FORMAT_HAS_PROX [Lucene
> 2.4]
>  1 of 1: name=_v3 docCount=12695236
>    compound=true
>    hasProx=true
>    numFiles=1
>    size (MB)=17,679.742
>    no deletions
>    test: open reader.........OK
>    test: fields, norms.......OK [6 fields]
>    test: terms, freq, prox...OK [18507503 terms; 1204902303 terms/docs
> pairs; 1978598629 tokens]
>    test: stored fields.......OK [76171416 total field count; avg 6 fields
> per doc]
>    test: term vectors........OK [0 total vector count; avg 0 term/freq
> vector fields per doc]
>
> No problems were detected with this index.
>
> Any ideas why this may be happening?
> --
> View this message in context:
> http://www.nabble.com/java.io.IOException%3A-read-past-EOF-non-corrupt-index-tp21319971p21319971.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: java.io.IOException: read past EOF non-corrupt index

Posted by 1world1love <jd...@yahoo.com>.

Toke Eskildsen wrote:
> 
> A quick check when a corrupt index problem is encountered:
> Does any of your machines run Java 1.6.0_04-1.6.0_10b25?
> 

Thanks Toke.

As I mentioned in my response to Erick, this is complicated by the fact that
the error is within a java stored procedure in Oracle. The Ojvm is version
1.5.0_10. From what I understand, the ojvm is pretty true to the sun
implementation and we have not previously run into issues before.

On the other hand, we have never optimized before either.



-- 
View this message in context: http://www.nabble.com/java.io.IOException%3A-read-past-EOF-non-corrupt-index-tp21319971p21334347.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: java.io.IOException: read past EOF non-corrupt index

Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.
On Tue, 2009-01-06 at 23:07 +0100, 1world1love wrote:
> Greetings all. I have an index that I have optimized and when I try to open
> the index I get this:
> 
> java.io.IOException: read past EOF

A quick check when a corrupt index problem is encountered:
Does any of your machines run Java 1.6.0_04-1.6.0_10b25?
If they do, they are probably the cause.

https://issues.apache.org/jira/browse/LUCENE-1282

We got burned by this and added a fail-fast to our application,
which checks for the critical Java-versions. The version can be
requested by System.getProperty("java.runtime.version").


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: java.io.IOException: read past EOF non-corrupt index

Posted by Marcelo Ochoa <ma...@gmail.com>.
Hi:
   Could you try open the index using Luke but using the JDK bundled
with the Oracle DB?
   I mean, try to use Luke as an standalone application in the same
machine but outside the OJVM using the JDK at:
   $ORACLE_HOME/jdk
   which was used to compile most of the classes running inside the OJVM.
   Also you can drop me an email (not to the list because is quite off
topic) to help you how to debug an application inside the OJVM.
   I have experience writing the Lucene Domain Index for Oracle 11g
which is a Lucene 2.4.0 implementation running inside the OJVM but
replacing the file system store by BLOB store.
   http://docs.google.com/Doc?id=ddgw7sjp_54fgj9kg
   Best regards, Marcelo.
On Wed, Jan 7, 2009 at 2:12 PM, 1world1love <jd...@yahoo.com> wrote:
>
>
>
>
> Michael McCandless-2 wrote:
>>
>> That exception seems to indicate that the fdx file being opened by
>> FieldsReader is 0 length (it's trying to read the first int from that
>> file).
>>
>> Is the exception repeatable, if you try again to call
>> IndexReader.open?
>>
>> It's odd that CheckIndex finds no problem with the index, but opening
>> an IndexReader does.
>>
>> Or: did you try to open the IndexReader while the IndexWriter was
>> still open?  Or had IndexWriter already been closed?
>>
>
> Thanks Michael.
>
> The exception happens whenever I call open on the indexreader.
>
> There are no open indexwriters on the index as far as I know. Although I am
> not certain of exactly how Oracle manages objects and GC, I assume that if I
> close a reader then it should be closed in which case there should not be
> any open. Although it is not efficient, the readers and writers are only
> open for the duration of the stored procedure call.
>
>
> Michael McCandless-2 wrote:
>>
>> Can you post "ls -l" output from your index dir?
>>
>
> drwxrwxrwx   2 sdapp sdapp 2.0K Jan  7 09:53 .
> drwxr-xr-x  15 sdapp 10001 2.0K Jan  6 15:22 ..
> -rwxrwxrwx   1 sdapp sdapp   59 Dec 30 13:22 segments_1q9
> -rwxrwxrwx   1 sdapp sdapp   20 Dec 30 13:22 segments.gen
> -rwxrwxrwx   1 sdapp sdapp  18G Dec 30 13:22 _v3.cfs
>
> --
> View this message in context: http://www.nabble.com/java.io.IOException%3A-read-past-EOF-non-corrupt-index-tp21319971p21334530.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>



-- 
Marcelo F. Ochoa
http://marceloochoa.blogspot.com/
http://marcelo.ochoa.googlepages.com/home
______________
Want to integrate Lucene and Oracle?
http://marceloochoa.blogspot.com/2007/09/running-lucene-inside-your-oracle-jvm.html
Is Oracle 11g REST ready?
http://marceloochoa.blogspot.com/2008/02/is-oracle-11g-rest-ready.html

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: java.io.IOException: read past EOF non-corrupt index

Posted by 1world1love <jd...@yahoo.com>.



Michael McCandless-2 wrote:
> 
> That exception seems to indicate that the fdx file being opened by
> FieldsReader is 0 length (it's trying to read the first int from that
> file).
> 
> Is the exception repeatable, if you try again to call
> IndexReader.open?
> 
> It's odd that CheckIndex finds no problem with the index, but opening
> an IndexReader does.
> 
> Or: did you try to open the IndexReader while the IndexWriter was
> still open?  Or had IndexWriter already been closed?
> 

Thanks Michael.

The exception happens whenever I call open on the indexreader.

There are no open indexwriters on the index as far as I know. Although I am
not certain of exactly how Oracle manages objects and GC, I assume that if I
close a reader then it should be closed in which case there should not be
any open. Although it is not efficient, the readers and writers are only
open for the duration of the stored procedure call.


Michael McCandless-2 wrote:
> 
> Can you post "ls -l" output from your index dir?
> 

drwxrwxrwx   2 sdapp sdapp 2.0K Jan  7 09:53 .
drwxr-xr-x  15 sdapp 10001 2.0K Jan  6 15:22 ..
-rwxrwxrwx   1 sdapp sdapp   59 Dec 30 13:22 segments_1q9
-rwxrwxrwx   1 sdapp sdapp   20 Dec 30 13:22 segments.gen
-rwxrwxrwx   1 sdapp sdapp  18G Dec 30 13:22 _v3.cfs

-- 
View this message in context: http://www.nabble.com/java.io.IOException%3A-read-past-EOF-non-corrupt-index-tp21319971p21334530.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: java.io.IOException: read past EOF non-corrupt index

Posted by Michael McCandless <lu...@mikemccandless.com>.
That exception seems to indicate that the fdx file being opened by
FieldsReader is 0 length (it's trying to read the first int from that
file).

Is the exception repeatable, if you try again to call
IndexReader.open?

It's odd that CheckIndex finds no problem with the index, but opening
an IndexReader does.

Or: did you try to open the IndexReader while the IndexWriter was
still open?  Or had IndexWriter already been closed?

Can you post "ls -l" output from your index dir?

Mike

1world1love wrote:

>
> Greetings all. I have an index that I have optimized and when I try  
> to open
> the index I get this:
>
> java.io.IOException: read past EOF
> 	at
> org 
> .apache 
> .lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java)
> 	at
> org.apache.lucene.index.CompoundFileReader 
> $CSIndexInput.readInternal(CompoundFileReader.java)
> 	at
> org 
> .apache 
> .lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java)
> 	at
> org 
> .apache 
> .lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java)
> 	at org.apache.lucene.store.IndexInput.readInt(IndexInput.java)
> 	at org.apache.lucene.index.FieldsReader.<init>(FieldsReader.java)
> 	at  
> org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java)
> 	at org.apache.lucene.index.SegmentReader.get(SegmentReader.java)
> 	at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:269)
> 	at
> org.apache.lucene.index.DirectoryIndexReader 
> $1.doBody(DirectoryIndexReader.java:99)
> 	at
> org.apache.lucene.index.SegmentInfos 
> $FindSegmentsFile.run(SegmentInfos.java)
> 	at
> org 
> .apache 
> .lucene.index.DirectoryIndexReader.open(DirectoryIndexReader.java:111)
> 	at org.apache.lucene.index.IndexReader.open(IndexReader.java:316)
> 	at org.apache.lucene.index.IndexReader.open(IndexReader.java:227)
> 	at LuceneTest.MyMethod(LuceneTest.java:226)
>
> Obviously there are a few threads on this and most seem to indicate a
> corrupted index, but I ran checkindex from a different machine and  
> this is
> the result:
>
> Opening index @ /lucenedata/index3
>
> Segments file=segments_1q9 numSegments=1 version=FORMAT_HAS_PROX  
> [Lucene
> 2.4]
>  1 of 1: name=_v3 docCount=12695236
>    compound=true
>    hasProx=true
>    numFiles=1
>    size (MB)=17,679.742
>    no deletions
>    test: open reader.........OK
>    test: fields, norms.......OK [6 fields]
>    test: terms, freq, prox...OK [18507503 terms; 1204902303 terms/docs
> pairs; 1978598629 tokens]
>    test: stored fields.......OK [76171416 total field count; avg 6  
> fields
> per doc]
>    test: term vectors........OK [0 total vector count; avg 0 term/freq
> vector fields per doc]
>
> No problems were detected with this index.
>
> Any ideas why this may be happening?
> -- 
> View this message in context: http://www.nabble.com/java.io.IOException%3A-read-past-EOF-non-corrupt-index-tp21319971p21319971.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org