You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Paul <pa...@waite.net.nz> on 2004/04/14 09:44:29 UTC

Optimize crash

I posted this to the lucene-user list a few days ago, to a resounding silence
and so thought I'd try my luck here on the dev list. ;-)

Cheers,
Paul.


The problem I have is that when I try to execute an optimize on my Lucene
index I get the following error thrown (see below).

If anyone can help, and the answer requires some digging, then I have the
very index tarred and gzipped for anon FTP access at ftp.catalyst.net.nz (in
the "pub" sub-directory). This is 462Mb, and unpacks to roughly twice that
size. There is also a README file there.

Here is the error I get very quickly when optimize runs:

--- CUT ---
java.lang.ArrayIndexOutOfBoundsException: 111 >= 23
        at java.util.Vector.elementAt(Vector.java(Compiled Code))
        at
 org.apache.lucene.index.FieldsReader.doc(FieldsReader.java(Compiled Code))
        at
 org.apache.lucene.index.FieldsReader.doc(FieldsReader.java(Compiled Code))
        at
org.apache.lucene.index.SegmentReader.document(SegmentReader.java(Compiled
Code))
        at
org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java(Compiled
Code))
        at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:92)
        at
org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:473)
        at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:354)
        at nz.net.catalyst.lucene.server.Optimize.execute(Optimize.java:80)
        at nz.net.catalyst.lucene.server.Control.optimize(Control.java:87)
        at nz.net.catalyst.lucene.server.Control.execute(Control.java:49)
        at nz.net.catalyst.lucene.server.Dialogue.process(Dialogue.java:111)
        at
 nz.net.catalyst.lucene.server.Session.communicate(Session.java:125) at
 nz.net.catalyst.SocketClient.run(SocketClient.java:70)
        at java.lang.Thread.run(Thread.java:512)
--- CUT ---

This was actually thrown by Lucene v1.4-rc2, which I was testing to see if it
solved my problem. I am currently running v1.3-Final on my live site and this
does the same thing. This is running on Debian Linux, Woody, and is using the
IBM Runtime Environment for Linux Java(TM) 2 Technology Edition, Version
1.3.1, JRE.

It should be noted that I have had this problem before, and I solved it by
completely re-indexing the article set from scratch (starting with no index
 at all). After that process, the optimize worked fine. Then somewhere along
 the line of many days indexing new articles, and doing an optimise every day
 at about 3.30am, the problem has returned.

The articles being indexed are all homogeneous in terms of fields being
indexed, details below:

FIELD DEFINITIONS
Field name      Field type      Stored?         Indexed?
----------      ----------      -------         --------
Domain          Text            STORED          INDEXED
Id              Id              STORED          INDEXED
date            Date            STORED          INDEXED
datetime        Date            STORED          INDEXED
added           Date            STORED          INDEXED
category        Text            STORED          INDEXED
subcategory     Text            STORED          INDEXED
source          Text            STORED          INDEXED
title           Text            STORED          NOT INDEXED
slug            Text            STORED          NOT INDEXED
type            Text            STORED          NOT INDEXED
sourcetype      Text            NOT STORED      INDEXED


Any help greatly appreciated.

Cheers,
Paul.

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Re: Optimize crash

Posted by Paul <pa...@waite.net.nz>.
Sam Hough wrote:
> Paul,
>
> Again, more information but no comfort. It looks like your index is corrupt
> rather than there being a problem with optimise method.
>
> Document numbers 185414 and 259128 are corrupt. I tried marking these
> two documents as deleted but it then seems term info is also corrupt as it
> blows up in SegmentTermPositions.next() line 111
>
> I cant even imagine how to track down how that happened from here. From
> my little knowledge of Lucene the storage code is very solid so maybe the
> JVM etc start to become suspects.
>
> Sorry I couldnt really help.

Not at all, thanks very much for all your efforts on it and thanks for the
idea on the JVM.

Those documents are pretty old ones (the index was up to 700,000 by then)
so this is interesting. I had re-indexed a while back, to get around this very
same problem the first time it occurred, and optimize worked at that point.
So these two documents were presumably not corrupt then.

It's very unlikely that they were edited, being so old, so no re-indexing
should have been done. So how they became corrupt is a mystery.

Cheers,
Paul.

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Re: Optimize crash

Posted by Paul <pa...@waite.net.nz>.
Sam Hough wrote:
> Paul,
>
> For what it is worth I can duplicate the problem with your index file
> on Win2k, Sun JDK1.4.0_04 and Lucene CVS HEAD.
>
> I get the same stack trace:
>
> java.lang.ArrayIndexOutOfBoundsException: 111 >= 23
> at java.util.Vector.elementAt(Vector.java:427)
> at org.apache.lucene.index.FieldInfos.fieldInfo(FieldInfos.java:155)
> at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:66)
> at org.apache.lucene.index.SegmentReader.document(SegmentReader.java:245)
> at
> org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:180)
> at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:92) at
> org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:473) at
> org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:354)
>
> Do normal searches never fail against this index?

No, it seems to search fine as far as I know. The live index, of which the one
you mention above is just a snapshot, is being added to at a rate of about
1,000 new articles a day, and is being searched pretty hard constantly.

Thanks for the verification.

Cheers,
Paul.

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Re: Optimize crash

Posted by Sam Hough <sa...@redspr.com>.
Paul,

For what it is worth I can duplicate the problem with your index file
on Win2k, Sun JDK1.4.0_04 and Lucene CVS HEAD.

I get the same stack trace:

java.lang.ArrayIndexOutOfBoundsException: 111 >= 23
at java.util.Vector.elementAt(Vector.java:427)
at org.apache.lucene.index.FieldInfos.fieldInfo(FieldInfos.java:155)
at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:66)
at org.apache.lucene.index.SegmentReader.document(SegmentReader.java:245)
at org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:180)
at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:92)
at org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:473)
at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:354)

Do normal searches never fail against this index?

Cheers

Sam



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Re: Optimize crash

Posted by Sam Hough <sa...@redspr.com>.
Paul,

Again, more information but no comfort. It looks like your index is corrupt
rather than there being a problem with optimise method.

Document numbers 185414 and 259128 are corrupt. I tried marking these
two documents as deleted but it then seems term info is also corrupt as it 
blows up in SegmentTermPositions.next() line 111

I cant even imagine how to track down how that happened from here. From
my little knowledge of Lucene the storage code is very solid so maybe the
JVM etc start to become suspects.

Sorry I couldnt really help.

Cheers

Sam

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Re: Optimize crash

Posted by Paul <pa...@waite.net.nz>.
Erik Hatcher wrote:

> If it makes you feel any better, I'll at least reply and say that I've
> read your previous reports but I have not experienced this myself nor
> know what causes it.  Sorry.

Yep, feeling much better now. ;-)


> If its possible to duplicate this, perhaps you could package up your
> index and a small piece of code that causes the crash and make it
> available for someone to take a look at.

Well the index is at an ftp point as below, though I'm afraid it's 462Mb
as any attempt to replicate this with smaller indexes hasn't been a
success yet. There's a readme with it. As to code, basically it crashes
when you call the optimize method on that index.

I'm assuming the index is corrupted in some way, and wondered if
the error message being reported gave anyone any clues as to what
might have occurred to corrupt it in the manner which would cause
it. Or is it just so non-specific as to be largely useless?

Thanks once again for the reply.

Cheers,
Paul.

> > index tarred and gzipped for anon FTP access at ftp.catalyst.net.nz (in
> > the "pub" sub-directory).

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Re: Optimize crash

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
If it makes you feel any better, I'll at least reply and say that I've  
read your previous reports but I have not experienced this myself nor  
know what causes it.  Sorry.

If its possible to duplicate this, perhaps you could package up your  
index and a small piece of code that causes the crash and make it  
available for someone to take a look at.

	Erik


On May 8, 2004, at 4:50 PM, Paul wrote:

> I'm posting this yet again on behalf of myself and a few other users  
> who have
> this serious problem, and who have yet to obtain any assistance on it.
>
> Once again, any help or feedback would be appreciated from you folks  
> who
> are using and developing with Lucene, and who may have an idea as to  
> what
> on earth is going on here.
>
> If there just isn't enough info, or the question is plain stupid, then  
> please
> tell me so. If there is something I need to do in debugging it, to get  
> you
> more information, then likewise please suggest the best way forward.
>
> Cheers,
> Paul.
>
>
> The problem I have is that when I try to execute an optimize on my  
> Lucene
> index I get the following error thrown (see below).
>
> If anyone can help, and the answer requires some digging, then I have  
> the
> very index tarred and gzipped for anon FTP access at  
> ftp.catalyst.net.nz (in
> the "pub" sub-directory). This is 462Mb, and unpacks to roughly twice  
> that
> size. There is also a README file there.
>
> Here is the error I get very quickly when optimize runs:
>
> --- CUT ---
> java.lang.ArrayIndexOutOfBoundsException: 111 >= 23
>         at java.util.Vector.elementAt(Vector.java(Compiled Code))
>         at
>  org.apache.lucene.index.FieldsReader.doc(FieldsReader.java(Compiled  
> Code))
>         at
>  org.apache.lucene.index.FieldsReader.doc(FieldsReader.java(Compiled  
> Code))
>         at
> org.apache.lucene.index.SegmentReader.document(SegmentReader.java(Compi 
> led
> Code))
>         at
> org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java(Co 
> mpiled
> Code))
>         at  
> org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:92)
>         at
> org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:473)
>         at  
> org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:354)
>         at  
> nz.net.catalyst.lucene.server.Optimize.execute(Optimize.java:80)
>         at  
> nz.net.catalyst.lucene.server.Control.optimize(Control.java:87)
>         at  
> nz.net.catalyst.lucene.server.Control.execute(Control.java:49)
>         at  
> nz.net.catalyst.lucene.server.Dialogue.process(Dialogue.java:111)
>         at
>  nz.net.catalyst.lucene.server.Session.communicate(Session.java:125) at
>  nz.net.catalyst.SocketClient.run(SocketClient.java:70)
>         at java.lang.Thread.run(Thread.java:512)
> --- CUT ---
>
> This was actually thrown by Lucene v1.4-rc2, which I was testing to  
> see if it
> solved my problem. I am currently running v1.3-Final on my live site  
> and this
> does the same thing. This is running on Debian Linux, Woody, and is  
> using the
> IBM Runtime Environment for Linux Java(TM) 2 Technology Edition,  
> Version
> 1.3.1, JRE.
>
> It should be noted that I have had this problem before, and I solved  
> it by
> completely re-indexing the article set from scratch (starting with no  
> index
>  at all). After that process, the optimize worked fine. Then somewhere  
> along
>  the line of many days indexing new articles, and doing an optimise  
> every day
>  at about 3.30am, the problem has returned.
>
> The articles being indexed are all homogeneous in terms of fields being
> indexed, details below:
>
> FIELD DEFINITIONS
> Field name      Field type      Stored?         Indexed?
> ----------      ----------      -------         --------
> Domain          Text            STORED          INDEXED
> Id              Id              STORED          INDEXED
> date            Date            STORED          INDEXED
> datetime        Date            STORED          INDEXED
> added           Date            STORED          INDEXED
> category        Text            STORED          INDEXED
> subcategory     Text            STORED          INDEXED
> source          Text            STORED          INDEXED
> title           Text            STORED          NOT INDEXED
> slug            Text            STORED          NOT INDEXED
> type            Text            STORED          NOT INDEXED
> sourcetype      Text            NOT STORED      INDEXED
>
>
> Any help greatly appreciated.
>
> Cheers,
> Paul.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Re: Optimize crash

Posted by Paul <pa...@waite.net.nz>.
I'm posting this yet again on behalf of myself and a few other users who have
this serious problem, and who have yet to obtain any assistance on it.

Once again, any help or feedback would be appreciated from you folks who
are using and developing with Lucene, and who may have an idea as to what
on earth is going on here.

If there just isn't enough info, or the question is plain stupid, then please
tell me so. If there is something I need to do in debugging it, to get you
more information, then likewise please suggest the best way forward.

Cheers,
Paul.


The problem I have is that when I try to execute an optimize on my Lucene
index I get the following error thrown (see below).

If anyone can help, and the answer requires some digging, then I have the
very index tarred and gzipped for anon FTP access at ftp.catalyst.net.nz (in
the "pub" sub-directory). This is 462Mb, and unpacks to roughly twice that
size. There is also a README file there.

Here is the error I get very quickly when optimize runs:

--- CUT ---
java.lang.ArrayIndexOutOfBoundsException: 111 >= 23
        at java.util.Vector.elementAt(Vector.java(Compiled Code))
        at
 org.apache.lucene.index.FieldsReader.doc(FieldsReader.java(Compiled Code))
        at
 org.apache.lucene.index.FieldsReader.doc(FieldsReader.java(Compiled Code))
        at
org.apache.lucene.index.SegmentReader.document(SegmentReader.java(Compiled
Code))
        at
org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java(Compiled
Code))
        at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:92)
        at
org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:473)
        at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:354)
        at nz.net.catalyst.lucene.server.Optimize.execute(Optimize.java:80)
        at nz.net.catalyst.lucene.server.Control.optimize(Control.java:87)
        at nz.net.catalyst.lucene.server.Control.execute(Control.java:49)
        at nz.net.catalyst.lucene.server.Dialogue.process(Dialogue.java:111)
        at
 nz.net.catalyst.lucene.server.Session.communicate(Session.java:125) at
 nz.net.catalyst.SocketClient.run(SocketClient.java:70)
        at java.lang.Thread.run(Thread.java:512)
--- CUT ---

This was actually thrown by Lucene v1.4-rc2, which I was testing to see if it
solved my problem. I am currently running v1.3-Final on my live site and this
does the same thing. This is running on Debian Linux, Woody, and is using the
IBM Runtime Environment for Linux Java(TM) 2 Technology Edition, Version
1.3.1, JRE.

It should be noted that I have had this problem before, and I solved it by
completely re-indexing the article set from scratch (starting with no index
 at all). After that process, the optimize worked fine. Then somewhere along
 the line of many days indexing new articles, and doing an optimise every day
 at about 3.30am, the problem has returned.

The articles being indexed are all homogeneous in terms of fields being
indexed, details below:

FIELD DEFINITIONS
Field name      Field type      Stored?         Indexed?
----------      ----------      -------         --------
Domain          Text            STORED          INDEXED
Id              Id              STORED          INDEXED
date            Date            STORED          INDEXED
datetime        Date            STORED          INDEXED
added           Date            STORED          INDEXED
category        Text            STORED          INDEXED
subcategory     Text            STORED          INDEXED
source          Text            STORED          INDEXED
title           Text            STORED          NOT INDEXED
slug            Text            STORED          NOT INDEXED
type            Text            STORED          NOT INDEXED
sourcetype      Text            NOT STORED      INDEXED


Any help greatly appreciated.

Cheers,
Paul.

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Re: Optimize crash

Posted by Paul <pa...@waite.net.nz>.
Dear all,

I hate to be insistent, but I have a large live website with a growing,
un-optimizable Lucene index and which therefore has it's appointment
with destiny pencilled into The Diary of Doom on a date roughly
three weeks hence.

So if I'm doing something stupid, or there's a workaround, or someone
is already looking into this problem, *please* let me know. My alternative
is to spend two days re-indexing the archive, and then to just wait for the 
inevitable repeat of this problem, like Groundhog Day, which isn't a
particularly attractive option.

(NB: The original message is under the same subject line in the archive.)

Thanks.

Cheers,
Paul.

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org