You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by John O'Brien <jo...@criticalpath.net> on 2008/08/12 18:27:08 UTC

RE: CheckIndex possibly not detecting/fixing all corruptions?

Hi Mike,
	Apologies for the delay in getting back.
I have since figured out that the reason Luke gave an error when we searched
on the "fixed" index was (possibly) because it was a really old version (0.6
2005/02/16) - I tried again with v 0.8.1 (2008-02-13) and Luke can search on
the "fixed" index just fine.

When I first ran CheckIndex the exception was as follows:

java.lang.RuntimeException: term body:03: doc -2147483645 < lastDoc 44
        at org.apache.lucene.index.CheckIndex.check(CheckIndex.java:195)
        at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:373)

WARNING: 1 broken segments detected
WARNING: 162 documents will be lost

So for now I think the problem may lie in CLucene (or possibly our use of
it) rather than CheckIndex not detecting and repairing properly.
I will follow up with an update if/when I figure it out!
In the meantime if you have any ideas as to how/why a corruption of this
nature can occur I'd be glad to hear them.

Thanks for your help,
John.

-----Original Message-----
>From Michael McCandless <lu...@mikemccandless.com> 
Subject Re: CheckIndex possibly not detecting/fixing all corruptions? 
Date Wed, 30 Jul 2008 10:43:29 GMT 

Do you have the exception Luke produced?   That'd be a good clue as to  
what CheckIndex is not detecting.  It's hard for me to tell from that  
GDB trace exactly what's gone wrong...

When you first ran CheckIndex, and it detected one corrupt segment,  
what exception did it report as the cause of the corruption on that  
segment?

If you could upload the index somewhere, that'd be great -- I'll try  
to figure out why CheckIndex fails to detect the corruption.  It's  
particularly odd that you were able to successfully optimize the index.

If you run searches on your index use Lucene java, do you get any odd  
exceptions?

Mike
 

-----Original Message-----
From: John O'Brien [mailto:john.obrien@criticalpath.net] 
Sent: 30 July 2008 11:22
To: 'java-user@lucene.apache.org'
Subject: CheckIndex possibly not detecting/fixing all corruptions?


Hi,
	I already posted this question on the CLucene dev list but it was
suggested that I may be able to get some help on the Java list so here goes.
We use Clucene 0.9.20 in our search engine. One of the indexes appears to
have become corrupt (still investigating the cause of the corruption). We
tried using the Java Lucene CheckIndex tool to fix the corruption(s).
CheckIndex detected 1 broken segment and removed 162 documents and wrote a
new segments file. Subsequent runs of CheckIndex report "No problems were
detected with this index.". However our search engine still crashes when its
performing a search on the "fixed" index. Here is the relevant backtrace:

#8  0x001c0faa in lucene::index::SegmentTermDocs::read (this=0x664f26c8, 
    docs=0x664f9944, freqs=0x664f99c4, length=32)
    at CLucene/util/BitSet.h:35
#9  0x001b4a4a in lucene::index::MultiTermDocs::read (this=0x664e5d38, 
    docs=0x664f9944, freqs=0x664f99c4, length=32)
    at ../src/CLucene/index/MultiReader.cpp:400
#10 0x001ef7dc in lucene::search::TermScorer::next (this=0x664f9920)
    at ../src/CLucene/search/TermScorer.cpp:41
#11 0x001d2d23 in lucene::search::BooleanScorer::next (this=0x665f2fc0)
    at ../src/CLucene/search/BooleanScorer.cpp:63
#12 0x001d2d23 in lucene::search::BooleanScorer::next (this=0x664e5ec8)
    at ../src/CLucene/search/BooleanScorer.cpp:63
#13 0x001e3275 in lucene::search::IndexSearcher::_search (
    this=0x65ba6410, query=0x66734ed8, filter=0x0, nDocs=100)
    at ../src/CLucene/search/Scorer.h:39
#14 0x001e1fb0 in lucene::search::Hits::getMoreDocs (this=0x66881440, 
    m=50) at ../src/CLucene/search/Hits.cpp:110
#15 0x001e1acf in lucene::search::Hits::Hits ()
    at CLucene/util/PriorityQueue.h:35

We tried to perform a search using Luke tool but this also resulted in an
error. 
Also tried after optimizing the index db, but the same error persists. 
So it looks like the index db might still be corrupt.

Any ideas as to why CheckIndex appears not to have detected/fixed all
corruptions? 
Are there any other suggestions as to how to detect/repair index
corruptions?

Thanks in advance,
John.

P.s. I still have the index in question before and after running CheckIndex
fix on it so if anyone's willing to take a look at it I can upload it
somewhere for you to download.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: CheckIndex possibly not detecting/fixing all corruptions?

Posted by Michael McCandless <lu...@mikemccandless.com>.
OK thanks for bringing closure to this John, and good luck tracking it down.

Mike

John O'Brien <jo...@criticalpath.net> wrote:
> Hi Mike,
>        Apologies for the delay in getting back.
> I have since figured out that the reason Luke gave an error when we searched
> on the "fixed" index was (possibly) because it was a really old version (0.6
> 2005/02/16) - I tried again with v 0.8.1 (2008-02-13) and Luke can search on
> the "fixed" index just fine.
>
> When I first ran CheckIndex the exception was as follows:
>
> java.lang.RuntimeException: term body:03: doc -2147483645 < lastDoc 44
>        at org.apache.lucene.index.CheckIndex.check(CheckIndex.java:195)
>        at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:373)
>
> WARNING: 1 broken segments detected
> WARNING: 162 documents will be lost
>
> So for now I think the problem may lie in CLucene (or possibly our use of
> it) rather than CheckIndex not detecting and repairing properly.
> I will follow up with an update if/when I figure it out!
> In the meantime if you have any ideas as to how/why a corruption of this
> nature can occur I'd be glad to hear them.
>
> Thanks for your help,
> John.
>
> -----Original Message-----
> From Michael McCandless <lu...@mikemccandless.com>
> Subject Re: CheckIndex possibly not detecting/fixing all corruptions?
> Date Wed, 30 Jul 2008 10:43:29 GMT
>
> Do you have the exception Luke produced?   That'd be a good clue as to
> what CheckIndex is not detecting.  It's hard for me to tell from that
> GDB trace exactly what's gone wrong...
>
> When you first ran CheckIndex, and it detected one corrupt segment,
> what exception did it report as the cause of the corruption on that
> segment?
>
> If you could upload the index somewhere, that'd be great -- I'll try
> to figure out why CheckIndex fails to detect the corruption.  It's
> particularly odd that you were able to successfully optimize the index.
>
> If you run searches on your index use Lucene java, do you get any odd
> exceptions?
>
> Mike
>
>
> -----Original Message-----
> From: John O'Brien [mailto:john.obrien@criticalpath.net]
> Sent: 30 July 2008 11:22
> To: 'java-user@lucene.apache.org'
> Subject: CheckIndex possibly not detecting/fixing all corruptions?
>
>
> Hi,
>        I already posted this question on the CLucene dev list but it was
> suggested that I may be able to get some help on the Java list so here goes.
> We use Clucene 0.9.20 in our search engine. One of the indexes appears to
> have become corrupt (still investigating the cause of the corruption). We
> tried using the Java Lucene CheckIndex tool to fix the corruption(s).
> CheckIndex detected 1 broken segment and removed 162 documents and wrote a
> new segments file. Subsequent runs of CheckIndex report "No problems were
> detected with this index.". However our search engine still crashes when its
> performing a search on the "fixed" index. Here is the relevant backtrace:
>
> #8  0x001c0faa in lucene::index::SegmentTermDocs::read (this=0x664f26c8,
>    docs=0x664f9944, freqs=0x664f99c4, length=32)
>    at CLucene/util/BitSet.h:35
> #9  0x001b4a4a in lucene::index::MultiTermDocs::read (this=0x664e5d38,
>    docs=0x664f9944, freqs=0x664f99c4, length=32)
>    at ../src/CLucene/index/MultiReader.cpp:400
> #10 0x001ef7dc in lucene::search::TermScorer::next (this=0x664f9920)
>    at ../src/CLucene/search/TermScorer.cpp:41
> #11 0x001d2d23 in lucene::search::BooleanScorer::next (this=0x665f2fc0)
>    at ../src/CLucene/search/BooleanScorer.cpp:63
> #12 0x001d2d23 in lucene::search::BooleanScorer::next (this=0x664e5ec8)
>    at ../src/CLucene/search/BooleanScorer.cpp:63
> #13 0x001e3275 in lucene::search::IndexSearcher::_search (
>    this=0x65ba6410, query=0x66734ed8, filter=0x0, nDocs=100)
>    at ../src/CLucene/search/Scorer.h:39
> #14 0x001e1fb0 in lucene::search::Hits::getMoreDocs (this=0x66881440,
>    m=50) at ../src/CLucene/search/Hits.cpp:110
> #15 0x001e1acf in lucene::search::Hits::Hits ()
>    at CLucene/util/PriorityQueue.h:35
>
> We tried to perform a search using Luke tool but this also resulted in an
> error.
> Also tried after optimizing the index db, but the same error persists.
> So it looks like the index db might still be corrupt.
>
> Any ideas as to why CheckIndex appears not to have detected/fixed all
> corruptions?
> Are there any other suggestions as to how to detect/repair index
> corruptions?
>
> Thanks in advance,
> John.
>
> P.s. I still have the index in question before and after running CheckIndex
> fix on it so if anyone's willing to take a look at it I can upload it
> somewhere for you to download.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org