You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by wa...@Cyveillance.com on 2004/07/13 23:40:50 UTC

corrupt indexes?

Has anyone had any experience with their index getting corrupted?

Are there any tools to repair it should it get corrupted?

I have not had any problems, but was curious at how resiliant this data
store seems to be.

-Will

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: corrupt indexes?

Posted by Paul <pa...@waite.net.nz>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Tue, 13 Jul 2004 17:40:50 -0400, wallen@cyveillance.com
<wa...@cyveillance.com> wrote:
> Has anyone had any experience with their index getting corrupted?

See my previous postings under the subject "Optimize Crash" of around
4th April. I provided the index via FTP (again see the previous posts - it is
still available).

Sam Hough dug into it and found that two documents were corrupt in this
index, causing the optimize to trip up. He tried marking these as deleted
but reported that the term info was also corrupt so this didn't work.

My modus operandum prior to this happening was as follows. We index new
documents at around 1,000-1,500 per day, and so I was doing a single
optimize in the early hours of each morning. Searching is being done all the
while, but is pretty quiet outside working hours. No result code was being
returned from the optimization, so I wasn't aware of the corruption at the
actual time it occurred.

So far this has happened twice. The first time it happened, I just re-indexed
the whole archive (about 700,000 docs), and also moved to 1.3-Final, since
that had just come out.

Unfortunately the same problem then recurred, and this is when I posted to
this group and made the index available for anyone interested.

Obviously I still don't know what caused it. It could be nothing to do with
Lucene - possibly a JVM or hardware issue. It's very hard to diagnose.

My approach has therefore been as follows:

First of all I stopped doing any optimisation. The index still has the corrupt
documents in it, but otherwise behaves normally for searching and indexing
new content etc. Performance is fine.

Our optimisation run is going to be enhanced to check the success or
otherwise, as indication of index corruption having occurred. If the run was
a success, we will merge the index into an empty directory, creating a
'last known good' backup. If it failed, we will report the error.

If corruption occurs, we will then take the last known good backup and just
roll it forward through the few hundred new articles since backup, which
will only take about 10-15 minutes.

Cheers,
Paul.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFA9HZKtfkpAgkMOyMRAvVBAJ0eIm7WNCn/uLDf+x1NiMAyvCi+qACfa1Y0
Jmfi3tKZPGHNM+oJVIku7mE=
=MtJS
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: corrupt indexes?

Posted by Giulio Cesare Solaroli <gi...@gmail.com>.
We had a nightmare story, but we did not understand what had happened.

The net result was that an index, with almost 6 milion documents, that
was updated (adding and deleting documets) constantly for a few
months, one morning was reporting 0 documents, even if the space used
on the disk was still consistent with the 6 milion documents (about
30Gb).

We have not enough knowledge to debug the problem so we started
immediatly creating a new index on the whole set of documents.

Beside this nasty problem, we have experienced constant errors trying
to create a new index on a Sun V440 (with 4 CPU and 8Gb of RAM) with a
Sun JRE.

I will collect all the relevant information and post back full details
on this later problem because we have been forced to use other systems
(XServe with MacOSX 10.2) because using the V440 the index became
corrupted regularly. We have not other Sun hardware with enough
performace to support our indexing needs, so we haven't investigated
this problem further.

Regards,

Giulio Cesare Solaroli

On Tue, 13 Jul 2004 17:40:50 -0400, wallen@cyveillance.com
<wa...@cyveillance.com> wrote:
> Has anyone had any experience with their index getting corrupted?
> 
> Are there any tools to repair it should it get corrupted?
> 
> I have not had any problems, but was curious at how resiliant this data
> store seems to be.
> 
> -Will
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
>

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org