You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by nettadalet <ns...@dalet.com> on 2020/05/19 13:33:11 UTC

Corrupted .cfs file

I get the following exception:
Caused by: org.apache.lucene.index.CorruptIndexException: length should be
104004663 bytes, but is 104856631 instead
(resource=MMapIndexInput(path="path_to_index\index\_jlp.cfs"))

What may be the cause of this?
How can the length of the .cfs file change so it become corrupted?
Can I simply delete this .cfs file and then synchronized the index against
the database, so only the missing files will be indexed, instead of
reindexing all the files?

Thanks in advance.



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Corrupted .cfs file

Posted by nettadalet <ns...@dalet.com>.
Sorry to reply just now, but you were right - the problem was that the disk
got full.
Thank you very much!



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Corrupted .cfs file

Posted by Erick Erickson <er...@gmail.com>.
Usually this is caused by one of
1> the file on disk getting corrupted, i.e. the disk going bad.
2> the disk getting full at some point and writing a partial segment

No, you cannot delete the cfs file and re-index only the documents
that were in it because you have no way of knowing exactly what
those documents are. Segments are merged in the background as
part of normal indexing, so figuring out what docs were in the
segment isn’t really possible. (OK, it’s determinate, but there are
so many variables that it might as well be impossible).

CheckIndex -fix will remove the corrupted segments, leaving holes
in your index. You can’t just delete the cfs file yourself because the
segments file which tells Lucene what segments are current references
it. But CheckIndex will take care of both parts for you.

If you really can’t re-index everything, you could certainly use a
streaming expression to get a list of all the IDs in the index, compare
that against your DB and only index the difference, but whether that’s
more work than just reindexing anyway I don’t know.

You don’t say whether you’re using SolrCloud or not, but if you are _and_
if you have more than one replica, just DELETEREPLICA on the bad one and
use ADDREPLICA to put it back. It’ll sync with the leader automatically.

Best,
Erick

> On May 19, 2020, at 9:33 AM, nettadalet <ns...@dalet.com> wrote:
> 
> I get the following exception:
> Caused by: org.apache.lucene.index.CorruptIndexException: length should be
> 104004663 bytes, but is 104856631 instead
> (resource=MMapIndexInput(path="path_to_index\index\_jlp.cfs"))
> 
> What may be the cause of this?
> How can the length of the .cfs file change so it become corrupted?
> Can I simply delete this .cfs file and then synchronized the index against
> the database, so only the missing files will be indexed, instead of
> reindexing all the files?
> 
> Thanks in advance.
> 
> 
> 
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html