You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Otis Gospodnetic <ot...@yahoo.com> on 2004/12/13 00:05:15 UTC
Re: Finding unused segment files?
Hello George,
Here is a quick hack (with a few TODOs). I only tested it a bit, so
the actual delete calls are still commented out. If this works for
you, and especially if you take care of TODOs, I may put this in the
Lucene Sandbox.
Otis
P.S.
Usage example showing how the fool found some unused segments (this was
caused by a bug in one of the earlier 1.4 versions of Lucene).
[otis@cosmo java]$ java org.apache.lucene.index.SegmentPurger
/simpy/users/1/index
Candidate non-Lucene file found: _1b2.del
Candidate unused Lucene file found: _1b2.cfs
Candidate unused Lucene file found: _1bm.cfs
Candidate unused Lucene file found: _1c6.cfs
Candidate unused Lucene file found: _1cq.cfs
Candidate unused Lucene file found: _1da.cfs
Candidate unused Lucene file found: _1du.cfs
Candidate unused Lucene file found: _1ee.cfs
Candidate unused Lucene file found: _1ey.cfs
[otis@cosmo java]$
[otis@cosmo java]$ strings /simpy/users/1/index/segments
_3o0
[otis@cosmo java]$ ls -al /simpy/users/1/index/
total 647
drwxrwsr-x 2 otis simpy 1024 Dec 7 14:39 .
drwxrwsr-x 3 otis simpy 1024 Sep 16 20:39 ..
-rw-rw-r-- 1 otis simpy 212815 Nov 17 18:36 _1b2.cfs
-rw-rw-r-- 1 otis simpy 104 Nov 17 18:40 _1b2.del
-rw-rw-r-- 1 otis simpy 3380 Nov 17 18:40 _1bm.cfs
-rw-rw-r-- 1 otis simpy 3533 Nov 17 18:40 _1c6.cfs
-rw-rw-r-- 1 otis simpy 4774 Nov 17 18:40 _1cq.cfs
-rw-rw-r-- 1 otis simpy 3389 Nov 17 18:40 _1da.cfs
-rw-rw-r-- 1 otis simpy 3809 Nov 17 18:40 _1du.cfs
-rw-rw-r-- 1 otis simpy 3423 Nov 17 18:40 _1ee.cfs
-rw-rw-r-- 1 otis simpy 4016 Nov 17 18:40 _1ey.cfs
-rw-rw-r-- 1 otis simpy 410299 Dec 7 14:39 _3o0.cfs
-rw-rw-r-- 1 otis simpy 4 Dec 7 14:39 deletable
-rw-rw-r-- 1 otis simpy 29 Dec 7 14:39 segments
--- gwithers@connected.com wrote:
> Hello all.
>
>
>
> I recently ran into a problem where errors during indexing or
> optimization
> (perhaps related to running out of disk space) left me with a working
> index
> in a directory but with additional segment files (partial) that were
> unneeded. The solution for finding the ~40 files to keep out of the
> ~900
> files in the directory amounted to dumping the segments file and
> noting that
> only 5 segments were in fact "live". The index is a non-compound
> index
> using FSDirectory.
>
>
>
> Is there (or would it be possible to add (and I'd be willing to
> submit code
> if it made sense to people)) some sort of interrogation on the index
> of what
> files belonged to it? I looked first as FSDirectory itself thinking
> that
> it's "list()" method should return the subset of index-related files
> but
> looking deeper it looks like Directory is at a lower level
> abstracting
> simple I/O and thus wouldn't "know".
>
>
>
> So any thoughts? Would it make sense to have a form of clean on
> IndexWriter()? I hesitate since it seems there isn't a charter that
> only
> Lucene files could exist in the directory thus what is ideal for my
> application (since I know I won't mingle other files) might not be
> ideal for
> all. Would it be fair to look for Lucene known extensions and file
> naming
> signatures to identify unused files that might be failed or dead
> segments?
>
>
>
> Thanks,
>
> -George
>
>