You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Ian Soboroff <ia...@nist.gov> on 2005/02/04 17:58:24 UTC

Reconstruct segments file?

Idiot question.  I've managed to blow away the "segments" file.  This
is an optimized index, so there's only one segment.  Is there an easy
way to reconstruct the segments file?

I've looked over the file formats web page, and poked at a known-good
segments file from a separate, similar index using od(1) and such.  I
guess what I'm not sure how to do is to recover the SegSize from the
segment I have.

Ian



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Reconstruct segments file?

Posted by Doug Cutting <cu...@apache.org>.
Ian Soboroff wrote:
> Speaking of Counter, I have a dumb question.  If the segments are
> named using an integer counter which is incremented, what is the point
> in converting that counter into a string for the segment filename?
> Why not just name the segments e.g. "1.frq", etc.?

The names are prefixed with an underscore, since it turns out that some 
filesystems have trouble (DOS?) with certain all-digit names.  Other 
than that, they are integers, just with a large radix.

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Reconstruct segments file?

Posted by Ian Soboroff <ia...@nist.gov>.
Doug Cutting <cu...@apache.org> writes:

> Ian Soboroff wrote:
>> I've looked over the file formats web page, and poked at a known-good
>> segments file from a separate, similar index using od(1) and such.  I
>> guess what I'm not sure how to do is to recover the SegSize from the
>> segment I have.
>
> The SegSize should be the same as the length in bytes of any of the 
> .f[0-9]+ files in the segment.  If your segment is in compound format 
> then you can use IndexReader.main() in the current SVN version to list 
> the files and sizes in the .cfs file, including its contained .f[0-9]+ 
> files.

Thanks, Doug, that is a huge help.

BTW, the fileformats.html page on the Lucene web site is incorrect
with regards to the segments file.  The description should read:

Segments --> Format, Version, Counter, SegCount, 
             <SegName, SegSize>^SegCount

That is, the Counter field is missing.  The Counter field is a UInt32.
Counter is used to generate the next segment name (see
IndexWriter.newSegmentName()).

Speaking of Counter, I have a dumb question.  If the segments are
named using an integer counter which is incremented, what is the point
in converting that counter into a string for the segment filename?
Why not just name the segments e.g. "1.frq", etc.?

Ian




---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Reconstruct segments file?

Posted by Doug Cutting <cu...@apache.org>.
Ian Soboroff wrote:
> I've looked over the file formats web page, and poked at a known-good
> segments file from a separate, similar index using od(1) and such.  I
> guess what I'm not sure how to do is to recover the SegSize from the
> segment I have.

The SegSize should be the same as the length in bytes of any of the 
.f[0-9]+ files in the segment.  If your segment is in compound format 
then you can use IndexReader.main() in the current SVN version to list 
the files and sizes in the .cfs file, including its contained .f[0-9]+ 
files.

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org