You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Bill Tschumy <bi...@otherwise.com> on 2005/10/27 05:20:28 UTC

Segments file format

I have been trying to reconstitute a corrupted index.  I have been  
looking at the segments file with a hex-editor and its format doesn't  
seem to quite agree with the description found at:
<http://lucene.apache.org/java/docs/fileformats.html>

It indicates the segments file looks like this:
Segments --> Format, Version, SegCount, <SegName, SegSize>SegCount

However, the couple I have looked at all have an additional  
undocumented UInt32 (probably) field after the Version field.  What  
is this and why is it not in the documentation?

Here is the hex from a small segment file to show you.  The 00 00 00  
4E at the end of the first line seems like it should not be there.   
Am I mis-reading something?

FF FF FF FF 00 00 00 00 00 00 00 28 00 00 00 4E
00 00 00 04 02 5F 6A 00 00 00 0A 03 5F 31 33 00
00 00 0A 03 5F 31 6E 00 00 00 0A 03 5F 32 35 00
00 00 09

-- 
Bill Tschumy
Otherwise -- Austin, TX
http://www.otherwise.com



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Segments file format

Posted by Yonik Seeley <ys...@gmail.com>.
There is a currently undocumented extra int32.
Here's the code for writing the segment file:

output.writeInt(FORMAT); // write FORMAT
output.writeLong(++version); // every write changes the index
output.writeInt(counter); // write counter
output.writeInt(size()); // write infos
for (int i = 0; i < size(); i++) {
SegmentInfo si = info(i);
output.writeString(si.name <http://si.name>);
output.writeInt(si.docCount);
}

The counter is just used to name new segments, so it's not needed for
reading an index.
It was added at the same time Format was added (Apr 2004).

-Yonik
Now hiring -- http://forms.cnet.com/slink?231706

On 10/26/05, Bill Tschumy <bi...@otherwise.com> wrote:
>
> I have been trying to reconstitute a corrupted index. I have been
> looking at the segments file with a hex-editor and its format doesn't
> seem to quite agree with the description found at:
> <http://lucene.apache.org/java/docs/fileformats.html>
>
> It indicates the segments file looks like this:
> Segments --> Format, Version, SegCount, <SegName, SegSize>SegCount
>
> However, the couple I have looked at all have an additional
> undocumented UInt32 (probably) field after the Version field. What
> is this and why is it not in the documentation?
>
> Here is the hex from a small segment file to show you. The 00 00 00
> 4E at the end of the first line seems like it should not be there.
> Am I mis-reading something?
>
> FF FF FF FF 00 00 00 00 00 00 00 28 00 00 00 4E
> 00 00 00 04 02 5F 6A 00 00 00 0A 03 5F 31 33 00
> 00 00 0A 03 5F 31 6E 00 00 00 0A 03 5F 32 35 00
> 00 00 09
>
> --
> Bill Tschumy
> Otherwise -- Austin, TX
> http://www.otherwise.com
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Segments file format

Posted by Yonik Seeley <ys...@gmail.com>.
Hi Bill,
I can't seem to correctly parse it either...

Format = FF FF FF FF
Version = 00 00 00 00 00 00 00 28
SegCount = 00 00 00 4E
???? = 00 00 00 04

-Yonik
Now hiring -- http://forms.cnet.com/slink?231706

On 10/26/05, Bill Tschumy <bi...@otherwise.com> wrote:
>
> I have been trying to reconstitute a corrupted index. I have been
> looking at the segments file with a hex-editor and its format doesn't
> seem to quite agree with the description found at:
> <http://lucene.apache.org/java/docs/fileformats.html>
>
> It indicates the segments file looks like this:
> Segments --> Format, Version, SegCount, <SegName, SegSize>SegCount
>
> However, the couple I have looked at all have an additional
> undocumented UInt32 (probably) field after the Version field. What
> is this and why is it not in the documentation?
>
> Here is the hex from a small segment file to show you. The 00 00 00
> 4E at the end of the first line seems like it should not be there.
> Am I mis-reading something?
>
> FF FF FF FF 00 00 00 00 00 00 00 28 00 00 00 4E
> 00 00 00 04 02 5F 6A 00 00 00 0A 03 5F 31 33 00
> 00 00 0A 03 5F 31 6E 00 00 00 0A 03 5F 32 35 00
> 00 00 09
>
> --
> Bill Tschumy
> Otherwise -- Austin, TX
> http://www.otherwise.com
>