You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Stanislav Jordanov <st...@sirma.bg> on 2006/09/05 15:08:16 UTC
obtaining the number of documents stored in a .cfs file
Suppose I have a bunch of valid .cfs files while the
segmens/segments.new file is missing or invalid.
The task is to 'recover' the present .cfs files into a valid index.
I think it will be necessary and sufficient to create a segments file
that references the .cfs files.
The only problem I've encountered in generating a vaild and well-formed
segments file is that I need to know the number of docs in each cfs file.
So the couple of questions is:
Do I have to put the right number of docs for each segments or any
(dummy) number will do?
If I have to put the right number there, how do I get it having the cfs
file?
Stanislav
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: obtaining the number of documents stored in a .cfs file
Posted by Andrzej Bialecki <ab...@getopt.org>.
Stanislav Jordanov wrote:
> Suppose I have a bunch of valid .cfs files while the
> segmens/segments.new file is missing or invalid.
> The task is to 'recover' the present .cfs files into a valid index.
> I think it will be necessary and sufficient to create a segments file
> that references the .cfs files.
> The only problem I've encountered in generating a vaild and
> well-formed segments file is that I need to know the number of docs in
> each cfs file.
> So the couple of questions is:
> Do I have to put the right number of docs for each segments or any
> (dummy) number will do?
Not sure, but I doubt anything else than a valid number would work.
> If I have to put the right number there, how do I get it having the
> cfs file?
Look at the size of _xx.f1 file inside CFS file; this is the norms file,
and its size in bytes is the same as the number of documents in the index.
(You can use CompoundFileReader.list() and fileLength() methods).
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: obtaining the number of documents stored in a .cfs file
Posted by Volodymyr Bychkoviak <vb...@i-hypergrid.com>.
One more note:
this should be in package 'org.apache.lucene.index;' because it uses
some package visible classes :)
Volodymyr Bychkoviak wrote:
> I've used following code to recover index. Note: it only works with
> .cfs files.
>
>
> String path = // path to index
> File file = new File(path);
> Directory directory = FSDirectory.getDirectory(file, false);
>
> String[] files = file.list(new FilenameFilter() {
>
> public boolean accept(File dir, String name) {
> return name.endsWith(".cfs");
> }
>
> });
> SegmentInfos infos = new SegmentInfos();
> int counter = 0;
> for (int i = 0; i < files.length; i++) {
> String fileName = files[i];
> String segmentName = fileName.substring(1,
> fileName.lastIndexOf('.'));
> int segmentInt =
> Integer.parseInt(segmentName,Character.MAX_RADIX);
> counter = Math.max(counter, segmentInt);
> segmentName = fileName.substring(0, fileName.lastIndexOf('.'));
> Directory fileReader = new
> CompoundFileReader(directory,fileName);
> IndexInput indexStream = fileReader.openInput(segmentName + ".fdx");
> int size = (int)(indexStream.length() / 8);
> indexStream.close();
> fileReader.close();
> SegmentInfo segmentInfo = new
> SegmentInfo(segmentName,size,directory);
> infos.addElement(segmentInfo);
> }
>
> infos.counter = counter++;
> infos.write(directory);
>
> Stanislav Jordanov wrote:
>> Suppose I have a bunch of valid .cfs files while the
>> segmens/segments.new file is missing or invalid.
>> The task is to 'recover' the present .cfs files into a valid index.
>> I think it will be necessary and sufficient to create a segments file
>> that references the .cfs files.
>> The only problem I've encountered in generating a vaild and
>> well-formed segments file is that I need to know the number of docs
>> in each cfs file.
>> So the couple of questions is:
>> Do I have to put the right number of docs for each segments or any
>> (dummy) number will do?
>> If I have to put the right number there, how do I get it having the
>> cfs file?
>>
>> Stanislav
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
--
regards,
Volodymyr Bychkoviak
Re: obtaining the number of documents stored in a .cfs file
Posted by Volodymyr Bychkoviak <vb...@i-hypergrid.com>.
one mistake in this code
should be
infos.counter = ++counter;
instead of
infos.counter = counter++;
Volodymyr Bychkoviak wrote:
> I've used following code to recover index. Note: it only works with
> .cfs files.
>
>
> String path = // path to index
> File file = new File(path);
> Directory directory = FSDirectory.getDirectory(file, false);
>
> String[] files = file.list(new FilenameFilter() {
>
> public boolean accept(File dir, String name) {
> return name.endsWith(".cfs");
> }
>
> });
> SegmentInfos infos = new SegmentInfos();
> int counter = 0;
> for (int i = 0; i < files.length; i++) {
> String fileName = files[i];
> String segmentName = fileName.substring(1,
> fileName.lastIndexOf('.'));
> int segmentInt =
> Integer.parseInt(segmentName,Character.MAX_RADIX);
> counter = Math.max(counter, segmentInt);
> segmentName = fileName.substring(0, fileName.lastIndexOf('.'));
> Directory fileReader = new
> CompoundFileReader(directory,fileName);
> IndexInput indexStream = fileReader.openInput(segmentName + ".fdx");
> int size = (int)(indexStream.length() / 8);
> indexStream.close();
> fileReader.close();
> SegmentInfo segmentInfo = new
> SegmentInfo(segmentName,size,directory);
> infos.addElement(segmentInfo);
> }
>
> infos.counter = counter++;
> infos.write(directory);
>
> Stanislav Jordanov wrote:
>> Suppose I have a bunch of valid .cfs files while the
>> segmens/segments.new file is missing or invalid.
>> The task is to 'recover' the present .cfs files into a valid index.
>> I think it will be necessary and sufficient to create a segments file
>> that references the .cfs files.
>> The only problem I've encountered in generating a vaild and
>> well-formed segments file is that I need to know the number of docs
>> in each cfs file.
>> So the couple of questions is:
>> Do I have to put the right number of docs for each segments or any
>> (dummy) number will do?
>> If I have to put the right number there, how do I get it having the
>> cfs file?
>>
>> Stanislav
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
--
regards,
Volodymyr Bychkoviak
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: obtaining the number of documents stored in a .cfs file
Posted by Volodymyr Bychkoviak <vb...@i-hypergrid.com>.
I've used following code to recover index. Note: it only works with .cfs
files.
String path = // path to index
File file = new File(path);
Directory directory = FSDirectory.getDirectory(file, false);
String[] files = file.list(new FilenameFilter() {
public boolean accept(File dir, String name) {
return name.endsWith(".cfs");
}
});
SegmentInfos infos = new SegmentInfos();
int counter = 0;
for (int i = 0; i < files.length; i++) {
String fileName = files[i];
String segmentName = fileName.substring(1, fileName.lastIndexOf('.'));
int segmentInt = Integer.parseInt(segmentName,Character.MAX_RADIX);
counter = Math.max(counter, segmentInt);
segmentName = fileName.substring(0, fileName.lastIndexOf('.'));
Directory fileReader = new CompoundFileReader(directory,fileName);
IndexInput indexStream = fileReader.openInput(segmentName + ".fdx");
int size = (int)(indexStream.length() / 8);
indexStream.close();
fileReader.close();
SegmentInfo segmentInfo = new SegmentInfo(segmentName,size,directory);
infos.addElement(segmentInfo);
}
infos.counter = counter++;
infos.write(directory);
Stanislav Jordanov wrote:
> Suppose I have a bunch of valid .cfs files while the
> segmens/segments.new file is missing or invalid.
> The task is to 'recover' the present .cfs files into a valid index.
> I think it will be necessary and sufficient to create a segments file
> that references the .cfs files.
> The only problem I've encountered in generating a vaild and
> well-formed segments file is that I need to know the number of docs in
> each cfs file.
> So the couple of questions is:
> Do I have to put the right number of docs for each segments or any
> (dummy) number will do?
> If I have to put the right number there, how do I get it having the
> cfs file?
>
> Stanislav
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
--
regards,
Volodymyr Bychkoviak