You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Stanislav Jordanov <st...@sirma.bg> on 2006/09/05 15:08:16 UTC

obtaining the number of documents stored in a .cfs file

Suppose I have a bunch of valid .cfs files while the 
segmens/segments.new file is missing or invalid.
The task is to 'recover' the present .cfs files into a valid index.
I think it will be necessary and sufficient to create a segments file 
that references the .cfs files.
The only problem I've encountered in generating a vaild and well-formed 
segments file is that I need to know the number of docs in each cfs file.
So the couple of questions is:
Do I have to put the right number of docs for each segments or any 
(dummy) number will do?
If I have to put the right number there, how do I get it having the cfs 
file?

Stanislav

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: obtaining the number of documents stored in a .cfs file

Posted by Andrzej Bialecki <ab...@getopt.org>.
Stanislav Jordanov wrote:
> Suppose I have a bunch of valid .cfs files while the 
> segmens/segments.new file is missing or invalid.
> The task is to 'recover' the present .cfs files into a valid index.
> I think it will be necessary and sufficient to create a segments file 
> that references the .cfs files.
> The only problem I've encountered in generating a vaild and 
> well-formed segments file is that I need to know the number of docs in 
> each cfs file.
> So the couple of questions is:
> Do I have to put the right number of docs for each segments or any 
> (dummy) number will do?

Not sure, but I doubt anything else than a valid number would work.

> If I have to put the right number there, how do I get it having the 
> cfs file?

Look at the size of _xx.f1 file inside CFS file; this is the norms file, 
and its size in bytes is the same as the number of documents in the index.

(You can use CompoundFileReader.list() and fileLength() methods).

-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: obtaining the number of documents stored in a .cfs file

Posted by Volodymyr Bychkoviak <vb...@i-hypergrid.com>.
One more note:
this should be in package 'org.apache.lucene.index;' because it uses 
some package visible classes :)

Volodymyr Bychkoviak wrote:
> I've used following code to recover index. Note: it only works with 
> .cfs files.
>
>
>    String path = // path to index
>    File file = new File(path);
>    Directory directory = FSDirectory.getDirectory(file, false);
>
>    String[] files = file.list(new FilenameFilter() {
>
>      public boolean accept(File dir, String name) {
>        return name.endsWith(".cfs");
>      }
>
>    });
>      SegmentInfos infos = new SegmentInfos();
>    int counter = 0;
>    for (int i = 0; i < files.length; i++) {
>      String fileName = files[i];
>      String segmentName = fileName.substring(1, 
> fileName.lastIndexOf('.'));
>          int segmentInt = 
> Integer.parseInt(segmentName,Character.MAX_RADIX);
>      counter = Math.max(counter, segmentInt);
>          segmentName = fileName.substring(0, fileName.lastIndexOf('.'));
>          Directory fileReader = new 
> CompoundFileReader(directory,fileName);
>      IndexInput indexStream = fileReader.openInput(segmentName + ".fdx");
>      int size = (int)(indexStream.length() / 8);
>      indexStream.close();
>      fileReader.close();
>          SegmentInfo segmentInfo = new 
> SegmentInfo(segmentName,size,directory);
>      infos.addElement(segmentInfo);
>    }
>
>    infos.counter = counter++;
>      infos.write(directory);
>
> Stanislav Jordanov wrote:
>> Suppose I have a bunch of valid .cfs files while the 
>> segmens/segments.new file is missing or invalid.
>> The task is to 'recover' the present .cfs files into a valid index.
>> I think it will be necessary and sufficient to create a segments file 
>> that references the .cfs files.
>> The only problem I've encountered in generating a vaild and 
>> well-formed segments file is that I need to know the number of docs 
>> in each cfs file.
>> So the couple of questions is:
>> Do I have to put the right number of docs for each segments or any 
>> (dummy) number will do?
>> If I have to put the right number there, how do I get it having the 
>> cfs file?
>>
>> Stanislav
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>

-- 
regards,
Volodymyr Bychkoviak


Re: obtaining the number of documents stored in a .cfs file

Posted by Volodymyr Bychkoviak <vb...@i-hypergrid.com>.
one mistake in this code
should be
    infos.counter = ++counter;
instead of
    infos.counter = counter++;

Volodymyr Bychkoviak wrote:
> I've used following code to recover index. Note: it only works with 
> .cfs files.
>
>
>    String path = // path to index
>    File file = new File(path);
>    Directory directory = FSDirectory.getDirectory(file, false);
>
>    String[] files = file.list(new FilenameFilter() {
>
>      public boolean accept(File dir, String name) {
>        return name.endsWith(".cfs");
>      }
>
>    });
>      SegmentInfos infos = new SegmentInfos();
>    int counter = 0;
>    for (int i = 0; i < files.length; i++) {
>      String fileName = files[i];
>      String segmentName = fileName.substring(1, 
> fileName.lastIndexOf('.'));
>          int segmentInt = 
> Integer.parseInt(segmentName,Character.MAX_RADIX);
>      counter = Math.max(counter, segmentInt);
>          segmentName = fileName.substring(0, fileName.lastIndexOf('.'));
>          Directory fileReader = new 
> CompoundFileReader(directory,fileName);
>      IndexInput indexStream = fileReader.openInput(segmentName + ".fdx");
>      int size = (int)(indexStream.length() / 8);
>      indexStream.close();
>      fileReader.close();
>          SegmentInfo segmentInfo = new 
> SegmentInfo(segmentName,size,directory);
>      infos.addElement(segmentInfo);
>    }
>
>    infos.counter = counter++;
>      infos.write(directory);
>
> Stanislav Jordanov wrote:
>> Suppose I have a bunch of valid .cfs files while the 
>> segmens/segments.new file is missing or invalid.
>> The task is to 'recover' the present .cfs files into a valid index.
>> I think it will be necessary and sufficient to create a segments file 
>> that references the .cfs files.
>> The only problem I've encountered in generating a vaild and 
>> well-formed segments file is that I need to know the number of docs 
>> in each cfs file.
>> So the couple of questions is:
>> Do I have to put the right number of docs for each segments or any 
>> (dummy) number will do?
>> If I have to put the right number there, how do I get it having the 
>> cfs file?
>>
>> Stanislav
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>

-- 
regards,
Volodymyr Bychkoviak


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: obtaining the number of documents stored in a .cfs file

Posted by Volodymyr Bychkoviak <vb...@i-hypergrid.com>.
I've used following code to recover index. Note: it only works with .cfs 
files.


    String path = // path to index
    File file = new File(path);
    Directory directory = FSDirectory.getDirectory(file, false);

    String[] files = file.list(new FilenameFilter() {

      public boolean accept(File dir, String name) {
        return name.endsWith(".cfs");
      }

    });
   
    SegmentInfos infos = new SegmentInfos();
    int counter = 0;
    for (int i = 0; i < files.length; i++) {
      String fileName = files[i];
      String segmentName = fileName.substring(1, fileName.lastIndexOf('.'));
     
      int segmentInt = Integer.parseInt(segmentName,Character.MAX_RADIX);
      counter = Math.max(counter, segmentInt);
     
      segmentName = fileName.substring(0, fileName.lastIndexOf('.'));
     
      Directory fileReader = new CompoundFileReader(directory,fileName);
      IndexInput indexStream = fileReader.openInput(segmentName + ".fdx");
      int size = (int)(indexStream.length() / 8);
      indexStream.close();
      fileReader.close();
     
      SegmentInfo segmentInfo = new SegmentInfo(segmentName,size,directory);
      infos.addElement(segmentInfo);
    }

    infos.counter = counter++;
   
    infos.write(directory);

Stanislav Jordanov wrote:
> Suppose I have a bunch of valid .cfs files while the 
> segmens/segments.new file is missing or invalid.
> The task is to 'recover' the present .cfs files into a valid index.
> I think it will be necessary and sufficient to create a segments file 
> that references the .cfs files.
> The only problem I've encountered in generating a vaild and 
> well-formed segments file is that I need to know the number of docs in 
> each cfs file.
> So the couple of questions is:
> Do I have to put the right number of docs for each segments or any 
> (dummy) number will do?
> If I have to put the right number there, how do I get it having the 
> cfs file?
>
> Stanislav
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

-- 
regards,
Volodymyr Bychkoviak