You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by Kevin Oliver <ke...@mac.com> on 2004/11/13 00:40:11 UTC

compound files question/patch

While investigating some performance issues during queries, I stumbled upon a small issue in SegmentReader in regards to compound files. Specifically, the openNorms() method takes in the directory to use, but then has its own logic as to use that directory or the directory from its base class (IndexReader). When an index has many field infos, we have about 30, this logic about checking for files existing adds a significant overhead. Although this is a small inefficiency in a normal file system, our file system is mounted over nfs, and this check is relatively expensive. 

The rest of the class doesn't do this sort of check for other files. By changing this code to work like the rest of the methods in the class (i.e. just using the passed in directory), things are a good bit quicker on my end. I don't see any issues with this patch. If people prefer, I can put this in bugzilla.

367c367
<         // look first for re-written file, then in compound format
---
>         norms.put(fi.name, new Norm(cfsDir.openFile(fileName), fi.number));
368,369d367
<         Directory d = directory().fileExists(fileName) ? directory() : cfsDir;
<         norms.put(fi.name, new Norm(d.openFile(fileName), fi.number));




---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org

Re: compound files question/patch

Posted by Bernhard Messer <bm...@apache.org>.

Kevin,

your idea sounds reasonable to me. Could you add a new patch to bugzilla 
and attach the diff to it. This ensures that the patch didn't get lost 
within the thousands of emails and if there is some time we will have a 
look on it.

thanks in advance
Bernhard

Kevin Oliver schrieb:

>While investigating some performance issues during queries, I stumbled upon a small issue in SegmentReader in regards to compound files. Specifically, the openNorms() method takes in the directory to use, but then has its own logic as to use that directory or the directory from its base class (IndexReader). When an index has many field infos, we have about 30, this logic about checking for files existing adds a significant overhead. Although this is a small inefficiency in a normal file system, our file system is mounted over nfs, and this check is relatively expensive. 
>
>The rest of the class doesn't do this sort of check for other files. By changing this code to work like the rest of the methods in the class (i.e. just using the passed in directory), things are a good bit quicker on my end. I don't see any issues with this patch. If people prefer, I can put this in bugzilla.
>
>367c367
><         // look first for re-written file, then in compound format
>---
>  
>
>>        norms.put(fi.name, new Norm(cfsDir.openFile(fileName), fi.number));
>>    
>>
>368,369d367
><         Directory d = directory().fileExists(fileName) ? directory() : cfsDir;
><         norms.put(fi.name, new Norm(d.openFile(fileName), fi.number));
>
>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
>
>  
>