You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Simon Willnauer <si...@gmail.com> on 2013/06/13 19:27:21 UTC

CompoundFile handling in Solr after LUCENE-5038

in LUCENE-5038 I changed how MergePolicy decides when to use CFS. Yet,
think in solr land we need to check if everything is like users expect it.
We don't have setUseCompoundFile anymore but only use setNoCFSRatio to make
the decision. I think we need to reset noCFSRatio after setters are applied
in SolrIndexConfig#238 Can somebody with more solr knowledge gimme some
advice, I don't wanna leave is alone as it is but I am happy to follow up
to get a good default.

simon

Re: CompoundFile handling in Solr after LUCENE-5038

Posted by Chris Hostetter <ho...@fucit.org>.
Hey Simon, thanks for calling this out so it didn't slip through the 
cracks...

: in LUCENE-5038 I changed how MergePolicy decides when to use CFS. Yet,
: think in solr land we need to check if everything is like users expect it.

I looked over all the commits in LUCENE-5038, and i *think* the way things 
stand in Solr is (close to) the best possible outcome of the MP+CFS 
changes.

: We don't have setUseCompoundFile anymore but only use setNoCFSRatio to make
: the decision. I think we need to reset noCFSRatio after setters are applied
: in SolrIndexConfig#238 Can somebody with more solr knowledge gimme some

I'm fairly certain we don't want to do that -- If we did, then any user 
who explicitly configured "noCFSRatio" on TieredMergePolicy would have 
that setting blown away by the (legacy) global useCompoundFile setting -- 
which in Solr defaults to false.

It's a little weird, because (as noted near the bottom of 
TestMergePolicyConfig) in the past if you had a config that looked like 
this...

  <indexConfig>
    <mergePolicy class="org.apache.lucene.index.TieredMergePolicy">
      <int name="maxMergeAtOnceExplicit">19</int>
      <int name="segmentsPerTier">9</int>
      <double name="noCFSRatio">0.5</double>
    </mergePolicy>
  </indexConfig>

...that "noCFSRatio" setting was essentially ignored unless you also 
explicitly set useCompoundFile (either on the <mergePolicy> or via the 
global <useCompoundFile> syntax).  With your changes as they stand, 
TieredMergePolicy users should get less-suprising behavior with that 
config.

The one hitch is what happens to existing users who might have configs 
that look like this...

  <indexConfig>
    <mergePolicy class="org.apache.lucene.index.TieredMergePolicy">
      <int name="maxMergeAtOnceExplicit">19</int>
      <int name="segmentsPerTier">9</int>
      <double name="noCFSRatio">0.5</double>
      <bool name="useCompoundFile">true</bool>
    </mergePolicy>
  </indexConfig>


As things stand, a config like that will get a hard init failure at 
runtime because there is no longer a TieredMergePolicy.setUseCompoundFile 
(analogous to what a java dev would get at compile time).

My gut reaction to this is just to release note it, but since we are 
already doing instanceof checks for TieredMergePolicy and LogMergePolicy 
we might as well remove "useCompoundFile" from the 
mergePolicyInfo.initArgs if we see it (and log an appropriate warning 
about it being ignored and to use noCFSRatio as appropriate, etc...)

I'll open an issue for this and take care of it.


Incidently, while reviewing the changes in LUCENE-5038 i noticed a 
possible perf improvement in MergePolicy.useCompoundFile ...

+  public boolean useCompoundFile(SegmentInfos infos, SegmentInfoPerCommit mergedInfo) throws IOException {
+    if (getNoCFSRatio() == 0.0) {
+      return false;
+    }
+    long mergedInfoSize = size(mergedInfo);
+    if (mergedInfoSize > maxCFSSegmentSize) {
+      return false;
+    }
+    if (getNoCFSRatio() >= 1.0) {
+      return true;
+    }

...why not move that "getNoCFSRatio() >= 1.0" check prior to the 
"size(mergedInfo)" call?




-Hoss

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org