You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Simon Willnauer <si...@gmail.com> on 2013/06/13 19:27:21 UTC
CompoundFile handling in Solr after LUCENE-5038
in LUCENE-5038 I changed how MergePolicy decides when to use CFS. Yet,
think in solr land we need to check if everything is like users expect it.
We don't have setUseCompoundFile anymore but only use setNoCFSRatio to make
the decision. I think we need to reset noCFSRatio after setters are applied
in SolrIndexConfig#238 Can somebody with more solr knowledge gimme some
advice, I don't wanna leave is alone as it is but I am happy to follow up
to get a good default.
simon
Re: CompoundFile handling in Solr after LUCENE-5038
Posted by Chris Hostetter <ho...@fucit.org>.
Hey Simon, thanks for calling this out so it didn't slip through the
cracks...
: in LUCENE-5038 I changed how MergePolicy decides when to use CFS. Yet,
: think in solr land we need to check if everything is like users expect it.
I looked over all the commits in LUCENE-5038, and i *think* the way things
stand in Solr is (close to) the best possible outcome of the MP+CFS
changes.
: We don't have setUseCompoundFile anymore but only use setNoCFSRatio to make
: the decision. I think we need to reset noCFSRatio after setters are applied
: in SolrIndexConfig#238 Can somebody with more solr knowledge gimme some
I'm fairly certain we don't want to do that -- If we did, then any user
who explicitly configured "noCFSRatio" on TieredMergePolicy would have
that setting blown away by the (legacy) global useCompoundFile setting --
which in Solr defaults to false.
It's a little weird, because (as noted near the bottom of
TestMergePolicyConfig) in the past if you had a config that looked like
this...
<indexConfig>
<mergePolicy class="org.apache.lucene.index.TieredMergePolicy">
<int name="maxMergeAtOnceExplicit">19</int>
<int name="segmentsPerTier">9</int>
<double name="noCFSRatio">0.5</double>
</mergePolicy>
</indexConfig>
...that "noCFSRatio" setting was essentially ignored unless you also
explicitly set useCompoundFile (either on the <mergePolicy> or via the
global <useCompoundFile> syntax). With your changes as they stand,
TieredMergePolicy users should get less-suprising behavior with that
config.
The one hitch is what happens to existing users who might have configs
that look like this...
<indexConfig>
<mergePolicy class="org.apache.lucene.index.TieredMergePolicy">
<int name="maxMergeAtOnceExplicit">19</int>
<int name="segmentsPerTier">9</int>
<double name="noCFSRatio">0.5</double>
<bool name="useCompoundFile">true</bool>
</mergePolicy>
</indexConfig>
As things stand, a config like that will get a hard init failure at
runtime because there is no longer a TieredMergePolicy.setUseCompoundFile
(analogous to what a java dev would get at compile time).
My gut reaction to this is just to release note it, but since we are
already doing instanceof checks for TieredMergePolicy and LogMergePolicy
we might as well remove "useCompoundFile" from the
mergePolicyInfo.initArgs if we see it (and log an appropriate warning
about it being ignored and to use noCFSRatio as appropriate, etc...)
I'll open an issue for this and take care of it.
Incidently, while reviewing the changes in LUCENE-5038 i noticed a
possible perf improvement in MergePolicy.useCompoundFile ...
+ public boolean useCompoundFile(SegmentInfos infos, SegmentInfoPerCommit mergedInfo) throws IOException {
+ if (getNoCFSRatio() == 0.0) {
+ return false;
+ }
+ long mergedInfoSize = size(mergedInfo);
+ if (mergedInfoSize > maxCFSSegmentSize) {
+ return false;
+ }
+ if (getNoCFSRatio() >= 1.0) {
+ return true;
+ }
...why not move that "getNoCFSRatio() >= 1.0" check prior to the
"size(mergedInfo)" call?
-Hoss
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org