You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Summer Shire <sh...@gmail.com> on 2015/03/04 22:15:01 UTC

solr 4.7.2 mergeFactor/ Merge policy issue

Hi All,

I am using solr 4.7.2 is there a bug wrt merging the segments down ?

I recently added the following to my solrConfig.xml

  <indexConfig>
    <useCompoundFile>false</useCompoundFile>
    <ramBufferSizeMB>100</ramBufferSizeMB>
    <maxBufferedDocs>1000</maxBufferedDocs>
    <mergeFactor>5</mergeFactor>
  </indexConfig>


But I do not see any merging of the segments happening. I saw some other
people have
the same issue but there wasn’t much info. except one suggesting to use
<mergePolicy class="org.apache.lucene.index.TieredMergePolicy">
      <int name="maxMergeAtOnce">5</int>
      <int name="segmentsPerTier">5</int>
    </mergePolicy>
<mergeScheduler class="org.apache.lucene.index.ConcurrentMergeScheduler”/>

instead of mergeFactor.

Thanks,
Summer

Re: solr 4.7.2 mergeFactor/ Merge policy issue

Posted by Summer Shire <sh...@gmail.com>.

actually after every commit a new segment gets created. I don't see them
merging down.

what all could i do to debug this better. Hasn't anyone else tried to merge
their segments down to a specific range :) ?

On Wed, Mar 4, 2015 at 3:12 PM, Erick Erickson <er...@gmail.com>
wrote:

> I _think_, but don't know for sure, that the merging stuff doesn't get
> triggered until you commit, it doesn't "just happen".
>
> Shot in the dark...
>
> Erick
>
> On Wed, Mar 4, 2015 at 1:15 PM, Summer Shire <sh...@gmail.com>
> wrote:
> > Hi All,
> >
> > I am using solr 4.7.2 is there a bug wrt merging the segments down ?
> >
> > I recently added the following to my solrConfig.xml
> >
> >   <indexConfig>
> >     <useCompoundFile>false</useCompoundFile>
> >     <ramBufferSizeMB>100</ramBufferSizeMB>
> >     <maxBufferedDocs>1000</maxBufferedDocs>
> >     <mergeFactor>5</mergeFactor>
> >   </indexConfig>
> >
> >
> > But I do not see any merging of the segments happening. I saw some other
> > people have
> > the same issue but there wasn’t much info. except one suggesting to use
> > <mergePolicy class="org.apache.lucene.index.TieredMergePolicy">
> >       <int name="maxMergeAtOnce">5</int>
> >       <int name="segmentsPerTier">5</int>
> >     </mergePolicy>
> > <mergeScheduler
> class="org.apache.lucene.index.ConcurrentMergeScheduler”/>
> >
> > instead of mergeFactor.
> >
> > Thanks,
> > Summer
>

Re: solr 4.7.2 mergeFactor/ Merge policy issue

Posted by Dmitry Kan <so...@gmail.com>.

Hi,

I can confirm similar behaviour, but for solr 4.3.1. We use default values
for merge related settings. Even though mergeFactor=10
by default, there are 13 segments in one core and 30 segments in another. I
am not sure it proves there is a bug in the merging, because it depends on
the TieredMergePolicy. Relevant discussion from the past:
http://lucene.472066.n3.nabble.com/TieredMergePolicy-reclaimDeletesWeight-td4071487.html
Apart from other policy parameters you could play with ReclaimDeletesWeight,
in case you'd like to affect on merging the segments with deletes in them.
See
http://stackoverflow.com/questions/18361300/informations-about-tieredmergepolicy


Regarding your attachment: I believe it got cut by the mailing list system,
could you share it via a file sharing system?

On Sat, Mar 14, 2015 at 7:36 AM, Summer Shire <sh...@gmail.com> wrote:

> Hi All,
>
> Did anyone get a chance to look at my config and the InfoStream File ?
>
> I am very curious to see what you think
>
> thanks,
> Summer
>
> > On Mar 6, 2015, at 5:20 PM, Summer Shire <sh...@gmail.com> wrote:
> >
> > Hi All,
> >
> > Here’s more update on where I am at with this.
> > I enabled infoStream logging and quickly figured that I need to get rid
> of maxBufferedDocs. So Erick you
> > were absolutely right on that.
> > I increased my ramBufferSize to 100MB
> > and reduced maxMergeAtOnce to 3 and segmentsPerTier to 3 as well.
> > My config looks like this
> >
> > <indexConfig>
> >    <useCompoundFile>false</useCompoundFile>
> >    <ramBufferSizeMB>100</ramBufferSizeMB>
> >
> >
> <!--<maxMergeSizeForForcedMerge>9223372036854775807</maxMergeSizeForForcedMerge>-->
> >    <mergePolicy class="org.apache.lucene.index.TieredMergePolicy">
> >      <int name="maxMergeAtOnce">3</int>
> >      <int name="segmentsPerTier">3</int>
> >    </mergePolicy>
> >    <mergeScheduler
> class="org.apache.lucene.index.ConcurrentMergeScheduler"/>
> >    <infoStream file=“/tmp/INFOSTREAM.txt”>true</infoStream>
> >  </indexConfig>
> >
> > I am attaching a sample infostream log file.
> > In the infoStream logs though you an see how the segments keep on adding
> > and it shows (just an example )
> > allowedSegmentCount=10 vs count=9 (eligible count=9) tooBigCount=0
> >
> > I looked at TieredMergePolicy.java to see how allowedSegmentCount is
> getting calculated
> > // Compute max allowed segs in the index
> >    long levelSize = minSegmentBytes;
> >    long bytesLeft = totIndexBytes;
> >    double allowedSegCount = 0;
> >    while(true) {
> >      final double segCountLevel = bytesLeft / (double) levelSize;
> >      if (segCountLevel < segsPerTier) {
> >        allowedSegCount += Math.ceil(segCountLevel);
> >        break;
> >      }
> >      allowedSegCount += segsPerTier;
> >      bytesLeft -= segsPerTier * levelSize;
> >      levelSize *= maxMergeAtOnce;
> >    }
> >    int allowedSegCountInt = (int) allowedSegCount;
> > and the minSegmentBytes is calculated as follows
> > // Compute total index bytes & print details about the index
> >    long totIndexBytes = 0;
> >    long minSegmentBytes = Long.MAX_VALUE;
> >    for(SegmentInfoPerCommit info : infosSorted) {
> >      final long segBytes = size(info);
> >      if (verbose()) {
> >        String extra = merging.contains(info) ? " [merging]" : "";
> >        if (segBytes >= maxMergedSegmentBytes/2.0) {
> >          extra += " [skip: too large]";
> >        } else if (segBytes < floorSegmentBytes) {
> >          extra += " [floored]";
> >        }
> >        message("  seg=" + writer.get().segString(info) + " size=" +
> String.format(Locale.ROOT, "%.3f", segBytes/1024/1024.) + " MB" + extra);
> >      }
> >
> >      minSegmentBytes = Math.min(segBytes, minSegmentBytes);
> >      // Accum total byte size
> >      totIndexBytes += segBytes;
> >    }
> >
> >
> > any input is welcome.
> >
> > <myinfoLog.rtf>
> >
> >
> > thanks,
> > Summer
> >
> >
> >> On Mar 5, 2015, at 8:11 AM, Erick Erickson <er...@gmail.com>
> wrote:
> >>
> >> I would, BTW, either just get rid of the <maxBufferedDocs> all together
> or
> >> make it much higher, i.e. 100000. I don't think this is really your
> >> problem, but you're creating a lot of segments here.
> >>
> >> But I'm kind of at a loss as to what would be different about your
> setup.
> >> Is there _any_ chance that you have some secondary process looking at
> >> your index that's maintaining open searchers? Any custom code that's
> >> perhaps failing to close searchers? Is this a Unix or Windows system?
> >>
> >> And just to be really clear, you _only_ seeing more segments being
> >> added, right? If you're only counting files in the index directory, it's
> >> _possible_ that merging is happening, you're just seeing new files take
> >> the place of old ones.
> >>
> >> Best,
> >> Erick
> >>
> >> On Wed, Mar 4, 2015 at 7:12 PM, Shawn Heisey <ap...@elyograg.org>
> wrote:
> >>> On 3/4/2015 4:12 PM, Erick Erickson wrote:
> >>>> I _think_, but don't know for sure, that the merging stuff doesn't get
> >>>> triggered until you commit, it doesn't "just happen".
> >>>>
> >>>> Shot in the dark...
> >>>
> >>> I believe that new segments are created when the indexing buffer
> >>> (ramBufferSizeMB) fills up, even without commits.  I'm pretty sure that
> >>> anytime a new segment is created, the merge policy is checked to see
> >>> whether a merge is needed.
> >>>
> >>> Thanks,
> >>> Shawn
> >>>
> >
>
>


-- 
Dmitry Kan
Luke Toolbox: http://github.com/DmitryKey/luke
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan
SemanticAnalyzer: www.semanticanalyzer.info

Re: solr 4.7.2 mergeFactor/ Merge policy issue

Posted by Summer Shire <sh...@gmail.com>.

Hi All,

Did anyone get a chance to look at my config and the InfoStream File ?

I am very curious to see what you think

thanks,
Summer

> On Mar 6, 2015, at 5:20 PM, Summer Shire <sh...@gmail.com> wrote:
> 
> Hi All,
> 
> Here’s more update on where I am at with this.
> I enabled infoStream logging and quickly figured that I need to get rid of maxBufferedDocs. So Erick you 
> were absolutely right on that.
> I increased my ramBufferSize to 100MB
> and reduced maxMergeAtOnce to 3 and segmentsPerTier to 3 as well.
> My config looks like this 
> 
> <indexConfig>
>    <useCompoundFile>false</useCompoundFile>
>    <ramBufferSizeMB>100</ramBufferSizeMB>
> 
>    <!--<maxMergeSizeForForcedMerge>9223372036854775807</maxMergeSizeForForcedMerge>-->
>    <mergePolicy class="org.apache.lucene.index.TieredMergePolicy">
>      <int name="maxMergeAtOnce">3</int>
>      <int name="segmentsPerTier">3</int>
>    </mergePolicy>
>    <mergeScheduler class="org.apache.lucene.index.ConcurrentMergeScheduler"/>
>    <infoStream file=“/tmp/INFOSTREAM.txt”>true</infoStream>
>  </indexConfig>
> 
> I am attaching a sample infostream log file.
> In the infoStream logs though you an see how the segments keep on adding
> and it shows (just an example )
> allowedSegmentCount=10 vs count=9 (eligible count=9) tooBigCount=0
> 
> I looked at TieredMergePolicy.java to see how allowedSegmentCount is getting calculated
> // Compute max allowed segs in the index
>    long levelSize = minSegmentBytes;
>    long bytesLeft = totIndexBytes;
>    double allowedSegCount = 0;
>    while(true) {
>      final double segCountLevel = bytesLeft / (double) levelSize;
>      if (segCountLevel < segsPerTier) {
>        allowedSegCount += Math.ceil(segCountLevel);
>        break;
>      }
>      allowedSegCount += segsPerTier;
>      bytesLeft -= segsPerTier * levelSize;
>      levelSize *= maxMergeAtOnce;
>    }
>    int allowedSegCountInt = (int) allowedSegCount;
> and the minSegmentBytes is calculated as follows
> // Compute total index bytes & print details about the index
>    long totIndexBytes = 0;
>    long minSegmentBytes = Long.MAX_VALUE;
>    for(SegmentInfoPerCommit info : infosSorted) {
>      final long segBytes = size(info);
>      if (verbose()) {
>        String extra = merging.contains(info) ? " [merging]" : "";
>        if (segBytes >= maxMergedSegmentBytes/2.0) {
>          extra += " [skip: too large]";
>        } else if (segBytes < floorSegmentBytes) {
>          extra += " [floored]";
>        }
>        message("  seg=" + writer.get().segString(info) + " size=" + String.format(Locale.ROOT, "%.3f", segBytes/1024/1024.) + " MB" + extra);
>      }
> 
>      minSegmentBytes = Math.min(segBytes, minSegmentBytes);
>      // Accum total byte size
>      totIndexBytes += segBytes;
>    }
> 
> 
> any input is welcome. 
> 
> <myinfoLog.rtf>
> 
> 
> thanks,
> Summer
> 
> 
>> On Mar 5, 2015, at 8:11 AM, Erick Erickson <er...@gmail.com> wrote:
>> 
>> I would, BTW, either just get rid of the <maxBufferedDocs> all together or
>> make it much higher, i.e. 100000. I don't think this is really your
>> problem, but you're creating a lot of segments here.
>> 
>> But I'm kind of at a loss as to what would be different about your setup.
>> Is there _any_ chance that you have some secondary process looking at
>> your index that's maintaining open searchers? Any custom code that's
>> perhaps failing to close searchers? Is this a Unix or Windows system?
>> 
>> And just to be really clear, you _only_ seeing more segments being
>> added, right? If you're only counting files in the index directory, it's
>> _possible_ that merging is happening, you're just seeing new files take
>> the place of old ones.
>> 
>> Best,
>> Erick
>> 
>> On Wed, Mar 4, 2015 at 7:12 PM, Shawn Heisey <ap...@elyograg.org> wrote:
>>> On 3/4/2015 4:12 PM, Erick Erickson wrote:
>>>> I _think_, but don't know for sure, that the merging stuff doesn't get
>>>> triggered until you commit, it doesn't "just happen".
>>>> 
>>>> Shot in the dark...
>>> 
>>> I believe that new segments are created when the indexing buffer
>>> (ramBufferSizeMB) fills up, even without commits.  I'm pretty sure that
>>> anytime a new segment is created, the merge policy is checked to see
>>> whether a merge is needed.
>>> 
>>> Thanks,
>>> Shawn
>>> 
>

Re: solr 4.7.2 mergeFactor/ Merge policy issue

Posted by Summer Shire <sh...@gmail.com>.

Hi All,

Here’s more update on where I am at with this.
I enabled infoStream logging and quickly figured that I need to get rid of maxBufferedDocs. So Erick you 
were absolutely right on that.
I increased my ramBufferSize to 100MB
and reduced maxMergeAtOnce to 3 and segmentsPerTier to 3 as well.
My config looks like this 

<indexConfig>
    <useCompoundFile>false</useCompoundFile>
    <ramBufferSizeMB>100</ramBufferSizeMB>

    <!--<maxMergeSizeForForcedMerge>9223372036854775807</maxMergeSizeForForcedMerge>-->
    <mergePolicy class="org.apache.lucene.index.TieredMergePolicy">
      <int name="maxMergeAtOnce">3</int>
      <int name="segmentsPerTier">3</int>
    </mergePolicy>
    <mergeScheduler class="org.apache.lucene.index.ConcurrentMergeScheduler"/>
    <infoStream file=“/tmp/INFOSTREAM.txt”>true</infoStream>
  </indexConfig>

I am attaching a sample infostream log file.
In the infoStream logs though you an see how the segments keep on adding
and it shows (just an example )
allowedSegmentCount=10 vs count=9 (eligible count=9) tooBigCount=0

I looked at TieredMergePolicy.java to see how allowedSegmentCount is getting calculated
// Compute max allowed segs in the index
    long levelSize = minSegmentBytes;
    long bytesLeft = totIndexBytes;
    double allowedSegCount = 0;
    while(true) {
      final double segCountLevel = bytesLeft / (double) levelSize;
      if (segCountLevel < segsPerTier) {
        allowedSegCount += Math.ceil(segCountLevel);
        break;
      }
      allowedSegCount += segsPerTier;
      bytesLeft -= segsPerTier * levelSize;
      levelSize *= maxMergeAtOnce;
    }
    int allowedSegCountInt = (int) allowedSegCount;
and the minSegmentBytes is calculated as follows
 // Compute total index bytes & print details about the index
    long totIndexBytes = 0;
    long minSegmentBytes = Long.MAX_VALUE;
    for(SegmentInfoPerCommit info : infosSorted) {
      final long segBytes = size(info);
      if (verbose()) {
        String extra = merging.contains(info) ? " [merging]" : "";
        if (segBytes >= maxMergedSegmentBytes/2.0) {
          extra += " [skip: too large]";
        } else if (segBytes < floorSegmentBytes) {
          extra += " [floored]";
        }
        message("  seg=" + writer.get().segString(info) + " size=" + String.format(Locale.ROOT, "%.3f", segBytes/1024/1024.) + " MB" + extra);
      }

      minSegmentBytes = Math.min(segBytes, minSegmentBytes);
      // Accum total byte size
      totIndexBytes += segBytes;
    }


any input is welcome.

Re: solr 4.7.2 mergeFactor/ Merge policy issue

Posted by Erick Erickson <er...@gmail.com>.

I would, BTW, either just get rid of the <maxBufferedDocs> all together or
make it much higher, i.e. 100000. I don't think this is really your
problem, but you're creating a lot of segments here.

But I'm kind of at a loss as to what would be different about your setup.
Is there _any_ chance that you have some secondary process looking at
your index that's maintaining open searchers? Any custom code that's
perhaps failing to close searchers? Is this a Unix or Windows system?

And just to be really clear, you _only_ seeing more segments being
added, right? If you're only counting files in the index directory, it's
_possible_ that merging is happening, you're just seeing new files take
the place of old ones.

Best,
Erick

On Wed, Mar 4, 2015 at 7:12 PM, Shawn Heisey <ap...@elyograg.org> wrote:
> On 3/4/2015 4:12 PM, Erick Erickson wrote:
>> I _think_, but don't know for sure, that the merging stuff doesn't get
>> triggered until you commit, it doesn't "just happen".
>>
>> Shot in the dark...
>
> I believe that new segments are created when the indexing buffer
> (ramBufferSizeMB) fills up, even without commits.  I'm pretty sure that
> anytime a new segment is created, the merge policy is checked to see
> whether a merge is needed.
>
> Thanks,
> Shawn
>

Re: solr 4.7.2 mergeFactor/ Merge policy issue

Posted by Shawn Heisey <ap...@elyograg.org>.

On 3/4/2015 4:12 PM, Erick Erickson wrote:
> I _think_, but don't know for sure, that the merging stuff doesn't get
> triggered until you commit, it doesn't "just happen".
> 
> Shot in the dark...

I believe that new segments are created when the indexing buffer
(ramBufferSizeMB) fills up, even without commits.  I'm pretty sure that
anytime a new segment is created, the merge policy is checked to see
whether a merge is needed.

Thanks,
Shawn

Re: solr 4.7.2 mergeFactor/ Merge policy issue

Posted by Erick Erickson <er...@gmail.com>.

I _think_, but don't know for sure, that the merging stuff doesn't get
triggered until you commit, it doesn't "just happen".

Shot in the dark...

Erick

On Wed, Mar 4, 2015 at 1:15 PM, Summer Shire <sh...@gmail.com> wrote:
> Hi All,
>
> I am using solr 4.7.2 is there a bug wrt merging the segments down ?
>
> I recently added the following to my solrConfig.xml
>
>   <indexConfig>
>     <useCompoundFile>false</useCompoundFile>
>     <ramBufferSizeMB>100</ramBufferSizeMB>
>     <maxBufferedDocs>1000</maxBufferedDocs>
>     <mergeFactor>5</mergeFactor>
>   </indexConfig>
>
>
> But I do not see any merging of the segments happening. I saw some other
> people have
> the same issue but there wasn’t much info. except one suggesting to use
> <mergePolicy class="org.apache.lucene.index.TieredMergePolicy">
>       <int name="maxMergeAtOnce">5</int>
>       <int name="segmentsPerTier">5</int>
>     </mergePolicy>
> <mergeScheduler class="org.apache.lucene.index.ConcurrentMergeScheduler”/>
>
> instead of mergeFactor.
>
> Thanks,
> Summer