You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Summer Shire <sh...@gmail.com> on 2015/03/04 22:15:01 UTC
solr 4.7.2 mergeFactor/ Merge policy issue
Hi All,
I am using solr 4.7.2 is there a bug wrt merging the segments down ?
I recently added the following to my solrConfig.xml
<indexConfig>
<useCompoundFile>false</useCompoundFile>
<ramBufferSizeMB>100</ramBufferSizeMB>
<maxBufferedDocs>1000</maxBufferedDocs>
<mergeFactor>5</mergeFactor>
</indexConfig>
But I do not see any merging of the segments happening. I saw some other
people have
the same issue but there wasn’t much info. except one suggesting to use
<mergePolicy class="org.apache.lucene.index.TieredMergePolicy">
<int name="maxMergeAtOnce">5</int>
<int name="segmentsPerTier">5</int>
</mergePolicy>
<mergeScheduler class="org.apache.lucene.index.ConcurrentMergeScheduler”/>
instead of mergeFactor.
Thanks,
Summer
Re: solr 4.7.2 mergeFactor/ Merge policy issue
Posted by Summer Shire <sh...@gmail.com>.
actually after every commit a new segment gets created. I don't see them
merging down.
what all could i do to debug this better. Hasn't anyone else tried to merge
their segments down to a specific range :) ?
On Wed, Mar 4, 2015 at 3:12 PM, Erick Erickson <er...@gmail.com>
wrote:
> I _think_, but don't know for sure, that the merging stuff doesn't get
> triggered until you commit, it doesn't "just happen".
>
> Shot in the dark...
>
> Erick
>
> On Wed, Mar 4, 2015 at 1:15 PM, Summer Shire <sh...@gmail.com>
> wrote:
> > Hi All,
> >
> > I am using solr 4.7.2 is there a bug wrt merging the segments down ?
> >
> > I recently added the following to my solrConfig.xml
> >
> > <indexConfig>
> > <useCompoundFile>false</useCompoundFile>
> > <ramBufferSizeMB>100</ramBufferSizeMB>
> > <maxBufferedDocs>1000</maxBufferedDocs>
> > <mergeFactor>5</mergeFactor>
> > </indexConfig>
> >
> >
> > But I do not see any merging of the segments happening. I saw some other
> > people have
> > the same issue but there wasn’t much info. except one suggesting to use
> > <mergePolicy class="org.apache.lucene.index.TieredMergePolicy">
> > <int name="maxMergeAtOnce">5</int>
> > <int name="segmentsPerTier">5</int>
> > </mergePolicy>
> > <mergeScheduler
> class="org.apache.lucene.index.ConcurrentMergeScheduler”/>
> >
> > instead of mergeFactor.
> >
> > Thanks,
> > Summer
>
Re: solr 4.7.2 mergeFactor/ Merge policy issue
Posted by Dmitry Kan <so...@gmail.com>.
Hi,
I can confirm similar behaviour, but for solr 4.3.1. We use default values
for merge related settings. Even though mergeFactor=10
by default, there are 13 segments in one core and 30 segments in another. I
am not sure it proves there is a bug in the merging, because it depends on
the TieredMergePolicy. Relevant discussion from the past:
http://lucene.472066.n3.nabble.com/TieredMergePolicy-reclaimDeletesWeight-td4071487.html
Apart from other policy parameters you could play with ReclaimDeletesWeight,
in case you'd like to affect on merging the segments with deletes in them.
See
http://stackoverflow.com/questions/18361300/informations-about-tieredmergepolicy
Regarding your attachment: I believe it got cut by the mailing list system,
could you share it via a file sharing system?
On Sat, Mar 14, 2015 at 7:36 AM, Summer Shire <sh...@gmail.com> wrote:
> Hi All,
>
> Did anyone get a chance to look at my config and the InfoStream File ?
>
> I am very curious to see what you think
>
> thanks,
> Summer
>
> > On Mar 6, 2015, at 5:20 PM, Summer Shire <sh...@gmail.com> wrote:
> >
> > Hi All,
> >
> > Here’s more update on where I am at with this.
> > I enabled infoStream logging and quickly figured that I need to get rid
> of maxBufferedDocs. So Erick you
> > were absolutely right on that.
> > I increased my ramBufferSize to 100MB
> > and reduced maxMergeAtOnce to 3 and segmentsPerTier to 3 as well.
> > My config looks like this
> >
> > <indexConfig>
> > <useCompoundFile>false</useCompoundFile>
> > <ramBufferSizeMB>100</ramBufferSizeMB>
> >
> >
> <!--<maxMergeSizeForForcedMerge>9223372036854775807</maxMergeSizeForForcedMerge>-->
> > <mergePolicy class="org.apache.lucene.index.TieredMergePolicy">
> > <int name="maxMergeAtOnce">3</int>
> > <int name="segmentsPerTier">3</int>
> > </mergePolicy>
> > <mergeScheduler
> class="org.apache.lucene.index.ConcurrentMergeScheduler"/>
> > <infoStream file=“/tmp/INFOSTREAM.txt”>true</infoStream>
> > </indexConfig>
> >
> > I am attaching a sample infostream log file.
> > In the infoStream logs though you an see how the segments keep on adding
> > and it shows (just an example )
> > allowedSegmentCount=10 vs count=9 (eligible count=9) tooBigCount=0
> >
> > I looked at TieredMergePolicy.java to see how allowedSegmentCount is
> getting calculated
> > // Compute max allowed segs in the index
> > long levelSize = minSegmentBytes;
> > long bytesLeft = totIndexBytes;
> > double allowedSegCount = 0;
> > while(true) {
> > final double segCountLevel = bytesLeft / (double) levelSize;
> > if (segCountLevel < segsPerTier) {
> > allowedSegCount += Math.ceil(segCountLevel);
> > break;
> > }
> > allowedSegCount += segsPerTier;
> > bytesLeft -= segsPerTier * levelSize;
> > levelSize *= maxMergeAtOnce;
> > }
> > int allowedSegCountInt = (int) allowedSegCount;
> > and the minSegmentBytes is calculated as follows
> > // Compute total index bytes & print details about the index
> > long totIndexBytes = 0;
> > long minSegmentBytes = Long.MAX_VALUE;
> > for(SegmentInfoPerCommit info : infosSorted) {
> > final long segBytes = size(info);
> > if (verbose()) {
> > String extra = merging.contains(info) ? " [merging]" : "";
> > if (segBytes >= maxMergedSegmentBytes/2.0) {
> > extra += " [skip: too large]";
> > } else if (segBytes < floorSegmentBytes) {
> > extra += " [floored]";
> > }
> > message(" seg=" + writer.get().segString(info) + " size=" +
> String.format(Locale.ROOT, "%.3f", segBytes/1024/1024.) + " MB" + extra);
> > }
> >
> > minSegmentBytes = Math.min(segBytes, minSegmentBytes);
> > // Accum total byte size
> > totIndexBytes += segBytes;
> > }
> >
> >
> > any input is welcome.
> >
> > <myinfoLog.rtf>
> >
> >
> > thanks,
> > Summer
> >
> >
> >> On Mar 5, 2015, at 8:11 AM, Erick Erickson <er...@gmail.com>
> wrote:
> >>
> >> I would, BTW, either just get rid of the <maxBufferedDocs> all together
> or
> >> make it much higher, i.e. 100000. I don't think this is really your
> >> problem, but you're creating a lot of segments here.
> >>
> >> But I'm kind of at a loss as to what would be different about your
> setup.
> >> Is there _any_ chance that you have some secondary process looking at
> >> your index that's maintaining open searchers? Any custom code that's
> >> perhaps failing to close searchers? Is this a Unix or Windows system?
> >>
> >> And just to be really clear, you _only_ seeing more segments being
> >> added, right? If you're only counting files in the index directory, it's
> >> _possible_ that merging is happening, you're just seeing new files take
> >> the place of old ones.
> >>
> >> Best,
> >> Erick
> >>
> >> On Wed, Mar 4, 2015 at 7:12 PM, Shawn Heisey <ap...@elyograg.org>
> wrote:
> >>> On 3/4/2015 4:12 PM, Erick Erickson wrote:
> >>>> I _think_, but don't know for sure, that the merging stuff doesn't get
> >>>> triggered until you commit, it doesn't "just happen".
> >>>>
> >>>> Shot in the dark...
> >>>
> >>> I believe that new segments are created when the indexing buffer
> >>> (ramBufferSizeMB) fills up, even without commits. I'm pretty sure that
> >>> anytime a new segment is created, the merge policy is checked to see
> >>> whether a merge is needed.
> >>>
> >>> Thanks,
> >>> Shawn
> >>>
> >
>
>
--
Dmitry Kan
Luke Toolbox: http://github.com/DmitryKey/luke
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan
SemanticAnalyzer: www.semanticanalyzer.info
Re: solr 4.7.2 mergeFactor/ Merge policy issue
Posted by Summer Shire <sh...@gmail.com>.
Hi All,
Did anyone get a chance to look at my config and the InfoStream File ?
I am very curious to see what you think
thanks,
Summer
> On Mar 6, 2015, at 5:20 PM, Summer Shire <sh...@gmail.com> wrote:
>
> Hi All,
>
> Here’s more update on where I am at with this.
> I enabled infoStream logging and quickly figured that I need to get rid of maxBufferedDocs. So Erick you
> were absolutely right on that.
> I increased my ramBufferSize to 100MB
> and reduced maxMergeAtOnce to 3 and segmentsPerTier to 3 as well.
> My config looks like this
>
> <indexConfig>
> <useCompoundFile>false</useCompoundFile>
> <ramBufferSizeMB>100</ramBufferSizeMB>
>
> <!--<maxMergeSizeForForcedMerge>9223372036854775807</maxMergeSizeForForcedMerge>-->
> <mergePolicy class="org.apache.lucene.index.TieredMergePolicy">
> <int name="maxMergeAtOnce">3</int>
> <int name="segmentsPerTier">3</int>
> </mergePolicy>
> <mergeScheduler class="org.apache.lucene.index.ConcurrentMergeScheduler"/>
> <infoStream file=“/tmp/INFOSTREAM.txt”>true</infoStream>
> </indexConfig>
>
> I am attaching a sample infostream log file.
> In the infoStream logs though you an see how the segments keep on adding
> and it shows (just an example )
> allowedSegmentCount=10 vs count=9 (eligible count=9) tooBigCount=0
>
> I looked at TieredMergePolicy.java to see how allowedSegmentCount is getting calculated
> // Compute max allowed segs in the index
> long levelSize = minSegmentBytes;
> long bytesLeft = totIndexBytes;
> double allowedSegCount = 0;
> while(true) {
> final double segCountLevel = bytesLeft / (double) levelSize;
> if (segCountLevel < segsPerTier) {
> allowedSegCount += Math.ceil(segCountLevel);
> break;
> }
> allowedSegCount += segsPerTier;
> bytesLeft -= segsPerTier * levelSize;
> levelSize *= maxMergeAtOnce;
> }
> int allowedSegCountInt = (int) allowedSegCount;
> and the minSegmentBytes is calculated as follows
> // Compute total index bytes & print details about the index
> long totIndexBytes = 0;
> long minSegmentBytes = Long.MAX_VALUE;
> for(SegmentInfoPerCommit info : infosSorted) {
> final long segBytes = size(info);
> if (verbose()) {
> String extra = merging.contains(info) ? " [merging]" : "";
> if (segBytes >= maxMergedSegmentBytes/2.0) {
> extra += " [skip: too large]";
> } else if (segBytes < floorSegmentBytes) {
> extra += " [floored]";
> }
> message(" seg=" + writer.get().segString(info) + " size=" + String.format(Locale.ROOT, "%.3f", segBytes/1024/1024.) + " MB" + extra);
> }
>
> minSegmentBytes = Math.min(segBytes, minSegmentBytes);
> // Accum total byte size
> totIndexBytes += segBytes;
> }
>
>
> any input is welcome.
>
> <myinfoLog.rtf>
>
>
> thanks,
> Summer
>
>
>> On Mar 5, 2015, at 8:11 AM, Erick Erickson <er...@gmail.com> wrote:
>>
>> I would, BTW, either just get rid of the <maxBufferedDocs> all together or
>> make it much higher, i.e. 100000. I don't think this is really your
>> problem, but you're creating a lot of segments here.
>>
>> But I'm kind of at a loss as to what would be different about your setup.
>> Is there _any_ chance that you have some secondary process looking at
>> your index that's maintaining open searchers? Any custom code that's
>> perhaps failing to close searchers? Is this a Unix or Windows system?
>>
>> And just to be really clear, you _only_ seeing more segments being
>> added, right? If you're only counting files in the index directory, it's
>> _possible_ that merging is happening, you're just seeing new files take
>> the place of old ones.
>>
>> Best,
>> Erick
>>
>> On Wed, Mar 4, 2015 at 7:12 PM, Shawn Heisey <ap...@elyograg.org> wrote:
>>> On 3/4/2015 4:12 PM, Erick Erickson wrote:
>>>> I _think_, but don't know for sure, that the merging stuff doesn't get
>>>> triggered until you commit, it doesn't "just happen".
>>>>
>>>> Shot in the dark...
>>>
>>> I believe that new segments are created when the indexing buffer
>>> (ramBufferSizeMB) fills up, even without commits. I'm pretty sure that
>>> anytime a new segment is created, the merge policy is checked to see
>>> whether a merge is needed.
>>>
>>> Thanks,
>>> Shawn
>>>
>
Re: solr 4.7.2 mergeFactor/ Merge policy issue
Posted by Summer Shire <sh...@gmail.com>.
Hi All,
Here’s more update on where I am at with this.
I enabled infoStream logging and quickly figured that I need to get rid of maxBufferedDocs. So Erick you
were absolutely right on that.
I increased my ramBufferSize to 100MB
and reduced maxMergeAtOnce to 3 and segmentsPerTier to 3 as well.
My config looks like this
<indexConfig>
<useCompoundFile>false</useCompoundFile>
<ramBufferSizeMB>100</ramBufferSizeMB>
<!--<maxMergeSizeForForcedMerge>9223372036854775807</maxMergeSizeForForcedMerge>-->
<mergePolicy class="org.apache.lucene.index.TieredMergePolicy">
<int name="maxMergeAtOnce">3</int>
<int name="segmentsPerTier">3</int>
</mergePolicy>
<mergeScheduler class="org.apache.lucene.index.ConcurrentMergeScheduler"/>
<infoStream file=“/tmp/INFOSTREAM.txt”>true</infoStream>
</indexConfig>
I am attaching a sample infostream log file.
In the infoStream logs though you an see how the segments keep on adding
and it shows (just an example )
allowedSegmentCount=10 vs count=9 (eligible count=9) tooBigCount=0
I looked at TieredMergePolicy.java to see how allowedSegmentCount is getting calculated
// Compute max allowed segs in the index
long levelSize = minSegmentBytes;
long bytesLeft = totIndexBytes;
double allowedSegCount = 0;
while(true) {
final double segCountLevel = bytesLeft / (double) levelSize;
if (segCountLevel < segsPerTier) {
allowedSegCount += Math.ceil(segCountLevel);
break;
}
allowedSegCount += segsPerTier;
bytesLeft -= segsPerTier * levelSize;
levelSize *= maxMergeAtOnce;
}
int allowedSegCountInt = (int) allowedSegCount;
and the minSegmentBytes is calculated as follows
// Compute total index bytes & print details about the index
long totIndexBytes = 0;
long minSegmentBytes = Long.MAX_VALUE;
for(SegmentInfoPerCommit info : infosSorted) {
final long segBytes = size(info);
if (verbose()) {
String extra = merging.contains(info) ? " [merging]" : "";
if (segBytes >= maxMergedSegmentBytes/2.0) {
extra += " [skip: too large]";
} else if (segBytes < floorSegmentBytes) {
extra += " [floored]";
}
message(" seg=" + writer.get().segString(info) + " size=" + String.format(Locale.ROOT, "%.3f", segBytes/1024/1024.) + " MB" + extra);
}
minSegmentBytes = Math.min(segBytes, minSegmentBytes);
// Accum total byte size
totIndexBytes += segBytes;
}
any input is welcome.
Re: solr 4.7.2 mergeFactor/ Merge policy issue
Posted by Erick Erickson <er...@gmail.com>.
I would, BTW, either just get rid of the <maxBufferedDocs> all together or
make it much higher, i.e. 100000. I don't think this is really your
problem, but you're creating a lot of segments here.
But I'm kind of at a loss as to what would be different about your setup.
Is there _any_ chance that you have some secondary process looking at
your index that's maintaining open searchers? Any custom code that's
perhaps failing to close searchers? Is this a Unix or Windows system?
And just to be really clear, you _only_ seeing more segments being
added, right? If you're only counting files in the index directory, it's
_possible_ that merging is happening, you're just seeing new files take
the place of old ones.
Best,
Erick
On Wed, Mar 4, 2015 at 7:12 PM, Shawn Heisey <ap...@elyograg.org> wrote:
> On 3/4/2015 4:12 PM, Erick Erickson wrote:
>> I _think_, but don't know for sure, that the merging stuff doesn't get
>> triggered until you commit, it doesn't "just happen".
>>
>> Shot in the dark...
>
> I believe that new segments are created when the indexing buffer
> (ramBufferSizeMB) fills up, even without commits. I'm pretty sure that
> anytime a new segment is created, the merge policy is checked to see
> whether a merge is needed.
>
> Thanks,
> Shawn
>
Re: solr 4.7.2 mergeFactor/ Merge policy issue
Posted by Shawn Heisey <ap...@elyograg.org>.
On 3/4/2015 4:12 PM, Erick Erickson wrote:
> I _think_, but don't know for sure, that the merging stuff doesn't get
> triggered until you commit, it doesn't "just happen".
>
> Shot in the dark...
I believe that new segments are created when the indexing buffer
(ramBufferSizeMB) fills up, even without commits. I'm pretty sure that
anytime a new segment is created, the merge policy is checked to see
whether a merge is needed.
Thanks,
Shawn
Re: solr 4.7.2 mergeFactor/ Merge policy issue
Posted by Erick Erickson <er...@gmail.com>.
I _think_, but don't know for sure, that the merging stuff doesn't get
triggered until you commit, it doesn't "just happen".
Shot in the dark...
Erick
On Wed, Mar 4, 2015 at 1:15 PM, Summer Shire <sh...@gmail.com> wrote:
> Hi All,
>
> I am using solr 4.7.2 is there a bug wrt merging the segments down ?
>
> I recently added the following to my solrConfig.xml
>
> <indexConfig>
> <useCompoundFile>false</useCompoundFile>
> <ramBufferSizeMB>100</ramBufferSizeMB>
> <maxBufferedDocs>1000</maxBufferedDocs>
> <mergeFactor>5</mergeFactor>
> </indexConfig>
>
>
> But I do not see any merging of the segments happening. I saw some other
> people have
> the same issue but there wasn’t much info. except one suggesting to use
> <mergePolicy class="org.apache.lucene.index.TieredMergePolicy">
> <int name="maxMergeAtOnce">5</int>
> <int name="segmentsPerTier">5</int>
> </mergePolicy>
> <mergeScheduler class="org.apache.lucene.index.ConcurrentMergeScheduler”/>
>
> instead of mergeFactor.
>
> Thanks,
> Summer