You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Zheng Lin Edwin Yeo <ed...@gmail.com> on 2019/01/28 17:14:47 UTC

Number of segments in collection is more than what is set in TieredMergePolicyFactory

Hi,

We have the following TieredMergePolicyFactory configuration in our
solrconfig,xml

<mergePolicyFactory class="org.apache.solr.index.TieredMergePolicyFactory">
                  <int name="maxMergeAtOnce">10</int>
  <int name="maxMergeAtOnceExplicit">10</int>
                  <int name="segmentsPerTier">10</int>
  <int name="floorSegmentMB">10</int>
  <int name="maxMergedSegmentMB">5120</int>
  <double name="noCFSRatio">0.1</double>
  <int name="maxCFSSegmentSizeMB">2048</int>
  <double name="forceMergeDeletesPctAllowed">10.0</double>
        </mergePolicyFactory>

However, when we index data to the collection, the number of segments that
we are getting does not match what we configured.
For example, our collection size is 13.7 GB. With the above
TieredMergePolicyFactory configuration, we should expect to have 3 segments
(since 13.7 / 5 = 2.74, which rounds up to 3). But we are getting 24
segments in our collection, which we have attached the screenshot in the
link below.
https://drive.google.com/file/d/1hjIQVk_L2Bn9MYOmCdf2wKD_f_D2DNV6/view?usp=sharing

What could be the reason that it is not able to merge the segments to 3,
with each of the  segment size to be 5 GB?

Regards,
Edwin

Re: Number of segments in collection is more than what is set in TieredMergePolicyFactory

Posted by Zheng Lin Edwin Yeo <ed...@gmail.com>.
Hi Shawn,

Thank you for the explanation.

Regards,
Edwin

On Wed, 30 Jan 2019 at 15:18, Shawn Heisey <ap...@elyograg.org> wrote:

> On 1/28/2019 10:14 AM, Zheng Lin Edwin Yeo wrote:
> > We have the following TieredMergePolicyFactory configuration in our
> > solrconfig,xml
> >
> > <mergePolicyFactory
> class="org.apache.solr.index.TieredMergePolicyFactory">
> >                    <int name="maxMergeAtOnce">10</int>
> >    <int name="maxMergeAtOnceExplicit">10</int>
> >                    <int name="segmentsPerTier">10</int>
>
> These three settings are the really important ones.  Except for
> maxMergeAtOnceExplicit, you have these at the default settings.  The
> default for maxMergeAtOnceExplicit is 30 ... and you shouldn't lower it
> without a really good reason.  It mostly comes into play during an
> optimize ... when you lower it, optimizes may take longer than normal.
> It won't be able to merge as many segments at the same time, so the
> number of passes required to complete the optimize could increase.
>
> The most important setting here is segmentsPerTier ... this does not
> mean you will never have more than 10 total segments, it means that at
> each tier, Lucene will try to keep the number of segments below 10.
> With a large index, you are likely to have 3 or 4 tiers, possibly more.
>
> On an index where I spent a lot of time, my settings were, respective to
> yours, 35, 105, and 35.  I often had more than 100 segments in those
> indexes.  It was behaving correctly.
>
> > What could be the reason that it is not able to merge the segments to 3,
> > with each of the  segment size to be 5 GB?
>
> It is working as designed, just not as you expected.
>
> Thanks,
> Shawn
>

Re: Number of segments in collection is more than what is set in TieredMergePolicyFactory

Posted by Shawn Heisey <ap...@elyograg.org>.
On 1/28/2019 10:14 AM, Zheng Lin Edwin Yeo wrote:
> We have the following TieredMergePolicyFactory configuration in our
> solrconfig,xml
> 
> <mergePolicyFactory class="org.apache.solr.index.TieredMergePolicyFactory">
>                    <int name="maxMergeAtOnce">10</int>
>    <int name="maxMergeAtOnceExplicit">10</int>
>                    <int name="segmentsPerTier">10</int>

These three settings are the really important ones.  Except for 
maxMergeAtOnceExplicit, you have these at the default settings.  The 
default for maxMergeAtOnceExplicit is 30 ... and you shouldn't lower it 
without a really good reason.  It mostly comes into play during an 
optimize ... when you lower it, optimizes may take longer than normal. 
It won't be able to merge as many segments at the same time, so the 
number of passes required to complete the optimize could increase.

The most important setting here is segmentsPerTier ... this does not 
mean you will never have more than 10 total segments, it means that at 
each tier, Lucene will try to keep the number of segments below 10. 
With a large index, you are likely to have 3 or 4 tiers, possibly more.

On an index where I spent a lot of time, my settings were, respective to 
yours, 35, 105, and 35.  I often had more than 100 segments in those 
indexes.  It was behaving correctly.

> What could be the reason that it is not able to merge the segments to 3,
> with each of the  segment size to be 5 GB?

It is working as designed, just not as you expected.

Thanks,
Shawn

Re: Number of segments in collection is more than what is set in TieredMergePolicyFactory

Posted by Zheng Lin Edwin Yeo <ed...@gmail.com>.
Hi,

Anyone has any insights of this?

Thank you in advance.

Regards,
Edwin

On Tue, 29 Jan 2019 at 01:14, Zheng Lin Edwin Yeo <ed...@gmail.com>
wrote:

> Hi,
>
> We have the following TieredMergePolicyFactory configuration in our
> solrconfig,xml
>
> <mergePolicyFactory class="org.apache.solr.index.TieredMergePolicyFactory">
>                   <int name="maxMergeAtOnce">10</int>
>   <int name="maxMergeAtOnceExplicit">10</int>
>                   <int name="segmentsPerTier">10</int>
>   <int name="floorSegmentMB">10</int>
>   <int name="maxMergedSegmentMB">5120</int>
>   <double name="noCFSRatio">0.1</double>
>   <int name="maxCFSSegmentSizeMB">2048</int>
>   <double name="forceMergeDeletesPctAllowed">10.0</double>
>         </mergePolicyFactory>
>
> However, when we index data to the collection, the number of segments that
> we are getting does not match what we configured.
> For example, our collection size is 13.7 GB. With the above
> TieredMergePolicyFactory configuration, we should expect to have 3 segments
> (since 13.7 / 5 = 2.74, which rounds up to 3). But we are getting 24
> segments in our collection, which we have attached the screenshot in the
> link below.
>
> https://drive.google.com/file/d/1hjIQVk_L2Bn9MYOmCdf2wKD_f_D2DNV6/view?usp=sharing
>
> What could be the reason that it is not able to merge the segments to 3,
> with each of the  segment size to be 5 GB?
>
> Regards,
> Edwin
>
>
>
>