You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Zheng Lin Edwin Yeo <ed...@gmail.com> on 2016/01/14 02:44:32 UTC

Setting of ramBufferSizeMB

Hi,

I would like to check, if I have make the following settings for
ramBufferSizeMB, and I am using TieredMergePolicy, am I supposed to get
each segment size of at least 320MB?


    <!-- ramBufferSizeMB sets the amount of RAM that may be used by Lucene
         indexing for buffering added documents and deletions before they are
         flushed to the Directory.
         maxBufferedDocs sets a limit on the number of documents buffered
         before flushing.
         If both ramBufferSizeMB and maxBufferedDocs is set, then
         Lucene will flush based on whichever limit is hit first.
         The default is 100 MB.  -->
    	<ramBufferSizeMB>320</ramBufferSizeMB>
    	<!--<maxBufferedDocs>1000</maxBufferedDocs>-->


        <mergePolicy class="org.apache.lucene.index.TieredMergePolicy">
          <int name="maxMergeAtOnce">10</int>
          <int name="segmentsPerTier">10</int>
	  <double name="maxMergedSegmentMB">10240</double>
        </mergePolicy>


I have this setting in my solrconfig.xml, but when I checked my segments
size under the Segments info screen on the Admin UI, I see quite a number
of segments at the bottom which have size that are much smaller than 320MB.
Is that the correct behaviour, or is my ramBufferSizeMB not working
correctly?

I am using Solr 5.4.0,


Regards,
Edwin

Re: Setting of ramBufferSizeMB

Posted by Erick Erickson <er...@gmail.com>.
Yep. Here's Mike's classic video:
http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html

The third visualization down "TieredMergePolicy" is the default.

Best,
Erick

On Wed, Jan 13, 2016 at 6:52 PM, Zheng Lin Edwin Yeo
<ed...@gmail.com> wrote:
> Hi Erick,
>
> Thanks for your reply.
>
> So those small segments that I found is probably due to a commit happening
> during that time?
>
> I also found that those small segments are created during the last
> indexing. If I start another batch of indexing, those small segments will
> probably be get merge together to form a 10GB segment, as I have defined
> the maxMergeSegmentMB to be 10240MB. Then there will be other new small
> segments that are formed from the latest batch of indexing. Is that the way
> it works?
>
> Regards,
> Edwin
>
>
> On 14 January 2016 at 10:38, Erick Erickson <er...@gmail.com> wrote:
>
>> ramBufferSizeMB is a _limit_ that flushes the buffer when
>> it is reached (actually, I think, it indexes a doc _then_
>> checks the size and if it's > the setting, flushes the
>> buffer. So technically you can exceed the buffer size by
>> your biggest doc's addition to the index).
>>
>> But I digress. This is a _limit_. If a commit happens (either
>> an autocommit or client-initiated commit or a commitWithin)
>> then the segment is flushed without regard to ramBufferSizeMB.
>>
>> Best,
>> Erick
>>
>> On Wed, Jan 13, 2016 at 5:44 PM, Zheng Lin Edwin Yeo
>> <ed...@gmail.com> wrote:
>> > Hi,
>> >
>> > I would like to check, if I have make the following settings for
>> > ramBufferSizeMB, and I am using TieredMergePolicy, am I supposed to get
>> > each segment size of at least 320MB?
>> >
>> >
>> >     <!-- ramBufferSizeMB sets the amount of RAM that may be used by
>> Lucene
>> >          indexing for buffering added documents and deletions before
>> they are
>> >          flushed to the Directory.
>> >          maxBufferedDocs sets a limit on the number of documents buffered
>> >          before flushing.
>> >          If both ramBufferSizeMB and maxBufferedDocs is set, then
>> >          Lucene will flush based on whichever limit is hit first.
>> >          The default is 100 MB.  -->
>> >         <ramBufferSizeMB>320</ramBufferSizeMB>
>> >         <!--<maxBufferedDocs>1000</maxBufferedDocs>-->
>> >
>> >
>> >         <mergePolicy class="org.apache.lucene.index.TieredMergePolicy">
>> >           <int name="maxMergeAtOnce">10</int>
>> >           <int name="segmentsPerTier">10</int>
>> >           <double name="maxMergedSegmentMB">10240</double>
>> >         </mergePolicy>
>> >
>> >
>> > I have this setting in my solrconfig.xml, but when I checked my segments
>> > size under the Segments info screen on the Admin UI, I see quite a number
>> > of segments at the bottom which have size that are much smaller than
>> 320MB.
>> > Is that the correct behaviour, or is my ramBufferSizeMB not working
>> > correctly?
>> >
>> > I am using Solr 5.4.0,
>> >
>> >
>> > Regards,
>> > Edwin
>>

Re: Setting of ramBufferSizeMB

Posted by Zheng Lin Edwin Yeo <ed...@gmail.com>.
Hi Erick,

Thanks for your reply.

So those small segments that I found is probably due to a commit happening
during that time?

I also found that those small segments are created during the last
indexing. If I start another batch of indexing, those small segments will
probably be get merge together to form a 10GB segment, as I have defined
the maxMergeSegmentMB to be 10240MB. Then there will be other new small
segments that are formed from the latest batch of indexing. Is that the way
it works?

Regards,
Edwin


On 14 January 2016 at 10:38, Erick Erickson <er...@gmail.com> wrote:

> ramBufferSizeMB is a _limit_ that flushes the buffer when
> it is reached (actually, I think, it indexes a doc _then_
> checks the size and if it's > the setting, flushes the
> buffer. So technically you can exceed the buffer size by
> your biggest doc's addition to the index).
>
> But I digress. This is a _limit_. If a commit happens (either
> an autocommit or client-initiated commit or a commitWithin)
> then the segment is flushed without regard to ramBufferSizeMB.
>
> Best,
> Erick
>
> On Wed, Jan 13, 2016 at 5:44 PM, Zheng Lin Edwin Yeo
> <ed...@gmail.com> wrote:
> > Hi,
> >
> > I would like to check, if I have make the following settings for
> > ramBufferSizeMB, and I am using TieredMergePolicy, am I supposed to get
> > each segment size of at least 320MB?
> >
> >
> >     <!-- ramBufferSizeMB sets the amount of RAM that may be used by
> Lucene
> >          indexing for buffering added documents and deletions before
> they are
> >          flushed to the Directory.
> >          maxBufferedDocs sets a limit on the number of documents buffered
> >          before flushing.
> >          If both ramBufferSizeMB and maxBufferedDocs is set, then
> >          Lucene will flush based on whichever limit is hit first.
> >          The default is 100 MB.  -->
> >         <ramBufferSizeMB>320</ramBufferSizeMB>
> >         <!--<maxBufferedDocs>1000</maxBufferedDocs>-->
> >
> >
> >         <mergePolicy class="org.apache.lucene.index.TieredMergePolicy">
> >           <int name="maxMergeAtOnce">10</int>
> >           <int name="segmentsPerTier">10</int>
> >           <double name="maxMergedSegmentMB">10240</double>
> >         </mergePolicy>
> >
> >
> > I have this setting in my solrconfig.xml, but when I checked my segments
> > size under the Segments info screen on the Admin UI, I see quite a number
> > of segments at the bottom which have size that are much smaller than
> 320MB.
> > Is that the correct behaviour, or is my ramBufferSizeMB not working
> > correctly?
> >
> > I am using Solr 5.4.0,
> >
> >
> > Regards,
> > Edwin
>

Re: Setting of ramBufferSizeMB

Posted by Erick Erickson <er...@gmail.com>.
ramBufferSizeMB is a _limit_ that flushes the buffer when
it is reached (actually, I think, it indexes a doc _then_
checks the size and if it's > the setting, flushes the
buffer. So technically you can exceed the buffer size by
your biggest doc's addition to the index).

But I digress. This is a _limit_. If a commit happens (either
an autocommit or client-initiated commit or a commitWithin)
then the segment is flushed without regard to ramBufferSizeMB.

Best,
Erick

On Wed, Jan 13, 2016 at 5:44 PM, Zheng Lin Edwin Yeo
<ed...@gmail.com> wrote:
> Hi,
>
> I would like to check, if I have make the following settings for
> ramBufferSizeMB, and I am using TieredMergePolicy, am I supposed to get
> each segment size of at least 320MB?
>
>
>     <!-- ramBufferSizeMB sets the amount of RAM that may be used by Lucene
>          indexing for buffering added documents and deletions before they are
>          flushed to the Directory.
>          maxBufferedDocs sets a limit on the number of documents buffered
>          before flushing.
>          If both ramBufferSizeMB and maxBufferedDocs is set, then
>          Lucene will flush based on whichever limit is hit first.
>          The default is 100 MB.  -->
>         <ramBufferSizeMB>320</ramBufferSizeMB>
>         <!--<maxBufferedDocs>1000</maxBufferedDocs>-->
>
>
>         <mergePolicy class="org.apache.lucene.index.TieredMergePolicy">
>           <int name="maxMergeAtOnce">10</int>
>           <int name="segmentsPerTier">10</int>
>           <double name="maxMergedSegmentMB">10240</double>
>         </mergePolicy>
>
>
> I have this setting in my solrconfig.xml, but when I checked my segments
> size under the Segments info screen on the Admin UI, I see quite a number
> of segments at the bottom which have size that are much smaller than 320MB.
> Is that the correct behaviour, or is my ramBufferSizeMB not working
> correctly?
>
> I am using Solr 5.4.0,
>
>
> Regards,
> Edwin