You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Manohar Sripada <ma...@gmail.com> on 2015/05/14 06:01:47 UTC

QQ on segments during indexing.

I have a question on segment creation on disk during indexing.

In my solrconfig.xml, I have commented maxBufferedDocs and ramBufferSizeMB.
I am controlling the flushing of data to disk using autoCommit's maxDocs
and maxTime.

Here, maxDocs is set to 50000 and will be hit first, so that commit of data
to disk happens every 50000 docs. So, my question here is will it create a
new segment when this commit happens?

In the wiki
<https://wiki.apache.org/solr/SolrPerformanceFactors#mergeFactor>, it is
mentioned that a new segment creation is determined based on
maxBufferedDocs parameter. As I have commented this parameter, how a new
segment creation is determined?

Thanks,
Manohar

Re: QQ on segments during indexing.

Posted by Manohar Sripada <ma...@gmail.com>.
Thanks Shawn, In my case, the document size is small. So, for sure it will
reach 50k docs first than 100MB buffer size.

Thanks,
Manohar

On Thu, May 14, 2015 at 10:49 AM, Shawn Heisey <ap...@elyograg.org> wrote:

> On 5/13/2015 10:01 PM, Manohar Sripada wrote:
> > I have a question on segment creation on disk during indexing.
> >
> > In my solrconfig.xml, I have commented maxBufferedDocs and
> ramBufferSizeMB.
> > I am controlling the flushing of data to disk using autoCommit's maxDocs
> > and maxTime.
> >
> > Here, maxDocs is set to 50000 and will be hit first, so that commit of
> data
> > to disk happens every 50000 docs. So, my question here is will it create
> a
> > new segment when this commit happens?
> >
> > In the wiki
> > <https://wiki.apache.org/solr/SolrPerformanceFactors#mergeFactor>, it is
> > mentioned that a new segment creation is determined based on
> > maxBufferedDocs parameter. As I have commented this parameter, how a new
> > segment creation is determined?
>
> In recent Solr versions, the ramBufferSizeMB setting defaults to 100 and
> maxBufferedDocs defaults to -1.  A setting of -1 on maxBufferedDocs
> means that the number of docs doesn't matter, it will use
> ramBufferSizeMB unless a commit happens before the buffer fills up.  A
> commit does trigger a segment flush, although if it's a soft commit, the
> situation might be more complicated.
>
> Unless the docs are very small, I would expect a 100MB buffer to fill up
> before you reach 50000 docs.  It's been a while since I watched index
> segments get created, but if I remember correctly, the amount of space
> required in the RAM buffer to index documents is more than the size of
> the segment that eventually gets flushed to disk.
>
> Thanks,
> Shawn
>
>

Re: QQ on segments during indexing.

Posted by Shawn Heisey <ap...@elyograg.org>.
On 5/13/2015 10:01 PM, Manohar Sripada wrote:
> I have a question on segment creation on disk during indexing.
> 
> In my solrconfig.xml, I have commented maxBufferedDocs and ramBufferSizeMB.
> I am controlling the flushing of data to disk using autoCommit's maxDocs
> and maxTime.
> 
> Here, maxDocs is set to 50000 and will be hit first, so that commit of data
> to disk happens every 50000 docs. So, my question here is will it create a
> new segment when this commit happens?
> 
> In the wiki
> <https://wiki.apache.org/solr/SolrPerformanceFactors#mergeFactor>, it is
> mentioned that a new segment creation is determined based on
> maxBufferedDocs parameter. As I have commented this parameter, how a new
> segment creation is determined?

In recent Solr versions, the ramBufferSizeMB setting defaults to 100 and
maxBufferedDocs defaults to -1.  A setting of -1 on maxBufferedDocs
means that the number of docs doesn't matter, it will use
ramBufferSizeMB unless a commit happens before the buffer fills up.  A
commit does trigger a segment flush, although if it's a soft commit, the
situation might be more complicated.

Unless the docs are very small, I would expect a 100MB buffer to fill up
before you reach 50000 docs.  It's been a while since I watched index
segments get created, but if I remember correctly, the amount of space
required in the RAM buffer to index documents is more than the size of
the segment that eventually gets flushed to disk.

Thanks,
Shawn