You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Patrick Recchia <pa...@gmail.com> on 2018/05/02 10:54:44 UTC

Too many commits

Hello,

I'm seeing way too many commits on our solr cluster, and I don't know why.

Here is the landscape:
- Each collection we create (one per day) is created with 10 shards with 2
replicas each.
- we send live data, 2B records / day. so on average 200M records/shard per
day - for a size of approx 180GB/sahrd*Day.
on peak hours that makes approx 10M records/hour;
- so approx. 150000 records/minute. For a size of ~115MB/Minute?

- IndexConfig is set to autoCommit every minute:

<autoCommit> <maxTime>${solr.autoCommit.maxTime:60000}</maxTime> <
openSearcher>true</openSearcher> </autoCommit>

(solr.autoCommit.maxTime is not set)

There is nothing else customized (when it comes to IndexWriter, at least)
within solrconfig.xml

The data is sent without commit, but with commitWithin=500000 ms.

All that said, I would have expected a rate of about 1 segment created epr
minute; of about 100MB.

Instead of that, I a lot of very small segments (between a few KB to a few
MB) with a very high rate.

And I have no idea why this would happen.
Where I can look to explain such a rate of segments being written?





-- 
One way of describing a computer is as an electric box which hums.
Never ascribe to malice what can be explained by stupidity
--
Patrick Recchia
GSM (BE): +32 486 828311
GSM(IT): +39 347 2300830

Re: Too many commits

Posted by Shawn Heisey <ap...@elyograg.org>.
On 5/2/2018 11:45 AM, Patrick Recchia wrote:
> Is there any logging I can turn on to know when a commit happens and/or
> when a segment is flushed?

The normal INFO-level logging that Solr ships with will log all
commits.  It probably doesn't log segment flushes unless they happen as
a result of a commit, though.  The infoStream logging would have that
information.

Your autoCommit settings are ensuring that commitWithin is never going
to actually cause a commit.  Your interval for autoCommit is 60000 (one
minute), commitWithin is 500000 (a little over eight minutes).  The
autoCommit has openSearcher set to true, so there will always be a
commit with a new searcher occurring within one minute after an update
is sent, and commitWithin will never be needed.

Here's what I think I would try:  On autoCommit, set openSearcher to
false.  If you want to have less than an eight minute window for
document visibility, then reduce commitWithin to 120000.  Increase
ramBufferSizeMB to 256 or 512, which might require an increase in heap
size as well.  Instead of using commitWithin, you could configure
autoSoftCommit with a maxTime of 120000.

Here's some additional info about commits:

https://lucidworks.com/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

The title says "SolrCloud" but the concepts are equally applicable when
not running in cloud mode.

Thanks,
Shawn


Re: Too many commits

Posted by Erick Erickson <er...@gmail.com>.
Youcan turn on "infostream", but that is _very_ voluminous. The
regular Solr logs at INFO level should show commits though



On Wed, May 2, 2018 at 10:45 AM, Patrick Recchia
<pa...@gmail.com> wrote:
> Swawn,
> thanks you very much for your answer.
>
>
> On Wed, May 2, 2018 at 6:27 PM, Shawn Heisey <ap...@elyograg.org> wrote:
>
>> On 5/2/2018 4:54 AM, Patrick Recchia wrote:
>> > I'm seeing way too many commits on our solr cluster, and I don't know
>> why.
>>
>> Are you sure there are commits happening?  Do you have logs actually
>> saying that a commit is occurring?  The creation of a new segment does
>> not necessarily mean a commit happened -- this can happen even without a
>> commit.
>>
>
> You're right, I assumed a new segment would be created only as part of a
> commit; but I realize now that there can be other situations.
>
> Is there any logging I can turn on to know when a commit happens and/or
> when a segment is flushed?
>
> I would be very interested in that
> I've already enabled InfoStream logging from the IndexWriter, but have
> found nothing yet there to help me understand that
>
>
>
>> > - IndexConfig is set to autoCommit every minute:
>> >
>> > <autoCommit> <maxTime>${solr.autoCommit.maxTime:60000}</maxTime> <
>> > openSearcher>true</openSearcher> </autoCommit>
>> >
>> > (solr.autoCommit.maxTime is not set)
>>
>> It's recommended to set openSearcher to false on autoCommit.  Do you
>> have autoSoftCommit configured?
>>
>
> autoSoftCommit is left at its default '-1' (which means infinity, I
> suppose).
>
>
>
>>
>> > There is nothing else customized (when it comes to IndexWriter, at least)
>> > within solrconfig.xml
>> >
>> > The data is sent without commit, but with commitWithin=500000 ms.
>> >
>> > All that said, I would have expected a rate of about 1 segment created
>> epr
>> > minute; of about 100MB.
>>
>> One of the events that can cause a new segment to be flushed is the ram
>> buffer filling up.  Solr defaults to a ramBufferSizeMB value of 100.
>> But that does not translate to a segment size of 100MB -- it's merely
>> the size of the ram buffer that Lucene uses for all the work related to
>> building a segment.  A segment resulting from a full memory buffer is
>> going to be smaller than the buffer.  I do not know how MUCH smaller, or
>> what causes variations in that size.
>>
>> The general advice is to leave the buffer size alone.  But with the high
>> volume you've got, you might want to increase it so segments are not
>> flushed as frequently.  Be aware that increasing it will have an impact
>> on how much heap memory gets used.  Every Solr core (shard replica in
>> SolrCloud terminology) that does indexing is going to need one of these
>> ram buffers.
>>
>
> I will definitely investigate this ramBufferSizeMB.
> And, see through lucene code when a segment is flushed.
>
> Again, many thanks.
> Patrick

Re: Too many commits

Posted by Patrick Recchia <pa...@gmail.com>.
Swawn,
thanks you very much for your answer.


On Wed, May 2, 2018 at 6:27 PM, Shawn Heisey <ap...@elyograg.org> wrote:

> On 5/2/2018 4:54 AM, Patrick Recchia wrote:
> > I'm seeing way too many commits on our solr cluster, and I don't know
> why.
>
> Are you sure there are commits happening?  Do you have logs actually
> saying that a commit is occurring?  The creation of a new segment does
> not necessarily mean a commit happened -- this can happen even without a
> commit.
>

You're right, I assumed a new segment would be created only as part of a
commit; but I realize now that there can be other situations.

Is there any logging I can turn on to know when a commit happens and/or
when a segment is flushed?

I would be very interested in that
I've already enabled InfoStream logging from the IndexWriter, but have
found nothing yet there to help me understand that



> > - IndexConfig is set to autoCommit every minute:
> >
> > <autoCommit> <maxTime>${solr.autoCommit.maxTime:60000}</maxTime> <
> > openSearcher>true</openSearcher> </autoCommit>
> >
> > (solr.autoCommit.maxTime is not set)
>
> It's recommended to set openSearcher to false on autoCommit.  Do you
> have autoSoftCommit configured?
>

autoSoftCommit is left at its default '-1' (which means infinity, I
suppose).



>
> > There is nothing else customized (when it comes to IndexWriter, at least)
> > within solrconfig.xml
> >
> > The data is sent without commit, but with commitWithin=500000 ms.
> >
> > All that said, I would have expected a rate of about 1 segment created
> epr
> > minute; of about 100MB.
>
> One of the events that can cause a new segment to be flushed is the ram
> buffer filling up.  Solr defaults to a ramBufferSizeMB value of 100.
> But that does not translate to a segment size of 100MB -- it's merely
> the size of the ram buffer that Lucene uses for all the work related to
> building a segment.  A segment resulting from a full memory buffer is
> going to be smaller than the buffer.  I do not know how MUCH smaller, or
> what causes variations in that size.
>
> The general advice is to leave the buffer size alone.  But with the high
> volume you've got, you might want to increase it so segments are not
> flushed as frequently.  Be aware that increasing it will have an impact
> on how much heap memory gets used.  Every Solr core (shard replica in
> SolrCloud terminology) that does indexing is going to need one of these
> ram buffers.
>

I will definitely investigate this ramBufferSizeMB.
And, see through lucene code when a segment is flushed.

Again, many thanks.
Patrick

Re: Too many commits

Posted by Shawn Heisey <ap...@elyograg.org>.
On 5/2/2018 4:54 AM, Patrick Recchia wrote:
> I'm seeing way too many commits on our solr cluster, and I don't know why.

Are you sure there are commits happening?  Do you have logs actually
saying that a commit is occurring?  The creation of a new segment does
not necessarily mean a commit happened -- this can happen even without a
commit.

> - IndexConfig is set to autoCommit every minute:
>
> <autoCommit> <maxTime>${solr.autoCommit.maxTime:60000}</maxTime> <
> openSearcher>true</openSearcher> </autoCommit>
>
> (solr.autoCommit.maxTime is not set)

It's recommended to set openSearcher to false on autoCommit.  Do you
have autoSoftCommit configured?

> There is nothing else customized (when it comes to IndexWriter, at least)
> within solrconfig.xml
>
> The data is sent without commit, but with commitWithin=500000 ms.
>
> All that said, I would have expected a rate of about 1 segment created epr
> minute; of about 100MB.

One of the events that can cause a new segment to be flushed is the ram
buffer filling up.  Solr defaults to a ramBufferSizeMB value of 100. 
But that does not translate to a segment size of 100MB -- it's merely
the size of the ram buffer that Lucene uses for all the work related to
building a segment.  A segment resulting from a full memory buffer is
going to be smaller than the buffer.  I do not know how MUCH smaller, or
what causes variations in that size.

The general advice is to leave the buffer size alone.  But with the high
volume you've got, you might want to increase it so segments are not
flushed as frequently.  Be aware that increasing it will have an impact
on how much heap memory gets used.  Every Solr core (shard replica in
SolrCloud terminology) that does indexing is going to need one of these
ram buffers.

Thanks,
Shawn


Re: Too many commits

Posted by Erick Erickson <er...@gmail.com>.
Two possibilities:
1> you have multiple replicas in the same JVM and are seeing commits
happen withall of them.

2> ramBufferSizeMB. when you index docs, segments are flushed when the
in-memory structures exceed this limit, is this perhaps what you're
seeing?

Best,
Erick

On Wed, May 2, 2018 at 3:54 AM, Patrick Recchia
<pa...@gmail.com> wrote:
> Hello,
>
> I'm seeing way too many commits on our solr cluster, and I don't know why.
>
> Here is the landscape:
> - Each collection we create (one per day) is created with 10 shards with 2
> replicas each.
> - we send live data, 2B records / day. so on average 200M records/shard per
> day - for a size of approx 180GB/sahrd*Day.
> on peak hours that makes approx 10M records/hour;
> - so approx. 150000 records/minute. For a size of ~115MB/Minute?
>
> - IndexConfig is set to autoCommit every minute:
>
> <autoCommit> <maxTime>${solr.autoCommit.maxTime:60000}</maxTime> <
> openSearcher>true</openSearcher> </autoCommit>
>
> (solr.autoCommit.maxTime is not set)
>
> There is nothing else customized (when it comes to IndexWriter, at least)
> within solrconfig.xml
>
> The data is sent without commit, but with commitWithin=500000 ms.
>
> All that said, I would have expected a rate of about 1 segment created epr
> minute; of about 100MB.
>
> Instead of that, I a lot of very small segments (between a few KB to a few
> MB) with a very high rate.
>
> And I have no idea why this would happen.
> Where I can look to explain such a rate of segments being written?
>
>
>
>
>
> --
> One way of describing a computer is as an electric box which hums.
> Never ascribe to malice what can be explained by stupidity
> --
> Patrick Recchia
> GSM (BE): +32 486 828311
> GSM(IT): +39 347 2300830