You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by Shawn Heisey <ap...@elyograg.org> on 2018/10/10 06:28:46 UTC

Does ConcurrentMergeScheduler actually do smaller merges first?

Before I open an issue, I would like to double-check my sanity, see if 
an issue is needed.

I have noticed that the javadoc for ConcurrentMergeScheduler says that 
it schedules smaller merges before larger merges.  In the past, I have 
seen evidence suggesting this is not actually the case, that it prefers 
larger merges first.

---- background ----

When importing millions of rows from a database using Solr's dataimport 
handler, the index will be merged quite frequently while that indexing 
occurs.  Eventually, it reaches a point where there are multiple merges 
scheduled simultaneously, so the the ongoing indexing thread will be 
paused until the number of merges drops below maxMergeCount.

If the smallest merge was being done first, then I don't think the 
observed behavior would be what happens.  What I would see happen in the 
past is that when a large merge gets scheduled, indexing is paused long 
enough for the database connection to time out and be disconnected, so 
when the import tries to resume indexing, it can't -- the source 
database connection is gone.  For MySQL databases, this timeout takes 
about ten minutes to happen. If the smallest merge had completed first, 
the count would have decreased long before the database connection could 
time out, and indexing would have resumed with no problems.

---- end background ----

The way that I have fixed this problem in the past is to increase 
maxMergeCount to 6.  When that's done, the incoming thread never gets 
paused, and the database connection doesn't time out.

I can see that the default for maxMergeCount was changed from 2 to 6 in 
2014 by LUCENE-6119.  So 5.0 and later probably might not have the 
problems I encountered as long as the scheduler is left at defaults ... 
but I suspect that the running order of merges goes larger to smaller, 
contrary to javadoc.  The code is pretty dense and I haven't completely 
deciphered it yet.

Thanks,
Shawn


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: Does ConcurrentMergeScheduler actually do smaller merges first?

Posted by Shawn Heisey <ap...@elyograg.org>.

On 10/10/2018 5:40 PM, Shawn Heisey wrote:
> somebody who's intimately familiar with that code could decipher it a 
> lot faster than I can. 

I went ahead and built a test class mirroring the sorting code I see in 
ConcurrentMergeScheduler (master branch), and it looks like current code 
does indeed behave as advertised.  Here's the code I built (paste has a 
one month expiration time):

https://apaste.info/CFJT

The output of that class is this, exactly what I was hoping to see:

Index: 0, Size: 4725, Pause: true
Index: 1, Size: 3725, Pause: true
Index: 2, Size: 2725, Pause: true
Index: 3, Size: 1725, Pause: true
Index: 4, Size: 725, Pause: true
Index: 5, Size: 525, Pause: false
Index: 6, Size: 25, Pause: false

The code could use more comments documenting its operation, but it does 
look like it's correct, at least in the master branch.

Looking over the commit history for the file, nothing jumped out at me 
as being a change that might have reversed the sort order, but I can say 
that Solr 1.4.x (Lucene 2.9) is where I saw the problem, and I'm 
reasonably certain that some of the reports I handled on the mailing 
list were on version 4.x.  I cannot confirm the version on more recent 
reports without checking list history.

Thanks,
Shawn

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: Does ConcurrentMergeScheduler actually do smaller merges first?

Posted by Shawn Heisey <ap...@elyograg.org>.

On 10/10/2018 11:52 AM, Michael Sokolov wrote:
> If maxMergeCount was 2, you could get into a situation with three 
> large merges I think; the largest would be paused, but the others 
> could still take > 10 mins to complete. Are you sure that your 
> observation is at odds with what the document says the scheduler is doing?

I haven't done extremely comprehensive checking, and it has been a 
number of years now.  When I was looking, what appeared to be happening 
was three merges scheduled.  The smallest one I would expect to complete 
in seconds, or certainly within a few minutes. The largest one was 
probably at the merge policy's 5GB max segment size, and a merge of that 
size would definitely take longer than ten minutes.  I no longer have 
access to those indexes, so I can't investigate directly.

There are still new reports on solr-user of database connections failing 
while importing millions of rows, even recently.  I have NOT heard about 
anyone applying my fix (set maxMergeCount to 6) and still seeing 
failures, but I suppose that might have happened.

It is the recent reports of the problem that has prompted me to 
investigate deeper and start this thread.  I believe that the merge 
scheduler SHOULD handle smaller merges first, just like the javadocs 
indicate, but I have seen evidence (at least in the past) that it's not 
actually doing so.  My look at the code today seems to indicate that it 
is sorting large merges first, but somebody who's intimately familiar 
with that code could decipher it a lot faster than I can.

Thanks,
Shawn

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: Does ConcurrentMergeScheduler actually do smaller merges first?

Posted by Michael Sokolov <ms...@gmail.com>.

If maxMergeCount was 2, you could get into a situation with three large
merges I think; the largest would be paused, but the others could still
take > 10 mins to complete. Are you sure that your observation is at odds
with what the document says the scheduler is doing?

On Wed, Oct 10, 2018 at 2:28 AM Shawn Heisey <ap...@elyograg.org> wrote:

> Before I open an issue, I would like to double-check my sanity, see if
> an issue is needed.
>
> I have noticed that the javadoc for ConcurrentMergeScheduler says that
> it schedules smaller merges before larger merges.  In the past, I have
> seen evidence suggesting this is not actually the case, that it prefers
> larger merges first.
>
> ---- background ----
>
> When importing millions of rows from a database using Solr's dataimport
> handler, the index will be merged quite frequently while that indexing
> occurs.  Eventually, it reaches a point where there are multiple merges
> scheduled simultaneously, so the the ongoing indexing thread will be
> paused until the number of merges drops below maxMergeCount.
>
> If the smallest merge was being done first, then I don't think the
> observed behavior would be what happens.  What I would see happen in the
> past is that when a large merge gets scheduled, indexing is paused long
> enough for the database connection to time out and be disconnected, so
> when the import tries to resume indexing, it can't -- the source
> database connection is gone.  For MySQL databases, this timeout takes
> about ten minutes to happen. If the smallest merge had completed first,
> the count would have decreased long before the database connection could
> time out, and indexing would have resumed with no problems.
>
> ---- end background ----
>
> The way that I have fixed this problem in the past is to increase
> maxMergeCount to 6.  When that's done, the incoming thread never gets
> paused, and the database connection doesn't time out.
>
> I can see that the default for maxMergeCount was changed from 2 to 6 in
> 2014 by LUCENE-6119.  So 5.0 and later probably might not have the
> problems I encountered as long as the scheduler is left at defaults ...
> but I suspect that the running order of merges goes larger to smaller,
> contrary to javadoc.  The code is pretty dense and I haven't completely
> deciphered it yet.
>
> Thanks,
> Shawn
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>