You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2022/05/16 18:10:12 UTC

[GitHub] [lucene] risdenk opened a new pull request, #895: LUCENE-10576: ConcurrentMergeScheduler maxThreadCount calculation is artificially low

risdenk opened a new pull request, #895:
URL: https://github.com/apache/lucene/pull/895

   # Description
   
   ConcurrentMergeScheduler tries to calculate number of max threads. This is artificially low and capped at 4 due to the logic in the calculation.
   
   # Solution
   
   This updates the logic to `Math.max(4, coreCount / 2)` which sets a minimum to 4 threads (the previous lower limit) and max of `coreCount / 2`.
   
   # Tests
   
   No new tests were added.
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [ ] I have reviewed the guidelines for [How to Contribute](https://github.com/apache/lucene/blob/main/CONTRIBUTING.md) and my code conforms to the standards described there to the best of my ability.
   - [ ] I have given Lucene maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended)
   - [ ] I have developed this patch against the `main` branch.
   - [ ] I have run `./gradlew check`.
   - [ ] I have added tests for my changes.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jpountz commented on pull request #895: LUCENE-10576: ConcurrentMergeScheduler maxThreadCount calculation is artificially low

Posted by GitBox <gi...@apache.org>.
jpountz commented on PR #895:
URL: https://github.com/apache/lucene/pull/895#issuecomment-1128736566

   The current calculation makes sense to me. Merge policies like to organize segments into tiers, where the number of segments on each tier is typically also the number of segments that can be merged together. So it doesn't make much sense to perform multiple merges on the same tier concurrently. The way I'm reading the current formula is that we are scaling the number of merge threads with the number of processors, but stop at 4 anyway because it already allows Lucene to perform merges on 4 different tiers concurrently, which is already a lot given that tiers have exponential sizes, and that TieredMergePolicy has a max merged segment size of 5GB.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] risdenk commented on pull request #895: LUCENE-10576: ConcurrentMergeScheduler maxThreadCount calculation is artificially low

Posted by GitBox <gi...@apache.org>.
risdenk commented on PR #895:
URL: https://github.com/apache/lucene/pull/895#issuecomment-1128011527

   So just to clarify - today (and ever since spins = false either detected or default) - the minimum has been 4 threads. 
   
   I'm not sure I understand "I'd just change the upper limit. But there should be a cap." @uschindler.
   
   are you saying something like:
   
   `Math.max(1, coreCount / 2)` - so the minimum is 1 but still variable based on coreCount.
   
   or 
   
   `Math.max(4, Math.min(10, coreCount / 2)` - so there is a maximum of 10, but minimum of 4 based on `coreCount / 2`.
   
   or
   
   `Math.max(1, Math.min(10, coreCount / 2)` - so there is a maximum of 10, but there is a minimum of 1 based on `coreCount / 2`
   
   or something completely different.
   
   Note: the 10 above is just some arbitrary max just to illustrate. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] rmuir commented on pull request #895: LUCENE-10576: ConcurrentMergeScheduler maxThreadCount calculation is artificially low

Posted by GitBox <gi...@apache.org>.
rmuir commented on PR #895:
URL: https://github.com/apache/lucene/pull/895#issuecomment-1128002097

   I agree with @uschindler . I only have 2 cores.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] risdenk commented on pull request #895: LUCENE-10576: ConcurrentMergeScheduler maxThreadCount calculation is artificially low

Posted by GitBox <gi...@apache.org>.
risdenk commented on PR #895:
URL: https://github.com/apache/lucene/pull/895#issuecomment-1127982468

   I'm not sure this is a great change since it could drastically increase the number of max threads and therefore merges allowed - especially with some of the high CPU core boxes. However, the limit of 4 threads today seems to be wrong at least how the logic is written. This PR is just an example of simplifying to make it more clear. I don't have any benchmarks or specifics on why to make this change.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] uschindler commented on pull request #895: LUCENE-10576: ConcurrentMergeScheduler maxThreadCount calculation is artificially low

Posted by GitBox <gi...@apache.org>.
uschindler commented on PR #895:
URL: https://github.com/apache/lucene/pull/895#issuecomment-1128502847

   What I wanted to add: Merging is mostly an IO thing. More cores would not necessarily make it faster (your SSD  has a limited amount of parallelism). It may be better on different indexes placed on different SSDs, but those would have separate merge schedulers anyways. If you look a few lines up in code: If its a harddisk and spins the maximum number of threads is 1.
   
   P.S.: When we really want to change this, the documentation (javadocs) needs update, too.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] rmuir commented on pull request #895: LUCENE-10576: ConcurrentMergeScheduler maxThreadCount calculation is artificially low

Posted by GitBox <gi...@apache.org>.
rmuir commented on PR #895:
URL: https://github.com/apache/lucene/pull/895#issuecomment-1128016323

   I don't think the formula should be changed personally. If you want to increase the default limit of `4` to say `6` for some good reason, I'm not entirely opposed to that, but the formula looks correct.
   
   But I think we need justification for the change. as background merging should be... background for most people, I feel like the simple bounded default up to 4 is good. User can always override by calling `setMaxMergesAndThreads` themselves.
   This is just a default value. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] uschindler commented on pull request #895: LUCENE-10576: ConcurrentMergeScheduler maxThreadCount calculation is artificially low

Posted by GitBox <gi...@apache.org>.
uschindler commented on PR #895:
URL: https://github.com/apache/lucene/pull/895#issuecomment-1128022492

   Hi,
   The lower limit should definitely stay at 1. I tend to agree to change the upper limit, but the formula as is if correct.
   As Robert said, that's a background thing. I tend to think that the /2 is wrong, so maybe set maximum threads to 6 or 8 but done spend more than a third of all cores.
   This all needs some benchmarks with SSD and also without.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] risdenk commented on pull request #895: LUCENE-10576: ConcurrentMergeScheduler maxThreadCount calculation is artificially low

Posted by GitBox <gi...@apache.org>.
risdenk commented on PR #895:
URL: https://github.com/apache/lucene/pull/895#issuecomment-1128827348

   Fair enough - appreciate all the comments and additional context I couldn't find in the linked jiras. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] risdenk closed pull request #895: LUCENE-10576: ConcurrentMergeScheduler maxThreadCount calculation is artificially low

Posted by GitBox <gi...@apache.org>.
risdenk closed pull request #895: LUCENE-10576: ConcurrentMergeScheduler maxThreadCount calculation is artificially low
URL: https://github.com/apache/lucene/pull/895


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org