You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@solr.apache.org by GitBox <gi...@apache.org> on 2022/11/24 01:33:56 UTC

[GitHub] [solr] ooasis opened a new pull request, #1189: SOLR-16561: Use autoSoftCommmitMaxTime as preferred poll interval of …

ooasis opened a new pull request, #1189:
URL: https://github.com/apache/solr/pull/1189

   # Description
   
   TLOG/PULL replicas use _IndexFetcher_ to fetch segment files from leaders. Once new segment files are downloaded and merged into existing index, a new Searcher is opened so the updated data is made available to the clients.  The poll interval is determined by following code in _ReplicateFromLeader_
   
   ```
   if (uinfo.autoCommmitMaxTime != -1) {
      pollIntervalStr = toPollIntervalStr(uinfo.autoCommmitMaxTime/2);
   } else if (uinfo.autoSoftCommmitMaxTime != -1) {
      pollIntervalStr = toPollIntervalStr(uinfo.autoSoftCommmitMaxTime/2);
   }
   ```
   
   In a typical config for replication using TLOG/PULL replicas where data visibility is less important (a trade-off to avoid NRT replicas), we set a short commit time to persist changes and long soft-commit time to make changes visible.
   
   ```
   <autoCommit>
     <maxTime>15000</maxTime>
     <openSearcher>false</openSearcher>
   </autoCommit>
   <autoSoftCommit>
     <maxTime>3600000</maxTime>
   </autoSoftCommit>
   
   ``` 
   
   With about config, the poll interval will be 15/2 = 7 sec.  This leads to frequent opening of new Searchers which causes huge impact on realtime user queries, especially if the new Searcher takes long time to warmup.  This also makes changes visible on followers ahead of leaders.   
   
   Because the polling of new segment files is more about visibility because TLOG replicas still get updates to tlog files via UpdateHandler (this is my understanding). It seems more appropriate to use  _autoSoftCommmitMaxTime_ as the poll interval.   
   
   # Solution
   
   I would  proposed change below where autoSoftCommmitMaxTime is chosen as the preferred interval.  This will make the poll interval much longer and make the visibility order more inline with eventual consistency pattern.
   
   ```
   if (uinfo.autoSoftCommmitMaxTime != -1) {
       pollIntervalStr = toPollIntervalStr(uinfo.autoSoftCommmitMaxTime);
   } else if (uinfo.autoCommmitMaxTime != -1) {
       pollIntervalStr = toPollIntervalStr(uinfo.autoCommmitMaxTime);
   }
   ```
   
   # Tests
   
   The difference can only be tested with proper replication config and controlled indexing and user queries.  The change has been tried in my environment and showed much less impact on realtime queries compared with previous tests.
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [x] I have reviewed the guidelines for [How to Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms to the standards described there to the best of my ability.
   - [x] I have created a Jira issue and added the issue ID to my pull request title.
   - [x] I have given Solr maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended)
   - [x] I have developed this patch against the `main` branch.
   - [x] I have run `./gradlew check`.
   - [ ] I have added tests for my changes.
   - [ ] I have added documentation for the [Reference Guide](https://github.com/apache/solr/tree/main/solr/solr-ref-guide)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org