You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@solr.apache.org by "psalagnac (via GitHub)" <gi...@apache.org> on 2024/02/05 19:49:40 UTC

[PR] SOLR-17149: Fix backup/restore for large collections. [solr]

psalagnac opened a new pull request, #2243:
URL: https://github.com/apache/solr/pull/2243

   https://issues.apache.org/jira/browse/SOLR-17149
   
   <!--
   _(If you are a project committer then you may remove some/all of the following template.)_
   
   Before creating a pull request, please file an issue in the ASF Jira system for Solr:
   
   * https://issues.apache.org/jira/projects/SOLR
   
   For something minor (i.e. that wouldn't be worth putting in release notes), you can skip JIRA. 
   To create a Jira issue, you will need to create an account there first.
   
   The title of the PR should reference the Jira issue number in the form:
   
   * SOLR-####: <short description of problem or changes>
   
   SOLR must be fully capitalized. A short description helps people scanning pull requests for items they can work on.
   
   Properly referencing the issue in the title ensures that Jira is correctly updated with code review comments and commits. -->
   
   
   # Description
   
   A regression was introduced with [SOLR-16879](https://issues.apache.org/jira/browse/SOLR-16879) that makes backup/restore failing when there are more than 5 shards per node.
   
   # Solution
   
   Replace the bounded queue for async core admin tasks by an unbounded one.
   
   # Tests
   
   No unit tests added.
   Manually checked we don't have any tasks rejected by the executor.
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [x] I have reviewed the guidelines for [How to Contribute](https://github.com/apache/solr/blob/main/CONTRIBUTING.md) and my code conforms to the standards described there to the best of my ability.
   - [x] I have created a Jira issue and added the issue ID to my pull request title.
   - [x] I have given Solr maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended)
   - [x] I have developed this patch against the `main` branch.
   - [x] I have run `./gradlew check`.
   - [ ] I have added tests for my changes.
   - [ ] I have added documentation for the [Reference Guide](https://github.com/apache/solr/tree/main/solr/solr-ref-guide)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


Re: [PR] SOLR-17149: Fix backup/restore for large collections. [solr]

Posted by "psalagnac (via GitHub)" <gi...@apache.org>.
psalagnac commented on code in PR #2243:
URL: https://github.com/apache/solr/pull/2243#discussion_r1478841079


##########
solr/core/src/java/org/apache/solr/core/CoreContainer.java:
##########
@@ -412,7 +412,8 @@ public CoreContainer(NodeConfig config, CoresLocator locator, boolean asyncSolrC
         new OrderedExecutor(
             cfg.getReplayUpdatesThreads(),
             ExecutorUtil.newMDCAwareCachedThreadPool(

Review Comment:
   I did not change the behavior here, but to be honest, I don't get why we create an executor with a bounded queue that have same capacity than the number of threads. By default, number of threads is number of processors available to the JVM.
   
   If there is an agreement this queue should be unbounded too, I may remove the capacity parameter and always use `Integer.MAX_VALUE` for the queue capacity.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


Re: [PR] SOLR-17149: Fix backup/restore for large collections. [solr]

Posted by "gerlowskija (via GitHub)" <gi...@apache.org>.
gerlowskija commented on code in PR #2243:
URL: https://github.com/apache/solr/pull/2243#discussion_r1479783759


##########
solr/core/src/java/org/apache/solr/core/CoreContainer.java:
##########
@@ -412,7 +412,8 @@ public CoreContainer(NodeConfig config, CoresLocator locator, boolean asyncSolrC
         new OrderedExecutor(
             cfg.getReplayUpdatesThreads(),
             ExecutorUtil.newMDCAwareCachedThreadPool(

Review Comment:
   > I may remove the capacity parameter and always use Integer.MAX_VALUE for the queue capacity.
   
   I suspect you're right that this was an oversight or mistake.  But I'd file a separate ticket for changing this if you're interested.  It's likely a safe change, but it seems like something we'd want to "bake" for longer before it goes into a release.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


Re: [PR] SOLR-17149: Fix backup/restore for large collections. [solr]

Posted by "gerlowskija (via GitHub)" <gi...@apache.org>.
gerlowskija commented on PR #2243:
URL: https://github.com/apache/solr/pull/2243#issuecomment-1929966138

   Gah - that must've been it!  Thanks for catching my mistake.  I've pushed my test-changes, and run 'test'+'check' locally.  So I'll merge and backport.  Thanks for the quick fix @psalagnac !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


Re: [PR] SOLR-17149: Fix backup/restore for large collections. [solr]

Posted by "psalagnac (via GitHub)" <gi...@apache.org>.
psalagnac commented on PR #2243:
URL: https://github.com/apache/solr/pull/2243#issuecomment-1929975215

   Thanks for the test. Looks good to me.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


Re: [PR] SOLR-17149: Fix backup/restore for large collections. [solr]

Posted by "gerlowskija (via GitHub)" <gi...@apache.org>.
gerlowskija merged PR #2243:
URL: https://github.com/apache/solr/pull/2243


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


Re: [PR] SOLR-17149: Fix backup/restore for large collections. [solr]

Posted by "psalagnac (via GitHub)" <gi...@apache.org>.
psalagnac commented on code in PR #2243:
URL: https://github.com/apache/solr/pull/2243#discussion_r1479965597


##########
solr/core/src/java/org/apache/solr/core/CoreContainer.java:
##########
@@ -412,7 +412,8 @@ public CoreContainer(NodeConfig config, CoresLocator locator, boolean asyncSolrC
         new OrderedExecutor(
             cfg.getReplayUpdatesThreads(),
             ExecutorUtil.newMDCAwareCachedThreadPool(

Review Comment:
   Thanks, makes sense.
   I'll fill another ticket for that later so this is not in 9.5.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


Re: [PR] SOLR-17149: Fix backup/restore for large collections. [solr]

Posted by "psalagnac (via GitHub)" <gi...@apache.org>.
psalagnac commented on PR #2243:
URL: https://github.com/apache/solr/pull/2243#issuecomment-1929955744

   @gerlowskija Not sure why you got denied. I already checked the _"Allow edits and access to secrets by maintainers"_ box.
   
   Can this be because of branch name case? (`SOLR-17149` vs `solr-17149`)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


Re: [PR] SOLR-17149: Fix backup/restore for large collections. [solr]

Posted by "gerlowskija (via GitHub)" <gi...@apache.org>.
gerlowskija commented on PR #2243:
URL: https://github.com/apache/solr/pull/2243#issuecomment-1929779151

   Hi @psalagnac - this LGTM, generally.
   
   The one thing missing IMO is some modifications to `AbstractIncrementalBackupTest` to cover the many-shards case.  I took a stab at these locally and have code ready to share, put it looks like your branch isn't configured to allow others to push to it:
   
   ```
   $ git push psalagnac SOLR-17149:SOLR-17149
   Enumerating objects: 84, done.
   Counting objects: 100% (84/84), done.
   Delta compression using up to 28 threads
   Compressing objects: 100% (32/32), done.
   Writing objects: 100% (51/51), 13.84 KiB | 4.61 MiB/s, done.
   Total 51 (delta 22), reused 0 (delta 0), pack-reused 0
   remote: Resolving deltas: 100% (22/22), completed with 19 local objects.
   To github.com:psalagnac/solr.git
    ! [remote rejected]         SOLR-17149 -> SOLR-17149 (**permission denied**)
   error: failed to push some refs to 'github.com:psalagnac/solr.git'
   ```
   
   
   If you're willing to accept my test changes, could you take a look at your branch settings and let me know when they're updated?  Or alternatively, if you'd rather make the changes yourself, that's fine too.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org