You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@solr.apache.org by GitBox <gi...@apache.org> on 2022/09/21 04:28:40 UTC

[GitHub] [solr] patsonluk opened a new pull request, #1032: SOLR-16412 : Race condition in SizeLimitedDistributedMap for cleanup

patsonluk opened a new pull request, #1032:
URL: https://github.com/apache/solr/pull/1032

   https://issues.apache.org/jira/browse/SOLR-16412
   
   # Description
   
   Details of the issue is described in the JIRA issue linked above.
   
   Although we could enforce synchronization to prevent threads from purging the same set of child nodes, it might not be desirable to add extra blocking. 
   
   Instead, we should be more forgiving `SizeLimitedDistributedMap#shrinkIfNeeded` if any items/nodes in the map no longer exist. 
   
   # Solution
   
   Catch `KeeperEx1ception.NoNodeException` in `SizeLimitedDistributedMap#shrinkIfNeeded` if such node is already deleted, which is likely triggered by concurrent `shrinkIfNeeded` call.
   
   # Tests
   
   Added unit test case `TestSizeLimitedDistributedMap#testConcurrentCleanup`, it's shown that without the current fix, the exception will be thrown
   
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [x] I have reviewed the guidelines for [How to Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms to the standards described there to the best of my ability.
   - [x] I have created a Jira issue and added the issue ID to my pull request title.
   - [ ] I have given Solr maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended)
   - [x] I have developed this patch against the `main` branch.
   - [ ] I have run `./gradlew check`.
   - [x] I have added tests for my changes.
   - [ ] I have added documentation for the [Reference Guide](https://github.com/apache/solr/tree/main/solr/solr-ref-guide)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] patsonluk commented on pull request #1032: SOLR-16412 : Race condition in SizeLimitedDistributedMap for cleanup

Posted by GitBox <gi...@apache.org>.
patsonluk commented on PR #1032:
URL: https://github.com/apache/solr/pull/1032#issuecomment-1289523323

   @risdenk would you mind to take a second look please? Many thanks! 🙇🏼 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] patsonluk commented on pull request #1032: SOLR-16412 : Race condition in SizeLimitedDistributedMap for cleanup

Posted by GitBox <gi...@apache.org>.
patsonluk commented on PR #1032:
URL: https://github.com/apache/solr/pull/1032#issuecomment-1289822367

   Can someone with write access please merge if it's all good? 😊 
   cc @noblepaul @chatman 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] risdenk commented on a diff in pull request #1032: SOLR-16412 : Race condition in SizeLimitedDistributedMap for cleanup

Posted by GitBox <gi...@apache.org>.
risdenk commented on code in PR #1032:
URL: https://github.com/apache/solr/pull/1032#discussion_r1003701865


##########
solr/core/src/java/org/apache/solr/cloud/SizeLimitedDistributedMap.java:
##########
@@ -91,9 +91,13 @@ protected boolean lessThan(Long a, Long b) {
       for (String child : children) {
         Long id = childToModificationZxid.get(child);
         if (id != null && id <= topElementMzxId) {
-          zookeeper.delete(dir + "/" + child, -1, true);
-          if (onOverflowObserver != null)
-            onOverflowObserver.onChildDelete(child.substring(PREFIX.length()));
+          try {
+            zookeeper.delete(dir + "/" + child, -1, true);
+            if (onOverflowObserver != null)
+              onOverflowObserver.onChildDelete(child.substring(PREFIX.length()));
+          } catch (KeeperException.NoNodeException e) {

Review Comment:
   nit: naming the exception `ignore` or `ignored` I think makes IDEs not care that the exception isn't used. It doesn't change the behavior at all just clear we meant to ignore the exception.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] sonatype-lift[bot] commented on pull request #1032: SOLR-16412 : Race condition in SizeLimitedDistributedMap for cleanup

Posted by GitBox <gi...@apache.org>.
sonatype-lift[bot] commented on PR #1032:
URL: https://github.com/apache/solr/pull/1032#issuecomment-1283253466

   :warning: **312 God Classes** were detected by Lift in this project. [Visit the Lift web console](https://lift.sonatype.com/results/github.com/apache/solr/01GFPV4REF6BRMYYBW4EVPECFJ?tab=technical-debt&utm_source=github.com&utm_campaign=lift-comment&utm_content=apache\%20solr) for more details.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] risdenk commented on a diff in pull request #1032: SOLR-16412 : Race condition in SizeLimitedDistributedMap for cleanup

Posted by GitBox <gi...@apache.org>.
risdenk commented on code in PR #1032:
URL: https://github.com/apache/solr/pull/1032#discussion_r998862676


##########
solr/core/src/test/org/apache/solr/cloud/TestSizeLimitedDistributedMap.java:
##########
@@ -67,6 +74,58 @@ public void testCleanup() throws Exception {
     }
   }
 
+  public void testConcurrentCleanup() throws Exception {
+    final Set<String> expectedKeys = new HashSet<>();
+    final List<String> deletedItems = new LinkedList<>();
+    int numResponsesToStore = TEST_NIGHTLY ? Overseer.NUM_RESPONSES_TO_STORE : 100;
+
+    try (SolrZkClient zkClient = new SolrZkClient(zkServer.getZkHost(), 10000)) {
+      String path = getAndMakeInitialPath(zkClient);
+      DistributedMap map =
+          new SizeLimitedDistributedMap(
+              zkClient, path, numResponsesToStore, (element) -> deletedItems.add(element));
+      // fill the map to limit first
+      for (int i = 0; i < numResponsesToStore; i++) {
+        map.put("xyz_" + i, new byte[0]);
+      }
+
+      // add more elements concurrently to trigger cleanup
+      final int THREAD_COUNT = Math.min(100, numResponsesToStore);
+      List<Callable<Object>> callables = new ArrayList<>();
+      for (int i = 0; i < THREAD_COUNT; i++) {
+        final String key = "xyz_" + (numResponsesToStore + 1);
+        expectedKeys.add(key);
+        callables.add(
+            () -> {
+              map.put(key, new byte[0]);
+              return null;
+            });
+      }
+
+      ExecutorService executorService =
+          ExecutorUtil.newMDCAwareFixedThreadPool(
+              THREAD_COUNT, new SolrNamedThreadFactory("test-concurrent-cleanup"));
+      List<Future<Object>> futures = new ArrayList<>();
+      for (Callable<Object> callable : callables) {
+        futures.add(executorService.submit(callable));
+      }
+      try {
+        for (Future<Object> future : futures) {
+          future.get(); // none of them should throw exception
+        }
+        for (String expectedKey : expectedKeys) {
+          assertTrue(map.contains(expectedKey));
+        }
+        // there's no guarantees on exactly how many elements will be removed, but it should at
+        // least NOT throw exception
+        assertTrue(!deletedItems.isEmpty());
+      } finally {
+        executorService.shutdown();

Review Comment:
   Use ExecutorUtil to do this. It will handle things nicely.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] risdenk commented on a diff in pull request #1032: SOLR-16412 : Race condition in SizeLimitedDistributedMap for cleanup

Posted by GitBox <gi...@apache.org>.
risdenk commented on code in PR #1032:
URL: https://github.com/apache/solr/pull/1032#discussion_r998863051


##########
solr/core/src/test/org/apache/solr/cloud/TestSizeLimitedDistributedMap.java:
##########
@@ -67,6 +74,58 @@ public void testCleanup() throws Exception {
     }
   }
 
+  public void testConcurrentCleanup() throws Exception {
+    final Set<String> expectedKeys = new HashSet<>();
+    final List<String> deletedItems = new LinkedList<>();
+    int numResponsesToStore = TEST_NIGHTLY ? Overseer.NUM_RESPONSES_TO_STORE : 100;
+
+    try (SolrZkClient zkClient = new SolrZkClient(zkServer.getZkHost(), 10000)) {
+      String path = getAndMakeInitialPath(zkClient);
+      DistributedMap map =
+          new SizeLimitedDistributedMap(
+              zkClient, path, numResponsesToStore, (element) -> deletedItems.add(element));
+      // fill the map to limit first
+      for (int i = 0; i < numResponsesToStore; i++) {
+        map.put("xyz_" + i, new byte[0]);
+      }
+
+      // add more elements concurrently to trigger cleanup
+      final int THREAD_COUNT = Math.min(100, numResponsesToStore);
+      List<Callable<Object>> callables = new ArrayList<>();
+      for (int i = 0; i < THREAD_COUNT; i++) {
+        final String key = "xyz_" + (numResponsesToStore + 1);
+        expectedKeys.add(key);
+        callables.add(
+            () -> {
+              map.put(key, new byte[0]);
+              return null;
+            });
+      }
+
+      ExecutorService executorService =
+          ExecutorUtil.newMDCAwareFixedThreadPool(
+              THREAD_COUNT, new SolrNamedThreadFactory("test-concurrent-cleanup"));
+      List<Future<Object>> futures = new ArrayList<>();
+      for (Callable<Object> callable : callables) {
+        futures.add(executorService.submit(callable));
+      }
+      try {
+        for (Future<Object> future : futures) {
+          future.get(); // none of them should throw exception
+        }
+        for (String expectedKey : expectedKeys) {
+          assertTrue(map.contains(expectedKey));
+        }
+        // there's no guarantees on exactly how many elements will be removed, but it should at
+        // least NOT throw exception
+        assertTrue(!deletedItems.isEmpty());
+      } finally {
+        executorService.shutdown();

Review Comment:
   Specifically `ExecutorUtil#shutdownAndAwaitTermination` or `ExecutorUtil#shutdownNowAndAwaitTermination`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] patsonluk commented on a diff in pull request #1032: SOLR-16412 : Race condition in SizeLimitedDistributedMap for cleanup

Posted by GitBox <gi...@apache.org>.
patsonluk commented on code in PR #1032:
URL: https://github.com/apache/solr/pull/1032#discussion_r1003888379


##########
solr/core/src/java/org/apache/solr/cloud/SizeLimitedDistributedMap.java:
##########
@@ -91,9 +91,13 @@ protected boolean lessThan(Long a, Long b) {
       for (String child : children) {
         Long id = childToModificationZxid.get(child);
         if (id != null && id <= topElementMzxId) {
-          zookeeper.delete(dir + "/" + child, -1, true);
-          if (onOverflowObserver != null)
-            onOverflowObserver.onChildDelete(child.substring(PREFIX.length()));
+          try {
+            zookeeper.delete(dir + "/" + child, -1, true);
+            if (onOverflowObserver != null)
+              onOverflowObserver.onChildDelete(child.substring(PREFIX.length()));
+          } catch (KeeperException.NoNodeException e) {

Review Comment:
   👍🏼  changed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] noblepaul commented on pull request #1032: SOLR-16412 : Race condition in SizeLimitedDistributedMap for cleanup

Posted by GitBox <gi...@apache.org>.
noblepaul commented on PR #1032:
URL: https://github.com/apache/solr/pull/1032#issuecomment-1291267801

   thanks @patsonluk 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] noblepaul merged pull request #1032: SOLR-16412 : Race condition in SizeLimitedDistributedMap for cleanup

Posted by GitBox <gi...@apache.org>.
noblepaul merged PR #1032:
URL: https://github.com/apache/solr/pull/1032


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] patsonluk commented on pull request #1032: SOLR-16412 : Race condition in SizeLimitedDistributedMap for cleanup

Posted by GitBox <gi...@apache.org>.
patsonluk commented on PR #1032:
URL: https://github.com/apache/solr/pull/1032#issuecomment-1283256758

   @noblepaul Can have a review on this please? 😊 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org