You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@solr.apache.org by GitBox <gi...@apache.org> on 2022/05/06 16:53:05 UTC

[GitHub] [solr] magibney opened a new pull request, #842: SOLR-16046: fix thread leaks from non-blocking ZooKeeper.close()

magibney opened a new pull request, #842:
URL: https://github.com/apache/solr/pull/842

   See: [SOLR-16046](https://issues.apache.org/jira/browse/SOLR-16046)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] risdenk commented on a diff in pull request #842: SOLR-16046: fix thread leaks from non-blocking ZooKeeper.close()

Posted by GitBox <gi...@apache.org>.
risdenk commented on code in PR #842:
URL: https://github.com/apache/solr/pull/842#discussion_r868751831


##########
solr/solrj/src/java/org/apache/solr/common/cloud/ConnectionManager.java:
##########
@@ -210,7 +210,7 @@ public void update(ZooKeeper keeper) {
 
                 private void closeKeeper(ZooKeeper keeper) {
                   try {
-                    keeper.close();
+                    SolrZkClient.closeAsync(keeper);

Review Comment:
   Do we actually need to `closeAsync`? Can we always wait to close?



##########
solr/modules/hadoop-auth/src/test/org/apache/solr/security/hadoop/TestZkAclsWithHadoopAuth.java:
##########
@@ -106,6 +107,10 @@ public void testZkAcls() throws Exception {
       String zkHost = cluster.getSolrClient().getClusterStateProvider().getQuorumHosts();
       String zkChroot = zkHost.contains("/") ? zkHost.substring(zkHost.indexOf("/")) : null;
       walkZkTree(keeper, zkChroot, "/");
+    } finally {
+      // NOTE: cannot use try-with-resources, because `ZooKeeper.close()` without shutdownTimeout
+      // does not join on (and can leak) connection threads.

Review Comment:
   I'm not sure this logic belongs in `SolrZkClient`. We should use `try w/ resources` if possible. Can we override/extend `ZooKeeper` with a `SolrZooKeeper` or something that always closes Zookeeper the way we want? That way we get the benefit of try w/ resources and still get Zookeeper to close.



##########
solr/modules/hadoop-auth/src/test/org/apache/solr/security/hadoop/TestZkAclsWithHadoopAuth.java:
##########
@@ -106,6 +107,10 @@ public void testZkAcls() throws Exception {
       String zkHost = cluster.getSolrClient().getClusterStateProvider().getQuorumHosts();
       String zkChroot = zkHost.contains("/") ? zkHost.substring(zkHost.indexOf("/")) : null;
       walkZkTree(keeper, zkChroot, "/");
+    } finally {
+      // NOTE: cannot use try-with-resources, because `ZooKeeper.close()` without shutdownTimeout
+      // does not join on (and can leak) connection threads.

Review Comment:
   I'm not sure this logic belongs in `SolrZkClient`. We should use `try w/ resources` if possible. Can we override/extend `ZooKeeper` with a `SolrZooKeeper` or something that always closes Zookeeper the way we want? That way we get the benefit of try w/ resources and still get Zookeeper to close.



##########
solr/solrj/src/java/org/apache/solr/common/cloud/ConnectionManager.java:
##########
@@ -210,7 +210,7 @@ public void update(ZooKeeper keeper) {
 
                 private void closeKeeper(ZooKeeper keeper) {
                   try {
-                    keeper.close();
+                    SolrZkClient.closeAsync(keeper);

Review Comment:
   Do we actually need to `closeAsync`? Can we always wait to close?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] magibney commented on pull request #842: SOLR-16046: fix thread leaks from non-blocking ZooKeeper.close()

Posted by GitBox <gi...@apache.org>.
magibney commented on PR #842:
URL: https://github.com/apache/solr/pull/842#issuecomment-1122407472

   Thanks @risdenk! Both the "SolrZooKeeper" and the always-synchronous close() are codependent, iiuc. My hesitance to go full "blocking close()" across the board comes down to the fact that introducing a blocking method introduces new possibilities for actual _deadlock_. Paired with my [increasing sense that this may be chasing an issue of very little practical significance](https://issues.apache.org/jira/browse/SOLR-15660?focusedCommentId=17533066#comment-17533066), I'm not sure it's worth the risk to make any change at all here unless we're reasonably certain that the possibility for deadlock is adequately covered by existing tests, and/or we anticipate more substantial benefits than just "avoiding test failures for short-lived thread leaks" -- ThreadLeakLinger would avoid test failures, and may well be entirely appropriate in this case.
   
   I guess the way I'm thinking about this now: this is a "real" issue, but not necessarily a significant one, and the fix carries its own risks (arguably more significant -- deadlock -- than the risks of the existing issue). I'd like to get some consensus over whether there are any benefits (e.g., avoiding Zk reconnect storms/thrashing?) to a synchronous approach to close().
   
   If there's consensus that making ZooKeeper.close() synchronous across the board is likely the correct way to go, I'm fine with that (probably following the SolrZooKeeper approach you suggested). We could let it bake for a while before porting to a release branch.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


Re: [PR] SOLR-16046: fix thread leaks from non-blocking ZooKeeper.close() [solr]

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #842:
URL: https://github.com/apache/solr/pull/842#issuecomment-1953292457

   This PR had no visible activity in the past 60 days, labeling it as stale. Any new activity will remove the stale label. To attract more reviewers, please tag someone or notify the dev@solr.apache.org mailing list. Thank you for your contribution!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org