You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@solr.apache.org by "Michael Gibney (Jira)" <ji...@apache.org> on 2022/05/09 12:53:00 UTC
[jira] [Comment Edited] (SOLR-15660) Remove universal 10 second test thread leak linger.

    [ https://issues.apache.org/jira/browse/SOLR-15660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17533066#comment-17533066 ] 

Michael Gibney edited comment on SOLR-15660 at 5/9/22 12:52 PM:
----------------------------------------------------------------

There's several issues being conflated here I think. The stack trace that Jan and Gus refer to above _is_ able to be addressed I believe; I've taken an initial stab at doing so in [apache/solr#842|https://github.com/apache/solr/pull/842]. But I think it's impossible to entirely get around ThreadLeakLinger, and probably a mistake to try to do so, due to the [number of errors we're seeing now|https://lists.apache.org/list?builds@solr.apache.org:dfr=2022-4-1\|dto=2022-5-31:java.base@11.0.12/java.util.concurrent.ThreadPoolExecutor.tryTerminate] that are really _completely_ spurious.

There's some high-level discussion of ThreadLeakLinger in MAHOUT-1345 that makes it clear even if we fixed all the actual thread leaks, we'd still be getting "leaked" threads detected spuriously [between awaitTermination.signalAll and actual thread death|https://github.com/openjdk/jdk17u/blob/20f3576cd1bbe516360b0d9f7deaacdad94df4d7/src/java.base/share/classes/java/util/concurrent/ThreadPoolExecutor.java#L728-L733].

Universal thread leak linger of 10s is probably overkill, granted; and may (\?) have masked a bunch of actual issues. But I'd argue that universal thread leak linger of perhaps 1s could be viewed as a small price to pay for avoiding tons of spurious failures and avoiding the need for every developer to be intimately familiar with the issues surrounding (perfectly normal) delayed thread death in Executors.

Notably though, even the TestLeaderElectionZkExpiry test failure, though "real" in a sense, may not have been worth the trouble to address. Yes it was surprising to me that ZooKeeper.close() doesn't block until the connection threads die. But although we can tighten that up, I'd argue it represents a transient/aesthetic resource leak, not a practically significant one. And the unavoidable Executor "resource leak" is a game changer from my perspective. I mean, 10ms might _indeed_ be enough time to avoid these spurious errors. Evidently _0s_ is usually enough time to avoid such errors :)


was (Author: mgibney):
There's several issues being conflated here I think. The stack trace that Jan and Gus refer to above _is_ able to be addressed I believe; I've taken an initial stab at doing so in [apache/solr#842|https://github.com/apache/solr/pull/842]. But I think it's impossible to entirely get around ThreadLeakLinger, and probably a mistake to try to do so, due to the [number of errors we're seeing now|https://lists.apache.org/list?builds@solr.apache.org:dfr=2022-4-1] that are really _completely_ spurious.

There's some high-level discussion of ThreadLeakLinger in MAHOUT-1345 that makes it clear even if we fixed all the actual thread leaks, we'd still be getting "leaked" threads detected spuriously [between awaitTermination.signalAll and actual thread death|https://github.com/openjdk/jdk17u/blob/20f3576cd1bbe516360b0d9f7deaacdad94df4d7/src/java.base/share/classes/java/util/concurrent/ThreadPoolExecutor.java#L728-L733].

Universal thread leak linger of 10s is probably overkill, granted; and may (?) have masked a bunch of actual issues. But I'd argue that universal thread leak linger of perhaps 1s could be viewed as a small price to pay for avoiding tons of spurious failures and avoiding the need for every developer to be intimately familiar with the issues surrounding (perfectly normal) delayed thread death in Executors.

Notably though, even the TestLeaderElectionZkExpiry test failure, though "real" in a sense, may not have been worth the trouble to address. Yes it was surprising to me that ZooKeeper.close() doesn't block until the connection threads die. But although we can tighten that up, I'd argue it represents a transient/aesthetic resource leak, not a practically significant one. And the unavoidable Executor "resource leak" is a game changer from my perspective. I mean, 10ms might _indeed_ be enough time to avoid these spurious errors. Evidently _0s_ is usually enough time to avoid such errors :)

> Remove universal 10 second test thread leak linger.
> ---------------------------------------------------
>
>                 Key: SOLR-15660
>                 URL: https://issues.apache.org/jira/browse/SOLR-15660
>             Project: Solr
>          Issue Type: Test
>          Components: Tests
>            Reporter: Mark Robert Miller
>            Assignee: Mark Robert Miller
>            Priority: Minor
>             Fix For: 9.0
>
>         Attachments: screenshot-1.png
>
>          Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org