You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Shalin Shekhar Mangar (JIRA)" <ji...@apache.org> on 2019/03/29 08:17:00 UTC

[jira] [Commented] (SOLR-13352) possible deadlock/threadleak from OverseerTriggerThread/AutoScalingWatcher during close()

    [ https://issues.apache.org/jira/browse/SOLR-13352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16804679#comment-16804679 ] 

Shalin Shekhar Mangar commented on SOLR-13352:
----------------------------------------------

Thanks Hoss. Your fix looks good to me.

Re: the code comment:
{quote}
// this can throw InterruptedException and we don't want to unlock if it did, so we keep this outside
// of the try/finally block
{quote}
and your comment:
{quote}
if lockInterruptibly() throws InterruptedException, that means we never acquired the lock – so we shouldn't call unlock in the outer try/finally anyway
{quote}

Aren't they equivalent statements? It is saying that the reason why the lockInterruptibly() call is not inside the try-finally block is because it can throw InterruptedException and in that case unlock should not be called.

I mean if it is obvious to everyone why lockInterruptibly() should never be inside a try-finally block then fine, we can remove the code comment but it might not be obvious to me if I come back to this code in a year or two. Feel free to re-word it to make it more clear.

> possible deadlock/threadleak from OverseerTriggerThread/AutoScalingWatcher during close()
> -----------------------------------------------------------------------------------------
>
>                 Key: SOLR-13352
>                 URL: https://issues.apache.org/jira/browse/SOLR-13352
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Hoss Man
>            Assignee: Hoss Man
>            Priority: Major
>         Attachments: SOLR-13352.patch, sarowe_Lucene-Solr-tests-master_20462.log.txt
>
>
> A recent jenkins failure in TestSimTriggerIntegration lead me to what appears to be a "lock leak" situation in OverseerTriggerThread in how the "updateLock" object is dealt with in the event that the OverseerTriggerThread is closed.
> It's possible that this only affects tests using the SimCloudManager when calling "simRestartOverseer" -- but 
> I _believe_ this can lead also lead to an actual deadlock / threadleak situation in a thread running AutoScalingWatcher (that hold a refrefrences to OverseerTriggerThread and every object reachable from it) when the OverseerTriggerThread is closed as part of a real Solr shutdown ... which i think would cause the JVM to stall untill externally killed.
> ----
> If my analysis of the test failure (to follow in comment) is correct, then even even if this bug isn't likely to affect real world solr instances (and only surfaces because of how OverseerTriggerThread is used in SimCloudManager) the fix to OverseerTriggerThread is a trivial change to follow locking best practices (patch to follow)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org