You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@yunikorn.apache.org by "Wilfred Spiegelenburg (Jira)" <ji...@apache.org> on 2021/03/15 03:48:00 UTC

[jira] [Created] (YUNIKORN-572) terminated app move causes deadlock

Wilfred Spiegelenburg created YUNIKORN-572:
----------------------------------------------

             Summary: terminated app move causes deadlock
                 Key: YUNIKORN-572
                 URL: https://issues.apache.org/jira/browse/YUNIKORN-572
             Project: Apache YuniKorn
          Issue Type: Bug
          Components: core - scheduler
            Reporter: Wilfred Spiegelenburg
            Assignee: Wilfred Spiegelenburg


PR #250 for the placeholder cleanup introduced a possible dead lock.

When moving a terminated app from the queue the queue is unlinked from the app. Before that happens we make sure that the queue tracking is up to date. This requires an application lock.

The move used to take a lock on the partition and did all its work in one go. With the change to remove the queue from the app we now lock the app inside the partition lock. Since the app is part of the active list until the move is done scheduling might check the app too. This could lead to a dead lock.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@yunikorn.apache.org
For additional commands, e-mail: dev-help@yunikorn.apache.org