You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Benjamin Mahler (JIRA)" <ji...@apache.org> on 2019/06/07 03:15:00 UTC
[jira] [Assigned] (MESOS-9808) libprocess can deadlock on
termination (cleanup() vs use() + terminate())
[ https://issues.apache.org/jira/browse/MESOS-9808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Benjamin Mahler reassigned MESOS-9808:
--------------------------------------
Assignee: Benjamin Mahler
> libprocess can deadlock on termination (cleanup() vs use() + terminate())
> -------------------------------------------------------------------------
>
> Key: MESOS-9808
> URL: https://issues.apache.org/jira/browse/MESOS-9808
> Project: Mesos
> Issue Type: Bug
> Reporter: Andrei Sekretenko
> Assignee: Benjamin Mahler
> Priority: Major
> Labels: foundations
> Attachments: deadlock_stacks.txt, deadlock_stacks_filtered.txt, deadlock_stacks_with_fix.txt
>
>
> Using the process::loop() together with the common pattern of using libprocess (Process wrapper + dispatching) is prone to causing a deadlock on libprocess termination if the code does not wait for the loop exit before termination.
> *The deadlock itself is not directly caused by the process::loop(), though.*
> It occurs in a following setup with two processes (let's name them A and B).
> Thread 1 tries to cleanup process A. It locks processes_mutex and hangs here:
> [https://github.com/apache/mesos/blob/663bfa68b6ab68f4c28ed6a01ac42ac2ad23ac07/3rdparty/libprocess/src/process.cpp#L3079]
> waiting for the process A to have no strong references.
> Thread 2 begins with creating a ProcessReference in ProcessManager::deliver(UPID&) called for process: [https://github.com/apache/mesos/blob/663bfa68b6ab68f4c28ed6a01ac42ac2ad23ac07/3rdparty/libprocess/src/process.cpp#L2799]
> and ends up waiting for processes_mutex in ProcessManager::terminate() for process B:
> [https://github.com/apache/mesos/blob/663bfa68b6ab68f4c28ed6a01ac42ac2ad23ac07/3rdparty/libprocess/src/process.cpp#L3155]
> -----------------
> In the observed case, terminate() for process B was triggered by a destructor of a process-wrapping object owned by a libprocess loop executing on A.
> I'm attaching the stacks captured at the deadlock. Stacks of the threads which lock one another are in [^deadlock_stacks_filtered.txt] Note frame #1 in Thread 5 (waiting for all references to expire) and frames #48 and #8 in Thread 19 (creating a reference and waiting for a processes_mutex).
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)