You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Benjamin Mahler (JIRA)" <ji...@apache.org> on 2019/06/07 03:15:00 UTC

[jira] [Assigned] (MESOS-9808) libprocess can deadlock on termination (cleanup() vs use() + terminate())

     [ https://issues.apache.org/jira/browse/MESOS-9808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Benjamin Mahler reassigned MESOS-9808:
--------------------------------------

    Assignee: Benjamin Mahler

> libprocess can deadlock on termination (cleanup() vs use() + terminate())
> -------------------------------------------------------------------------
>
>                 Key: MESOS-9808
>                 URL: https://issues.apache.org/jira/browse/MESOS-9808
>             Project: Mesos
>          Issue Type: Bug
>            Reporter: Andrei Sekretenko
>            Assignee: Benjamin Mahler
>            Priority: Major
>              Labels: foundations
>         Attachments: deadlock_stacks.txt, deadlock_stacks_filtered.txt, deadlock_stacks_with_fix.txt
>
>
> Using the process::loop() together with the common pattern of using libprocess (Process wrapper + dispatching) is prone to causing a deadlock on libprocess termination if the code does not wait for the loop exit before termination.
> *The deadlock itself is not directly caused by the process::loop(), though.*
>  It occurs in a following setup with two processes (let's name them A and B).
> Thread 1 tries to cleanup process A. It locks processes_mutex and hangs here:
>  [https://github.com/apache/mesos/blob/663bfa68b6ab68f4c28ed6a01ac42ac2ad23ac07/3rdparty/libprocess/src/process.cpp#L3079]
>  waiting for the process A to have no strong references.
> Thread 2 begins with creating a ProcessReference in ProcessManager::deliver(UPID&) called for process: [https://github.com/apache/mesos/blob/663bfa68b6ab68f4c28ed6a01ac42ac2ad23ac07/3rdparty/libprocess/src/process.cpp#L2799]
> and ends up waiting for processes_mutex in ProcessManager::terminate() for process B:
>  [https://github.com/apache/mesos/blob/663bfa68b6ab68f4c28ed6a01ac42ac2ad23ac07/3rdparty/libprocess/src/process.cpp#L3155]
> -----------------
>  In the observed case, terminate() for process B was triggered by a destructor of a process-wrapping object owned by a libprocess loop executing on A.
> I'm attaching the stacks captured at the deadlock. Stacks of the threads which lock one another are in [^deadlock_stacks_filtered.txt] Note frame #1 in Thread 5 (waiting for all references to expire) and frames #48 and #8 in Thread 19 (creating a reference and waiting for a processes_mutex).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)