You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Joris Van Remoortere (JIRA)" <ji...@apache.org> on 2016/02/27 02:26:18 UTC

[jira] [Commented] (MESOS-4711) Race condition in libevent poll implementation causes crash

    [ https://issues.apache.org/jira/browse/MESOS-4711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15170239#comment-15170239 ] 

Joris Van Remoortere commented on MESOS-4711:
---------------------------------------------

{code}
commit 16aa038949741f4dc6bf43423dc0340f869605ce
Author: Alexander Rojas <al...@mesosphere.io>
Date:   Fri Feb 26 17:17:50 2016 -0800

    Removed race condition from libevent based poll implementation.
    
    Under certains circumstances, the future returned by poll is discarded
    right after the event is triggered, this causes the event callback to be
    called before the discard callback which results in an abort signal
    being raised by the libevent library.
    
    Review: https://reviews.apache.org/r/43799/
{code}

> Race condition in libevent poll implementation causes crash
> -----------------------------------------------------------
>
>                 Key: MESOS-4711
>                 URL: https://issues.apache.org/jira/browse/MESOS-4711
>             Project: Mesos
>          Issue Type: Bug
>          Components: libprocess
>    Affects Versions: 0.28.0
>         Environment: CentOS 6.7 running in VirtualBox
>            Reporter: Alexander Rojas
>            Assignee: Alexander Rojas
>              Labels: mesosphere
>             Fix For: 0.28.0, 0.27.2
>
>
> The issue first arose in MESOS-3271, but can be reproduced every time by using the mentioned environment and running:
> {noformat}
> sudo ./bin/mesos-tests.sh --gtest_filter="MemoryPressureMesosTest.CGROUPS_ROOT_SlaveRecovery" --gtest_repeat=1000
> {noformat}
> The problem can be traced back to [{{libevent_poll.cpp}}|https://github.com/apache/mesos/blob/3539b7a0e15b594148308319bf052d28b1429b98/3rdparty/libprocess/src/libevent_poll.cpp]. If the event is triggered and the the future associated with the event is discarded, the situation arises in which  [{{pollCallback()}}|https://github.com/apache/mesos/blob/3539b7a0e15b594148308319bf052d28b1429b98/3rdparty/libprocess/src/libevent_poll.cpp#L33] starts executing just early enough to finish before [{{pollDiscard()}}|https://github.com/apache/mesos/blob/3539b7a0e15b594148308319bf052d28b1429b98/3rdparty/libprocess/src/libevent_poll.cpp#L53] executes. If that happens, {{pollCallback()}} deletes the poll object and {{pollDiscard()}} is left with a dangling pointer which crashes when it executes the line {{event_active(ev, EV_READ, 0);}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)