You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Alexander Rukletsov (JIRA)" <ji...@apache.org> on 2017/10/20 23:29:00 UTC

[jira] [Commented] (MESOS-8096) Enqueueing events in MockHTTPScheduler can lead to segfaults.

    [ https://issues.apache.org/jira/browse/MESOS-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16213484#comment-16213484 ] 

Alexander Rukletsov commented on MESOS-8096:
--------------------------------------------

There are at least 2 races in the test code around v1 scheduler/executor and driver libraries. Below only scheduler is described, executor case is the same modulo it has one more race which was fixed in MESOS-4029 for the scheduler.

h4. The scheduler might not be fully constructed before the driver library starts to use it.
When we [initialize|https://github.com/apache/mesos/blob/a85a22baa32f66ecaa13c4602a195d57f6abf9be/src/tests/mesos.hpp#L2159-L2165] the scheduler driver library, we pass an [{{events}} callback|https://github.com/apache/mesos/blob/a85a22baa32f66ecaa13c4602a195d57f6abf9be/src/tests/mesos.hpp#L2176], which [uses|https://github.com/apache/mesos/blob/a85a22baa32f66ecaa13c4602a195d57f6abf9be/src/tests/mesos.hpp#L2211] the [member variable|https://github.com/apache/mesos/blob/a85a22baa32f66ecaa13c4602a195d57f6abf9be/src/tests/mesos.hpp#L2216] of the library test wrapper. The scheduler library can start using the callback right after it is constructed, even before the library test wrapper [fully initializes|https://github.com/apache/mesos/blob/a85a22baa32f66ecaa13c4602a195d57f6abf9be/src/tests/mesos.hpp#L2180-L2181]. This leads to segfaults when {{SUBSCRIBED}} event is being passed to the not-yet-initialized scheduler.

h4. The scheduler library might be destroyed while the scheduler still uses it.
This is not fully fixed by MESOS-4029, see {{"AsyncExecutorProcess-badrun-3"}}. Passing scheduler driver's {{this}} to a scheduler [here|https://github.com/apache/mesos/blob/a85a22baa32f66ecaa13c4602a195d57f6abf9be/src/tests/mesos.hpp#L2177] is unsafe and does not guarantee that it is safe to use Mesos.

> Enqueueing events in MockHTTPScheduler can lead to segfaults.
> -------------------------------------------------------------
>
>                 Key: MESOS-8096
>                 URL: https://issues.apache.org/jira/browse/MESOS-8096
>             Project: Mesos
>          Issue Type: Bug
>          Components: scheduler driver, test
>         Environment: Fedora 23, Ubuntu 14.04, Ubuntu 16
>            Reporter: Alexander Rukletsov
>            Assignee: Alexander Rukletsov
>              Labels: flaky-test, mesosphere
>         Attachments: AsyncExecutorProcess-badrun-1.txt, AsyncExecutorProcess-badrun-2.txt, AsyncExecutorProcess-badrun-3.txt
>
>
> Various tests segfault due to a yet unknown reason. Comparing logs (attached) hints that the problem might be in the scheduler's event queue.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)