You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mesos.apache.org by James Vanns <jv...@gmail.com> on 2015/06/09 17:58:21 UTC

Threading model of mesos API (C++)

Hi. I'm toying with the mesos scheduler (C++) API and running into
unexpected race conditions. I have *not* synchronised access to attributes
of my Scheduler-derived class. Is the mesos library code threaded and
network communication asynchronous? What it *looks like* I'm seeing is my
statusUpdate() callback being executed before the return of
resourceOffers(). Naturally I call driver->launchTasks() inside
resourceOffers(). This is intermittent but generally triggered by tasks
that report status changes very quickly; eg. a task that fails instantly.

Can anyone point me in the right direction of any online API docs that
explain how callbacks are invoked? Distributed over a pool of worker
threads?

Also are the state transitions documented? Eg.
mesos::TASK_STAGING -> mesos::TASK_STARTING -> etc.

Cheers,

Jim

--
Senior Code Pig
Industrial Light & Magic

Re: Threading model of mesos API (C++)

Posted by James Vanns <jv...@gmail.com>.
Excellent. Thank you both for your time and efforts - and most importantly
clarifying behavior :)

Jim

Re: Threading model of mesos API (C++)

Posted by Alexander Gallego <ag...@concord.io>.
Jim,

Ben is correct.

Here is the test - I tested with Mesos-0.21 - :

https://gist.github.com/senior7515/bc79371324f1e50598a7

The synchronization i had was due to the fact that my scheduler services
requests to do more work, which is a shared queue between the
resourceOffers(...)  & my request handler.

Here is a screenshot of the gist running on my machine w/ about 16 things
being run.



- Alex


.


On Wed, Jun 10, 2015 at 12:03 PM, James Vanns <jv...@gmail.com> wrote:

> You are a star, Alex. Thank you :)
>
> Jim
>
>
> On 10 June 2015 at 15:15, Alexander Gallego <ag...@concord.io> wrote:
>
>> Jim,
>>
>> Let me prototype something small today. After reading my scheduler (in
>> c++) i do have comments and synchronization on some state vars, but it
>> might have to do with a more complex async code base I manage.
>>
>> I'll get back to you.
>>
>> - alex
>>
>>
>> On Wed, Jun 10, 2015 at 6:15 AM, James Vanns <jv...@gmail.com>
>> wrote:
>>
>>> Thanks for the responses, guys. That link of the 'detailed description'
>>> will be handy - I've not come across that before. I do now have another
>>> question though! Aren't these two a contradiction;
>>>
>>> Alex;
>>> "you launch a task, before the method returns (say you do some blocking
>>> stuff after, like sync update zookeeper), you might get a statusUpdate()
>>> callback."
>>> Ben;
>>> "Methods will not be invoked concurrently, and each method must
>>> complete before the next is called."
>>>
>>> ??
>>>
>>> Jim
>>>
>>>
>>> On 10 June 2015 at 02:22, Benjamin Mahler <be...@gmail.com>
>>> wrote:
>>>
>>>> If that's really what you're seeing, it is a bug and a very surprising
>>>> one, so please provide evidence :)
>>>>
>>>> See the "detailed description" here:
>>>> http://mesos.apache.org/api/latest/c++/classmesos_1_1Scheduler.html
>>>>
>>>> The scheduler driver will serially invoke methods on your Scheduler
>>>> implementation. Methods will not be invoked concurrently, and each method
>>>> must complete before the next is called.
>>>>
>>>> So, we recommend that you don't block inside the callbacks. Otherwise,
>>>> you're blocking the driver as well and your own ability to continue
>>>> processing callbacks.
>>>>
>>>> On Tue, Jun 9, 2015 at 8:58 AM, James Vanns <jv...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi. I'm toying with the mesos scheduler (C++) API and running into
>>>>> unexpected race conditions. I have *not* synchronised access to attributes
>>>>> of my Scheduler-derived class. Is the mesos library code threaded and
>>>>> network communication asynchronous? What it *looks like* I'm seeing is my
>>>>> statusUpdate() callback being executed before the return of
>>>>> resourceOffers(). Naturally I call driver->launchTasks() inside
>>>>> resourceOffers(). This is intermittent but generally triggered by tasks
>>>>> that report status changes very quickly; eg. a task that fails instantly.
>>>>>
>>>>> Can anyone point me in the right direction of any online API docs that
>>>>> explain how callbacks are invoked? Distributed over a pool of worker
>>>>> threads?
>>>>>
>>>>> Also are the state transitions documented? Eg.
>>>>> mesos::TASK_STAGING -> mesos::TASK_STARTING -> etc.
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Jim
>>>>>
>>>>> --
>>>>> Senior Code Pig
>>>>> Industrial Light & Magic
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> --
>>> Senior Code Pig
>>> Industrial Light & Magic
>>>
>>
>>
>>
>>
>
>
> --
> --
> Senior Code Pig
> Industrial Light & Magic
>



-- 





Sincerely,
Alexander Gallego
Co Founder & CTO

Re: Threading model of mesos API (C++)

Posted by James Vanns <jv...@gmail.com>.
You are a star, Alex. Thank you :)

Jim


On 10 June 2015 at 15:15, Alexander Gallego <ag...@concord.io> wrote:

> Jim,
>
> Let me prototype something small today. After reading my scheduler (in
> c++) i do have comments and synchronization on some state vars, but it
> might have to do with a more complex async code base I manage.
>
> I'll get back to you.
>
> - alex
>
>
> On Wed, Jun 10, 2015 at 6:15 AM, James Vanns <jv...@gmail.com> wrote:
>
>> Thanks for the responses, guys. That link of the 'detailed description'
>> will be handy - I've not come across that before. I do now have another
>> question though! Aren't these two a contradiction;
>>
>> Alex;
>> "you launch a task, before the method returns (say you do some blocking
>> stuff after, like sync update zookeeper), you might get a statusUpdate()
>> callback."
>> Ben;
>> "Methods will not be invoked concurrently, and each method must complete
>> before the next is called."
>>
>> ??
>>
>> Jim
>>
>>
>> On 10 June 2015 at 02:22, Benjamin Mahler <be...@gmail.com>
>> wrote:
>>
>>> If that's really what you're seeing, it is a bug and a very surprising
>>> one, so please provide evidence :)
>>>
>>> See the "detailed description" here:
>>> http://mesos.apache.org/api/latest/c++/classmesos_1_1Scheduler.html
>>>
>>> The scheduler driver will serially invoke methods on your Scheduler
>>> implementation. Methods will not be invoked concurrently, and each method
>>> must complete before the next is called.
>>>
>>> So, we recommend that you don't block inside the callbacks. Otherwise,
>>> you're blocking the driver as well and your own ability to continue
>>> processing callbacks.
>>>
>>> On Tue, Jun 9, 2015 at 8:58 AM, James Vanns <jv...@gmail.com>
>>> wrote:
>>>
>>>> Hi. I'm toying with the mesos scheduler (C++) API and running into
>>>> unexpected race conditions. I have *not* synchronised access to attributes
>>>> of my Scheduler-derived class. Is the mesos library code threaded and
>>>> network communication asynchronous? What it *looks like* I'm seeing is my
>>>> statusUpdate() callback being executed before the return of
>>>> resourceOffers(). Naturally I call driver->launchTasks() inside
>>>> resourceOffers(). This is intermittent but generally triggered by tasks
>>>> that report status changes very quickly; eg. a task that fails instantly.
>>>>
>>>> Can anyone point me in the right direction of any online API docs that
>>>> explain how callbacks are invoked? Distributed over a pool of worker
>>>> threads?
>>>>
>>>> Also are the state transitions documented? Eg.
>>>> mesos::TASK_STAGING -> mesos::TASK_STARTING -> etc.
>>>>
>>>> Cheers,
>>>>
>>>> Jim
>>>>
>>>> --
>>>> Senior Code Pig
>>>> Industrial Light & Magic
>>>>
>>>
>>>
>>
>>
>> --
>> --
>> Senior Code Pig
>> Industrial Light & Magic
>>
>
>
>
>


-- 
--
Senior Code Pig
Industrial Light & Magic

Re: Threading model of mesos API (C++)

Posted by Alexander Gallego <ag...@concord.io>.
Jim,

Let me prototype something small today. After reading my scheduler (in c++)
i do have comments and synchronization on some state vars, but it might
have to do with a more complex async code base I manage.

I'll get back to you.

- alex


On Wed, Jun 10, 2015 at 6:15 AM, James Vanns <jv...@gmail.com> wrote:

> Thanks for the responses, guys. That link of the 'detailed description'
> will be handy - I've not come across that before. I do now have another
> question though! Aren't these two a contradiction;
>
> Alex;
> "you launch a task, before the method returns (say you do some blocking
> stuff after, like sync update zookeeper), you might get a statusUpdate()
> callback."
> Ben;
> "Methods will not be invoked concurrently, and each method must complete
> before the next is called."
>
> ??
>
> Jim
>
>
> On 10 June 2015 at 02:22, Benjamin Mahler <be...@gmail.com>
> wrote:
>
>> If that's really what you're seeing, it is a bug and a very surprising
>> one, so please provide evidence :)
>>
>> See the "detailed description" here:
>> http://mesos.apache.org/api/latest/c++/classmesos_1_1Scheduler.html
>>
>> The scheduler driver will serially invoke methods on your Scheduler
>> implementation. Methods will not be invoked concurrently, and each method
>> must complete before the next is called.
>>
>> So, we recommend that you don't block inside the callbacks. Otherwise,
>> you're blocking the driver as well and your own ability to continue
>> processing callbacks.
>>
>> On Tue, Jun 9, 2015 at 8:58 AM, James Vanns <jv...@gmail.com> wrote:
>>
>>> Hi. I'm toying with the mesos scheduler (C++) API and running into
>>> unexpected race conditions. I have *not* synchronised access to attributes
>>> of my Scheduler-derived class. Is the mesos library code threaded and
>>> network communication asynchronous? What it *looks like* I'm seeing is my
>>> statusUpdate() callback being executed before the return of
>>> resourceOffers(). Naturally I call driver->launchTasks() inside
>>> resourceOffers(). This is intermittent but generally triggered by tasks
>>> that report status changes very quickly; eg. a task that fails instantly.
>>>
>>> Can anyone point me in the right direction of any online API docs that
>>> explain how callbacks are invoked? Distributed over a pool of worker
>>> threads?
>>>
>>> Also are the state transitions documented? Eg.
>>> mesos::TASK_STAGING -> mesos::TASK_STARTING -> etc.
>>>
>>> Cheers,
>>>
>>> Jim
>>>
>>> --
>>> Senior Code Pig
>>> Industrial Light & Magic
>>>
>>
>>
>
>
> --
> --
> Senior Code Pig
> Industrial Light & Magic
>

Re: Threading model of mesos API (C++)

Posted by James Vanns <jv...@gmail.com>.
Thanks for the responses, guys. That link of the 'detailed description'
will be handy - I've not come across that before. I do now have another
question though! Aren't these two a contradiction;

Alex;
"you launch a task, before the method returns (say you do some blocking
stuff after, like sync update zookeeper), you might get a statusUpdate()
callback."
Ben;
"Methods will not be invoked concurrently, and each method must complete
before the next is called."

??

Jim


On 10 June 2015 at 02:22, Benjamin Mahler <be...@gmail.com> wrote:

> If that's really what you're seeing, it is a bug and a very surprising
> one, so please provide evidence :)
>
> See the "detailed description" here:
> http://mesos.apache.org/api/latest/c++/classmesos_1_1Scheduler.html
>
> The scheduler driver will serially invoke methods on your Scheduler
> implementation. Methods will not be invoked concurrently, and each method
> must complete before the next is called.
>
> So, we recommend that you don't block inside the callbacks. Otherwise,
> you're blocking the driver as well and your own ability to continue
> processing callbacks.
>
> On Tue, Jun 9, 2015 at 8:58 AM, James Vanns <jv...@gmail.com> wrote:
>
>> Hi. I'm toying with the mesos scheduler (C++) API and running into
>> unexpected race conditions. I have *not* synchronised access to attributes
>> of my Scheduler-derived class. Is the mesos library code threaded and
>> network communication asynchronous? What it *looks like* I'm seeing is my
>> statusUpdate() callback being executed before the return of
>> resourceOffers(). Naturally I call driver->launchTasks() inside
>> resourceOffers(). This is intermittent but generally triggered by tasks
>> that report status changes very quickly; eg. a task that fails instantly.
>>
>> Can anyone point me in the right direction of any online API docs that
>> explain how callbacks are invoked? Distributed over a pool of worker
>> threads?
>>
>> Also are the state transitions documented? Eg.
>> mesos::TASK_STAGING -> mesos::TASK_STARTING -> etc.
>>
>> Cheers,
>>
>> Jim
>>
>> --
>> Senior Code Pig
>> Industrial Light & Magic
>>
>
>


-- 
--
Senior Code Pig
Industrial Light & Magic

Re: Threading model of mesos API (C++)

Posted by Benjamin Mahler <be...@gmail.com>.
If that's really what you're seeing, it is a bug and a very surprising one,
so please provide evidence :)

See the "detailed description" here:
http://mesos.apache.org/api/latest/c++/classmesos_1_1Scheduler.html

The scheduler driver will serially invoke methods on your Scheduler
implementation. Methods will not be invoked concurrently, and each method
must complete before the next is called.

So, we recommend that you don't block inside the callbacks. Otherwise,
you're blocking the driver as well and your own ability to continue
processing callbacks.

On Tue, Jun 9, 2015 at 8:58 AM, James Vanns <jv...@gmail.com> wrote:

> Hi. I'm toying with the mesos scheduler (C++) API and running into
> unexpected race conditions. I have *not* synchronised access to attributes
> of my Scheduler-derived class. Is the mesos library code threaded and
> network communication asynchronous? What it *looks like* I'm seeing is my
> statusUpdate() callback being executed before the return of
> resourceOffers(). Naturally I call driver->launchTasks() inside
> resourceOffers(). This is intermittent but generally triggered by tasks
> that report status changes very quickly; eg. a task that fails instantly.
>
> Can anyone point me in the right direction of any online API docs that
> explain how callbacks are invoked? Distributed over a pool of worker
> threads?
>
> Also are the state transitions documented? Eg.
> mesos::TASK_STAGING -> mesos::TASK_STARTING -> etc.
>
> Cheers,
>
> Jim
>
> --
> Senior Code Pig
> Industrial Light & Magic
>

Re: Threading model of mesos API (C++)

Posted by Alexander Gallego <ag...@concord.io>.
Jim,

You do need to do your own synchronization.

It's basically possible for ANY callback to call your code in any order.
The API does not guarantee ordering.

For example, say you launch a task, before the method returns (say you do
some blocking stuff after, like sync update zookeeper), you might get a
statusUpdate() callback.

Effectively, you have 3 steps.

1.  a list of tasks for things that are pending (not launched, not enough
resources yet)
2.  a list of tasks for things that have launched but not yet
'statusUpdate()'
3.  a form of persistent list (zk, mysql, whatev) for things that confirm
they are running if you need High Avail.






On Tue, Jun 9, 2015 at 12:02 PM, James Vanns <jv...@gmail.com> wrote:

> Replying to my own thread here ;) It is also possible that
> resourceOffers() is called more than once before the first statusUpdate()
> is received. That is more likely.
>
> Some links to state transitions, and any threading model would be handy
> though. Or at least how and when callbacks are invoked. Just so I know what
> I need to protect access to ;)
>
> Cheers,
>
> Jim
>
>
> On 9 June 2015 at 16:58, James Vanns <jv...@gmail.com> wrote:
>
>> Hi. I'm toying with the mesos scheduler (C++) API and running into
>> unexpected race conditions. I have *not* synchronised access to attributes
>> of my Scheduler-derived class. Is the mesos library code threaded and
>> network communication asynchronous? What it *looks like* I'm seeing is my
>> statusUpdate() callback being executed before the return of
>> resourceOffers(). Naturally I call driver->launchTasks() inside
>> resourceOffers(). This is intermittent but generally triggered by tasks
>> that report status changes very quickly; eg. a task that fails instantly.
>>
>> Can anyone point me in the right direction of any online API docs that
>> explain how callbacks are invoked? Distributed over a pool of worker
>> threads?
>>
>> Also are the state transitions documented? Eg.
>> mesos::TASK_STAGING -> mesos::TASK_STARTING -> etc.
>>
>> Cheers,
>>
>> Jim
>>
>> --
>> Senior Code Pig
>> Industrial Light & Magic
>>
>
>
>
> --
> --
> Senior Code Pig
> Industrial Light & Magic
>

Re: Threading model of mesos API (C++)

Posted by James Vanns <jv...@gmail.com>.
Replying to my own thread here ;) It is also possible that resourceOffers()
is called more than once before the first statusUpdate() is received. That
is more likely.

Some links to state transitions, and any threading model would be handy
though. Or at least how and when callbacks are invoked. Just so I know what
I need to protect access to ;)

Cheers,

Jim


On 9 June 2015 at 16:58, James Vanns <jv...@gmail.com> wrote:

> Hi. I'm toying with the mesos scheduler (C++) API and running into
> unexpected race conditions. I have *not* synchronised access to attributes
> of my Scheduler-derived class. Is the mesos library code threaded and
> network communication asynchronous? What it *looks like* I'm seeing is my
> statusUpdate() callback being executed before the return of
> resourceOffers(). Naturally I call driver->launchTasks() inside
> resourceOffers(). This is intermittent but generally triggered by tasks
> that report status changes very quickly; eg. a task that fails instantly.
>
> Can anyone point me in the right direction of any online API docs that
> explain how callbacks are invoked? Distributed over a pool of worker
> threads?
>
> Also are the state transitions documented? Eg.
> mesos::TASK_STAGING -> mesos::TASK_STARTING -> etc.
>
> Cheers,
>
> Jim
>
> --
> Senior Code Pig
> Industrial Light & Magic
>



-- 
--
Senior Code Pig
Industrial Light & Magic