You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mesos.apache.org by Itamar Ostricher <it...@yowza3d.com> on 2015/03/19 16:19:23 UTC

Is launchTasks() with multiple offers limited to a single slave?

Hi,

According to the Python interface docstring
<https://github.com/apache/mesos/blob/master/src/python/interface/src/mesos/interface/__init__.py#L184-L193>,
launchTasks() may be called with a set of tasks.

In our framework, we thought this is used to issue a single RPC for
launching many tasks onto many offers (potentially from many slaves), as an
optimization (e.g., less communication overhead).

But, when running with multiple slaves, we saw that tasks are lost when
they are assigned to different slaves with the same launchTasks() call.

Reading the docstring of launchTasks carefully, I still couldn't figure out
that this is the intended behavior, so I'm here to verify that.
If that's by design, it should be stated clearly in the docstring (I'd be
happy to provide a documentation pull request for this).

Now, if this *is* the intended behavior, it raises the question - why does
launchTasks() support a set of tasks? doesn't mesos already aggregate
resources from the same slave to a single offer?

Thanks,
- Itamar.

Re: Is launchTasks() with multiple offers limited to a single slave?

Posted by Itamar Ostricher <it...@yowza3d.com>.

Thanks Michael!

On Mon, Mar 23, 2015 at 7:59 AM, Michael Park <mc...@gmail.com> wrote:

> Hi Itamar,
>
> Thanks for the patch! It looks like Niklas and Jie has looked at the patch
> and I'm sure they'll commit it soon, if not I'll nudge them :)
>
Great :-)

>
> 2. I would imagine there could be a "cleaner" way (from a framework author
>> perspective) to do that, by setting a policy or a filter or something, that
>> communicates to the master that the scheduler would like to receive only
>> offers that meet some criteria (e.g. min_cpu, min_mem, etc.). effectively,
>> moving the complexity of holding on to offers from the framework to the
>> master.
>>
>
> I don't think the ability to request for "only offer me resources that
> contains: [(cpus, 8), (mem, 1048)]" can actually replace the current
> ability of the frameworks, since the framework can currently build up the
> resources while holding onto the ones it was given. For example, suppose
> [(cpus, 2), (mem, 1048)] became available but the mesos master doesn't
> offer it to the framework because it does not meet the requirement, and
> instead offers it to another framework. Then [(cpus, 6)] becomes available
> and it still doesn't meet the requirement, so it gets offered to yet
> another framework. In order to avoid this situation, the framework would
> have to request for something like "offer me at least [(cpus, 8), (mem,
> 1048)] once you have built it up". I think we could support a mechanism
> like this via some form of "resource request".
>

I think your example is an excellent one to explain why I think the
"policy" approach can work better!
If framework A indeed requires [(cpus,8),(mem,1048)], and framework B is
happy with [(cpus,2),(mem,1048)], then with the current approach framework
A might want to hold on to [(cpus,2],(mem,1048)] until it gets another
[(cpus,6)] from the same slave. But the master may see that framework A is
not responding (because it's "holding on"), and offer the same
[(cpus,2),(mem,1048)] to framework B. Since framework B may immediately
accept, the offer will be rescinded from framework A.
This may make it difficult for framework A to get the resources it needs,
unless it has a way to tell the master that it wants [(cpus,8),(mem,1048)].
(unless I misunderstood how the master handles offering multiple resources
to multiple frameworks)

> It may also be satisfied by a reservation of some form. Some of these
> mechanisms such as "offer reservations" are described in MESOS-1791
> <https://issues.apache.org/jira/browse/MESOS-1791>.
>
> Is such a thing possible in mesos?
>
>
> Currently no, but it will be coming!
>
Great!

>
> Was it an explicit design decision to keep such logic at framework level?
>
>
> You may have noticed that there already exists a *Request* message in
> *mesos.proto* which currently does nothing. So while the logic lives at
> the framework-level right now, I don't think it was an explicit design
> decision to keep it there in the long run.
>
> MPark.
>

Re: Is launchTasks() with multiple offers limited to a single slave?

Posted by Michael Park <mc...@gmail.com>.

Hi Itamar,

Thanks for the patch! It looks like Niklas and Jie has looked at the patch
and I'm sure they'll commit it soon, if not I'll nudge them :)

2. I would imagine there could be a "cleaner" way (from a framework author
> perspective) to do that, by setting a policy or a filter or something, that
> communicates to the master that the scheduler would like to receive only
> offers that meet some criteria (e.g. min_cpu, min_mem, etc.). effectively,
> moving the complexity of holding on to offers from the framework to the
> master.
>

I don't think the ability to request for "only offer me resources that
contains: [(cpus, 8), (mem, 1048)]" can actually replace the current
ability of the frameworks, since the framework can currently build up the
resources while holding onto the ones it was given. For example, suppose
[(cpus, 2), (mem, 1048)] became available but the mesos master doesn't
offer it to the framework because it does not meet the requirement, and
instead offers it to another framework. Then [(cpus, 6)] becomes available
and it still doesn't meet the requirement, so it gets offered to yet
another framework. In order to avoid this situation, the framework would
have to request for something like "offer me at least [(cpus, 8), (mem,
1048)] once you have built it up". I think we could support a mechanism
like this via some form of "resource request".

It may also be satisfied by a reservation of some form. Some of these
mechanisms such as "offer reservations" are described in MESOS-1791
<https://issues.apache.org/jira/browse/MESOS-1791>.

Is such a thing possible in mesos?


Currently no, but it will be coming!

Was it an explicit design decision to keep such logic at framework level?


You may have noticed that there already exists a *Request* message in
*mesos.proto* which currently does nothing. So while the logic lives at the
framework-level right now, I don't think it was an explicit design decision
to keep it there in the long run.

MPark.

Re: Is launchTasks() with multiple offers limited to a single slave?

Posted by Itamar Ostricher <it...@yowza3d.com>.

Thanks for the replies!

@Sharma - yes, I was talking about multiple tasks on multiples slaves, with
each task assigned to a single slave. Indeed, our framework has a thread
that puts lots of (mostly) small tasks into a queue, so the thread that
handles resourceOffers can pop tasks from the queue to fill up the offers.
I agree that this is not limiting. It was just surprising to get all these
TASK_LOST statuses, given the lacking docstring.

@Michael - I did my best following the "contributing code" (or
documentation in this case) guidelines. hope it's OK. opened a JIRA issue
<https://issues.apache.org/jira/browse/MESOS-2525> and a review request
<https://reviews.apache.org/r/32306/>.

@Adam - wow, that's fascinating! I would love to get some validation about
this point - can anyone say how the actual messages from scheduler to
master are handled by mesos in case of multiple calls to launchTasks in the
context of a single resourceOffers invocation? can I simply not think about
optimizing for network when calling launchTasks?

@All
So, if I understand correctly now, the multiple offers feature is meant to
allow a scheduler to hold on to offers (across resourceOffers calls), as
long as they are not rescinded, and eventually launch a "big task" that
uses the sum of all collected offers from a specific slave?
If this is the case:
1. cool :-)
2. I would imagine there could be a "cleaner" way (from a framework author
perspective) to do that, by setting a policy or a filter or something, that
communicates to the master that the scheduler would like to receive only
offers that meet some criteria (e.g. min_cpu, min_mem, etc.). effectively,
moving the complexity of holding on to offers from the framework to the
master.
Is such a thing possible in mesos? Was it an explicit design decision to
keep such logic at framework level?

Thanks,
- Itamar.

On Fri, Mar 20, 2015 at 10:24 AM, Adam Bordelon <ad...@mesosphere.io> wrote:

> Keep in mind that you can call launchTasks() multiple times (once per
> slave) within the same resourceOffers callback in your scheduler, and due
> to the actor nature of libprocess, they will all be sent at the same time
> when resourceOffers returns to the SchedulerDriver. I'm not familiar enough
> with the internals of libprocess to know if/how it batches all of those
> messages together when transferring them to the master's libprocess actor,
> but it appears they are split again by the time they reach the master's
> mailbox, since the master will get one launchTasks callback per original
> launchTasks call.
>
> On Thu, Mar 19, 2015 at 12:03 PM, Michael Park <mc...@gmail.com> wrote:
>
>> Hi Itamar,
>>
>> Wow, thanks for bringing this up!
>>
>> The intended behavior is for *launchTasks* to take a set of tasks to be
>> launched on a *single *slave. This means that the multiple offers passed
>> to *launchTasks* must be from the *same *slave. The Python documentation
>> absolutely should state this explicitly as it does for acceptOffers
>> <https://github.com/apache/mesos/blob/master/src/python/interface/src/mesos/interface/__init__.py#L204-L213> as
>> well as C++ *launchTasks
>> <https://github.com/apache/mesos/blob/2985ae05634038b70f974bbfed6b52fe47231418/include/mesos/scheduler.hpp#L226-L237>*.
>> If you would create a review request for this that would be awesome!
>>
>> Now, if this *is* the intended behavior, it raises the question - why
>>> does launchTasks() support a set of tasks? doesn't mesos already aggregate
>>> resources from the same slave to a single offer?
>>
>>
>> The primary use case of this feature is to allow frameworks to hold onto
>> offers and use them in conjunction with other offers from the same slave
>> later on.
>>
>> MPark.
>>
>> On 19 March 2015 at 14:28, Sharma Podila <sp...@netflix.com> wrote:
>>
>>> I will assume that you are not talking of the case that a task actually
>>> is being launched on multiple salves, since a task can only be launched on
>>> one slave with existing concepts.
>>>
>>> Yes, that call is for one or more tasks on a single slave. That call
>>> (since 0.18, I believe) also takes multiple offers of the same slave, which
>>> can happen due to tasks finishing at different times on the host.
>>>
>>> I have seen discussion on batching status updates/acks. But, not on
>>> batching launching of tasks across multiple slaves. From a user
>>> perspective, I'd imagine that this should be possible. It would be useful
>>> for frameworks with high rate of task dispatching.
>>>
>>> I suspect (purely my opinion) that this model may have come up in the
>>> beginning when most frameworks were scheduling one task at a time before
>>> moving to the next pending task. My framework, for example, runs a
>>> scheduling loop/iteration and comes up with schedules for multiple tasks
>>> across one more slaves. I would find it useful as well to batch up task
>>> launches across multiple hosts.
>>>
>>> That said, I haven't found the existing method to be limiting in
>>> performance/latency for our needs at this time.
>>>
>>>
>>>
>>> On Thu, Mar 19, 2015 at 8:19 AM, Itamar Ostricher <it...@yowza3d.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> According to the Python interface docstring
>>>> <https://github.com/apache/mesos/blob/master/src/python/interface/src/mesos/interface/__init__.py#L184-L193>,
>>>> launchTasks() may be called with a set of tasks.
>>>>
>>>> In our framework, we thought this is used to issue a single RPC for
>>>> launching many tasks onto many offers (potentially from many slaves), as an
>>>> optimization (e.g., less communication overhead).
>>>>
>>>> But, when running with multiple slaves, we saw that tasks are lost when
>>>> they are assigned to different slaves with the same launchTasks() call.
>>>>
>>>> Reading the docstring of launchTasks carefully, I still couldn't figure
>>>> out that this is the intended behavior, so I'm here to verify that.
>>>> If that's by design, it should be stated clearly in the docstring (I'd
>>>> be happy to provide a documentation pull request for this).
>>>>
>>>> Now, if this *is* the intended behavior, it raises the question - why
>>>> does launchTasks() support a set of tasks? doesn't mesos already aggregate
>>>> resources from the same slave to a single offer?
>>>>
>>>> Thanks,
>>>> - Itamar.
>>>>
>>>
>>>
>>
>

Re: Is launchTasks() with multiple offers limited to a single slave?

Posted by Adam Bordelon <ad...@mesosphere.io>.

Keep in mind that you can call launchTasks() multiple times (once per
slave) within the same resourceOffers callback in your scheduler, and due
to the actor nature of libprocess, they will all be sent at the same time
when resourceOffers returns to the SchedulerDriver. I'm not familiar enough
with the internals of libprocess to know if/how it batches all of those
messages together when transferring them to the master's libprocess actor,
but it appears they are split again by the time they reach the master's
mailbox, since the master will get one launchTasks callback per original
launchTasks call.

On Thu, Mar 19, 2015 at 12:03 PM, Michael Park <mc...@gmail.com> wrote:

> Hi Itamar,
>
> Wow, thanks for bringing this up!
>
> The intended behavior is for *launchTasks* to take a set of tasks to be
> launched on a *single *slave. This means that the multiple offers passed
> to *launchTasks* must be from the *same *slave. The Python documentation
> absolutely should state this explicitly as it does for acceptOffers
> <https://github.com/apache/mesos/blob/master/src/python/interface/src/mesos/interface/__init__.py#L204-L213> as
> well as C++ *launchTasks
> <https://github.com/apache/mesos/blob/2985ae05634038b70f974bbfed6b52fe47231418/include/mesos/scheduler.hpp#L226-L237>*.
> If you would create a review request for this that would be awesome!
>
> Now, if this *is* the intended behavior, it raises the question - why does
>> launchTasks() support a set of tasks? doesn't mesos already aggregate
>> resources from the same slave to a single offer?
>
>
> The primary use case of this feature is to allow frameworks to hold onto
> offers and use them in conjunction with other offers from the same slave
> later on.
>
> MPark.
>
> On 19 March 2015 at 14:28, Sharma Podila <sp...@netflix.com> wrote:
>
>> I will assume that you are not talking of the case that a task actually
>> is being launched on multiple salves, since a task can only be launched on
>> one slave with existing concepts.
>>
>> Yes, that call is for one or more tasks on a single slave. That call
>> (since 0.18, I believe) also takes multiple offers of the same slave, which
>> can happen due to tasks finishing at different times on the host.
>>
>> I have seen discussion on batching status updates/acks. But, not on
>> batching launching of tasks across multiple slaves. From a user
>> perspective, I'd imagine that this should be possible. It would be useful
>> for frameworks with high rate of task dispatching.
>>
>> I suspect (purely my opinion) that this model may have come up in the
>> beginning when most frameworks were scheduling one task at a time before
>> moving to the next pending task. My framework, for example, runs a
>> scheduling loop/iteration and comes up with schedules for multiple tasks
>> across one more slaves. I would find it useful as well to batch up task
>> launches across multiple hosts.
>>
>> That said, I haven't found the existing method to be limiting in
>> performance/latency for our needs at this time.
>>
>>
>>
>> On Thu, Mar 19, 2015 at 8:19 AM, Itamar Ostricher <it...@yowza3d.com>
>> wrote:
>>
>>> Hi,
>>>
>>> According to the Python interface docstring
>>> <https://github.com/apache/mesos/blob/master/src/python/interface/src/mesos/interface/__init__.py#L184-L193>,
>>> launchTasks() may be called with a set of tasks.
>>>
>>> In our framework, we thought this is used to issue a single RPC for
>>> launching many tasks onto many offers (potentially from many slaves), as an
>>> optimization (e.g., less communication overhead).
>>>
>>> But, when running with multiple slaves, we saw that tasks are lost when
>>> they are assigned to different slaves with the same launchTasks() call.
>>>
>>> Reading the docstring of launchTasks carefully, I still couldn't figure
>>> out that this is the intended behavior, so I'm here to verify that.
>>> If that's by design, it should be stated clearly in the docstring (I'd
>>> be happy to provide a documentation pull request for this).
>>>
>>> Now, if this *is* the intended behavior, it raises the question - why
>>> does launchTasks() support a set of tasks? doesn't mesos already aggregate
>>> resources from the same slave to a single offer?
>>>
>>> Thanks,
>>> - Itamar.
>>>
>>
>>
>

Re: Is launchTasks() with multiple offers limited to a single slave?

Posted by Michael Park <mc...@gmail.com>.

Hi Itamar,

Wow, thanks for bringing this up!

The intended behavior is for *launchTasks* to take a set of tasks to be
launched on a *single *slave. This means that the multiple offers passed to
*launchTasks* must be from the *same *slave. The Python documentation
absolutely should state this explicitly as it does for acceptOffers
<https://github.com/apache/mesos/blob/master/src/python/interface/src/mesos/interface/__init__.py#L204-L213>
as
well as C++ *launchTasks
<https://github.com/apache/mesos/blob/2985ae05634038b70f974bbfed6b52fe47231418/include/mesos/scheduler.hpp#L226-L237>*.
If you would create a review request for this that would be awesome!

Now, if this *is* the intended behavior, it raises the question - why does
> launchTasks() support a set of tasks? doesn't mesos already aggregate
> resources from the same slave to a single offer?


The primary use case of this feature is to allow frameworks to hold onto
offers and use them in conjunction with other offers from the same slave
later on.

MPark.

On 19 March 2015 at 14:28, Sharma Podila <sp...@netflix.com> wrote:

> I will assume that you are not talking of the case that a task actually is
> being launched on multiple salves, since a task can only be launched on one
> slave with existing concepts.
>
> Yes, that call is for one or more tasks on a single slave. That call
> (since 0.18, I believe) also takes multiple offers of the same slave, which
> can happen due to tasks finishing at different times on the host.
>
> I have seen discussion on batching status updates/acks. But, not on
> batching launching of tasks across multiple slaves. From a user
> perspective, I'd imagine that this should be possible. It would be useful
> for frameworks with high rate of task dispatching.
>
> I suspect (purely my opinion) that this model may have come up in the
> beginning when most frameworks were scheduling one task at a time before
> moving to the next pending task. My framework, for example, runs a
> scheduling loop/iteration and comes up with schedules for multiple tasks
> across one more slaves. I would find it useful as well to batch up task
> launches across multiple hosts.
>
> That said, I haven't found the existing method to be limiting in
> performance/latency for our needs at this time.
>
>
>
> On Thu, Mar 19, 2015 at 8:19 AM, Itamar Ostricher <it...@yowza3d.com>
> wrote:
>
>> Hi,
>>
>> According to the Python interface docstring
>> <https://github.com/apache/mesos/blob/master/src/python/interface/src/mesos/interface/__init__.py#L184-L193>,
>> launchTasks() may be called with a set of tasks.
>>
>> In our framework, we thought this is used to issue a single RPC for
>> launching many tasks onto many offers (potentially from many slaves), as an
>> optimization (e.g., less communication overhead).
>>
>> But, when running with multiple slaves, we saw that tasks are lost when
>> they are assigned to different slaves with the same launchTasks() call.
>>
>> Reading the docstring of launchTasks carefully, I still couldn't figure
>> out that this is the intended behavior, so I'm here to verify that.
>> If that's by design, it should be stated clearly in the docstring (I'd be
>> happy to provide a documentation pull request for this).
>>
>> Now, if this *is* the intended behavior, it raises the question - why
>> does launchTasks() support a set of tasks? doesn't mesos already aggregate
>> resources from the same slave to a single offer?
>>
>> Thanks,
>> - Itamar.
>>
>
>

Re: Is launchTasks() with multiple offers limited to a single slave?

Posted by Sharma Podila <sp...@netflix.com>.

I will assume that you are not talking of the case that a task actually is
being launched on multiple salves, since a task can only be launched on one
slave with existing concepts.

Yes, that call is for one or more tasks on a single slave. That call (since
0.18, I believe) also takes multiple offers of the same slave, which can
happen due to tasks finishing at different times on the host.

I have seen discussion on batching status updates/acks. But, not on
batching launching of tasks across multiple slaves. From a user
perspective, I'd imagine that this should be possible. It would be useful
for frameworks with high rate of task dispatching.

I suspect (purely my opinion) that this model may have come up in the
beginning when most frameworks were scheduling one task at a time before
moving to the next pending task. My framework, for example, runs a
scheduling loop/iteration and comes up with schedules for multiple tasks
across one more slaves. I would find it useful as well to batch up task
launches across multiple hosts.

That said, I haven't found the existing method to be limiting in
performance/latency for our needs at this time.

On Thu, Mar 19, 2015 at 8:19 AM, Itamar Ostricher <it...@yowza3d.com>
wrote:

> Hi,
>
> According to the Python interface docstring
> <https://github.com/apache/mesos/blob/master/src/python/interface/src/mesos/interface/__init__.py#L184-L193>,
> launchTasks() may be called with a set of tasks.
>
> In our framework, we thought this is used to issue a single RPC for
> launching many tasks onto many offers (potentially from many slaves), as an
> optimization (e.g., less communication overhead).
>
> But, when running with multiple slaves, we saw that tasks are lost when
> they are assigned to different slaves with the same launchTasks() call.
>
> Reading the docstring of launchTasks carefully, I still couldn't figure
> out that this is the intended behavior, so I'm here to verify that.
> If that's by design, it should be stated clearly in the docstring (I'd be
> happy to provide a documentation pull request for this).
>
> Now, if this *is* the intended behavior, it raises the question - why does
> launchTasks() support a set of tasks? doesn't mesos already aggregate
> resources from the same slave to a single offer?
>
> Thanks,
> - Itamar.
>