You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mesos.apache.org by Douglas Voet <dv...@broadinstitute.org> on 2015/01/16 04:23:36 UTC

implementing data locality via mesos resource offers

Hello,

I am evaluating mesos in the context of running analyses of many large
files. I only want to download a file to a small subset of my nodes and
route the related processing there. The mesos paper talks about using
resource offers as a mechanism to achieve data locality but I can't find
any reference to how one might do this in the documentation. How would a
mesos slave know what data is available keeping in mind that that might
change over time? How can I configure a slave to include this information
in resource offers?

Thanks in advance for any pointers.

-Doug

Re: implementing data locality via mesos resource offers

Posted by Adam Bordelon <ad...@mesosphere.io>.

See also the upcoming persistence primitives. Using persistent volumes (
MESOS-1554 <https://issues.apache.org/jira/browse/MESOS-1554>) and dynamic
reservations (MESOS-2018 <https://issues.apache.org/jira/browse/MESOS-2018>),
you can launch a task from one framework which creates a persistent volume
on a slave, stores its output data there, and then completes. The volume
will persist after the task exits, and its associated resources will be
offered to a framework sharing the same role. In this way, you can
guarantee data locality, and even share data between different frameworks
by placing them under the same role.

On Fri, Jan 16, 2015 at 11:43 AM, Sharma Podila <sp...@netflix.com> wrote:

> Hi Tim,
>
> Sure, here's some preliminary thoughts.
>
> In a Mesos cluster that has only one framework, it would suffice for the
> scheduler to have this strategy;
>
> - when assigning for a task that needs data locality, assign from an offer
> from a host that has the data
> - when assigning for a task that does need data locality, do not assign
> from an offer from a host that has/had another task which produced data
> needed by others for data locality
>
> This strategy would naturally cluster hosts into two groups: one in which
> hosts are used for data locality and another in which hosts run tasks that
> don't need data locality. Or, multiple groups if not all data is identical.
>
> Now, if there were to be multiple frameworks in the cluster, we would need
> new support in Mesos to ensure the above strategy works. Mesos allocater
> would need to do the following:
>
> - when giving out offers to framework A, prefer hosts that had other tasks
> running (or previously run) from framework A.
>
> As an example, say we have two frameworks A and B. And say there are 4
> hosts, h1, h2, h3, and h4, each with 4 cores.
> If, say, A and B are assigned 1:1, that is 8 cores each. Say currently, 2
> cores from each of the 4 hosts are offered to frameworks A and B. A variety
> of reasons could have resulted in such a split.
>
> Now, say framework A launches a task that uses 2 cores and it uses its
> offer on host h1. Now, framework A has no ability to launch another task to
> achieve data locality. To keep resource allocation still 1:1 and help data
> locality, it would be nice if Mesos did the following:
>
> - rescind 2-core offer on h1 from framework B
> - rescind 2-core offer on h2 from framework A
> - send 2-core offer on h1 to framework A
> - send 2-core offer on h2 to framework B
>
> This would need to be done only if framework A indicated, when launching
> its task on h1, that this is a task that produces data for locality
> purposes.
>
> Similarly, other scenarios and other resource types can be dealt with in
> this new strategy.
>
>
>
>
>
> On Fri, Jan 16, 2015 at 9:53 AM, Tim Chen <ti...@mesosphere.io> wrote:
>
>> Hi Sharma,
>>
>> You're correct and that's how most schedulers handle this, which is to
>> handle the locality information itself.
>>
>> We've considering and finding primitives to help in this front though, so
>> if you have any input let us know how to help manage locality that fits at
>> the level of Mesos.
>>
>> Tim
>>
>> On Fri, Jan 16, 2015 at 9:34 AM, Sharma Podila <sp...@netflix.com>
>> wrote:
>>
>>> Using the attributes would be the simplest way, if the slave were to
>>> support dynamic updates of the attributes. The JIRA that Tim references
>>> would be nice! Otherwise one would have to resort to something like a
>>> wrapper script of the mesos-slave process that detects new data
>>> availability and restarts mesos-slave with new attributes in cmdline.
>>> Restarts may be OK when slaves are run to checkpoint state and recover
>>> state upon restart.
>>>
>>> Another possibility in the interim would be for the framework scheduler
>>> to launch the task that does the download of the file(s) to the small
>>> subset of nodes. Then, the scheduler can maintain this state information
>>> and assign the tasks based on that. This has the additional advantage of
>>> maintaining the list of that subset of nodes in a more dynamic way, if that
>>> is useful to you.
>>>
>>> In general, I am a fan of achieving data locality via the scheduler's
>>> state info. In a more generic scenario, the data would be created
>>> dynamically by tasks previously run (instead of just an initial download)
>>> and therefore locality for such data is easier done via the scheduler.
>>>
>>>
>>>
>>> On Fri, Jan 16, 2015 at 12:15 AM, Tim Chen <ti...@mesosphere.io> wrote:
>>>
>>>> Hi Douglas,
>>>>
>>>> The simplest way that Mesos can support is to add attributes via cli
>>>> flags when you launch a mesos slave. And when this slave's resources is
>>>> being offered, it will also include all the attributes you've tagged.
>>>>
>>>> This currently is static information on launch, and I believe there is
>>>> JIRA tickets to make this dynamic (updatable at runtime).
>>>>
>>>> Tim
>>>>
>>>> On Thu, Jan 15, 2015 at 7:23 PM, Douglas Voet <dvoet@broadinstitute.org
>>>> > wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> I am evaluating mesos in the context of running analyses of many large
>>>>> files. I only want to download a file to a small subset of my nodes and
>>>>> route the related processing there. The mesos paper talks about using
>>>>> resource offers as a mechanism to achieve data locality but I can't find
>>>>> any reference to how one might do this in the documentation. How would a
>>>>> mesos slave know what data is available keeping in mind that that might
>>>>> change over time? How can I configure a slave to include this information
>>>>> in resource offers?
>>>>>
>>>>> Thanks in advance for any pointers.
>>>>>
>>>>> -Doug
>>>>>
>>>>
>>>>
>>>
>>
>

Re: implementing data locality via mesos resource offers

Posted by Sharma Podila <sp...@netflix.com>.

Hi Tim,

Sure, here's some preliminary thoughts.

In a Mesos cluster that has only one framework, it would suffice for the
scheduler to have this strategy;

- when assigning for a task that needs data locality, assign from an offer
from a host that has the data
- when assigning for a task that does need data locality, do not assign
from an offer from a host that has/had another task which produced data
needed by others for data locality

This strategy would naturally cluster hosts into two groups: one in which
hosts are used for data locality and another in which hosts run tasks that
don't need data locality. Or, multiple groups if not all data is identical.

Now, if there were to be multiple frameworks in the cluster, we would need
new support in Mesos to ensure the above strategy works. Mesos allocater
would need to do the following:

- when giving out offers to framework A, prefer hosts that had other tasks
running (or previously run) from framework A.

As an example, say we have two frameworks A and B. And say there are 4
hosts, h1, h2, h3, and h4, each with 4 cores.
If, say, A and B are assigned 1:1, that is 8 cores each. Say currently, 2
cores from each of the 4 hosts are offered to frameworks A and B. A variety
of reasons could have resulted in such a split.

Now, say framework A launches a task that uses 2 cores and it uses its
offer on host h1. Now, framework A has no ability to launch another task to
achieve data locality. To keep resource allocation still 1:1 and help data
locality, it would be nice if Mesos did the following:

- rescind 2-core offer on h1 from framework B
- rescind 2-core offer on h2 from framework A
- send 2-core offer on h1 to framework A
- send 2-core offer on h2 to framework B

This would need to be done only if framework A indicated, when launching
its task on h1, that this is a task that produces data for locality
purposes.

Similarly, other scenarios and other resource types can be dealt with in
this new strategy.





On Fri, Jan 16, 2015 at 9:53 AM, Tim Chen <ti...@mesosphere.io> wrote:

> Hi Sharma,
>
> You're correct and that's how most schedulers handle this, which is to
> handle the locality information itself.
>
> We've considering and finding primitives to help in this front though, so
> if you have any input let us know how to help manage locality that fits at
> the level of Mesos.
>
> Tim
>
> On Fri, Jan 16, 2015 at 9:34 AM, Sharma Podila <sp...@netflix.com>
> wrote:
>
>> Using the attributes would be the simplest way, if the slave were to
>> support dynamic updates of the attributes. The JIRA that Tim references
>> would be nice! Otherwise one would have to resort to something like a
>> wrapper script of the mesos-slave process that detects new data
>> availability and restarts mesos-slave with new attributes in cmdline.
>> Restarts may be OK when slaves are run to checkpoint state and recover
>> state upon restart.
>>
>> Another possibility in the interim would be for the framework scheduler
>> to launch the task that does the download of the file(s) to the small
>> subset of nodes. Then, the scheduler can maintain this state information
>> and assign the tasks based on that. This has the additional advantage of
>> maintaining the list of that subset of nodes in a more dynamic way, if that
>> is useful to you.
>>
>> In general, I am a fan of achieving data locality via the scheduler's
>> state info. In a more generic scenario, the data would be created
>> dynamically by tasks previously run (instead of just an initial download)
>> and therefore locality for such data is easier done via the scheduler.
>>
>>
>>
>> On Fri, Jan 16, 2015 at 12:15 AM, Tim Chen <ti...@mesosphere.io> wrote:
>>
>>> Hi Douglas,
>>>
>>> The simplest way that Mesos can support is to add attributes via cli
>>> flags when you launch a mesos slave. And when this slave's resources is
>>> being offered, it will also include all the attributes you've tagged.
>>>
>>> This currently is static information on launch, and I believe there is
>>> JIRA tickets to make this dynamic (updatable at runtime).
>>>
>>> Tim
>>>
>>> On Thu, Jan 15, 2015 at 7:23 PM, Douglas Voet <dv...@broadinstitute.org>
>>> wrote:
>>>
>>>> Hello,
>>>>
>>>> I am evaluating mesos in the context of running analyses of many large
>>>> files. I only want to download a file to a small subset of my nodes and
>>>> route the related processing there. The mesos paper talks about using
>>>> resource offers as a mechanism to achieve data locality but I can't find
>>>> any reference to how one might do this in the documentation. How would a
>>>> mesos slave know what data is available keeping in mind that that might
>>>> change over time? How can I configure a slave to include this information
>>>> in resource offers?
>>>>
>>>> Thanks in advance for any pointers.
>>>>
>>>> -Doug
>>>>
>>>
>>>
>>
>

Re: implementing data locality via mesos resource offers

Posted by Douglas Voet <dv...@broadinstitute.org>.

Thanks for the comments Tim and Sharma. Handling it on the scheduler end is
an interesting suggestion that might better manage the issue of lifecycle
of this data (when to put it on a slave and when to remove it). However the
resource offer approach is nice to handle the routing.

On Fri, Jan 16, 2015 at 12:53 PM, Tim Chen <ti...@mesosphere.io> wrote:

> Hi Sharma,
>
> You're correct and that's how most schedulers handle this, which is to
> handle the locality information itself.
>
> We've considering and finding primitives to help in this front though, so
> if you have any input let us know how to help manage locality that fits at
> the level of Mesos.
>
> Tim
>
> On Fri, Jan 16, 2015 at 9:34 AM, Sharma Podila <sp...@netflix.com>
> wrote:
>
>> Using the attributes would be the simplest way, if the slave were to
>> support dynamic updates of the attributes. The JIRA that Tim references
>> would be nice! Otherwise one would have to resort to something like a
>> wrapper script of the mesos-slave process that detects new data
>> availability and restarts mesos-slave with new attributes in cmdline.
>> Restarts may be OK when slaves are run to checkpoint state and recover
>> state upon restart.
>>
>> Another possibility in the interim would be for the framework scheduler
>> to launch the task that does the download of the file(s) to the small
>> subset of nodes. Then, the scheduler can maintain this state information
>> and assign the tasks based on that. This has the additional advantage of
>> maintaining the list of that subset of nodes in a more dynamic way, if that
>> is useful to you.
>>
>> In general, I am a fan of achieving data locality via the scheduler's
>> state info. In a more generic scenario, the data would be created
>> dynamically by tasks previously run (instead of just an initial download)
>> and therefore locality for such data is easier done via the scheduler.
>>
>>
>>
>> On Fri, Jan 16, 2015 at 12:15 AM, Tim Chen <ti...@mesosphere.io> wrote:
>>
>>> Hi Douglas,
>>>
>>> The simplest way that Mesos can support is to add attributes via cli
>>> flags when you launch a mesos slave. And when this slave's resources is
>>> being offered, it will also include all the attributes you've tagged.
>>>
>>> This currently is static information on launch, and I believe there is
>>> JIRA tickets to make this dynamic (updatable at runtime).
>>>
>>> Tim
>>>
>>> On Thu, Jan 15, 2015 at 7:23 PM, Douglas Voet <dv...@broadinstitute.org>
>>> wrote:
>>>
>>>> Hello,
>>>>
>>>> I am evaluating mesos in the context of running analyses of many large
>>>> files. I only want to download a file to a small subset of my nodes and
>>>> route the related processing there. The mesos paper talks about using
>>>> resource offers as a mechanism to achieve data locality but I can't find
>>>> any reference to how one might do this in the documentation. How would a
>>>> mesos slave know what data is available keeping in mind that that might
>>>> change over time? How can I configure a slave to include this information
>>>> in resource offers?
>>>>
>>>> Thanks in advance for any pointers.
>>>>
>>>> -Doug
>>>>
>>>
>>>
>>
>

Re: implementing data locality via mesos resource offers

Posted by Tim Chen <ti...@mesosphere.io>.

Hi Sharma,

You're correct and that's how most schedulers handle this, which is to
handle the locality information itself.

We've considering and finding primitives to help in this front though, so
if you have any input let us know how to help manage locality that fits at
the level of Mesos.

Tim

On Fri, Jan 16, 2015 at 9:34 AM, Sharma Podila <sp...@netflix.com> wrote:

> Using the attributes would be the simplest way, if the slave were to
> support dynamic updates of the attributes. The JIRA that Tim references
> would be nice! Otherwise one would have to resort to something like a
> wrapper script of the mesos-slave process that detects new data
> availability and restarts mesos-slave with new attributes in cmdline.
> Restarts may be OK when slaves are run to checkpoint state and recover
> state upon restart.
>
> Another possibility in the interim would be for the framework scheduler to
> launch the task that does the download of the file(s) to the small subset
> of nodes. Then, the scheduler can maintain this state information and
> assign the tasks based on that. This has the additional advantage of
> maintaining the list of that subset of nodes in a more dynamic way, if that
> is useful to you.
>
> In general, I am a fan of achieving data locality via the scheduler's
> state info. In a more generic scenario, the data would be created
> dynamically by tasks previously run (instead of just an initial download)
> and therefore locality for such data is easier done via the scheduler.
>
>
>
> On Fri, Jan 16, 2015 at 12:15 AM, Tim Chen <ti...@mesosphere.io> wrote:
>
>> Hi Douglas,
>>
>> The simplest way that Mesos can support is to add attributes via cli
>> flags when you launch a mesos slave. And when this slave's resources is
>> being offered, it will also include all the attributes you've tagged.
>>
>> This currently is static information on launch, and I believe there is
>> JIRA tickets to make this dynamic (updatable at runtime).
>>
>> Tim
>>
>> On Thu, Jan 15, 2015 at 7:23 PM, Douglas Voet <dv...@broadinstitute.org>
>> wrote:
>>
>>> Hello,
>>>
>>> I am evaluating mesos in the context of running analyses of many large
>>> files. I only want to download a file to a small subset of my nodes and
>>> route the related processing there. The mesos paper talks about using
>>> resource offers as a mechanism to achieve data locality but I can't find
>>> any reference to how one might do this in the documentation. How would a
>>> mesos slave know what data is available keeping in mind that that might
>>> change over time? How can I configure a slave to include this information
>>> in resource offers?
>>>
>>> Thanks in advance for any pointers.
>>>
>>> -Doug
>>>
>>
>>
>

Re: implementing data locality via mesos resource offers

Posted by Sharma Podila <sp...@netflix.com>.

Using the attributes would be the simplest way, if the slave were to
support dynamic updates of the attributes. The JIRA that Tim references
would be nice! Otherwise one would have to resort to something like a
wrapper script of the mesos-slave process that detects new data
availability and restarts mesos-slave with new attributes in cmdline.
Restarts may be OK when slaves are run to checkpoint state and recover
state upon restart.

Another possibility in the interim would be for the framework scheduler to
launch the task that does the download of the file(s) to the small subset
of nodes. Then, the scheduler can maintain this state information and
assign the tasks based on that. This has the additional advantage of
maintaining the list of that subset of nodes in a more dynamic way, if that
is useful to you.

In general, I am a fan of achieving data locality via the scheduler's state
info. In a more generic scenario, the data would be created dynamically by
tasks previously run (instead of just an initial download) and therefore
locality for such data is easier done via the scheduler.

On Fri, Jan 16, 2015 at 12:15 AM, Tim Chen <ti...@mesosphere.io> wrote:

> Hi Douglas,
>
> The simplest way that Mesos can support is to add attributes via cli flags
> when you launch a mesos slave. And when this slave's resources is being
> offered, it will also include all the attributes you've tagged.
>
> This currently is static information on launch, and I believe there is
> JIRA tickets to make this dynamic (updatable at runtime).
>
> Tim
>
> On Thu, Jan 15, 2015 at 7:23 PM, Douglas Voet <dv...@broadinstitute.org>
> wrote:
>
>> Hello,
>>
>> I am evaluating mesos in the context of running analyses of many large
>> files. I only want to download a file to a small subset of my nodes and
>> route the related processing there. The mesos paper talks about using
>> resource offers as a mechanism to achieve data locality but I can't find
>> any reference to how one might do this in the documentation. How would a
>> mesos slave know what data is available keeping in mind that that might
>> change over time? How can I configure a slave to include this information
>> in resource offers?
>>
>> Thanks in advance for any pointers.
>>
>> -Doug
>>
>
>

Re: implementing data locality via mesos resource offers

Posted by Tim Chen <ti...@mesosphere.io>.

Hi Douglas,

The simplest way that Mesos can support is to add attributes via cli flags
when you launch a mesos slave. And when this slave's resources is being
offered, it will also include all the attributes you've tagged.

This currently is static information on launch, and I believe there is JIRA
tickets to make this dynamic (updatable at runtime).

Tim

On Thu, Jan 15, 2015 at 7:23 PM, Douglas Voet <dv...@broadinstitute.org>
wrote:

> Hello,
>
> I am evaluating mesos in the context of running analyses of many large
> files. I only want to download a file to a small subset of my nodes and
> route the related processing there. The mesos paper talks about using
> resource offers as a mechanism to achieve data locality but I can't find
> any reference to how one might do this in the documentation. How would a
> mesos slave know what data is available keeping in mind that that might
> change over time? How can I configure a slave to include this information
> in resource offers?
>
> Thanks in advance for any pointers.
>
> -Doug
>