You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mesos.apache.org by "Du, Fan" <fa...@intel.com> on 2016/06/06 09:17:36 UTC

Rack awareness support for Mesos

Hi, Mesos folks

I've been thinking about Mesos rack awareness support for a while,
it's a common interest for lots of data center applications to provide data locality,
fault tolerance and better task placement. Create MESOS-5545 to track the story,
and here is the initial design doc [1] to support rack awareness in Mesos.

Looking forward to hear any comments from end user and other developers,
Thanks!

[1]: https://docs.google.com/document/d/1rql_LZSwtQzBPALnk0qCLsmxcT3-zB7X7aJp-H3xxyE/edit?usp=sharing

Re: Rack awareness support for Mesos

Posted by Charles Allen <ch...@metamarkets.com>.

There are a lot of things in Mesos which require a-priori communication
between an agent and a framework in order to properly set resource usage
expectations (example: what does 1 cpu mean?). I'm not seeing how having
customizations in core mesos per "way of looking at resources" is scalable
and future-proof.

On Mon, Jun 6, 2016 at 8:48 AM Jörg Schad <jo...@mesosphere.io> wrote:

> Hi,
> thanks for your idea and design doc!
> Just a few thoughts:
> a) The scheduling part would be implemented in a framework scheduler and
> not the Mesos Core, or?
> b) As mentioned by James, this needs to be very flexible (and not
> necessarily based on network structure), afaik people are using labels on
> the agents to identify different fault domains which can then be
> interpreted by framework scheduler. Maybe it would make sense (instead of
> identifying the network structure) to come up with a common label naming
> scheme which can be understood by all/different frameworks.
>
> Looking forward to your thoughts on this!
>
> On Mon, Jun 6, 2016 at 3:27 PM, james <ga...@verizon.net> wrote:
>
>> Hello,
>>
>>
>> @Stephen::I guess Stephen is bringing up the 'security' aspect of who
>> get's access to the information, particularly cluster/cloud devops,
>> customers or interlopers....?
>>
>>
>> @Fan:: As a consultant, most of my customers either have  or are planning
>> hybrid installations, where some codes run on a local cluster or using 'the
>> cloud' for dynamic load requirements. I would think your proposed scheme
>> needs to be very flexible, both in application to a campus or Metropolitan
>> Area Network, if not massively distributed around the globe. What about
>> different resouce types (racks of arm64, gpu centric hardware, DSPs, FPGA
>> etc etc. Hardware diversity bring many
>> benefits to the cluster/cloud capabilities.
>>
>>
>> This also begs the quesion of hardware management (boot/config/online)
>> of the various hardware, such as is built into coreOS. Are several
>> applications going to be supported? Standards track? Just Mesos DC/OS
>> centric?
>>
>>
>> TIMING DATA:: This is the main issue I see. Once you start 'vectoring
>> in resources' you need to add timing (latency) data to encourage robust
>> and diversified use of of this data. For HPC, this could be very valuable
>> for rDMA abusive algorithms where memory constrained workloads not only
>> need the knowledge of additional nearby memory resources, but
>> the approximated (based on previous data collected) latency and bandwidth
>> constraints to use those additional resources.
>>
>>
>> Great idea. I do like it very much.
>>
>> hth,
>> James
>>
>>
>>
>> On 06/06/2016 05:06 AM, Stephen Gran wrote:
>>
>>> Hi,
>>>
>>> This looks potentially interesting.  How does it work in a public cloud
>>> deployment scenario?  I assume you would just have to disable this
>>> feature, or not enable it?
>>>
>>> Cheers,
>>>
>>> On 06/06/16 10:17, Du, Fan wrote:
>>>
>>>> Hi, Mesos folks
>>>>
>>>> I’ve been thinking about Mesos rack awareness support for a while,
>>>>
>>>> it’s a common interest for lots of data center applications to provide
>>>> data locality,
>>>>
>>>> fault tolerance and better task placement. Create MESOS-5545 to track
>>>> the story,
>>>>
>>>> and here is the initial design doc [1] to support rack awareness in
>>>> Mesos.
>>>>
>>>> Looking forward to hear any comments from end user and other developers,
>>>>
>>>> Thanks!
>>>>
>>>> [1]:
>>>>
>>>> https://docs.google.com/document/d/1rql_LZSwtQzBPALnk0qCLsmxcT3-zB7X7aJp-H3xxyE/edit?usp=sharing
>>>>
>>>>
>>>
>>
>

RE: Rack awareness support for Mesos

Posted by Aaron Carey <ac...@ilm.com>.

Would this perhaps make sense as a mesos module which can automatically assigns labels to the agents, rather than something in the core itself?

--

Aaron Carey
Production Engineer - Cloud Pipeline
Industrial Light & Magic
London
020 3751 9150

________________________________________
From: Du, Fan [fan.du@intel.com]
Sent: 07 June 2016 16:16
To: Jörg Schad; user@mesos.apache.org
Subject: Re: Rack awareness support for Mesos

On 2016/6/6 23:48, Jörg Schad wrote:
> Hi,
> thanks for your idea and design doc!
> Just a few thoughts:
> a) The scheduling part would be implemented in a framework scheduler and
> not the Mesos Core, or?

I'm not sure which level of scheduling part do you indicate,
For the "Future" section of proposal?, It's Mesos allocation logic.
And how to use rack information to implement advanced features (fault
tolerance,
data locality) is up to the framework scheduling part.

> b) As mentioned by James, this needs to be very flexible (and not
> necessarily based on network structure),

The proposed network topology detection is modular, to fit into Ethernet,
Infiniband, or other network implementation. And yes, user can statically
configure /etc/mesos/rack_id to manipulate the logical network topology
easily.


>afaik people are using labels
> on the agents to identify different fault domains which can then be
> interpreted by framework scheduler. Maybe it would make sense (instead
> of identifying the network structure) to come up with a common label
> naming scheme which can be understood by all/different frameworks.

I'm not convinced here why still using labels,
Based on what information to label the agents? IMO, cluster operator
still needs something like lldp to find out the network topology,
every cluster operator will need to do it by his own, and it's better
to abstract the logical inside Mesos to provide common interface to
frameworks.

Honestly speaking, I don't follow the argument here for the labels.
The proposal is designed to do it *automatically* to reduce maintenance
effort.

> Looking forward to your thoughts on this!
>
> On Mon, Jun 6, 2016 at 3:27 PM, james <garftd@verizon.net
> <ma...@verizon.net>> wrote:
>
>     Hello,
>
>
>     @Stephen::I guess Stephen is bringing up the 'security' aspect of
>     who get's access to the information, particularly cluster/cloud
>     devops, customers or interlopers....?
>
>
>     @Fan:: As a consultant, most of my customers either have  or are
>     planning hybrid installations, where some codes run on a local
>     cluster or using 'the cloud' for dynamic load requirements. I would
>     think your proposed scheme needs to be very flexible, both in
>     application to a campus or Metropolitan Area Network, if not
>     massively distributed around the globe. What about different resouce
>     types (racks of arm64, gpu centric hardware, DSPs, FPGA etc etc.
>     Hardware diversity bring many
>     benefits to the cluster/cloud capabilities.
>
>
>     This also begs the quesion of hardware management (boot/config/online)
>     of the various hardware, such as is built into coreOS. Are several
>     applications going to be supported? Standards track? Just Mesos DC/OS
>     centric?
>
>
>     TIMING DATA:: This is the main issue I see. Once you start 'vectoring
>     in resources' you need to add timing (latency) data to encourage robust
>     and diversified use of of this data. For HPC, this could be very
>     valuable for rDMA abusive algorithms where memory constrained
>     workloads not only need the knowledge of additional nearby memory
>     resources, but
>     the approximated (based on previous data collected) latency and
>     bandwidth constraints to use those additional resources.
>
>
>     Great idea. I do like it very much.
>
>     hth,
>     James
>
>
>
>     On 06/06/2016 05:06 AM, Stephen Gran wrote:
>
>         Hi,
>
>         This looks potentially interesting.  How does it work in a
>         public cloud
>         deployment scenario?  I assume you would just have to disable this
>         feature, or not enable it?
>
>         Cheers,
>
>         On 06/06/16 10:17, Du, Fan wrote:
>
>             Hi, Mesos folks
>
>             I’ve been thinking about Mesos rack awareness support for a
>             while,
>
>             it’s a common interest for lots of data center applications
>             to provide
>             data locality,
>
>             fault tolerance and better task placement. Create MESOS-5545
>             to track
>             the story,
>
>             and here is the initial design doc [1] to support rack
>             awareness in Mesos.
>
>             Looking forward to hear any comments from end user and other
>             developers,
>
>             Thanks!
>
>             [1]:
>             https://docs.google.com/document/d/1rql_LZSwtQzBPALnk0qCLsmxcT3-zB7X7aJp-H3xxyE/edit?usp=sharing
>
>
>
>

Re: Rack awareness support for Mesos

Posted by Jeff Schroeder <je...@computer.org>.

On Tuesday, June 7, 2016, Du, Fan <fa...@intel.com> wrote:

>
>
> On 2016/6/6 23:48, Jörg Schad wrote:
>
>> Hi,
>> thanks for your idea and design doc!
>> Just a few thoughts:
>> a) The scheduling part would be implemented in a framework scheduler and
>> not the Mesos Core, or?
>>
>
> I'm not sure which level of scheduling part do you indicate,
> For the "Future" section of proposal?, It's Mesos allocation logic.
> And how to use rack information to implement advanced features (fault
> tolerance,
> data locality) is up to the framework scheduling part.
>
> b) As mentioned by James, this needs to be very flexible (and not
>> necessarily based on network structure),
>>
>
> The proposed network topology detection is modular, to fit into Ethernet,
> Infiniband, or other network implementation. And yes, user can statically
> configure /etc/mesos/rack_id to manipulate the logical network topology
> easily.
>
>
> afaik people are using labels
>> on the agents to identify different fault domains which can then be
>> interpreted by framework scheduler. Maybe it would make sense (instead
>> of identifying the network structure) to come up with a common label
>> naming scheme which can be understood by all/different frameworks.
>>
>
> I'm not convinced here why still using labels,
> Based on what information to label the agents? IMO, cluster operator
> still needs something like lldp to find out the network topology,
> every cluster operator will need to do it by his own, and it's better
> to abstract the logical inside Mesos to provide common interface to
> frameworks.


LLDP is Ethernet specific however. To go into Mesos, it would need to be
higher level as there are people who run Mesos with Infiniband or perhaps
an exotic custom networking fabric (Cray and IBM bits come to mind) that
might want to take advantage of this functionality. Labels are more
generic, but also more flexible in that regard.


-- 
Text by Jeff, typos by iPhone

Re: Rack awareness support for Mesos

Posted by "Du, Fan" <fa...@intel.com>.


On 2016/6/6 23:48, J�rg Schad wrote:
> Hi,
> thanks for your idea and design doc!
> Just a few thoughts:
> a) The scheduling part would be implemented in a framework scheduler and
> not the Mesos Core, or?

I'm not sure which level of scheduling part do you indicate,
For the "Future" section of proposal?, It's Mesos allocation logic.
And how to use rack information to implement advanced features (fault 
tolerance,
data locality) is up to the framework scheduling part.

> b) As mentioned by James, this needs to be very flexible (and not
> necessarily based on network structure),

The proposed network topology detection is modular, to fit into Ethernet,
Infiniband, or other network implementation. And yes, user can statically
configure /etc/mesos/rack_id to manipulate the logical network topology
easily.


>afaik people are using labels
> on the agents to identify different fault domains which can then be
> interpreted by framework scheduler. Maybe it would make sense (instead
> of identifying the network structure) to come up with a common label
> naming scheme which can be understood by all/different frameworks.

I'm not convinced here why still using labels,
Based on what information to label the agents? IMO, cluster operator
still needs something like lldp to find out the network topology,
every cluster operator will need to do it by his own, and it's better
to abstract the logical inside Mesos to provide common interface to
frameworks.

Honestly speaking, I don't follow the argument here for the labels.
The proposal is designed to do it *automatically* to reduce maintenance 
effort.

> Looking forward to your thoughts on this!
>
> On Mon, Jun 6, 2016 at 3:27 PM, james <garftd@verizon.net
> <ma...@verizon.net>> wrote:
>
>     Hello,
>
>
>     @Stephen::I guess Stephen is bringing up the 'security' aspect of
>     who get's access to the information, particularly cluster/cloud
>     devops, customers or interlopers....?
>
>
>     @Fan:: As a consultant, most of my customers either have  or are
>     planning hybrid installations, where some codes run on a local
>     cluster or using 'the cloud' for dynamic load requirements. I would
>     think your proposed scheme needs to be very flexible, both in
>     application to a campus or Metropolitan Area Network, if not
>     massively distributed around the globe. What about different resouce
>     types (racks of arm64, gpu centric hardware, DSPs, FPGA etc etc.
>     Hardware diversity bring many
>     benefits to the cluster/cloud capabilities.
>
>
>     This also begs the quesion of hardware management (boot/config/online)
>     of the various hardware, such as is built into coreOS. Are several
>     applications going to be supported? Standards track? Just Mesos DC/OS
>     centric?
>
>
>     TIMING DATA:: This is the main issue I see. Once you start 'vectoring
>     in resources' you need to add timing (latency) data to encourage robust
>     and diversified use of of this data. For HPC, this could be very
>     valuable for rDMA abusive algorithms where memory constrained
>     workloads not only need the knowledge of additional nearby memory
>     resources, but
>     the approximated (based on previous data collected) latency and
>     bandwidth constraints to use those additional resources.
>
>
>     Great idea. I do like it very much.
>
>     hth,
>     James
>
>
>
>     On 06/06/2016 05:06 AM, Stephen Gran wrote:
>
>         Hi,
>
>         This looks potentially interesting.  How does it work in a
>         public cloud
>         deployment scenario?  I assume you would just have to disable this
>         feature, or not enable it?
>
>         Cheers,
>
>         On 06/06/16 10:17, Du, Fan wrote:
>
>             Hi, Mesos folks
>
>             I\u2019ve been thinking about Mesos rack awareness support for a
>             while,
>
>             it\u2019s a common interest for lots of data center applications
>             to provide
>             data locality,
>
>             fault tolerance and better task placement. Create MESOS-5545
>             to track
>             the story,
>
>             and here is the initial design doc [1] to support rack
>             awareness in Mesos.
>
>             Looking forward to hear any comments from end user and other
>             developers,
>
>             Thanks!
>
>             [1]:
>             https://docs.google.com/document/d/1rql_LZSwtQzBPALnk0qCLsmxcT3-zB7X7aJp-H3xxyE/edit?usp=sharing
>
>
>
>

Re: Rack awareness support for Mesos

Posted by Jörg Schad <jo...@mesosphere.io>.

Hi,
thanks for your idea and design doc!
Just a few thoughts:
a) The scheduling part would be implemented in a framework scheduler and
not the Mesos Core, or?
b) As mentioned by James, this needs to be very flexible (and not
necessarily based on network structure), afaik people are using labels on
the agents to identify different fault domains which can then be
interpreted by framework scheduler. Maybe it would make sense (instead of
identifying the network structure) to come up with a common label naming
scheme which can be understood by all/different frameworks.

Looking forward to your thoughts on this!

On Mon, Jun 6, 2016 at 3:27 PM, james <ga...@verizon.net> wrote:

> Hello,
>
>
> @Stephen::I guess Stephen is bringing up the 'security' aspect of who
> get's access to the information, particularly cluster/cloud devops,
> customers or interlopers....?
>
>
> @Fan:: As a consultant, most of my customers either have  or are planning
> hybrid installations, where some codes run on a local cluster or using 'the
> cloud' for dynamic load requirements. I would think your proposed scheme
> needs to be very flexible, both in application to a campus or Metropolitan
> Area Network, if not massively distributed around the globe. What about
> different resouce types (racks of arm64, gpu centric hardware, DSPs, FPGA
> etc etc. Hardware diversity bring many
> benefits to the cluster/cloud capabilities.
>
>
> This also begs the quesion of hardware management (boot/config/online)
> of the various hardware, such as is built into coreOS. Are several
> applications going to be supported? Standards track? Just Mesos DC/OS
> centric?
>
>
> TIMING DATA:: This is the main issue I see. Once you start 'vectoring
> in resources' you need to add timing (latency) data to encourage robust
> and diversified use of of this data. For HPC, this could be very valuable
> for rDMA abusive algorithms where memory constrained workloads not only
> need the knowledge of additional nearby memory resources, but
> the approximated (based on previous data collected) latency and bandwidth
> constraints to use those additional resources.
>
>
> Great idea. I do like it very much.
>
> hth,
> James
>
>
>
> On 06/06/2016 05:06 AM, Stephen Gran wrote:
>
>> Hi,
>>
>> This looks potentially interesting.  How does it work in a public cloud
>> deployment scenario?  I assume you would just have to disable this
>> feature, or not enable it?
>>
>> Cheers,
>>
>> On 06/06/16 10:17, Du, Fan wrote:
>>
>>> Hi, Mesos folks
>>>
>>> I’ve been thinking about Mesos rack awareness support for a while,
>>>
>>> it’s a common interest for lots of data center applications to provide
>>> data locality,
>>>
>>> fault tolerance and better task placement. Create MESOS-5545 to track
>>> the story,
>>>
>>> and here is the initial design doc [1] to support rack awareness in
>>> Mesos.
>>>
>>> Looking forward to hear any comments from end user and other developers,
>>>
>>> Thanks!
>>>
>>> [1]:
>>>
>>> https://docs.google.com/document/d/1rql_LZSwtQzBPALnk0qCLsmxcT3-zB7X7aJp-H3xxyE/edit?usp=sharing
>>>
>>>
>>
>

Re: Rack awareness support for Mesos

Posted by Joris Van Remoortere <jo...@mesosphere.io>.

+dev.

@Fan, I responded on the JIRA with some next steps.
Thanks for bringing this up!

—
*Joris Van Remoortere*
Mesosphere

On Tue, Jun 7, 2016 at 12:58 PM, james <ga...@verizon.net> wrote:

> On 06/07/2016 09:57 AM, Du, Fan wrote:
>
>>
>>
>> On 2016/6/6 21:27, james wrote:
>>
>>> Hello,
>>>
>>>
>>> @Stephen::I guess Stephen is bringing up the 'security' aspect of who
>>> get's access to the information, particularly cluster/cloud devops,
>>> customers or interlopers....?
>>>
>>
>> ACLs should play in this part to address security concern.
>>
>
> YES, and so much more! I know folks that their primary (in house cluster)
> usage is deep packet inspection on  the cluster....
> With a cluster (inside) there is no limit to new tools that can be
> judiciously altered to benefit from cluster codes....
>
>
>>
>>> @Fan:: As a consultant, most of my customers either have  or are
>>> planning hybrid installations, where some codes run on a local cluster
>>> or using 'the cloud' for dynamic load requirements. I would think your
>>> proposed scheme needs to be very flexible, both in application to a
>>> campus or Metropolitan Area Network, if not massively distributed around
>>> the globe. What about different resouce types (racks of arm64, gpu
>>> centric hardware, DSPs, FPGA etc etc. Hardware diversity bring many
>>> benefits to the cluster/cloud capabilities.
>>>
>>>
>>> This also begs the quesion of hardware management (boot/config/online)
>>> of the various hardware, such as is built into coreOS. Are several
>>> applications going to be supported? Standards track? Just Mesos DC/OS
>>> centric?
>>>
>>
>> It depends whether this proposal is accepted by Mesos, if you think
>> this feature is useful, let's discuss detailed requirement under
>> MESOS-5545.
>>
>
> OK. Take a look at 'Rackview' on sourceforge::
> 'http://rackview.sourceforge.net/'
>
>
> Do I have access to the jira system by default joining this list,
> or do I have to request permission somewhere? (sorry jira is new to me
> so recommendations on jira, per mesos, in a document, would be keen.)
>
>
>> btw, I have limited knowledge of CoreOS, will look into it.
>>
>
> CoreOS has some great ideas. But many of their codes are not current
> (when compared to the gentoo portage tree) and thus many are suspect
> for security/function.
>
> I thought the purpose was to get more folks involved here in discussions
> and then better formulated ideas  can migrate to the ticket (5545)  and
> repos.
>
>
>>
>>> TIMING DATA:: This is the main issue I see. Once you start 'vectoring
>>> in resources' you need to add timing (latency) data to encourage robust
>>> and diversified use of of this data. For HPC, this could be very
>>> valuable for rDMA abusive algorithms where memory constrained workloads
>>> not only need the knowledge of additional nearby memory resources, but
>>> the approximated (based on previous data collected) latency and
>>> bandwidth constraints to use those additional resources.
>>>
>>
>> Out of curiosity, which open sourced Mesos framework do you/your
>> customer run MPI?
>>
>
> Easy dude.    Most of this work in tightly help and nothing to publish
> or open up yet. It's a mess (my professional opinion) right now and
> I'm testing a variety of tools just be able to have better instrumentation
> on these codes. Still rDMA is very attractive so it does warrant much
> attention and extreme, internal, excitement.
>
>
>
>
> Mesos can support MPI framework, but AFIK, it's immature [1][2].
>>
>
> YEP.
>
> I think this part of work should be investigated in future.
>>
>> [1]: https://github.com/apache/mesos/tree/master/mpi   <- mpd ring
>> version
>> [2]：https://github.com/mesosphere/mesos-hydra         <- hydra version
>>
>
> Many codes floating around. Much excitement on new compiler features. Lots
> of hard work and testing going on. That said, the point I was try to make
> is "Vectoring in" resources, with a variety of parameters as a companion to
> your idea, is warranted for these aforementioned use cases
> and other opportunities.
>
>
>>
>>> Great idea. I do like it very much.
>>>
>>> hth,
>>> James
>>>
>>>
>>> On 06/06/2016 05:06 AM, Stephen Gran wrote:
>>>
>>>> Hi,
>>>>
>>>> This looks potentially interesting.  How does it work in a public cloud
>>>> deployment scenario?  I assume you would just have to disable this
>>>> feature, or not enable it?
>>>>
>>>> Cheers,
>>>>
>>>> On 06/06/16 10:17, Du, Fan wrote:
>>>>
>>>>> Hi, Mesos folks
>>>>>
>>>>> I’ve been thinking about Mesos rack awareness support for a while,
>>>>>
>>>>> it’s a common interest for lots of data center applications to provide
>>>>> data locality,
>>>>>
>>>>> fault tolerance and better task placement. Create MESOS-5545 to track
>>>>> the story,
>>>>>
>>>>> and here is the initial design doc [1] to support rack awareness in
>>>>> Mesos.
>>>>>
>>>>> Looking forward to hear any comments from end user and other
>>>>> developers,
>>>>>
>>>>> Thanks!
>>>>>
>>>>> [1]:
>>>>>
>>>>> https://docs.google.com/document/d/1rql_LZSwtQzBPALnk0qCLsmxcT3-zB7X7aJp-H3xxyE/edit?usp=sharing
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>
>>
>

Re: Rack awareness support for Mesos

Posted by "Du, Fan" <fa...@intel.com>.


On 2016/6/8 0:58, james wrote:
>
> Do I have access to the jira system by default joining this list,
> or do I have to request permission somewhere? (sorry jira is new to me
> so recommendations on jira, per mesos, in a document, would be keen.)

You need a JIRA account, sign up one here:
https://issues.apache.org/jira/secure/Signup!default.jspa

Re: Rack awareness support for Mesos

Posted by Joris Van Remoortere <jo...@mesosphere.io>.

+dev.

@Fan, I responded on the JIRA with some next steps.
Thanks for bringing this up!

—
*Joris Van Remoortere*
Mesosphere

On Tue, Jun 7, 2016 at 12:58 PM, james <ga...@verizon.net> wrote:

> On 06/07/2016 09:57 AM, Du, Fan wrote:
>
>>
>>
>> On 2016/6/6 21:27, james wrote:
>>
>>> Hello,
>>>
>>>
>>> @Stephen::I guess Stephen is bringing up the 'security' aspect of who
>>> get's access to the information, particularly cluster/cloud devops,
>>> customers or interlopers....?
>>>
>>
>> ACLs should play in this part to address security concern.
>>
>
> YES, and so much more! I know folks that their primary (in house cluster)
> usage is deep packet inspection on  the cluster....
> With a cluster (inside) there is no limit to new tools that can be
> judiciously altered to benefit from cluster codes....
>
>
>>
>>> @Fan:: As a consultant, most of my customers either have  or are
>>> planning hybrid installations, where some codes run on a local cluster
>>> or using 'the cloud' for dynamic load requirements. I would think your
>>> proposed scheme needs to be very flexible, both in application to a
>>> campus or Metropolitan Area Network, if not massively distributed around
>>> the globe. What about different resouce types (racks of arm64, gpu
>>> centric hardware, DSPs, FPGA etc etc. Hardware diversity bring many
>>> benefits to the cluster/cloud capabilities.
>>>
>>>
>>> This also begs the quesion of hardware management (boot/config/online)
>>> of the various hardware, such as is built into coreOS. Are several
>>> applications going to be supported? Standards track? Just Mesos DC/OS
>>> centric?
>>>
>>
>> It depends whether this proposal is accepted by Mesos, if you think
>> this feature is useful, let's discuss detailed requirement under
>> MESOS-5545.
>>
>
> OK. Take a look at 'Rackview' on sourceforge::
> 'http://rackview.sourceforge.net/'
>
>
> Do I have access to the jira system by default joining this list,
> or do I have to request permission somewhere? (sorry jira is new to me
> so recommendations on jira, per mesos, in a document, would be keen.)
>
>
>> btw, I have limited knowledge of CoreOS, will look into it.
>>
>
> CoreOS has some great ideas. But many of their codes are not current
> (when compared to the gentoo portage tree) and thus many are suspect
> for security/function.
>
> I thought the purpose was to get more folks involved here in discussions
> and then better formulated ideas  can migrate to the ticket (5545)  and
> repos.
>
>
>>
>>> TIMING DATA:: This is the main issue I see. Once you start 'vectoring
>>> in resources' you need to add timing (latency) data to encourage robust
>>> and diversified use of of this data. For HPC, this could be very
>>> valuable for rDMA abusive algorithms where memory constrained workloads
>>> not only need the knowledge of additional nearby memory resources, but
>>> the approximated (based on previous data collected) latency and
>>> bandwidth constraints to use those additional resources.
>>>
>>
>> Out of curiosity, which open sourced Mesos framework do you/your
>> customer run MPI?
>>
>
> Easy dude.    Most of this work in tightly help and nothing to publish
> or open up yet. It's a mess (my professional opinion) right now and
> I'm testing a variety of tools just be able to have better instrumentation
> on these codes. Still rDMA is very attractive so it does warrant much
> attention and extreme, internal, excitement.
>
>
>
>
> Mesos can support MPI framework, but AFIK, it's immature [1][2].
>>
>
> YEP.
>
> I think this part of work should be investigated in future.
>>
>> [1]: https://github.com/apache/mesos/tree/master/mpi   <- mpd ring
>> version
>> [2]：https://github.com/mesosphere/mesos-hydra         <- hydra version
>>
>
> Many codes floating around. Much excitement on new compiler features. Lots
> of hard work and testing going on. That said, the point I was try to make
> is "Vectoring in" resources, with a variety of parameters as a companion to
> your idea, is warranted for these aforementioned use cases
> and other opportunities.
>
>
>>
>>> Great idea. I do like it very much.
>>>
>>> hth,
>>> James
>>>
>>>
>>> On 06/06/2016 05:06 AM, Stephen Gran wrote:
>>>
>>>> Hi,
>>>>
>>>> This looks potentially interesting.  How does it work in a public cloud
>>>> deployment scenario?  I assume you would just have to disable this
>>>> feature, or not enable it?
>>>>
>>>> Cheers,
>>>>
>>>> On 06/06/16 10:17, Du, Fan wrote:
>>>>
>>>>> Hi, Mesos folks
>>>>>
>>>>> I’ve been thinking about Mesos rack awareness support for a while,
>>>>>
>>>>> it’s a common interest for lots of data center applications to provide
>>>>> data locality,
>>>>>
>>>>> fault tolerance and better task placement. Create MESOS-5545 to track
>>>>> the story,
>>>>>
>>>>> and here is the initial design doc [1] to support rack awareness in
>>>>> Mesos.
>>>>>
>>>>> Looking forward to hear any comments from end user and other
>>>>> developers,
>>>>>
>>>>> Thanks!
>>>>>
>>>>> [1]:
>>>>>
>>>>> https://docs.google.com/document/d/1rql_LZSwtQzBPALnk0qCLsmxcT3-zB7X7aJp-H3xxyE/edit?usp=sharing
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>
>>
>

Re: Rack awareness support for Mesos

Posted by james <ga...@verizon.net>.

On 06/07/2016 09:57 AM, Du, Fan wrote:
>
>
> On 2016/6/6 21:27, james wrote:
>> Hello,
>>
>>
>> @Stephen::I guess Stephen is bringing up the 'security' aspect of who
>> get's access to the information, particularly cluster/cloud devops,
>> customers or interlopers....?
>
> ACLs should play in this part to address security concern.

YES, and so much more! I know folks that their primary (in house 
cluster) usage is deep packet inspection on  the cluster....
With a cluster (inside) there is no limit to new tools that can be
judiciously altered to benefit from cluster codes....

>
>>
>> @Fan:: As a consultant, most of my customers either have  or are
>> planning hybrid installations, where some codes run on a local cluster
>> or using 'the cloud' for dynamic load requirements. I would think your
>> proposed scheme needs to be very flexible, both in application to a
>> campus or Metropolitan Area Network, if not massively distributed around
>> the globe. What about different resouce types (racks of arm64, gpu
>> centric hardware, DSPs, FPGA etc etc. Hardware diversity bring many
>> benefits to the cluster/cloud capabilities.
>>
>>
>> This also begs the quesion of hardware management (boot/config/online)
>> of the various hardware, such as is built into coreOS. Are several
>> applications going to be supported? Standards track? Just Mesos DC/OS
>> centric?
>
> It depends whether this proposal is accepted by Mesos, if you think
> this feature is useful, let's discuss detailed requirement under
> MESOS-5545.

OK. Take a look at 'Rackview' on sourceforge::
'http://rackview.sourceforge.net/'


Do I have access to the jira system by default joining this list,
or do I have to request permission somewhere? (sorry jira is new to me
so recommendations on jira, per mesos, in a document, would be keen.)

>
> btw, I have limited knowledge of CoreOS, will look into it.

CoreOS has some great ideas. But many of their codes are not current
(when compared to the gentoo portage tree) and thus many are suspect
for security/function.

I thought the purpose was to get more folks involved here in discussions
and then better formulated ideas  can migrate to the ticket (5545)  and 
repos.

>
>>
>> TIMING DATA:: This is the main issue I see. Once you start 'vectoring
>> in resources' you need to add timing (latency) data to encourage robust
>> and diversified use of of this data. For HPC, this could be very
>> valuable for rDMA abusive algorithms where memory constrained workloads
>> not only need the knowledge of additional nearby memory resources, but
>> the approximated (based on previous data collected) latency and
>> bandwidth constraints to use those additional resources.
>
> Out of curiosity, which open sourced Mesos framework do you/your
> customer run MPI?

Easy dude.    Most of this work in tightly help and nothing to publish
or open up yet. It's a mess (my professional opinion) right now and
I'm testing a variety of tools just be able to have better 
instrumentation on these codes. Still rDMA is very attractive so it does 
warrant much attention and extreme, internal, excitement.




> Mesos can support MPI framework, but AFIK, it's immature [1][2].

YEP.

> I think this part of work should be investigated in future.
>
> [1]: https://github.com/apache/mesos/tree/master/mpi   <- mpd ring version
> [2]\uff1ahttps://github.com/mesosphere/mesos-hydra         <- hydra version

Many codes floating around. Much excitement on new compiler features. 
Lots of hard work and testing going on. That said, the point I was try 
to make is "Vectoring in" resources, with a variety of parameters as a 
companion to your idea, is warranted for these aforementioned use cases
and other opportunities.
>
>>
>> Great idea. I do like it very much.
>>
>> hth,
>> James
>>
>>
>> On 06/06/2016 05:06 AM, Stephen Gran wrote:
>>> Hi,
>>>
>>> This looks potentially interesting.  How does it work in a public cloud
>>> deployment scenario?  I assume you would just have to disable this
>>> feature, or not enable it?
>>>
>>> Cheers,
>>>
>>> On 06/06/16 10:17, Du, Fan wrote:
>>>> Hi, Mesos folks
>>>>
>>>> I\u2019ve been thinking about Mesos rack awareness support for a while,
>>>>
>>>> it\u2019s a common interest for lots of data center applications to provide
>>>> data locality,
>>>>
>>>> fault tolerance and better task placement. Create MESOS-5545 to track
>>>> the story,
>>>>
>>>> and here is the initial design doc [1] to support rack awareness in
>>>> Mesos.
>>>>
>>>> Looking forward to hear any comments from end user and other
>>>> developers,
>>>>
>>>> Thanks!
>>>>
>>>> [1]:
>>>> https://docs.google.com/document/d/1rql_LZSwtQzBPALnk0qCLsmxcT3-zB7X7aJp-H3xxyE/edit?usp=sharing
>>>>
>>>>
>>>>
>>>
>>
>>
>
>

Re: Rack awareness support for Mesos

Posted by "Du, Fan" <fa...@intel.com>.


On 2016/6/6 21:27, james wrote:
> Hello,
>
>
> @Stephen::I guess Stephen is bringing up the 'security' aspect of who
> get's access to the information, particularly cluster/cloud devops,
> customers or interlopers....?

ACLs should play in this part to address security concern.

>
> @Fan:: As a consultant, most of my customers either have  or are
> planning hybrid installations, where some codes run on a local cluster
> or using 'the cloud' for dynamic load requirements. I would think your
> proposed scheme needs to be very flexible, both in application to a
> campus or Metropolitan Area Network, if not massively distributed around
> the globe. What about different resouce types (racks of arm64, gpu
> centric hardware, DSPs, FPGA etc etc. Hardware diversity bring many
> benefits to the cluster/cloud capabilities.
>
>
> This also begs the quesion of hardware management (boot/config/online)
> of the various hardware, such as is built into coreOS. Are several
> applications going to be supported? Standards track? Just Mesos DC/OS
> centric?

It depends whether this proposal is accepted by Mesos, if you think
this feature is useful, let's discuss detailed requirement under MESOS-5545.

btw, I have limited knowledge of CoreOS, will look into it.

>
> TIMING DATA:: This is the main issue I see. Once you start 'vectoring
> in resources' you need to add timing (latency) data to encourage robust
> and diversified use of of this data. For HPC, this could be very
> valuable for rDMA abusive algorithms where memory constrained workloads
> not only need the knowledge of additional nearby memory resources, but
> the approximated (based on previous data collected) latency and
> bandwidth constraints to use those additional resources.

Out of curiosity, which open sourced Mesos framework do you/your 
customer run MPI?
Mesos can support MPI framework, but AFIK, it's immature [1][2].
I think this part of work should be investigated in future.

[1]: https://github.com/apache/mesos/tree/master/mpi   <- mpd ring version
[2]\uff1ahttps://github.com/mesosphere/mesos-hydra         <- hydra version

>
> Great idea. I do like it very much.
>
> hth,
> James
>
>
> On 06/06/2016 05:06 AM, Stephen Gran wrote:
>> Hi,
>>
>> This looks potentially interesting.  How does it work in a public cloud
>> deployment scenario?  I assume you would just have to disable this
>> feature, or not enable it?
>>
>> Cheers,
>>
>> On 06/06/16 10:17, Du, Fan wrote:
>>> Hi, Mesos folks
>>>
>>> I\u2019ve been thinking about Mesos rack awareness support for a while,
>>>
>>> it\u2019s a common interest for lots of data center applications to provide
>>> data locality,
>>>
>>> fault tolerance and better task placement. Create MESOS-5545 to track
>>> the story,
>>>
>>> and here is the initial design doc [1] to support rack awareness in
>>> Mesos.
>>>
>>> Looking forward to hear any comments from end user and other developers,
>>>
>>> Thanks!
>>>
>>> [1]:
>>> https://docs.google.com/document/d/1rql_LZSwtQzBPALnk0qCLsmxcT3-zB7X7aJp-H3xxyE/edit?usp=sharing
>>>
>>>
>>
>
>

Re: Rack awareness support for Mesos

Posted by james <ga...@verizon.net>.

Hello,

@Stephen::I guess Stephen is bringing up the 'security' aspect of who 
get's access to the information, particularly cluster/cloud devops, 
customers or interlopers....?

@Fan:: As a consultant, most of my customers either have  or are 
planning hybrid installations, where some codes run on a local cluster 
or using 'the cloud' for dynamic load requirements. I would think your 
proposed scheme needs to be very flexible, both in application to a 
campus or Metropolitan Area Network, if not massively distributed around 
the globe. What about different resouce types (racks of arm64, gpu 
centric hardware, DSPs, FPGA etc etc. Hardware diversity bring many
benefits to the cluster/cloud capabilities.

This also begs the quesion of hardware management (boot/config/online)
of the various hardware, such as is built into coreOS. Are several 
applications going to be supported? Standards track? Just Mesos DC/OS
centric?

TIMING DATA:: This is the main issue I see. Once you start 'vectoring
in resources' you need to add timing (latency) data to encourage robust
and diversified use of of this data. For HPC, this could be very 
valuable for rDMA abusive algorithms where memory constrained workloads 
not only need the knowledge of additional nearby memory resources, but
the approximated (based on previous data collected) latency and 
bandwidth constraints to use those additional resources.

Great idea. I do like it very much.

hth,
James

On 06/06/2016 05:06 AM, Stephen Gran wrote:
> Hi,
>
> This looks potentially interesting.  How does it work in a public cloud
> deployment scenario?  I assume you would just have to disable this
> feature, or not enable it?
>
> Cheers,
>
> On 06/06/16 10:17, Du, Fan wrote:
>> Hi, Mesos folks
>>
>> Ive been thinking about Mesos rack awareness support for a while,
>>
>> its a common interest for lots of data center applications to provide
>> data locality,
>>
>> fault tolerance and better task placement. Create MESOS-5545 to track
>> the story,
>>
>> and here is the initial design doc [1] to support rack awareness in Mesos.
>>
>> Looking forward to hear any comments from end user and other developers,
>>
>> Thanks!
>>
>> [1]:
>> https://docs.google.com/document/d/1rql_LZSwtQzBPALnk0qCLsmxcT3-zB7X7aJp-H3xxyE/edit?usp=sharing
>>
>

Re: Rack awareness support for Mesos

Posted by Stephen Gran <st...@piksel.com>.

Hi,

This looks potentially interesting.  How does it work in a public cloud 
deployment scenario?  I assume you would just have to disable this 
feature, or not enable it?

Cheers,

On 06/06/16 10:17, Du, Fan wrote:
> Hi, Mesos folks
>
> I’ve been thinking about Mesos rack awareness support for a while,
>
> it’s a common interest for lots of data center applications to provide
> data locality,
>
> fault tolerance and better task placement. Create MESOS-5545 to track
> the story,
>
> and here is the initial design doc [1] to support rack awareness in Mesos.
>
> Looking forward to hear any comments from end user and other developers,
>
> Thanks!
>
> [1]:
> https://docs.google.com/document/d/1rql_LZSwtQzBPALnk0qCLsmxcT3-zB7X7aJp-H3xxyE/edit?usp=sharing
>

-- 
Stephen Gran
Senior Technical Architect

picture the possibilities | piksel.com

Re: Rack awareness support for Mesos

Posted by "Du, Fan" <fa...@intel.com>.


On 2016/6/17 7:59, Joris Van Remoortere wrote:
> @Fan,
>
> In the community meeting a question was raised around which frameworks
> might be ready to use this.
> Can you provide some more context for immediate use cases on the
> framework side?

Hi Joris

Thanks for the bridging!

Frameworks capable of topology-aware replication strategies will benefit 
here. For how topology-aware replication works, please refer to section 
"Hadoop Rack awareness - Why?" in [1], the methodology will apply to 
other frameworks too.

For a POC, we can start from SPARK-6707[2] with the new rack awareness 
interface, and for the topology-aware replication case, I think
dcos-cassandra-service[3] is also a good start to implement rack 
awareness, because this repo is actively developed.

and I believe there are plenty of use cases here, so if anyone have more 
use case to prove this feature is useful, feel free to water fall.

Thanks!

[1]: 
http://bradhedlund.com/2011/09/10/understanding-hadoop-clusters-and-the-network/
[2]: https://issues.apache.org/jira/browse/SPARK-6707
[3]: https://github.com/mesosphere/dcos-cassandra-service
> \u2014
> *Joris Van Remoortere*
> Mesosphere
>
> On Wed, Jun 15, 2016 at 5:04 PM, james <garftd@verizon.net
> <ma...@verizon.net>> wrote:
>
>     @Joris,
>
>
>     OK. Now I understand where you are coming from. As soon as I get
>     some time, I'll join that design discussion. Thanks for the
>     clarifications.
>
>     James
>
>
>
>
>
>     On 06/15/2016 02:45 AM, Joris Van Remoortere wrote:
>
>                  Since your interest is in the determination of the
>         values, as
>                  opposed to
>
>                  their propagation, I would just urge that you keep in
>         mind that
>                  we may
>
>                  (as a project) not want to support this information as
>         the current
>
>                  string attributes.
>
>
>              Huh? Why not? If the attributes change, why can't this
>         sub-project
>              just change with those changing string attributes? Maybe some
>              elaboration how this might not naturally be able to evolve is a
>              warranted detail of discussion?
>
>
>         Sorry, I should clarify what I meant by support. By support I
>         mean that
>         we may not want to promise that those values will be there
>         (support as a
>         feature), and what schemas are mangled into the random strings
>         that we
>         currently call attributes. I did not mean that we wouldn't allow
>         users
>         to inject their own values if they wanted to. We just wouldn't
>         control
>         the standard or schema as a project and therefore couldn't
>         support it.
>
>         Any random collection of strings that has previously had no reserved
>         keywords is notoriously difficult to build new schemas in.
>         This is why we may want to instead introduce a typed structure
>         that is
>         dedicated to fault domain information. This:
>
>            * Prevents us from colliding with current users' attributes.
>            * Allows us to have more control over the types (YAY) and
>         ranges of
>              values.
>            * Allows us to introduce explicit structure such as dependency or
>              hierarchy.
>
>         The fact that users have already encoded information in
>         attributes is
>         not a reason for us to limit ourselves to that scope when better
>         structures may be available. This is why we shouldn't assume
>         that the
>         project will *provide support for* (as opposed to allow users
>         to) using
>         attributes.
>
>         As your said, it is their prerogative to join the design
>         discussion to
>         ensure that any formalized structure or schema we introduce is
>         one that
>         they are agreeable with.
>
>
>
>         \u2014
>         *Joris Van Remoortere*
>         Mesosphere
>
>         On Tue, Jun 14, 2016 at 6:31 PM, james <garftd@verizon.net
>         <ma...@verizon.net>
>         <mailto:garftd@verizon.net <ma...@verizon.net>>> wrote:
>
>              On 06/14/2016 08:14 AM, Joris Van Remoortere wrote:
>
>                      On the condition of compatible with existing
>         framework which
>                      already rely on parsing attributes for rack
>         information.
>
>                  There is currently nothing in Mesos that specifies the
>         format or
>                  structure for rack information in attributes.
>                  The fact that operators / frameworks have decided to
>         add this
>                  information out of band is their problem to solve.
>                  We don't need to be backwards compatible with something
>         we never
>                  published to begin with. This is why it's ok for us to
>         consider
>                  adding a
>                  typed form of failure domain information that is
>         separate from the
>                  typeless string attributes.
>
>
>              True. But you have to start somewhere, know that the schema and
>              codes will morph over time to maintain relevance  and
>         usefulness. In
>              that vein, if folks have established interesting and useful
>              parameters for this work, then it is most beneficial that those
>              methods and codes are considered carefully.  AKA:: speak up
>         now.
>              Diversity and inclusion are keenly beneficial, where practical.
>
>
>                  Since your interest is in the determination of the
>         values, as
>                  opposed to
>                  their propagation, I would just urge that you keep in
>         mind that
>                  we may
>                  (as a project) not want to support this information as
>         the current
>                  string attributes.
>
>
>              Huh? Why not? If the attributes change, why can't this
>         sub-project
>              just change with those changing string attributes? Maybe some
>              elaboration how this might not naturally be able to evolve is a
>              warranted detail of discussion?
>
>
>              I would venture that both 'determination of the values and
>              propagation (delays)' are inherently important in a cluster
>         of many
>              things:: hardware, resources, frameworks, security codes,
>         etc etc.
>              The author
>              and others seem to be keenly aware that a tight focus is
>         not going
>              to work, at this stage, so a broad appeal to a multitude of
>         needs is
>              best.
>              And in fact, until some idea is proven to be useless or too
>         difficult to
>              implement, the bigger the tent, the more useful the codes that
>              define this project/idea become.  Personally, I'm very
>         excited that
>              someone has stepped up in this area; hoping they keep an
>         open mind
>              and flexibility geared toward multiplicative usage, in the
>         future.
>              Most mature hardware folks who build ideas into robust
>         systems do
>              exactly that, to motivate a multiplicative usage for organizing
>              hardware, performance and state metrics, and timing signals,
>              gregariously. All of this is routine semantics from a hardware
>              perspective.
>
>              At some point, folks will realize that kernel
>         configuration, testing
>              and tweaks are critical to cluster performance, regardless
>         of the codes
>              running on top of the cluster. So this project could easily
>         use cgroups
>              and such for achieve robustness in many areas of need.
>
>
>              Like it or not large amounts of hardware, need to have schema,
>              planning and architectural robustness to keep large amounts of
>              hardware, pristinely  available for software efficiency to
>         be any
>              where near optimal deployment. This really becomes critical
>         when the
>              mix of different CPU types, GPUs and ram are to be
>         considered in
>              future deployments, regardless if you outsource or run your own
>              cluster. Hardware vendors are going to want to sell their
>         products
>              to as wide of a customer base a possible and customers are
>         going to
>              demand seamless management for expansion of resources.
>         Furthermore,
>              as a consultant my experiences are that much of the future
>         market is
>              going to demand outsourced, hybrid and in-house options as a
>              fundamental tenant of cluster resource adoption.
>
>              hth,
>              James
>
>
>                  *Joris Van Remoortere*
>                  Mesosphere
>
>                  On Tue, Jun 14, 2016 at 3:02 PM, Du, Fan
>         <fan.du@intel.com <ma...@intel.com>
>                  <mailto:fan.du@intel.com <ma...@intel.com>>
>                  <mailto:fan.du@intel.com <ma...@intel.com>
>         <mailto:fan.du@intel.com <ma...@intel.com>>>> wrote:
>
>
>
>                       On 2016/6/14 20:32, Joris Van Remoortere wrote:
>
>                                #1. Stick with attributes for rack awareness
>
>                           I don't think this is the right approach; however,
>                  there seem to
>                           be 2
>                           components to this discussion:
>
>                           1. How the values are presented (Attributes
>         vs. a new
>                  type-aware
>                           structure)
>                           2. How the values are determined (scripts vs.
>                  automation vs.
>                           modules)
>
>                           It seems you are more interested in working on
>         #2. If
>                  that's the
>                           case,
>                           please make sure that you don't assume
>         anything about
>                  #1, as we not
>                           everyone agrees that we will use the existing
>                  attributes in the
>                           future.
>
>
>                       On the condition of compatible with existing framework
>                  which already
>                       rely on parsing attributes for rack information.
>
>                       Quotes from my original statements:
>                       > For compatibility with existing framework, I
>         tend to be
>                  ok with using
>                       > attributes to convey the rack information
>
>                       By all means, no matter what internal structures
>         to use,
>                  current
>                       behavior should be honored. btw, I'm also thinking
>         about
>                  #1, it's
>                       too earlier to bring up the details so far before the
>                  ticket got
>                       ACCEPTED.
>
>                       Any way, I'm always open to all kind of
>         discussion, thanks
>                  for your
>                       comments! Joris.
>
>                           For #2, you should focus on an API (module or
>         script
>                  results)
>                           that will
>                           support all the different methods the
>         community wants
>                  to use to
>                           generate
>                           this data.
>
>                           As you mentioned, updating the values for a
>         running
>                  agent is not
>                           straightforward. A lot of design work will
>         need to go
>                  into how these
>                           values are propagated to frameworks that have made
>                  assumptions about
>                           them, and which values are allowed to change
>         vs. not.
>
>                           \u2014
>                           *Joris Van Remoortere*
>                           Mesosphere
>
>                           On Tue, Jun 14, 2016 at 10:04 AM, Aaron Carey
>                  <acarey@ilm.com <ma...@ilm.com>
>         <mailto:acarey@ilm.com <ma...@ilm.com>>
>                           <mailto:acarey@ilm.com <ma...@ilm.com>
>         <mailto:acarey@ilm.com <ma...@ilm.com>>>
>                           <mailto:acarey@ilm.com <ma...@ilm.com>
>         <mailto:acarey@ilm.com <ma...@ilm.com>>
>                  <mailto:acarey@ilm.com <ma...@ilm.com>
>         <mailto:acarey@ilm.com <ma...@ilm.com>>>>> wrote:
>
>                                #3 would be very helpful for us. Also
>         related:
>
>         https://issues.apache.org/jira/browse/MESOS-3059
>
>                                --
>
>                                Aaron Carey
>                                Production Engineer - Cloud Pipeline
>                                Industrial Light & Magic
>                                London
>                                020 3751 9150
>
>                                ________________________________________
>                                From: Du, Fan [fan.du@intel.com
>         <ma...@intel.com>
>                  <mailto:fan.du@intel.com <ma...@intel.com>>
>         <mailto:fan.du@intel.com <ma...@intel.com>
>                  <mailto:fan.du@intel.com <ma...@intel.com>>>
>                           <mailto:fan.du@intel.com
>         <ma...@intel.com> <mailto:fan.du@intel.com
>         <ma...@intel.com>>
>                  <mailto:fan.du@intel.com <ma...@intel.com>
>         <mailto:fan.du@intel.com <ma...@intel.com>>>>]
>                                Sent: 14 June 2016 07:24
>                                To: user@mesos.apache.org
>         <ma...@mesos.apache.org>
>                  <mailto:user@mesos.apache.org
>         <ma...@mesos.apache.org>> <mailto:user@mesos.apache.org
>         <ma...@mesos.apache.org>
>                  <mailto:user@mesos.apache.org
>         <ma...@mesos.apache.org>>>
>                           <mailto:user@mesos.apache.org
>         <ma...@mesos.apache.org>
>                  <mailto:user@mesos.apache.org
>         <ma...@mesos.apache.org>> <mailto:user@mesos.apache.org
>         <ma...@mesos.apache.org>
>                  <mailto:user@mesos.apache.org
>         <ma...@mesos.apache.org>>>>;
>         dev@mesos.apache.org <ma...@mesos.apache.org>
>         <mailto:dev@mesos.apache.org <ma...@mesos.apache.org>>
>                  <mailto:dev@mesos.apache.org
>         <ma...@mesos.apache.org> <mailto:dev@mesos.apache.org
>         <ma...@mesos.apache.org>>>
>                           <mailto:dev@mesos.apache.org
>         <ma...@mesos.apache.org>
>                  <mailto:dev@mesos.apache.org
>         <ma...@mesos.apache.org>> <mailto:dev@mesos.apache.org
>         <ma...@mesos.apache.org>
>                  <mailto:dev@mesos.apache.org
>         <ma...@mesos.apache.org>>>>
>                                Cc: Joris Van Remoortere;
>         vinodkone@apache.org <ma...@apache.org>
>                  <mailto:vinodkone@apache.org <ma...@apache.org>>
>                           <mailto:vinodkone@apache.org
>         <ma...@apache.org> <mailto:vinodkone@apache.org
>         <ma...@apache.org>>>
>                                <mailto:vinodkone@apache.org
>         <ma...@apache.org>
>                  <mailto:vinodkone@apache.org
>         <ma...@apache.org>> <mailto:vinodkone@apache.org
>         <ma...@apache.org>
>                  <mailto:vinodkone@apache.org
>         <ma...@apache.org>>>>
>
>
>                                Subject: Re: Rack awareness support for Mesos
>
>                                Hi everyone
>
>                                Let me summarize the discussion about Rack
>                  awareness in the
>                           community so
>                                far. First thanks for all the comments,
>         advices or
>                           challenges! :)
>
>                                #1. Stick with attributes for rack awareness
>
>                                For compatibility with existing
>         framework, I tend
>                  to be ok
>                           with using
>                                attributes to convey the rack
>         information, but
>                  with the
>                           goal to do it
>                                automatically, easy to maintain and with good
>                  attributes
>                           schema. This
>                                will bring up below question where the
>         controversy
>                  starts.
>
>                                #2. Scripts vs programmatic way
>
>                                Both can be used to set attributes, I've
>         made my
>                  arguments
>                           in the Jira
>                                and the Design doc, I'm not gonna to
>         argue more
>                  here. But
>                           please take a
>                                look discussion at MESOS-3366 before,
>         which allow
>                           resources/attributes
>                                discovery.
>
>                                A module to implement
>         *slaveAttributesDecorator*
>                  hook will
>                           works like
>                                a charm here in a static way. And need to
>         justify
>                           attributes updating.
>
>                                #3. Allow updating attributes
>                                Several cases need to be covered here:
>
>                                a). Mesos runs inside VMs or container,
>         where live
>                           migration happens, so
>                                rack information need to be updated.
>
>                                b). LLDP packets are broadcasted by the
>         interval
>                  10s~30s, a
>                           vendor
>                                specific implementation, and rack
>         information are
>                  usually
>                           stored in LLDP
>                                daemon to be queried. Worst cases(nodes fresh
>                  reboot, or
>                           daemon restart)
>                                would be: Mesos slave have to wait
>         10s~30s for a
>                  valid rack
>                           information
>                                before register to master. Allow updating
>                  attributes will
>                           mitigate this
>                                problem.
>
>                                c). Framework affinity
>
>                                Framework X prefers to run on the same
>         nodes with
>                  another
>                           framwork Y.
>                                For example, it's desirable for Shark or
>         Spark-SQL to
>                           reside on the
>                                *worker* node where Alluxio(former
>         Tachyon) to
>                  gain more
>                           performance
>                                boosting as SPARK-6707 ticket message
>                           {tachyon=true;us-east-1=false}
>
>                                If framework could advertise agent
>         attributes in the
>                           ResourcesOffer
>                                process, awesome!
>
>
>                                #4. Rearrange agents in a more scalable
>         manner,
>                  like per
>                           rack basis
>
>                                Randomly offering agents resource to
>         framework
>                  does not
>                           improve data
>                                locality, imagine the likelihood of a
>         framework
>                  getting
>                           resources
>                                underneath the same rack, at the scale of
>         +30000
>                  nodes.
>                           Moreover time to
>                                randomly shuffle the agents also grows.
>
>                                How about rearranging the agent in a per rack
>                  basis, and a
>                           minor change
>                                to the way how resources are allocated
>         will fix this.
>
>
>                                I might not see the whole picture here, so
>                  comments are
>                           welcomed!
>
>
>                                On 2016/6/6 17:17, Du, Fan wrote:
>                                 > Hi, Mesos folks
>                                 >
>                                 > I\u2019ve been thinking about Mesos rack
>         awareness
>                  support
>                           for a while,
>                                 >
>                                 > it\u2019s a common interest for lots of
>         data center
>                           applications to
>                                provide
>                                 > data locality,
>                                 >
>                                 > fault tolerance and better task
>         placement. Create
>                           MESOS-5545 to track
>                                 > the story,
>                                 >
>                                 > and here is the initial design doc [1] to
>                  support rack
>                           awareness
>                                in Mesos.
>                                 >
>                                 > Looking forward to hear any comments
>         from end
>                  user and other
>                                developers,
>                                 >
>                                 > Thanks!
>                                 >
>                                 > [1]:
>                                 >
>         https://docs.google.com/document/d/1rql_LZSwtQzBPALnk0qCLsmxcT3-zB7X7aJp-H3xxyE/edit?usp=sharing
>                                 >
>
>
>
>
>
>
>

Re: Rack awareness support for Mesos

Posted by Joris Van Remoortere <jo...@mesosphere.io>.

@Fan,

In the community meeting a question was raised around which frameworks
might be ready to use this.
Can you provide some more context for immediate use cases on the framework
side?

—
*Joris Van Remoortere*
Mesosphere

On Wed, Jun 15, 2016 at 5:04 PM, james <ga...@verizon.net> wrote:

> @Joris,
>
>
> OK. Now I understand where you are coming from. As soon as I get some
> time, I'll join that design discussion. Thanks for the clarifications.
>
> James
>
>
>
>
>
> On 06/15/2016 02:45 AM, Joris Van Remoortere wrote:
>
>>         Since your interest is in the determination of the values, as
>>         opposed to
>>
>>         their propagation, I would just urge that you keep in mind that
>>         we may
>>
>>         (as a project) not want to support this information as the current
>>
>>         string attributes.
>>
>>
>>     Huh? Why not? If the attributes change, why can't this sub-project
>>     just change with those changing string attributes? Maybe some
>>     elaboration how this might not naturally be able to evolve is a
>>     warranted detail of discussion?
>>
>>
>> Sorry, I should clarify what I meant by support. By support I mean that
>> we may not want to promise that those values will be there (support as a
>> feature), and what schemas are mangled into the random strings that we
>> currently call attributes. I did not mean that we wouldn't allow users
>> to inject their own values if they wanted to. We just wouldn't control
>> the standard or schema as a project and therefore couldn't support it.
>>
>> Any random collection of strings that has previously had no reserved
>> keywords is notoriously difficult to build new schemas in.
>> This is why we may want to instead introduce a typed structure that is
>> dedicated to fault domain information. This:
>>
>>   * Prevents us from colliding with current users' attributes.
>>   * Allows us to have more control over the types (YAY) and ranges of
>>     values.
>>   * Allows us to introduce explicit structure such as dependency or
>>     hierarchy.
>>
>> The fact that users have already encoded information in attributes is
>> not a reason for us to limit ourselves to that scope when better
>> structures may be available. This is why we shouldn't assume that the
>> project will *provide support for* (as opposed to allow users to) using
>> attributes.
>>
>> As your said, it is their prerogative to join the design discussion to
>> ensure that any formalized structure or schema we introduce is one that
>> they are agreeable with.
>>
>>
>>
>> —
>> *Joris Van Remoortere*
>> Mesosphere
>>
>> On Tue, Jun 14, 2016 at 6:31 PM, james <garftd@verizon.net
>> <ma...@verizon.net>> wrote:
>>
>>     On 06/14/2016 08:14 AM, Joris Van Remoortere wrote:
>>
>>             On the condition of compatible with existing framework which
>>             already rely on parsing attributes for rack information.
>>
>>         There is currently nothing in Mesos that specifies the format or
>>         structure for rack information in attributes.
>>         The fact that operators / frameworks have decided to add this
>>         information out of band is their problem to solve.
>>         We don't need to be backwards compatible with something we never
>>         published to begin with. This is why it's ok for us to consider
>>         adding a
>>         typed form of failure domain information that is separate from the
>>         typeless string attributes.
>>
>>
>>     True. But you have to start somewhere, know that the schema and
>>     codes will morph over time to maintain relevance  and usefulness. In
>>     that vein, if folks have established interesting and useful
>>     parameters for this work, then it is most beneficial that those
>>     methods and codes are considered carefully.  AKA:: speak up now.
>>     Diversity and inclusion are keenly beneficial, where practical.
>>
>>
>>         Since your interest is in the determination of the values, as
>>         opposed to
>>         their propagation, I would just urge that you keep in mind that
>>         we may
>>         (as a project) not want to support this information as the current
>>         string attributes.
>>
>>
>>     Huh? Why not? If the attributes change, why can't this sub-project
>>     just change with those changing string attributes? Maybe some
>>     elaboration how this might not naturally be able to evolve is a
>>     warranted detail of discussion?
>>
>>
>>     I would venture that both 'determination of the values and
>>     propagation (delays)' are inherently important in a cluster of many
>>     things:: hardware, resources, frameworks, security codes, etc etc.
>>     The author
>>     and others seem to be keenly aware that a tight focus is not going
>>     to work, at this stage, so a broad appeal to a multitude of needs is
>>     best.
>>     And in fact, until some idea is proven to be useless or too difficult
>> to
>>     implement, the bigger the tent, the more useful the codes that
>>     define this project/idea become.  Personally, I'm very excited that
>>     someone has stepped up in this area; hoping they keep an open mind
>>     and flexibility geared toward multiplicative usage, in the future.
>>     Most mature hardware folks who build ideas into robust systems do
>>     exactly that, to motivate a multiplicative usage for organizing
>>     hardware, performance and state metrics, and timing signals,
>>     gregariously. All of this is routine semantics from a hardware
>>     perspective.
>>
>>     At some point, folks will realize that kernel configuration, testing
>>     and tweaks are critical to cluster performance, regardless of the
>> codes
>>     running on top of the cluster. So this project could easily use
>> cgroups
>>     and such for achieve robustness in many areas of need.
>>
>>
>>     Like it or not large amounts of hardware, need to have schema,
>>     planning and architectural robustness to keep large amounts of
>>     hardware, pristinely  available for software efficiency to be any
>>     where near optimal deployment. This really becomes critical when the
>>     mix of different CPU types, GPUs and ram are to be considered in
>>     future deployments, regardless if you outsource or run your own
>>     cluster. Hardware vendors are going to want to sell their products
>>     to as wide of a customer base a possible and customers are going to
>>     demand seamless management for expansion of resources. Furthermore,
>>     as a consultant my experiences are that much of the future market is
>>     going to demand outsourced, hybrid and in-house options as a
>>     fundamental tenant of cluster resource adoption.
>>
>>     hth,
>>     James
>>
>>
>>         *Joris Van Remoortere*
>>         Mesosphere
>>
>>         On Tue, Jun 14, 2016 at 3:02 PM, Du, Fan <fan.du@intel.com
>>         <ma...@intel.com>
>>         <mailto:fan.du@intel.com <ma...@intel.com>>> wrote:
>>
>>
>>
>>              On 2016/6/14 20:32, Joris Van Remoortere wrote:
>>
>>                       #1. Stick with attributes for rack awareness
>>
>>                  I don't think this is the right approach; however,
>>         there seem to
>>                  be 2
>>                  components to this discussion:
>>
>>                  1. How the values are presented (Attributes vs. a new
>>         type-aware
>>                  structure)
>>                  2. How the values are determined (scripts vs.
>>         automation vs.
>>                  modules)
>>
>>                  It seems you are more interested in working on #2. If
>>         that's the
>>                  case,
>>                  please make sure that you don't assume anything about
>>         #1, as we not
>>                  everyone agrees that we will use the existing
>>         attributes in the
>>                  future.
>>
>>
>>              On the condition of compatible with existing framework
>>         which already
>>              rely on parsing attributes for rack information.
>>
>>              Quotes from my original statements:
>>              > For compatibility with existing framework, I tend to be
>>         ok with using
>>              > attributes to convey the rack information
>>
>>              By all means, no matter what internal structures to use,
>>         current
>>              behavior should be honored. btw, I'm also thinking about
>>         #1, it's
>>              too earlier to bring up the details so far before the
>>         ticket got
>>              ACCEPTED.
>>
>>              Any way, I'm always open to all kind of discussion, thanks
>>         for your
>>              comments! Joris.
>>
>>                  For #2, you should focus on an API (module or script
>>         results)
>>                  that will
>>                  support all the different methods the community wants
>>         to use to
>>                  generate
>>                  this data.
>>
>>                  As you mentioned, updating the values for a running
>>         agent is not
>>                  straightforward. A lot of design work will need to go
>>         into how these
>>                  values are propagated to frameworks that have made
>>         assumptions about
>>                  them, and which values are allowed to change vs. not.
>>
>>                  —
>>                  *Joris Van Remoortere*
>>                  Mesosphere
>>
>>                  On Tue, Jun 14, 2016 at 10:04 AM, Aaron Carey
>>         <acarey@ilm.com <ma...@ilm.com>
>>                  <mailto:acarey@ilm.com <ma...@ilm.com>>
>>                  <mailto:acarey@ilm.com <ma...@ilm.com>
>>         <mailto:acarey@ilm.com <ma...@ilm.com>>>> wrote:
>>
>>                       #3 would be very helpful for us. Also related:
>>
>>         https://issues.apache.org/jira/browse/MESOS-3059
>>
>>                       --
>>
>>                       Aaron Carey
>>                       Production Engineer - Cloud Pipeline
>>                       Industrial Light & Magic
>>                       London
>>                       020 3751 9150
>>
>>                       ________________________________________
>>                       From: Du, Fan [fan.du@intel.com
>>         <ma...@intel.com> <mailto:fan.du@intel.com
>>         <ma...@intel.com>>
>>                  <mailto:fan.du@intel.com <ma...@intel.com>
>>         <mailto:fan.du@intel.com <ma...@intel.com>>>]
>>                       Sent: 14 June 2016 07:24
>>                       To: user@mesos.apache.org
>>         <ma...@mesos.apache.org> <mailto:user@mesos.apache.org
>>         <ma...@mesos.apache.org>>
>>                  <mailto:user@mesos.apache.org
>>         <ma...@mesos.apache.org> <mailto:user@mesos.apache.org
>>         <ma...@mesos.apache.org>>>;
>>         dev@mesos.apache.org <ma...@mesos.apache.org>
>>         <mailto:dev@mesos.apache.org <ma...@mesos.apache.org>>
>>                  <mailto:dev@mesos.apache.org
>>         <ma...@mesos.apache.org> <mailto:dev@mesos.apache.org
>>         <ma...@mesos.apache.org>>>
>>                       Cc: Joris Van Remoortere; vinodkone@apache.org
>>         <ma...@apache.org>
>>                  <mailto:vinodkone@apache.org <mailto:
>> vinodkone@apache.org>>
>>                       <mailto:vinodkone@apache.org
>>         <ma...@apache.org> <mailto:vinodkone@apache.org
>>         <ma...@apache.org>>>
>>
>>
>>                       Subject: Re: Rack awareness support for Mesos
>>
>>                       Hi everyone
>>
>>                       Let me summarize the discussion about Rack
>>         awareness in the
>>                  community so
>>                       far. First thanks for all the comments, advices or
>>                  challenges! :)
>>
>>                       #1. Stick with attributes for rack awareness
>>
>>                       For compatibility with existing framework, I tend
>>         to be ok
>>                  with using
>>                       attributes to convey the rack information, but
>>         with the
>>                  goal to do it
>>                       automatically, easy to maintain and with good
>>         attributes
>>                  schema. This
>>                       will bring up below question where the controversy
>>         starts.
>>
>>                       #2. Scripts vs programmatic way
>>
>>                       Both can be used to set attributes, I've made my
>>         arguments
>>                  in the Jira
>>                       and the Design doc, I'm not gonna to argue more
>>         here. But
>>                  please take a
>>                       look discussion at MESOS-3366 before, which allow
>>                  resources/attributes
>>                       discovery.
>>
>>                       A module to implement *slaveAttributesDecorator*
>>         hook will
>>                  works like
>>                       a charm here in a static way. And need to justify
>>                  attributes updating.
>>
>>                       #3. Allow updating attributes
>>                       Several cases need to be covered here:
>>
>>                       a). Mesos runs inside VMs or container, where live
>>                  migration happens, so
>>                       rack information need to be updated.
>>
>>                       b). LLDP packets are broadcasted by the interval
>>         10s~30s, a
>>                  vendor
>>                       specific implementation, and rack information are
>>         usually
>>                  stored in LLDP
>>                       daemon to be queried. Worst cases(nodes fresh
>>         reboot, or
>>                  daemon restart)
>>                       would be: Mesos slave have to wait 10s~30s for a
>>         valid rack
>>                  information
>>                       before register to master. Allow updating
>>         attributes will
>>                  mitigate this
>>                       problem.
>>
>>                       c). Framework affinity
>>
>>                       Framework X prefers to run on the same nodes with
>>         another
>>                  framwork Y.
>>                       For example, it's desirable for Shark or Spark-SQL
>> to
>>                  reside on the
>>                       *worker* node where Alluxio(former Tachyon) to
>>         gain more
>>                  performance
>>                       boosting as SPARK-6707 ticket message
>>                  {tachyon=true;us-east-1=false}
>>
>>                       If framework could advertise agent attributes in the
>>                  ResourcesOffer
>>                       process, awesome!
>>
>>
>>                       #4. Rearrange agents in a more scalable manner,
>>         like per
>>                  rack basis
>>
>>                       Randomly offering agents resource to framework
>>         does not
>>                  improve data
>>                       locality, imagine the likelihood of a framework
>>         getting
>>                  resources
>>                       underneath the same rack, at the scale of +30000
>>         nodes.
>>                  Moreover time to
>>                       randomly shuffle the agents also grows.
>>
>>                       How about rearranging the agent in a per rack
>>         basis, and a
>>                  minor change
>>                       to the way how resources are allocated will fix
>> this.
>>
>>
>>                       I might not see the whole picture here, so
>>         comments are
>>                  welcomed!
>>
>>
>>                       On 2016/6/6 17:17, Du, Fan wrote:
>>                        > Hi, Mesos folks
>>                        >
>>                        > I’ve been thinking about Mesos rack awareness
>>         support
>>                  for a while,
>>                        >
>>                        > it’s a common interest for lots of data center
>>                  applications to
>>                       provide
>>                        > data locality,
>>                        >
>>                        > fault tolerance and better task placement. Create
>>                  MESOS-5545 to track
>>                        > the story,
>>                        >
>>                        > and here is the initial design doc [1] to
>>         support rack
>>                  awareness
>>                       in Mesos.
>>                        >
>>                        > Looking forward to hear any comments from end
>>         user and other
>>                       developers,
>>                        >
>>                        > Thanks!
>>                        >
>>                        > [1]:
>>                        >
>>
>> https://docs.google.com/document/d/1rql_LZSwtQzBPALnk0qCLsmxcT3-zB7X7aJp-H3xxyE/edit?usp=sharing
>>                        >
>>
>>
>>
>>
>>
>>
>

Re: Rack awareness support for Mesos

Posted by james <ga...@verizon.net>.

@Joris,


OK. Now I understand where you are coming from. As soon as I get some 
time, I'll join that design discussion. Thanks for the clarifications.

James





On 06/15/2016 02:45 AM, Joris Van Remoortere wrote:
>         Since your interest is in the determination of the values, as
>         opposed to
>
>         their propagation, I would just urge that you keep in mind that
>         we may
>
>         (as a project) not want to support this information as the current
>
>         string attributes.
>
>
>     Huh? Why not? If the attributes change, why can't this sub-project
>     just change with those changing string attributes? Maybe some
>     elaboration how this might not naturally be able to evolve is a
>     warranted detail of discussion?
>
>
> Sorry, I should clarify what I meant by support. By support I mean that
> we may not want to promise that those values will be there (support as a
> feature), and what schemas are mangled into the random strings that we
> currently call attributes. I did not mean that we wouldn't allow users
> to inject their own values if they wanted to. We just wouldn't control
> the standard or schema as a project and therefore couldn't support it.
>
> Any random collection of strings that has previously had no reserved
> keywords is notoriously difficult to build new schemas in.
> This is why we may want to instead introduce a typed structure that is
> dedicated to fault domain information. This:
>
>   * Prevents us from colliding with current users' attributes.
>   * Allows us to have more control over the types (YAY) and ranges of
>     values.
>   * Allows us to introduce explicit structure such as dependency or
>     hierarchy.
>
> The fact that users have already encoded information in attributes is
> not a reason for us to limit ourselves to that scope when better
> structures may be available. This is why we shouldn't assume that the
> project will *provide support for* (as opposed to allow users to) using
> attributes.
>
> As your said, it is their prerogative to join the design discussion to
> ensure that any formalized structure or schema we introduce is one that
> they are agreeable with.
>
>
>
> 
> *Joris Van Remoortere*
> Mesosphere
>
> On Tue, Jun 14, 2016 at 6:31 PM, james <garftd@verizon.net
> <ma...@verizon.net>> wrote:
>
>     On 06/14/2016 08:14 AM, Joris Van Remoortere wrote:
>
>             On the condition of compatible with existing framework which
>             already rely on parsing attributes for rack information.
>
>         There is currently nothing in Mesos that specifies the format or
>         structure for rack information in attributes.
>         The fact that operators / frameworks have decided to add this
>         information out of band is their problem to solve.
>         We don't need to be backwards compatible with something we never
>         published to begin with. This is why it's ok for us to consider
>         adding a
>         typed form of failure domain information that is separate from the
>         typeless string attributes.
>
>
>     True. But you have to start somewhere, know that the schema and
>     codes will morph over time to maintain relevance  and usefulness. In
>     that vein, if folks have established interesting and useful
>     parameters for this work, then it is most beneficial that those
>     methods and codes are considered carefully.  AKA:: speak up now.
>     Diversity and inclusion are keenly beneficial, where practical.
>
>
>         Since your interest is in the determination of the values, as
>         opposed to
>         their propagation, I would just urge that you keep in mind that
>         we may
>         (as a project) not want to support this information as the current
>         string attributes.
>
>
>     Huh? Why not? If the attributes change, why can't this sub-project
>     just change with those changing string attributes? Maybe some
>     elaboration how this might not naturally be able to evolve is a
>     warranted detail of discussion?
>
>
>     I would venture that both 'determination of the values and
>     propagation (delays)' are inherently important in a cluster of many
>     things:: hardware, resources, frameworks, security codes, etc etc.
>     The author
>     and others seem to be keenly aware that a tight focus is not going
>     to work, at this stage, so a broad appeal to a multitude of needs is
>     best.
>     And in fact, until some idea is proven to be useless or too difficult to
>     implement, the bigger the tent, the more useful the codes that
>     define this project/idea become.  Personally, I'm very excited that
>     someone has stepped up in this area; hoping they keep an open mind
>     and flexibility geared toward multiplicative usage, in the future.
>     Most mature hardware folks who build ideas into robust systems do
>     exactly that, to motivate a multiplicative usage for organizing
>     hardware, performance and state metrics, and timing signals,
>     gregariously. All of this is routine semantics from a hardware
>     perspective.
>
>     At some point, folks will realize that kernel configuration, testing
>     and tweaks are critical to cluster performance, regardless of the codes
>     running on top of the cluster. So this project could easily use cgroups
>     and such for achieve robustness in many areas of need.
>
>
>     Like it or not large amounts of hardware, need to have schema,
>     planning and architectural robustness to keep large amounts of
>     hardware, pristinely  available for software efficiency to be any
>     where near optimal deployment. This really becomes critical when the
>     mix of different CPU types, GPUs and ram are to be considered in
>     future deployments, regardless if you outsource or run your own
>     cluster. Hardware vendors are going to want to sell their products
>     to as wide of a customer base a possible and customers are going to
>     demand seamless management for expansion of resources. Furthermore,
>     as a consultant my experiences are that much of the future market is
>     going to demand outsourced, hybrid and in-house options as a
>     fundamental tenant of cluster resource adoption.
>
>     hth,
>     James
>
>
>         *Joris Van Remoortere*
>         Mesosphere
>
>         On Tue, Jun 14, 2016 at 3:02 PM, Du, Fan <fan.du@intel.com
>         <ma...@intel.com>
>         <mailto:fan.du@intel.com <ma...@intel.com>>> wrote:
>
>
>
>              On 2016/6/14 20:32, Joris Van Remoortere wrote:
>
>                       #1. Stick with attributes for rack awareness
>
>                  I don't think this is the right approach; however,
>         there seem to
>                  be 2
>                  components to this discussion:
>
>                  1. How the values are presented (Attributes vs. a new
>         type-aware
>                  structure)
>                  2. How the values are determined (scripts vs.
>         automation vs.
>                  modules)
>
>                  It seems you are more interested in working on #2. If
>         that's the
>                  case,
>                  please make sure that you don't assume anything about
>         #1, as we not
>                  everyone agrees that we will use the existing
>         attributes in the
>                  future.
>
>
>              On the condition of compatible with existing framework
>         which already
>              rely on parsing attributes for rack information.
>
>              Quotes from my original statements:
>              > For compatibility with existing framework, I tend to be
>         ok with using
>              > attributes to convey the rack information
>
>              By all means, no matter what internal structures to use,
>         current
>              behavior should be honored. btw, I'm also thinking about
>         #1, it's
>              too earlier to bring up the details so far before the
>         ticket got
>              ACCEPTED.
>
>              Any way, I'm always open to all kind of discussion, thanks
>         for your
>              comments! Joris.
>
>                  For #2, you should focus on an API (module or script
>         results)
>                  that will
>                  support all the different methods the community wants
>         to use to
>                  generate
>                  this data.
>
>                  As you mentioned, updating the values for a running
>         agent is not
>                  straightforward. A lot of design work will need to go
>         into how these
>                  values are propagated to frameworks that have made
>         assumptions about
>                  them, and which values are allowed to change vs. not.
>
>                  
>                  *Joris Van Remoortere*
>                  Mesosphere
>
>                  On Tue, Jun 14, 2016 at 10:04 AM, Aaron Carey
>         <acarey@ilm.com <ma...@ilm.com>
>                  <mailto:acarey@ilm.com <ma...@ilm.com>>
>                  <mailto:acarey@ilm.com <ma...@ilm.com>
>         <mailto:acarey@ilm.com <ma...@ilm.com>>>> wrote:
>
>                       #3 would be very helpful for us. Also related:
>
>         https://issues.apache.org/jira/browse/MESOS-3059
>
>                       --
>
>                       Aaron Carey
>                       Production Engineer - Cloud Pipeline
>                       Industrial Light & Magic
>                       London
>                       020 3751 9150
>
>                       ________________________________________
>                       From: Du, Fan [fan.du@intel.com
>         <ma...@intel.com> <mailto:fan.du@intel.com
>         <ma...@intel.com>>
>                  <mailto:fan.du@intel.com <ma...@intel.com>
>         <mailto:fan.du@intel.com <ma...@intel.com>>>]
>                       Sent: 14 June 2016 07:24
>                       To: user@mesos.apache.org
>         <ma...@mesos.apache.org> <mailto:user@mesos.apache.org
>         <ma...@mesos.apache.org>>
>                  <mailto:user@mesos.apache.org
>         <ma...@mesos.apache.org> <mailto:user@mesos.apache.org
>         <ma...@mesos.apache.org>>>;
>         dev@mesos.apache.org <ma...@mesos.apache.org>
>         <mailto:dev@mesos.apache.org <ma...@mesos.apache.org>>
>                  <mailto:dev@mesos.apache.org
>         <ma...@mesos.apache.org> <mailto:dev@mesos.apache.org
>         <ma...@mesos.apache.org>>>
>                       Cc: Joris Van Remoortere; vinodkone@apache.org
>         <ma...@apache.org>
>                  <mailto:vinodkone@apache.org <ma...@apache.org>>
>                       <mailto:vinodkone@apache.org
>         <ma...@apache.org> <mailto:vinodkone@apache.org
>         <ma...@apache.org>>>
>
>
>                       Subject: Re: Rack awareness support for Mesos
>
>                       Hi everyone
>
>                       Let me summarize the discussion about Rack
>         awareness in the
>                  community so
>                       far. First thanks for all the comments, advices or
>                  challenges! :)
>
>                       #1. Stick with attributes for rack awareness
>
>                       For compatibility with existing framework, I tend
>         to be ok
>                  with using
>                       attributes to convey the rack information, but
>         with the
>                  goal to do it
>                       automatically, easy to maintain and with good
>         attributes
>                  schema. This
>                       will bring up below question where the controversy
>         starts.
>
>                       #2. Scripts vs programmatic way
>
>                       Both can be used to set attributes, I've made my
>         arguments
>                  in the Jira
>                       and the Design doc, I'm not gonna to argue more
>         here. But
>                  please take a
>                       look discussion at MESOS-3366 before, which allow
>                  resources/attributes
>                       discovery.
>
>                       A module to implement *slaveAttributesDecorator*
>         hook will
>                  works like
>                       a charm here in a static way. And need to justify
>                  attributes updating.
>
>                       #3. Allow updating attributes
>                       Several cases need to be covered here:
>
>                       a). Mesos runs inside VMs or container, where live
>                  migration happens, so
>                       rack information need to be updated.
>
>                       b). LLDP packets are broadcasted by the interval
>         10s~30s, a
>                  vendor
>                       specific implementation, and rack information are
>         usually
>                  stored in LLDP
>                       daemon to be queried. Worst cases(nodes fresh
>         reboot, or
>                  daemon restart)
>                       would be: Mesos slave have to wait 10s~30s for a
>         valid rack
>                  information
>                       before register to master. Allow updating
>         attributes will
>                  mitigate this
>                       problem.
>
>                       c). Framework affinity
>
>                       Framework X prefers to run on the same nodes with
>         another
>                  framwork Y.
>                       For example, it's desirable for Shark or Spark-SQL to
>                  reside on the
>                       *worker* node where Alluxio(former Tachyon) to
>         gain more
>                  performance
>                       boosting as SPARK-6707 ticket message
>                  {tachyon=true;us-east-1=false}
>
>                       If framework could advertise agent attributes in the
>                  ResourcesOffer
>                       process, awesome!
>
>
>                       #4. Rearrange agents in a more scalable manner,
>         like per
>                  rack basis
>
>                       Randomly offering agents resource to framework
>         does not
>                  improve data
>                       locality, imagine the likelihood of a framework
>         getting
>                  resources
>                       underneath the same rack, at the scale of +30000
>         nodes.
>                  Moreover time to
>                       randomly shuffle the agents also grows.
>
>                       How about rearranging the agent in a per rack
>         basis, and a
>                  minor change
>                       to the way how resources are allocated will fix this.
>
>
>                       I might not see the whole picture here, so
>         comments are
>                  welcomed!
>
>
>                       On 2016/6/6 17:17, Du, Fan wrote:
>                        > Hi, Mesos folks
>                        >
>                        > Ive been thinking about Mesos rack awareness
>         support
>                  for a while,
>                        >
>                        > its a common interest for lots of data center
>                  applications to
>                       provide
>                        > data locality,
>                        >
>                        > fault tolerance and better task placement. Create
>                  MESOS-5545 to track
>                        > the story,
>                        >
>                        > and here is the initial design doc [1] to
>         support rack
>                  awareness
>                       in Mesos.
>                        >
>                        > Looking forward to hear any comments from end
>         user and other
>                       developers,
>                        >
>                        > Thanks!
>                        >
>                        > [1]:
>                        >
>         https://docs.google.com/document/d/1rql_LZSwtQzBPALnk0qCLsmxcT3-zB7X7aJp-H3xxyE/edit?usp=sharing
>                        >
>
>
>
>
>

Re: Rack awareness support for Mesos

Posted by Joris Van Remoortere <jo...@mesosphere.io>.

Since your interest is in the determination of the values, as opposed to

their propagation, I would just urge that you keep in mind that we may

(as a project) not want to support this information as the current

string attributes.


Huh? Why not? If the attributes change, why can't this sub-project just
> change with those changing string attributes? Maybe some elaboration how
> this might not naturally be able to evolve is a warranted detail of
> discussion?


Sorry, I should clarify what I meant by support. By support I mean that we
may not want to promise that those values will be there (support as a
feature), and what schemas are mangled into the random strings that we
currently call attributes. I did not mean that we wouldn't allow users to
inject their own values if they wanted to. We just wouldn't control the
standard or schema as a project and therefore couldn't support it.

Any random collection of strings that has previously had no reserved
keywords is notoriously difficult to build new schemas in.
This is why we may want to instead introduce a typed structure that is
dedicated to fault domain information. This:

   - Prevents us from colliding with current users' attributes.
   - Allows us to have more control over the types (YAY) and ranges of
   values.
   - Allows us to introduce explicit structure such as dependency or
   hierarchy.

The fact that users have already encoded information in attributes is not a
reason for us to limit ourselves to that scope when better structures may
be available. This is why we shouldn't assume that the project will
*provide support for* (as opposed to allow users to) using attributes.

As your said, it is their prerogative to join the design discussion to
ensure that any formalized structure or schema we introduce is one that
they are agreeable with.



—
*Joris Van Remoortere*
Mesosphere

On Tue, Jun 14, 2016 at 6:31 PM, james <ga...@verizon.net> wrote:

> On 06/14/2016 08:14 AM, Joris Van Remoortere wrote:
>
>> On the condition of compatible with existing framework which already rely
>>> on parsing attributes for rack information.
>>>
>> There is currently nothing in Mesos that specifies the format or
>> structure for rack information in attributes.
>> The fact that operators / frameworks have decided to add this
>> information out of band is their problem to solve.
>> We don't need to be backwards compatible with something we never
>> published to begin with. This is why it's ok for us to consider adding a
>> typed form of failure domain information that is separate from the
>> typeless string attributes.
>>
>
> True. But you have to start somewhere, know that the schema and codes will
> morph over time to maintain relevance  and usefulness. In that vein, if
> folks have established interesting and useful parameters for this work,
> then it is most beneficial that those methods and codes are considered
> carefully.  AKA:: speak up now. Diversity and inclusion are keenly
> beneficial, where practical.
>
>
> Since your interest is in the determination of the values, as opposed to
>> their propagation, I would just urge that you keep in mind that we may
>> (as a project) not want to support this information as the current
>> string attributes.
>>
>
> Huh? Why not? If the attributes change, why can't this sub-project just
> change with those changing string attributes? Maybe some elaboration how
> this might not naturally be able to evolve is a warranted detail of
> discussion?
>
>
> I would venture that both 'determination of the values and propagation
> (delays)' are inherently important in a cluster of many things:: hardware,
> resources, frameworks, security codes, etc etc. The author
> and others seem to be keenly aware that a tight focus is not going to
> work, at this stage, so a broad appeal to a multitude of needs is best.
> And in fact, until some idea is proven to be useless or too difficult to
> implement, the bigger the tent, the more useful the codes that define this
> project/idea become.  Personally, I'm very excited that someone has stepped
> up in this area; hoping they keep an open mind and flexibility geared
> toward multiplicative usage, in the future. Most mature hardware folks who
> build ideas into robust systems do exactly that, to motivate a
> multiplicative usage for organizing hardware, performance and state
> metrics, and timing signals, gregariously. All of this is routine semantics
> from a hardware perspective.
>
> At some point, folks will realize that kernel configuration, testing and
> tweaks are critical to cluster performance, regardless of the codes
> running on top of the cluster. So this project could easily use cgroups
> and such for achieve robustness in many areas of need.
>
>
> Like it or not large amounts of hardware, need to have schema, planning
> and architectural robustness to keep large amounts of hardware, pristinely
> available for software efficiency to be any where near optimal deployment.
> This really becomes critical when the mix of different CPU types, GPUs and
> ram are to be considered in future deployments, regardless if you outsource
> or run your own cluster. Hardware vendors are going to want to sell their
> products to as wide of a customer base a possible and customers are going
> to demand seamless management for expansion of resources. Furthermore, as a
> consultant my experiences are that much of the future market is going to
> demand outsourced, hybrid and in-house options as a fundamental tenant of
> cluster resource adoption.
>
> hth,
> James
>
>
> *Joris Van Remoortere*
>> Mesosphere
>>
>> On Tue, Jun 14, 2016 at 3:02 PM, Du, Fan <fan.du@intel.com
>> <ma...@intel.com>> wrote:
>>
>>
>>
>>     On 2016/6/14 20:32, Joris Van Remoortere wrote:
>>
>>              #1. Stick with attributes for rack awareness
>>
>>         I don't think this is the right approach; however, there seem to
>>         be 2
>>         components to this discussion:
>>
>>         1. How the values are presented (Attributes vs. a new type-aware
>>         structure)
>>         2. How the values are determined (scripts vs. automation vs.
>>         modules)
>>
>>         It seems you are more interested in working on #2. If that's the
>>         case,
>>         please make sure that you don't assume anything about #1, as we
>> not
>>         everyone agrees that we will use the existing attributes in the
>>         future.
>>
>>
>>     On the condition of compatible with existing framework which already
>>     rely on parsing attributes for rack information.
>>
>>     Quotes from my original statements:
>>     > For compatibility with existing framework, I tend to be ok with
>> using
>>     > attributes to convey the rack information
>>
>>     By all means, no matter what internal structures to use, current
>>     behavior should be honored. btw, I'm also thinking about #1, it's
>>     too earlier to bring up the details so far before the ticket got
>>     ACCEPTED.
>>
>>     Any way, I'm always open to all kind of discussion, thanks for your
>>     comments! Joris.
>>
>>         For #2, you should focus on an API (module or script results)
>>         that will
>>         support all the different methods the community wants to use to
>>         generate
>>         this data.
>>
>>         As you mentioned, updating the values for a running agent is not
>>         straightforward. A lot of design work will need to go into how
>> these
>>         values are propagated to frameworks that have made assumptions
>> about
>>         them, and which values are allowed to change vs. not.
>>
>>         —
>>         *Joris Van Remoortere*
>>         Mesosphere
>>
>>         On Tue, Jun 14, 2016 at 10:04 AM, Aaron Carey <acarey@ilm.com
>>         <ma...@ilm.com>
>>         <mailto:acarey@ilm.com <ma...@ilm.com>>> wrote:
>>
>>              #3 would be very helpful for us. Also related:
>>
>>         https://issues.apache.org/jira/browse/MESOS-3059
>>
>>              --
>>
>>              Aaron Carey
>>              Production Engineer - Cloud Pipeline
>>              Industrial Light & Magic
>>              London
>>              020 3751 9150
>>
>>              ________________________________________
>>              From: Du, Fan [fan.du@intel.com <ma...@intel.com>
>>         <mailto:fan.du@intel.com <ma...@intel.com>>]
>>              Sent: 14 June 2016 07:24
>>              To: user@mesos.apache.org <ma...@mesos.apache.org>
>>         <mailto:user@mesos.apache.org <ma...@mesos.apache.org>>;
>>         dev@mesos.apache.org <ma...@mesos.apache.org>
>>         <mailto:dev@mesos.apache.org <ma...@mesos.apache.org>>
>>              Cc: Joris Van Remoortere; vinodkone@apache.org
>>         <ma...@apache.org>
>>              <mailto:vinodkone@apache.org <ma...@apache.org>>
>>
>>
>>              Subject: Re: Rack awareness support for Mesos
>>
>>              Hi everyone
>>
>>              Let me summarize the discussion about Rack awareness in the
>>         community so
>>              far. First thanks for all the comments, advices or
>>         challenges! :)
>>
>>              #1. Stick with attributes for rack awareness
>>
>>              For compatibility with existing framework, I tend to be ok
>>         with using
>>              attributes to convey the rack information, but with the
>>         goal to do it
>>              automatically, easy to maintain and with good attributes
>>         schema. This
>>              will bring up below question where the controversy starts.
>>
>>              #2. Scripts vs programmatic way
>>
>>              Both can be used to set attributes, I've made my arguments
>>         in the Jira
>>              and the Design doc, I'm not gonna to argue more here. But
>>         please take a
>>              look discussion at MESOS-3366 before, which allow
>>         resources/attributes
>>              discovery.
>>
>>              A module to implement *slaveAttributesDecorator* hook will
>>         works like
>>              a charm here in a static way. And need to justify
>>         attributes updating.
>>
>>              #3. Allow updating attributes
>>              Several cases need to be covered here:
>>
>>              a). Mesos runs inside VMs or container, where live
>>         migration happens, so
>>              rack information need to be updated.
>>
>>              b). LLDP packets are broadcasted by the interval 10s~30s, a
>>         vendor
>>              specific implementation, and rack information are usually
>>         stored in LLDP
>>              daemon to be queried. Worst cases(nodes fresh reboot, or
>>         daemon restart)
>>              would be: Mesos slave have to wait 10s~30s for a valid rack
>>         information
>>              before register to master. Allow updating attributes will
>>         mitigate this
>>              problem.
>>
>>              c). Framework affinity
>>
>>              Framework X prefers to run on the same nodes with another
>>         framwork Y.
>>              For example, it's desirable for Shark or Spark-SQL to
>>         reside on the
>>              *worker* node where Alluxio(former Tachyon) to gain more
>>         performance
>>              boosting as SPARK-6707 ticket message
>>         {tachyon=true;us-east-1=false}
>>
>>              If framework could advertise agent attributes in the
>>         ResourcesOffer
>>              process, awesome!
>>
>>
>>              #4. Rearrange agents in a more scalable manner, like per
>>         rack basis
>>
>>              Randomly offering agents resource to framework does not
>>         improve data
>>              locality, imagine the likelihood of a framework getting
>>         resources
>>              underneath the same rack, at the scale of +30000 nodes.
>>         Moreover time to
>>              randomly shuffle the agents also grows.
>>
>>              How about rearranging the agent in a per rack basis, and a
>>         minor change
>>              to the way how resources are allocated will fix this.
>>
>>
>>              I might not see the whole picture here, so comments are
>>         welcomed!
>>
>>
>>              On 2016/6/6 17:17, Du, Fan wrote:
>>               > Hi, Mesos folks
>>               >
>>               > I’ve been thinking about Mesos rack awareness support
>>         for a while,
>>               >
>>               > it’s a common interest for lots of data center
>>         applications to
>>              provide
>>               > data locality,
>>               >
>>               > fault tolerance and better task placement. Create
>>         MESOS-5545 to track
>>               > the story,
>>               >
>>               > and here is the initial design doc [1] to support rack
>>         awareness
>>              in Mesos.
>>               >
>>               > Looking forward to hear any comments from end user and
>> other
>>              developers,
>>               >
>>               > Thanks!
>>               >
>>               > [1]:
>>               >
>>
>> https://docs.google.com/document/d/1rql_LZSwtQzBPALnk0qCLsmxcT3-zB7X7aJp-H3xxyE/edit?usp=sharing
>>               >
>>
>>
>>
>>
>

Re: Rack awareness support for Mesos

Posted by james <ga...@verizon.net>.

On 06/14/2016 08:14 AM, Joris Van Remoortere wrote:
>> On the condition of compatible with existing framework which already rely on parsing attributes for rack information.
> There is currently nothing in Mesos that specifies the format or
> structure for rack information in attributes.
> The fact that operators / frameworks have decided to add this
> information out of band is their problem to solve.
> We don't need to be backwards compatible with something we never
> published to begin with. This is why it's ok for us to consider adding a
> typed form of failure domain information that is separate from the
> typeless string attributes.

True. But you have to start somewhere, know that the schema and codes 
will morph over time to maintain relevance  and usefulness. In that 
vein, if folks have established interesting and useful parameters for 
this work, then it is most beneficial that those methods and codes are 
considered carefully.  AKA:: speak up now. Diversity and inclusion are 
keenly beneficial, where practical.


> Since your interest is in the determination of the values, as opposed to
> their propagation, I would just urge that you keep in mind that we may
> (as a project) not want to support this information as the current
> string attributes.

Huh? Why not? If the attributes change, why can't this sub-project just 
change with those changing string attributes? Maybe some elaboration how 
this might not naturally be able to evolve is a warranted detail of 
discussion?


I would venture that both 'determination of the values and propagation 
(delays)' are inherently important in a cluster of many things:: 
hardware, resources, frameworks, security codes, etc etc. The author
and others seem to be keenly aware that a tight focus is not going to 
work, at this stage, so a broad appeal to a multitude of needs is best.
And in fact, until some idea is proven to be useless or too difficult to
implement, the bigger the tent, the more useful the codes that define 
this project/idea become.  Personally, I'm very excited that someone has 
stepped up in this area; hoping they keep an open mind and flexibility 
geared toward multiplicative usage, in the future. Most mature hardware 
folks who build ideas into robust systems do exactly that, to motivate a 
multiplicative usage for organizing hardware, performance and state 
metrics, and timing signals, gregariously. All of this is routine 
semantics from a hardware perspective.

At some point, folks will realize that kernel configuration, testing and 
tweaks are critical to cluster performance, regardless of the codes
running on top of the cluster. So this project could easily use cgroups
and such for achieve robustness in many areas of need.


Like it or not large amounts of hardware, need to have schema, planning 
and architectural robustness to keep large amounts of hardware, 
pristinely  available for software efficiency to be any where near 
optimal deployment. This really becomes critical when the mix of 
different CPU types, GPUs and ram are to be considered in future 
deployments, regardless if you outsource or run your own cluster. 
Hardware vendors are going to want to sell their products to as wide of 
a customer base a possible and customers are going to demand seamless 
management for expansion of resources. Furthermore, as a consultant my 
experiences are that much of the future market is going to demand 
outsourced, hybrid and in-house options as a fundamental tenant of 
cluster resource adoption.

hth,
James


> *Joris Van Remoortere*
> Mesosphere
>
> On Tue, Jun 14, 2016 at 3:02 PM, Du, Fan <fan.du@intel.com
> <ma...@intel.com>> wrote:
>
>
>
>     On 2016/6/14 20:32, Joris Van Remoortere wrote:
>
>              #1. Stick with attributes for rack awareness
>
>         I don't think this is the right approach; however, there seem to
>         be 2
>         components to this discussion:
>
>         1. How the values are presented (Attributes vs. a new type-aware
>         structure)
>         2. How the values are determined (scripts vs. automation vs.
>         modules)
>
>         It seems you are more interested in working on #2. If that's the
>         case,
>         please make sure that you don't assume anything about #1, as we not
>         everyone agrees that we will use the existing attributes in the
>         future.
>
>
>     On the condition of compatible with existing framework which already
>     rely on parsing attributes for rack information.
>
>     Quotes from my original statements:
>     > For compatibility with existing framework, I tend to be ok with using
>     > attributes to convey the rack information
>
>     By all means, no matter what internal structures to use, current
>     behavior should be honored. btw, I'm also thinking about #1, it's
>     too earlier to bring up the details so far before the ticket got
>     ACCEPTED.
>
>     Any way, I'm always open to all kind of discussion, thanks for your
>     comments! Joris.
>
>         For #2, you should focus on an API (module or script results)
>         that will
>         support all the different methods the community wants to use to
>         generate
>         this data.
>
>         As you mentioned, updating the values for a running agent is not
>         straightforward. A lot of design work will need to go into how these
>         values are propagated to frameworks that have made assumptions about
>         them, and which values are allowed to change vs. not.
>
>         
>         *Joris Van Remoortere*
>         Mesosphere
>
>         On Tue, Jun 14, 2016 at 10:04 AM, Aaron Carey <acarey@ilm.com
>         <ma...@ilm.com>
>         <mailto:acarey@ilm.com <ma...@ilm.com>>> wrote:
>
>              #3 would be very helpful for us. Also related:
>
>         https://issues.apache.org/jira/browse/MESOS-3059
>
>              --
>
>              Aaron Carey
>              Production Engineer - Cloud Pipeline
>              Industrial Light & Magic
>              London
>              020 3751 9150
>
>              ________________________________________
>              From: Du, Fan [fan.du@intel.com <ma...@intel.com>
>         <mailto:fan.du@intel.com <ma...@intel.com>>]
>              Sent: 14 June 2016 07:24
>              To: user@mesos.apache.org <ma...@mesos.apache.org>
>         <mailto:user@mesos.apache.org <ma...@mesos.apache.org>>;
>         dev@mesos.apache.org <ma...@mesos.apache.org>
>         <mailto:dev@mesos.apache.org <ma...@mesos.apache.org>>
>              Cc: Joris Van Remoortere; vinodkone@apache.org
>         <ma...@apache.org>
>              <mailto:vinodkone@apache.org <ma...@apache.org>>
>
>              Subject: Re: Rack awareness support for Mesos
>
>              Hi everyone
>
>              Let me summarize the discussion about Rack awareness in the
>         community so
>              far. First thanks for all the comments, advices or
>         challenges! :)
>
>              #1. Stick with attributes for rack awareness
>
>              For compatibility with existing framework, I tend to be ok
>         with using
>              attributes to convey the rack information, but with the
>         goal to do it
>              automatically, easy to maintain and with good attributes
>         schema. This
>              will bring up below question where the controversy starts.
>
>              #2. Scripts vs programmatic way
>
>              Both can be used to set attributes, I've made my arguments
>         in the Jira
>              and the Design doc, I'm not gonna to argue more here. But
>         please take a
>              look discussion at MESOS-3366 before, which allow
>         resources/attributes
>              discovery.
>
>              A module to implement *slaveAttributesDecorator* hook will
>         works like
>              a charm here in a static way. And need to justify
>         attributes updating.
>
>              #3. Allow updating attributes
>              Several cases need to be covered here:
>
>              a). Mesos runs inside VMs or container, where live
>         migration happens, so
>              rack information need to be updated.
>
>              b). LLDP packets are broadcasted by the interval 10s~30s, a
>         vendor
>              specific implementation, and rack information are usually
>         stored in LLDP
>              daemon to be queried. Worst cases(nodes fresh reboot, or
>         daemon restart)
>              would be: Mesos slave have to wait 10s~30s for a valid rack
>         information
>              before register to master. Allow updating attributes will
>         mitigate this
>              problem.
>
>              c). Framework affinity
>
>              Framework X prefers to run on the same nodes with another
>         framwork Y.
>              For example, it's desirable for Shark or Spark-SQL to
>         reside on the
>              *worker* node where Alluxio(former Tachyon) to gain more
>         performance
>              boosting as SPARK-6707 ticket message
>         {tachyon=true;us-east-1=false}
>
>              If framework could advertise agent attributes in the
>         ResourcesOffer
>              process, awesome!
>
>
>              #4. Rearrange agents in a more scalable manner, like per
>         rack basis
>
>              Randomly offering agents resource to framework does not
>         improve data
>              locality, imagine the likelihood of a framework getting
>         resources
>              underneath the same rack, at the scale of +30000 nodes.
>         Moreover time to
>              randomly shuffle the agents also grows.
>
>              How about rearranging the agent in a per rack basis, and a
>         minor change
>              to the way how resources are allocated will fix this.
>
>
>              I might not see the whole picture here, so comments are
>         welcomed!
>
>
>              On 2016/6/6 17:17, Du, Fan wrote:
>               > Hi, Mesos folks
>               >
>               > Ive been thinking about Mesos rack awareness support
>         for a while,
>               >
>               > its a common interest for lots of data center
>         applications to
>              provide
>               > data locality,
>               >
>               > fault tolerance and better task placement. Create
>         MESOS-5545 to track
>               > the story,
>               >
>               > and here is the initial design doc [1] to support rack
>         awareness
>              in Mesos.
>               >
>               > Looking forward to hear any comments from end user and other
>              developers,
>               >
>               > Thanks!
>               >
>               > [1]:
>               >
>         https://docs.google.com/document/d/1rql_LZSwtQzBPALnk0qCLsmxcT3-zB7X7aJp-H3xxyE/edit?usp=sharing
>               >
>
>
>

Re: Rack awareness support for Mesos

Posted by "Du, Fan" <fa...@intel.com>.


On 2016/6/14 21:14, Joris Van Remoortere wrote:
>> On the condition of compatible with existing framework which already rely
> on parsing attributes for rack information.
> There is currently nothing in Mesos that specifies the format or structure
> for rack information in attributes.
> The fact that operators / frameworks have decided to add this information
> out of band is their problem to solve.
> We don't need to be backwards compatible with something we never published
> to begin with. This is why it's ok for us to consider adding a typed form
> of failure domain information that is separate from the typeless string
> attributes.

hmm, sounds promising, then we can travel light!

> Since your interest is in the determination of the values, as opposed to

You are presuming my work scope, this is not true from the very beginning.

> their propagation, I would just urge that you keep in mind that we may (as
> a project) not want to support this information as the current string
> attributes.

Well understood, thanks for the explanation!
Any comments about #3. and #4?

>
>
> \u2014
> *Joris Van Remoortere*
> Mesosphere
>
> On Tue, Jun 14, 2016 at 3:02 PM, Du, Fan <fa...@intel.com> wrote:
>
>>
>>
>> On 2016/6/14 20:32, Joris Van Remoortere wrote:
>>
>>>      #1. Stick with attributes for rack awareness
>>>
>>> I don't think this is the right approach; however, there seem to be 2
>>> components to this discussion:
>>>
>>> 1. How the values are presented (Attributes vs. a new type-aware
>>> structure)
>>> 2. How the values are determined (scripts vs. automation vs. modules)
>>>
>>> It seems you are more interested in working on #2. If that's the case,
>>> please make sure that you don't assume anything about #1, as we not
>>> everyone agrees that we will use the existing attributes in the future.
>>>
>>
>> On the condition of compatible with existing framework which already rely
>> on parsing attributes for rack information.
>>
>> Quotes from my original statements:
>>> For compatibility with existing framework, I tend to be ok with using
>>> attributes to convey the rack information
>>
>> By all means, no matter what internal structures to use, current behavior
>> should be honored. btw, I'm also thinking about #1, it's too earlier to
>> bring up the details so far before the ticket got ACCEPTED.
>>
>> Any way, I'm always open to all kind of discussion, thanks for your
>> comments! Joris.
>>
>> For #2, you should focus on an API (module or script results) that will
>>> support all the different methods the community wants to use to generate
>>> this data.
>>>
>>> As you mentioned, updating the values for a running agent is not
>>> straightforward. A lot of design work will need to go into how these
>>> values are propagated to frameworks that have made assumptions about
>>> them, and which values are allowed to change vs. not.
>>>
>>> \u2014
>>> *Joris Van Remoortere*
>>> Mesosphere
>>>
>>> On Tue, Jun 14, 2016 at 10:04 AM, Aaron Carey <acarey@ilm.com
>>> <ma...@ilm.com>> wrote:
>>>
>>>      #3 would be very helpful for us. Also related:
>>>
>>>      https://issues.apache.org/jira/browse/MESOS-3059
>>>
>>>      --
>>>
>>>      Aaron Carey
>>>      Production Engineer - Cloud Pipeline
>>>      Industrial Light & Magic
>>>      London
>>>      020 3751 9150
>>>
>>>      ________________________________________
>>>      From: Du, Fan [fan.du@intel.com <ma...@intel.com>]
>>>      Sent: 14 June 2016 07:24
>>>      To: user@mesos.apache.org <ma...@mesos.apache.org>;
>>>      dev@mesos.apache.org <ma...@mesos.apache.org>
>>>      Cc: Joris Van Remoortere; vinodkone@apache.org
>>>      <ma...@apache.org>
>>>
>>>      Subject: Re: Rack awareness support for Mesos
>>>
>>>      Hi everyone
>>>
>>>      Let me summarize the discussion about Rack awareness in the community
>>> so
>>>      far. First thanks for all the comments, advices or challenges! :)
>>>
>>>      #1. Stick with attributes for rack awareness
>>>
>>>      For compatibility with existing framework, I tend to be ok with using
>>>      attributes to convey the rack information, but with the goal to do it
>>>      automatically, easy to maintain and with good attributes schema. This
>>>      will bring up below question where the controversy starts.
>>>
>>>      #2. Scripts vs programmatic way
>>>
>>>      Both can be used to set attributes, I've made my arguments in the Jira
>>>      and the Design doc, I'm not gonna to argue more here. But please take
>>> a
>>>      look discussion at MESOS-3366 before, which allow resources/attributes
>>>      discovery.
>>>
>>>      A module to implement *slaveAttributesDecorator* hook will works like
>>>      a charm here in a static way. And need to justify attributes updating.
>>>
>>>      #3. Allow updating attributes
>>>      Several cases need to be covered here:
>>>
>>>      a). Mesos runs inside VMs or container, where live migration happens,
>>> so
>>>      rack information need to be updated.
>>>
>>>      b). LLDP packets are broadcasted by the interval 10s~30s, a vendor
>>>      specific implementation, and rack information are usually stored in
>>> LLDP
>>>      daemon to be queried. Worst cases(nodes fresh reboot, or daemon
>>> restart)
>>>      would be: Mesos slave have to wait 10s~30s for a valid rack
>>> information
>>>      before register to master. Allow updating attributes will mitigate
>>> this
>>>      problem.
>>>
>>>      c). Framework affinity
>>>
>>>      Framework X prefers to run on the same nodes with another framwork Y.
>>>      For example, it's desirable for Shark or Spark-SQL to reside on the
>>>      *worker* node where Alluxio(former Tachyon) to gain more performance
>>>      boosting as SPARK-6707 ticket message {tachyon=true;us-east-1=false}
>>>
>>>      If framework could advertise agent attributes in the ResourcesOffer
>>>      process, awesome!
>>>
>>>
>>>      #4. Rearrange agents in a more scalable manner, like per rack basis
>>>
>>>      Randomly offering agents resource to framework does not improve data
>>>      locality, imagine the likelihood of a framework getting resources
>>>      underneath the same rack, at the scale of +30000 nodes. Moreover time
>>> to
>>>      randomly shuffle the agents also grows.
>>>
>>>      How about rearranging the agent in a per rack basis, and a minor
>>> change
>>>      to the way how resources are allocated will fix this.
>>>
>>>
>>>      I might not see the whole picture here, so comments are welcomed!
>>>
>>>
>>>      On 2016/6/6 17:17, Du, Fan wrote:
>>>       > Hi, Mesos folks
>>>       >
>>>       > I\u2019ve been thinking about Mesos rack awareness support for a while,
>>>       >
>>>       > it\u2019s a common interest for lots of data center applications to
>>>      provide
>>>       > data locality,
>>>       >
>>>       > fault tolerance and better task placement. Create MESOS-5545 to
>>> track
>>>       > the story,
>>>       >
>>>       > and here is the initial design doc [1] to support rack awareness
>>>      in Mesos.
>>>       >
>>>       > Looking forward to hear any comments from end user and other
>>>      developers,
>>>       >
>>>       > Thanks!
>>>       >
>>>       > [1]:
>>>       >
>>>
>>> https://docs.google.com/document/d/1rql_LZSwtQzBPALnk0qCLsmxcT3-zB7X7aJp-H3xxyE/edit?usp=sharing
>>>       >
>>>
>>>
>>>
>

Re: Rack awareness support for Mesos

Posted by "Du, Fan" <fa...@intel.com>.


On 2016/6/14 21:14, Joris Van Remoortere wrote:
>> On the condition of compatible with existing framework which already rely
> on parsing attributes for rack information.
> There is currently nothing in Mesos that specifies the format or structure
> for rack information in attributes.
> The fact that operators / frameworks have decided to add this information
> out of band is their problem to solve.
> We don't need to be backwards compatible with something we never published
> to begin with. This is why it's ok for us to consider adding a typed form
> of failure domain information that is separate from the typeless string
> attributes.

hmm, sounds promising, then we can travel light!

> Since your interest is in the determination of the values, as opposed to

You are presuming my work scope, this is not true from the very beginning.

> their propagation, I would just urge that you keep in mind that we may (as
> a project) not want to support this information as the current string
> attributes.

Well understood, thanks for the explanation!
Any comments about #3. and #4?

>
>
> \u2014
> *Joris Van Remoortere*
> Mesosphere
>
> On Tue, Jun 14, 2016 at 3:02 PM, Du, Fan <fa...@intel.com> wrote:
>
>>
>>
>> On 2016/6/14 20:32, Joris Van Remoortere wrote:
>>
>>>      #1. Stick with attributes for rack awareness
>>>
>>> I don't think this is the right approach; however, there seem to be 2
>>> components to this discussion:
>>>
>>> 1. How the values are presented (Attributes vs. a new type-aware
>>> structure)
>>> 2. How the values are determined (scripts vs. automation vs. modules)
>>>
>>> It seems you are more interested in working on #2. If that's the case,
>>> please make sure that you don't assume anything about #1, as we not
>>> everyone agrees that we will use the existing attributes in the future.
>>>
>>
>> On the condition of compatible with existing framework which already rely
>> on parsing attributes for rack information.
>>
>> Quotes from my original statements:
>>> For compatibility with existing framework, I tend to be ok with using
>>> attributes to convey the rack information
>>
>> By all means, no matter what internal structures to use, current behavior
>> should be honored. btw, I'm also thinking about #1, it's too earlier to
>> bring up the details so far before the ticket got ACCEPTED.
>>
>> Any way, I'm always open to all kind of discussion, thanks for your
>> comments! Joris.
>>
>> For #2, you should focus on an API (module or script results) that will
>>> support all the different methods the community wants to use to generate
>>> this data.
>>>
>>> As you mentioned, updating the values for a running agent is not
>>> straightforward. A lot of design work will need to go into how these
>>> values are propagated to frameworks that have made assumptions about
>>> them, and which values are allowed to change vs. not.
>>>
>>> \u2014
>>> *Joris Van Remoortere*
>>> Mesosphere
>>>
>>> On Tue, Jun 14, 2016 at 10:04 AM, Aaron Carey <acarey@ilm.com
>>> <ma...@ilm.com>> wrote:
>>>
>>>      #3 would be very helpful for us. Also related:
>>>
>>>      https://issues.apache.org/jira/browse/MESOS-3059
>>>
>>>      --
>>>
>>>      Aaron Carey
>>>      Production Engineer - Cloud Pipeline
>>>      Industrial Light & Magic
>>>      London
>>>      020 3751 9150
>>>
>>>      ________________________________________
>>>      From: Du, Fan [fan.du@intel.com <ma...@intel.com>]
>>>      Sent: 14 June 2016 07:24
>>>      To: user@mesos.apache.org <ma...@mesos.apache.org>;
>>>      dev@mesos.apache.org <ma...@mesos.apache.org>
>>>      Cc: Joris Van Remoortere; vinodkone@apache.org
>>>      <ma...@apache.org>
>>>
>>>      Subject: Re: Rack awareness support for Mesos
>>>
>>>      Hi everyone
>>>
>>>      Let me summarize the discussion about Rack awareness in the community
>>> so
>>>      far. First thanks for all the comments, advices or challenges! :)
>>>
>>>      #1. Stick with attributes for rack awareness
>>>
>>>      For compatibility with existing framework, I tend to be ok with using
>>>      attributes to convey the rack information, but with the goal to do it
>>>      automatically, easy to maintain and with good attributes schema. This
>>>      will bring up below question where the controversy starts.
>>>
>>>      #2. Scripts vs programmatic way
>>>
>>>      Both can be used to set attributes, I've made my arguments in the Jira
>>>      and the Design doc, I'm not gonna to argue more here. But please take
>>> a
>>>      look discussion at MESOS-3366 before, which allow resources/attributes
>>>      discovery.
>>>
>>>      A module to implement *slaveAttributesDecorator* hook will works like
>>>      a charm here in a static way. And need to justify attributes updating.
>>>
>>>      #3. Allow updating attributes
>>>      Several cases need to be covered here:
>>>
>>>      a). Mesos runs inside VMs or container, where live migration happens,
>>> so
>>>      rack information need to be updated.
>>>
>>>      b). LLDP packets are broadcasted by the interval 10s~30s, a vendor
>>>      specific implementation, and rack information are usually stored in
>>> LLDP
>>>      daemon to be queried. Worst cases(nodes fresh reboot, or daemon
>>> restart)
>>>      would be: Mesos slave have to wait 10s~30s for a valid rack
>>> information
>>>      before register to master. Allow updating attributes will mitigate
>>> this
>>>      problem.
>>>
>>>      c). Framework affinity
>>>
>>>      Framework X prefers to run on the same nodes with another framwork Y.
>>>      For example, it's desirable for Shark or Spark-SQL to reside on the
>>>      *worker* node where Alluxio(former Tachyon) to gain more performance
>>>      boosting as SPARK-6707 ticket message {tachyon=true;us-east-1=false}
>>>
>>>      If framework could advertise agent attributes in the ResourcesOffer
>>>      process, awesome!
>>>
>>>
>>>      #4. Rearrange agents in a more scalable manner, like per rack basis
>>>
>>>      Randomly offering agents resource to framework does not improve data
>>>      locality, imagine the likelihood of a framework getting resources
>>>      underneath the same rack, at the scale of +30000 nodes. Moreover time
>>> to
>>>      randomly shuffle the agents also grows.
>>>
>>>      How about rearranging the agent in a per rack basis, and a minor
>>> change
>>>      to the way how resources are allocated will fix this.
>>>
>>>
>>>      I might not see the whole picture here, so comments are welcomed!
>>>
>>>
>>>      On 2016/6/6 17:17, Du, Fan wrote:
>>>       > Hi, Mesos folks
>>>       >
>>>       > I\u2019ve been thinking about Mesos rack awareness support for a while,
>>>       >
>>>       > it\u2019s a common interest for lots of data center applications to
>>>      provide
>>>       > data locality,
>>>       >
>>>       > fault tolerance and better task placement. Create MESOS-5545 to
>>> track
>>>       > the story,
>>>       >
>>>       > and here is the initial design doc [1] to support rack awareness
>>>      in Mesos.
>>>       >
>>>       > Looking forward to hear any comments from end user and other
>>>      developers,
>>>       >
>>>       > Thanks!
>>>       >
>>>       > [1]:
>>>       >
>>>
>>> https://docs.google.com/document/d/1rql_LZSwtQzBPALnk0qCLsmxcT3-zB7X7aJp-H3xxyE/edit?usp=sharing
>>>       >
>>>
>>>
>>>
>

Re: Rack awareness support for Mesos

Posted by Joris Van Remoortere <jo...@mesosphere.io>.

> On the condition of compatible with existing framework which already rely
on parsing attributes for rack information.
There is currently nothing in Mesos that specifies the format or structure
for rack information in attributes.
The fact that operators / frameworks have decided to add this information
out of band is their problem to solve.
We don't need to be backwards compatible with something we never published
to begin with. This is why it's ok for us to consider adding a typed form
of failure domain information that is separate from the typeless string
attributes.

Since your interest is in the determination of the values, as opposed to
their propagation, I would just urge that you keep in mind that we may (as
a project) not want to support this information as the current string
attributes.



—
*Joris Van Remoortere*
Mesosphere

On Tue, Jun 14, 2016 at 3:02 PM, Du, Fan <fa...@intel.com> wrote:

>
>
> On 2016/6/14 20:32, Joris Van Remoortere wrote:
>
>>     #1. Stick with attributes for rack awareness
>>
>> I don't think this is the right approach; however, there seem to be 2
>> components to this discussion:
>>
>> 1. How the values are presented (Attributes vs. a new type-aware
>> structure)
>> 2. How the values are determined (scripts vs. automation vs. modules)
>>
>> It seems you are more interested in working on #2. If that's the case,
>> please make sure that you don't assume anything about #1, as we not
>> everyone agrees that we will use the existing attributes in the future.
>>
>
> On the condition of compatible with existing framework which already rely
> on parsing attributes for rack information.
>
> Quotes from my original statements:
> > For compatibility with existing framework, I tend to be ok with using
> > attributes to convey the rack information
>
> By all means, no matter what internal structures to use, current behavior
> should be honored. btw, I'm also thinking about #1, it's too earlier to
> bring up the details so far before the ticket got ACCEPTED.
>
> Any way, I'm always open to all kind of discussion, thanks for your
> comments! Joris.
>
> For #2, you should focus on an API (module or script results) that will
>> support all the different methods the community wants to use to generate
>> this data.
>>
>> As you mentioned, updating the values for a running agent is not
>> straightforward. A lot of design work will need to go into how these
>> values are propagated to frameworks that have made assumptions about
>> them, and which values are allowed to change vs. not.
>>
>> —
>> *Joris Van Remoortere*
>> Mesosphere
>>
>> On Tue, Jun 14, 2016 at 10:04 AM, Aaron Carey <acarey@ilm.com
>> <ma...@ilm.com>> wrote:
>>
>>     #3 would be very helpful for us. Also related:
>>
>>     https://issues.apache.org/jira/browse/MESOS-3059
>>
>>     --
>>
>>     Aaron Carey
>>     Production Engineer - Cloud Pipeline
>>     Industrial Light & Magic
>>     London
>>     020 3751 9150
>>
>>     ________________________________________
>>     From: Du, Fan [fan.du@intel.com <ma...@intel.com>]
>>     Sent: 14 June 2016 07:24
>>     To: user@mesos.apache.org <ma...@mesos.apache.org>;
>>     dev@mesos.apache.org <ma...@mesos.apache.org>
>>     Cc: Joris Van Remoortere; vinodkone@apache.org
>>     <ma...@apache.org>
>>
>>     Subject: Re: Rack awareness support for Mesos
>>
>>     Hi everyone
>>
>>     Let me summarize the discussion about Rack awareness in the community
>> so
>>     far. First thanks for all the comments, advices or challenges! :)
>>
>>     #1. Stick with attributes for rack awareness
>>
>>     For compatibility with existing framework, I tend to be ok with using
>>     attributes to convey the rack information, but with the goal to do it
>>     automatically, easy to maintain and with good attributes schema. This
>>     will bring up below question where the controversy starts.
>>
>>     #2. Scripts vs programmatic way
>>
>>     Both can be used to set attributes, I've made my arguments in the Jira
>>     and the Design doc, I'm not gonna to argue more here. But please take
>> a
>>     look discussion at MESOS-3366 before, which allow resources/attributes
>>     discovery.
>>
>>     A module to implement *slaveAttributesDecorator* hook will works like
>>     a charm here in a static way. And need to justify attributes updating.
>>
>>     #3. Allow updating attributes
>>     Several cases need to be covered here:
>>
>>     a). Mesos runs inside VMs or container, where live migration happens,
>> so
>>     rack information need to be updated.
>>
>>     b). LLDP packets are broadcasted by the interval 10s~30s, a vendor
>>     specific implementation, and rack information are usually stored in
>> LLDP
>>     daemon to be queried. Worst cases(nodes fresh reboot, or daemon
>> restart)
>>     would be: Mesos slave have to wait 10s~30s for a valid rack
>> information
>>     before register to master. Allow updating attributes will mitigate
>> this
>>     problem.
>>
>>     c). Framework affinity
>>
>>     Framework X prefers to run on the same nodes with another framwork Y.
>>     For example, it's desirable for Shark or Spark-SQL to reside on the
>>     *worker* node where Alluxio(former Tachyon) to gain more performance
>>     boosting as SPARK-6707 ticket message {tachyon=true;us-east-1=false}
>>
>>     If framework could advertise agent attributes in the ResourcesOffer
>>     process, awesome!
>>
>>
>>     #4. Rearrange agents in a more scalable manner, like per rack basis
>>
>>     Randomly offering agents resource to framework does not improve data
>>     locality, imagine the likelihood of a framework getting resources
>>     underneath the same rack, at the scale of +30000 nodes. Moreover time
>> to
>>     randomly shuffle the agents also grows.
>>
>>     How about rearranging the agent in a per rack basis, and a minor
>> change
>>     to the way how resources are allocated will fix this.
>>
>>
>>     I might not see the whole picture here, so comments are welcomed!
>>
>>
>>     On 2016/6/6 17:17, Du, Fan wrote:
>>      > Hi, Mesos folks
>>      >
>>      > I’ve been thinking about Mesos rack awareness support for a while,
>>      >
>>      > it’s a common interest for lots of data center applications to
>>     provide
>>      > data locality,
>>      >
>>      > fault tolerance and better task placement. Create MESOS-5545 to
>> track
>>      > the story,
>>      >
>>      > and here is the initial design doc [1] to support rack awareness
>>     in Mesos.
>>      >
>>      > Looking forward to hear any comments from end user and other
>>     developers,
>>      >
>>      > Thanks!
>>      >
>>      > [1]:
>>      >
>>
>> https://docs.google.com/document/d/1rql_LZSwtQzBPALnk0qCLsmxcT3-zB7X7aJp-H3xxyE/edit?usp=sharing
>>      >
>>
>>
>>

Re: Rack awareness support for Mesos

Posted by Joris Van Remoortere <jo...@mesosphere.io>.

> On the condition of compatible with existing framework which already rely
on parsing attributes for rack information.
There is currently nothing in Mesos that specifies the format or structure
for rack information in attributes.
The fact that operators / frameworks have decided to add this information
out of band is their problem to solve.
We don't need to be backwards compatible with something we never published
to begin with. This is why it's ok for us to consider adding a typed form
of failure domain information that is separate from the typeless string
attributes.

Since your interest is in the determination of the values, as opposed to
their propagation, I would just urge that you keep in mind that we may (as
a project) not want to support this information as the current string
attributes.



—
*Joris Van Remoortere*
Mesosphere

On Tue, Jun 14, 2016 at 3:02 PM, Du, Fan <fa...@intel.com> wrote:

>
>
> On 2016/6/14 20:32, Joris Van Remoortere wrote:
>
>>     #1. Stick with attributes for rack awareness
>>
>> I don't think this is the right approach; however, there seem to be 2
>> components to this discussion:
>>
>> 1. How the values are presented (Attributes vs. a new type-aware
>> structure)
>> 2. How the values are determined (scripts vs. automation vs. modules)
>>
>> It seems you are more interested in working on #2. If that's the case,
>> please make sure that you don't assume anything about #1, as we not
>> everyone agrees that we will use the existing attributes in the future.
>>
>
> On the condition of compatible with existing framework which already rely
> on parsing attributes for rack information.
>
> Quotes from my original statements:
> > For compatibility with existing framework, I tend to be ok with using
> > attributes to convey the rack information
>
> By all means, no matter what internal structures to use, current behavior
> should be honored. btw, I'm also thinking about #1, it's too earlier to
> bring up the details so far before the ticket got ACCEPTED.
>
> Any way, I'm always open to all kind of discussion, thanks for your
> comments! Joris.
>
> For #2, you should focus on an API (module or script results) that will
>> support all the different methods the community wants to use to generate
>> this data.
>>
>> As you mentioned, updating the values for a running agent is not
>> straightforward. A lot of design work will need to go into how these
>> values are propagated to frameworks that have made assumptions about
>> them, and which values are allowed to change vs. not.
>>
>> —
>> *Joris Van Remoortere*
>> Mesosphere
>>
>> On Tue, Jun 14, 2016 at 10:04 AM, Aaron Carey <acarey@ilm.com
>> <ma...@ilm.com>> wrote:
>>
>>     #3 would be very helpful for us. Also related:
>>
>>     https://issues.apache.org/jira/browse/MESOS-3059
>>
>>     --
>>
>>     Aaron Carey
>>     Production Engineer - Cloud Pipeline
>>     Industrial Light & Magic
>>     London
>>     020 3751 9150
>>
>>     ________________________________________
>>     From: Du, Fan [fan.du@intel.com <ma...@intel.com>]
>>     Sent: 14 June 2016 07:24
>>     To: user@mesos.apache.org <ma...@mesos.apache.org>;
>>     dev@mesos.apache.org <ma...@mesos.apache.org>
>>     Cc: Joris Van Remoortere; vinodkone@apache.org
>>     <ma...@apache.org>
>>
>>     Subject: Re: Rack awareness support for Mesos
>>
>>     Hi everyone
>>
>>     Let me summarize the discussion about Rack awareness in the community
>> so
>>     far. First thanks for all the comments, advices or challenges! :)
>>
>>     #1. Stick with attributes for rack awareness
>>
>>     For compatibility with existing framework, I tend to be ok with using
>>     attributes to convey the rack information, but with the goal to do it
>>     automatically, easy to maintain and with good attributes schema. This
>>     will bring up below question where the controversy starts.
>>
>>     #2. Scripts vs programmatic way
>>
>>     Both can be used to set attributes, I've made my arguments in the Jira
>>     and the Design doc, I'm not gonna to argue more here. But please take
>> a
>>     look discussion at MESOS-3366 before, which allow resources/attributes
>>     discovery.
>>
>>     A module to implement *slaveAttributesDecorator* hook will works like
>>     a charm here in a static way. And need to justify attributes updating.
>>
>>     #3. Allow updating attributes
>>     Several cases need to be covered here:
>>
>>     a). Mesos runs inside VMs or container, where live migration happens,
>> so
>>     rack information need to be updated.
>>
>>     b). LLDP packets are broadcasted by the interval 10s~30s, a vendor
>>     specific implementation, and rack information are usually stored in
>> LLDP
>>     daemon to be queried. Worst cases(nodes fresh reboot, or daemon
>> restart)
>>     would be: Mesos slave have to wait 10s~30s for a valid rack
>> information
>>     before register to master. Allow updating attributes will mitigate
>> this
>>     problem.
>>
>>     c). Framework affinity
>>
>>     Framework X prefers to run on the same nodes with another framwork Y.
>>     For example, it's desirable for Shark or Spark-SQL to reside on the
>>     *worker* node where Alluxio(former Tachyon) to gain more performance
>>     boosting as SPARK-6707 ticket message {tachyon=true;us-east-1=false}
>>
>>     If framework could advertise agent attributes in the ResourcesOffer
>>     process, awesome!
>>
>>
>>     #4. Rearrange agents in a more scalable manner, like per rack basis
>>
>>     Randomly offering agents resource to framework does not improve data
>>     locality, imagine the likelihood of a framework getting resources
>>     underneath the same rack, at the scale of +30000 nodes. Moreover time
>> to
>>     randomly shuffle the agents also grows.
>>
>>     How about rearranging the agent in a per rack basis, and a minor
>> change
>>     to the way how resources are allocated will fix this.
>>
>>
>>     I might not see the whole picture here, so comments are welcomed!
>>
>>
>>     On 2016/6/6 17:17, Du, Fan wrote:
>>      > Hi, Mesos folks
>>      >
>>      > I’ve been thinking about Mesos rack awareness support for a while,
>>      >
>>      > it’s a common interest for lots of data center applications to
>>     provide
>>      > data locality,
>>      >
>>      > fault tolerance and better task placement. Create MESOS-5545 to
>> track
>>      > the story,
>>      >
>>      > and here is the initial design doc [1] to support rack awareness
>>     in Mesos.
>>      >
>>      > Looking forward to hear any comments from end user and other
>>     developers,
>>      >
>>      > Thanks!
>>      >
>>      > [1]:
>>      >
>>
>> https://docs.google.com/document/d/1rql_LZSwtQzBPALnk0qCLsmxcT3-zB7X7aJp-H3xxyE/edit?usp=sharing
>>      >
>>
>>
>>

Re: Rack awareness support for Mesos

Posted by "Du, Fan" <fa...@intel.com>.


On 2016/6/14 20:32, Joris Van Remoortere wrote:
>     #1. Stick with attributes for rack awareness
>
> I don't think this is the right approach; however, there seem to be 2
> components to this discussion:
>
> 1. How the values are presented (Attributes vs. a new type-aware structure)
> 2. How the values are determined (scripts vs. automation vs. modules)
>
> It seems you are more interested in working on #2. If that's the case,
> please make sure that you don't assume anything about #1, as we not
> everyone agrees that we will use the existing attributes in the future.

On the condition of compatible with existing framework which already 
rely on parsing attributes for rack information.

Quotes from my original statements:
 > For compatibility with existing framework, I tend to be ok with using
 > attributes to convey the rack information

By all means, no matter what internal structures to use, current 
behavior should be honored. btw, I'm also thinking about #1, it's too 
earlier to bring up the details so far before the ticket got ACCEPTED.

Any way, I'm always open to all kind of discussion, thanks for your 
comments! Joris.

> For #2, you should focus on an API (module or script results) that will
> support all the different methods the community wants to use to generate
> this data.
>
> As you mentioned, updating the values for a running agent is not
> straightforward. A lot of design work will need to go into how these
> values are propagated to frameworks that have made assumptions about
> them, and which values are allowed to change vs. not.
>
> \u2014
> *Joris Van Remoortere*
> Mesosphere
>
> On Tue, Jun 14, 2016 at 10:04 AM, Aaron Carey <acarey@ilm.com
> <ma...@ilm.com>> wrote:
>
>     #3 would be very helpful for us. Also related:
>
>     https://issues.apache.org/jira/browse/MESOS-3059
>
>     --
>
>     Aaron Carey
>     Production Engineer - Cloud Pipeline
>     Industrial Light & Magic
>     London
>     020 3751 9150
>
>     ________________________________________
>     From: Du, Fan [fan.du@intel.com <ma...@intel.com>]
>     Sent: 14 June 2016 07:24
>     To: user@mesos.apache.org <ma...@mesos.apache.org>;
>     dev@mesos.apache.org <ma...@mesos.apache.org>
>     Cc: Joris Van Remoortere; vinodkone@apache.org
>     <ma...@apache.org>
>     Subject: Re: Rack awareness support for Mesos
>
>     Hi everyone
>
>     Let me summarize the discussion about Rack awareness in the community so
>     far. First thanks for all the comments, advices or challenges! :)
>
>     #1. Stick with attributes for rack awareness
>
>     For compatibility with existing framework, I tend to be ok with using
>     attributes to convey the rack information, but with the goal to do it
>     automatically, easy to maintain and with good attributes schema. This
>     will bring up below question where the controversy starts.
>
>     #2. Scripts vs programmatic way
>
>     Both can be used to set attributes, I've made my arguments in the Jira
>     and the Design doc, I'm not gonna to argue more here. But please take a
>     look discussion at MESOS-3366 before, which allow resources/attributes
>     discovery.
>
>     A module to implement *slaveAttributesDecorator* hook will works like
>     a charm here in a static way. And need to justify attributes updating.
>
>     #3. Allow updating attributes
>     Several cases need to be covered here:
>
>     a). Mesos runs inside VMs or container, where live migration happens, so
>     rack information need to be updated.
>
>     b). LLDP packets are broadcasted by the interval 10s~30s, a vendor
>     specific implementation, and rack information are usually stored in LLDP
>     daemon to be queried. Worst cases(nodes fresh reboot, or daemon restart)
>     would be: Mesos slave have to wait 10s~30s for a valid rack information
>     before register to master. Allow updating attributes will mitigate this
>     problem.
>
>     c). Framework affinity
>
>     Framework X prefers to run on the same nodes with another framwork Y.
>     For example, it's desirable for Shark or Spark-SQL to reside on the
>     *worker* node where Alluxio(former Tachyon) to gain more performance
>     boosting as SPARK-6707 ticket message {tachyon=true;us-east-1=false}
>
>     If framework could advertise agent attributes in the ResourcesOffer
>     process, awesome!
>
>
>     #4. Rearrange agents in a more scalable manner, like per rack basis
>
>     Randomly offering agents resource to framework does not improve data
>     locality, imagine the likelihood of a framework getting resources
>     underneath the same rack, at the scale of +30000 nodes. Moreover time to
>     randomly shuffle the agents also grows.
>
>     How about rearranging the agent in a per rack basis, and a minor change
>     to the way how resources are allocated will fix this.
>
>
>     I might not see the whole picture here, so comments are welcomed!
>
>
>     On 2016/6/6 17:17, Du, Fan wrote:
>      > Hi, Mesos folks
>      >
>      > I\u2019ve been thinking about Mesos rack awareness support for a while,
>      >
>      > it\u2019s a common interest for lots of data center applications to
>     provide
>      > data locality,
>      >
>      > fault tolerance and better task placement. Create MESOS-5545 to track
>      > the story,
>      >
>      > and here is the initial design doc [1] to support rack awareness
>     in Mesos.
>      >
>      > Looking forward to hear any comments from end user and other
>     developers,
>      >
>      > Thanks!
>      >
>      > [1]:
>      >
>     https://docs.google.com/document/d/1rql_LZSwtQzBPALnk0qCLsmxcT3-zB7X7aJp-H3xxyE/edit?usp=sharing
>      >
>
>

Re: Rack awareness support for Mesos

Posted by "Du, Fan" <fa...@intel.com>.


On 2016/6/14 20:32, Joris Van Remoortere wrote:
>     #1. Stick with attributes for rack awareness
>
> I don't think this is the right approach; however, there seem to be 2
> components to this discussion:
>
> 1. How the values are presented (Attributes vs. a new type-aware structure)
> 2. How the values are determined (scripts vs. automation vs. modules)
>
> It seems you are more interested in working on #2. If that's the case,
> please make sure that you don't assume anything about #1, as we not
> everyone agrees that we will use the existing attributes in the future.

On the condition of compatible with existing framework which already 
rely on parsing attributes for rack information.

Quotes from my original statements:
 > For compatibility with existing framework, I tend to be ok with using
 > attributes to convey the rack information

By all means, no matter what internal structures to use, current 
behavior should be honored. btw, I'm also thinking about #1, it's too 
earlier to bring up the details so far before the ticket got ACCEPTED.

Any way, I'm always open to all kind of discussion, thanks for your 
comments! Joris.

> For #2, you should focus on an API (module or script results) that will
> support all the different methods the community wants to use to generate
> this data.
>
> As you mentioned, updating the values for a running agent is not
> straightforward. A lot of design work will need to go into how these
> values are propagated to frameworks that have made assumptions about
> them, and which values are allowed to change vs. not.
>
> \u2014
> *Joris Van Remoortere*
> Mesosphere
>
> On Tue, Jun 14, 2016 at 10:04 AM, Aaron Carey <acarey@ilm.com
> <ma...@ilm.com>> wrote:
>
>     #3 would be very helpful for us. Also related:
>
>     https://issues.apache.org/jira/browse/MESOS-3059
>
>     --
>
>     Aaron Carey
>     Production Engineer - Cloud Pipeline
>     Industrial Light & Magic
>     London
>     020 3751 9150
>
>     ________________________________________
>     From: Du, Fan [fan.du@intel.com <ma...@intel.com>]
>     Sent: 14 June 2016 07:24
>     To: user@mesos.apache.org <ma...@mesos.apache.org>;
>     dev@mesos.apache.org <ma...@mesos.apache.org>
>     Cc: Joris Van Remoortere; vinodkone@apache.org
>     <ma...@apache.org>
>     Subject: Re: Rack awareness support for Mesos
>
>     Hi everyone
>
>     Let me summarize the discussion about Rack awareness in the community so
>     far. First thanks for all the comments, advices or challenges! :)
>
>     #1. Stick with attributes for rack awareness
>
>     For compatibility with existing framework, I tend to be ok with using
>     attributes to convey the rack information, but with the goal to do it
>     automatically, easy to maintain and with good attributes schema. This
>     will bring up below question where the controversy starts.
>
>     #2. Scripts vs programmatic way
>
>     Both can be used to set attributes, I've made my arguments in the Jira
>     and the Design doc, I'm not gonna to argue more here. But please take a
>     look discussion at MESOS-3366 before, which allow resources/attributes
>     discovery.
>
>     A module to implement *slaveAttributesDecorator* hook will works like
>     a charm here in a static way. And need to justify attributes updating.
>
>     #3. Allow updating attributes
>     Several cases need to be covered here:
>
>     a). Mesos runs inside VMs or container, where live migration happens, so
>     rack information need to be updated.
>
>     b). LLDP packets are broadcasted by the interval 10s~30s, a vendor
>     specific implementation, and rack information are usually stored in LLDP
>     daemon to be queried. Worst cases(nodes fresh reboot, or daemon restart)
>     would be: Mesos slave have to wait 10s~30s for a valid rack information
>     before register to master. Allow updating attributes will mitigate this
>     problem.
>
>     c). Framework affinity
>
>     Framework X prefers to run on the same nodes with another framwork Y.
>     For example, it's desirable for Shark or Spark-SQL to reside on the
>     *worker* node where Alluxio(former Tachyon) to gain more performance
>     boosting as SPARK-6707 ticket message {tachyon=true;us-east-1=false}
>
>     If framework could advertise agent attributes in the ResourcesOffer
>     process, awesome!
>
>
>     #4. Rearrange agents in a more scalable manner, like per rack basis
>
>     Randomly offering agents resource to framework does not improve data
>     locality, imagine the likelihood of a framework getting resources
>     underneath the same rack, at the scale of +30000 nodes. Moreover time to
>     randomly shuffle the agents also grows.
>
>     How about rearranging the agent in a per rack basis, and a minor change
>     to the way how resources are allocated will fix this.
>
>
>     I might not see the whole picture here, so comments are welcomed!
>
>
>     On 2016/6/6 17:17, Du, Fan wrote:
>      > Hi, Mesos folks
>      >
>      > I\u2019ve been thinking about Mesos rack awareness support for a while,
>      >
>      > it\u2019s a common interest for lots of data center applications to
>     provide
>      > data locality,
>      >
>      > fault tolerance and better task placement. Create MESOS-5545 to track
>      > the story,
>      >
>      > and here is the initial design doc [1] to support rack awareness
>     in Mesos.
>      >
>      > Looking forward to hear any comments from end user and other
>     developers,
>      >
>      > Thanks!
>      >
>      > [1]:
>      >
>     https://docs.google.com/document/d/1rql_LZSwtQzBPALnk0qCLsmxcT3-zB7X7aJp-H3xxyE/edit?usp=sharing
>      >
>
>

Re: Rack awareness support for Mesos

Posted by Joris Van Remoortere <jo...@mesosphere.io>.

>
> #1. Stick with attributes for rack awareness

I don't think this is the right approach; however, there seem to be 2
components to this discussion:

1. How the values are presented (Attributes vs. a new type-aware structure)
2. How the values are determined (scripts vs. automation vs. modules)

It seems you are more interested in working on #2. If that's the case,
please make sure that you don't assume anything about #1, as we not
everyone agrees that we will use the existing attributes in the future.

For #2, you should focus on an API (module or script results) that will
support all the different methods the community wants to use to generate
this data.

As you mentioned, updating the values for a running agent is not
straightforward. A lot of design work will need to go into how these values
are propagated to frameworks that have made assumptions about them, and
which values are allowed to change vs. not.

—
*Joris Van Remoortere*
Mesosphere

On Tue, Jun 14, 2016 at 10:04 AM, Aaron Carey <ac...@ilm.com> wrote:

> #3 would be very helpful for us. Also related:
>
> https://issues.apache.org/jira/browse/MESOS-3059
>
> --
>
> Aaron Carey
> Production Engineer - Cloud Pipeline
> Industrial Light & Magic
> London
> 020 3751 9150
>
> ________________________________________
> From: Du, Fan [fan.du@intel.com]
> Sent: 14 June 2016 07:24
> To: user@mesos.apache.org; dev@mesos.apache.org
> Cc: Joris Van Remoortere; vinodkone@apache.org
> Subject: Re: Rack awareness support for Mesos
>
> Hi everyone
>
> Let me summarize the discussion about Rack awareness in the community so
> far. First thanks for all the comments, advices or challenges! :)
>
> #1. Stick with attributes for rack awareness
>
> For compatibility with existing framework, I tend to be ok with using
> attributes to convey the rack information, but with the goal to do it
> automatically, easy to maintain and with good attributes schema. This
> will bring up below question where the controversy starts.
>
> #2. Scripts vs programmatic way
>
> Both can be used to set attributes, I've made my arguments in the Jira
> and the Design doc, I'm not gonna to argue more here. But please take a
> look discussion at MESOS-3366 before, which allow resources/attributes
> discovery.
>
> A module to implement *slaveAttributesDecorator* hook will works like
> a charm here in a static way. And need to justify attributes updating.
>
> #3. Allow updating attributes
> Several cases need to be covered here:
>
> a). Mesos runs inside VMs or container, where live migration happens, so
> rack information need to be updated.
>
> b). LLDP packets are broadcasted by the interval 10s~30s, a vendor
> specific implementation, and rack information are usually stored in LLDP
> daemon to be queried. Worst cases(nodes fresh reboot, or daemon restart)
> would be: Mesos slave have to wait 10s~30s for a valid rack information
> before register to master. Allow updating attributes will mitigate this
> problem.
>
> c). Framework affinity
>
> Framework X prefers to run on the same nodes with another framwork Y.
> For example, it's desirable for Shark or Spark-SQL to reside on the
> *worker* node where Alluxio(former Tachyon) to gain more performance
> boosting as SPARK-6707 ticket message {tachyon=true;us-east-1=false}
>
> If framework could advertise agent attributes in the ResourcesOffer
> process, awesome!
>
>
> #4. Rearrange agents in a more scalable manner, like per rack basis
>
> Randomly offering agents resource to framework does not improve data
> locality, imagine the likelihood of a framework getting resources
> underneath the same rack, at the scale of +30000 nodes. Moreover time to
> randomly shuffle the agents also grows.
>
> How about rearranging the agent in a per rack basis, and a minor change
> to the way how resources are allocated will fix this.
>
>
> I might not see the whole picture here, so comments are welcomed!
>
>
> On 2016/6/6 17:17, Du, Fan wrote:
> > Hi, Mesos folks
> >
> > I’ve been thinking about Mesos rack awareness support for a while,
> >
> > it’s a common interest for lots of data center applications to provide
> > data locality,
> >
> > fault tolerance and better task placement. Create MESOS-5545 to track
> > the story,
> >
> > and here is the initial design doc [1] to support rack awareness in
> Mesos.
> >
> > Looking forward to hear any comments from end user and other developers,
> >
> > Thanks!
> >
> > [1]:
> >
> https://docs.google.com/document/d/1rql_LZSwtQzBPALnk0qCLsmxcT3-zB7X7aJp-H3xxyE/edit?usp=sharing
> >
>

Re: Rack awareness support for Mesos

Posted by Joris Van Remoortere <jo...@mesosphere.io>.

>
> #1. Stick with attributes for rack awareness

I don't think this is the right approach; however, there seem to be 2
components to this discussion:

1. How the values are presented (Attributes vs. a new type-aware structure)
2. How the values are determined (scripts vs. automation vs. modules)

It seems you are more interested in working on #2. If that's the case,
please make sure that you don't assume anything about #1, as we not
everyone agrees that we will use the existing attributes in the future.

For #2, you should focus on an API (module or script results) that will
support all the different methods the community wants to use to generate
this data.

As you mentioned, updating the values for a running agent is not
straightforward. A lot of design work will need to go into how these values
are propagated to frameworks that have made assumptions about them, and
which values are allowed to change vs. not.

—
*Joris Van Remoortere*
Mesosphere

On Tue, Jun 14, 2016 at 10:04 AM, Aaron Carey <ac...@ilm.com> wrote:

> #3 would be very helpful for us. Also related:
>
> https://issues.apache.org/jira/browse/MESOS-3059
>
> --
>
> Aaron Carey
> Production Engineer - Cloud Pipeline
> Industrial Light & Magic
> London
> 020 3751 9150
>
> ________________________________________
> From: Du, Fan [fan.du@intel.com]
> Sent: 14 June 2016 07:24
> To: user@mesos.apache.org; dev@mesos.apache.org
> Cc: Joris Van Remoortere; vinodkone@apache.org
> Subject: Re: Rack awareness support for Mesos
>
> Hi everyone
>
> Let me summarize the discussion about Rack awareness in the community so
> far. First thanks for all the comments, advices or challenges! :)
>
> #1. Stick with attributes for rack awareness
>
> For compatibility with existing framework, I tend to be ok with using
> attributes to convey the rack information, but with the goal to do it
> automatically, easy to maintain and with good attributes schema. This
> will bring up below question where the controversy starts.
>
> #2. Scripts vs programmatic way
>
> Both can be used to set attributes, I've made my arguments in the Jira
> and the Design doc, I'm not gonna to argue more here. But please take a
> look discussion at MESOS-3366 before, which allow resources/attributes
> discovery.
>
> A module to implement *slaveAttributesDecorator* hook will works like
> a charm here in a static way. And need to justify attributes updating.
>
> #3. Allow updating attributes
> Several cases need to be covered here:
>
> a). Mesos runs inside VMs or container, where live migration happens, so
> rack information need to be updated.
>
> b). LLDP packets are broadcasted by the interval 10s~30s, a vendor
> specific implementation, and rack information are usually stored in LLDP
> daemon to be queried. Worst cases(nodes fresh reboot, or daemon restart)
> would be: Mesos slave have to wait 10s~30s for a valid rack information
> before register to master. Allow updating attributes will mitigate this
> problem.
>
> c). Framework affinity
>
> Framework X prefers to run on the same nodes with another framwork Y.
> For example, it's desirable for Shark or Spark-SQL to reside on the
> *worker* node where Alluxio(former Tachyon) to gain more performance
> boosting as SPARK-6707 ticket message {tachyon=true;us-east-1=false}
>
> If framework could advertise agent attributes in the ResourcesOffer
> process, awesome!
>
>
> #4. Rearrange agents in a more scalable manner, like per rack basis
>
> Randomly offering agents resource to framework does not improve data
> locality, imagine the likelihood of a framework getting resources
> underneath the same rack, at the scale of +30000 nodes. Moreover time to
> randomly shuffle the agents also grows.
>
> How about rearranging the agent in a per rack basis, and a minor change
> to the way how resources are allocated will fix this.
>
>
> I might not see the whole picture here, so comments are welcomed!
>
>
> On 2016/6/6 17:17, Du, Fan wrote:
> > Hi, Mesos folks
> >
> > I’ve been thinking about Mesos rack awareness support for a while,
> >
> > it’s a common interest for lots of data center applications to provide
> > data locality,
> >
> > fault tolerance and better task placement. Create MESOS-5545 to track
> > the story,
> >
> > and here is the initial design doc [1] to support rack awareness in
> Mesos.
> >
> > Looking forward to hear any comments from end user and other developers,
> >
> > Thanks!
> >
> > [1]:
> >
> https://docs.google.com/document/d/1rql_LZSwtQzBPALnk0qCLsmxcT3-zB7X7aJp-H3xxyE/edit?usp=sharing
> >
>

RE: Rack awareness support for Mesos

Posted by Aaron Carey <ac...@ilm.com>.

#3 would be very helpful for us. Also related:

https://issues.apache.org/jira/browse/MESOS-3059

--

Aaron Carey
Production Engineer - Cloud Pipeline
Industrial Light & Magic
London
020 3751 9150

________________________________________
From: Du, Fan [fan.du@intel.com]
Sent: 14 June 2016 07:24
To: user@mesos.apache.org; dev@mesos.apache.org
Cc: Joris Van Remoortere; vinodkone@apache.org
Subject: Re: Rack awareness support for Mesos

Hi everyone

Let me summarize the discussion about Rack awareness in the community so
far. First thanks for all the comments, advices or challenges! :)

#1. Stick with attributes for rack awareness

For compatibility with existing framework, I tend to be ok with using
attributes to convey the rack information, but with the goal to do it
automatically, easy to maintain and with good attributes schema. This
will bring up below question where the controversy starts.

#2. Scripts vs programmatic way

Both can be used to set attributes, I've made my arguments in the Jira
and the Design doc, I'm not gonna to argue more here. But please take a
look discussion at MESOS-3366 before, which allow resources/attributes
discovery.

A module to implement *slaveAttributesDecorator* hook will works like
a charm here in a static way. And need to justify attributes updating.

#3. Allow updating attributes
Several cases need to be covered here:

a). Mesos runs inside VMs or container, where live migration happens, so
rack information need to be updated.

b). LLDP packets are broadcasted by the interval 10s~30s, a vendor
specific implementation, and rack information are usually stored in LLDP
daemon to be queried. Worst cases(nodes fresh reboot, or daemon restart)
would be: Mesos slave have to wait 10s~30s for a valid rack information
before register to master. Allow updating attributes will mitigate this
problem.

c). Framework affinity

Framework X prefers to run on the same nodes with another framwork Y.
For example, it's desirable for Shark or Spark-SQL to reside on the
*worker* node where Alluxio(former Tachyon) to gain more performance
boosting as SPARK-6707 ticket message {tachyon=true;us-east-1=false}

If framework could advertise agent attributes in the ResourcesOffer
process, awesome!

#4. Rearrange agents in a more scalable manner, like per rack basis

Randomly offering agents resource to framework does not improve data
locality, imagine the likelihood of a framework getting resources
underneath the same rack, at the scale of +30000 nodes. Moreover time to
randomly shuffle the agents also grows.

How about rearranging the agent in a per rack basis, and a minor change
to the way how resources are allocated will fix this.

I might not see the whole picture here, so comments are welcomed!

On 2016/6/6 17:17, Du, Fan wrote:
> Hi, Mesos folks
>
> I’ve been thinking about Mesos rack awareness support for a while,
>
> it’s a common interest for lots of data center applications to provide
> data locality,
>
> fault tolerance and better task placement. Create MESOS-5545 to track
> the story,
>
> and here is the initial design doc [1] to support rack awareness in Mesos.
>
> Looking forward to hear any comments from end user and other developers,
>
> Thanks!
>
> [1]:
> https://docs.google.com/document/d/1rql_LZSwtQzBPALnk0qCLsmxcT3-zB7X7aJp-H3xxyE/edit?usp=sharing
>

Re: Rack awareness support for Mesos

Posted by "Du, Fan" <fa...@intel.com>.

Hi everyone

Let me summarize the discussion about Rack awareness in the community so 
far. First thanks for all the comments, advices or challenges! :)

#1. Stick with attributes for rack awareness

For compatibility with existing framework, I tend to be ok with using 
attributes to convey the rack information, but with the goal to do it
automatically, easy to maintain and with good attributes schema. This 
will bring up below question where the controversy starts.

#2. Scripts vs programmatic way

Both can be used to set attributes, I've made my arguments in the Jira 
and the Design doc, I'm not gonna to argue more here. But please take a 
look discussion at MESOS-3366 before, which allow resources/attributes 
discovery.

A module to implement *slaveAttributesDecorator* hook will works like
a charm here in a static way. And need to justify attributes updating.

#3. Allow updating attributes
Several cases need to be covered here:

a). Mesos runs inside VMs or container, where live migration happens, so 
rack information need to be updated.

b). LLDP packets are broadcasted by the interval 10s~30s, a vendor 
specific implementation, and rack information are usually stored in LLDP 
daemon to be queried. Worst cases(nodes fresh reboot, or daemon restart) 
would be: Mesos slave have to wait 10s~30s for a valid rack information 
before register to master. Allow updating attributes will mitigate this 
problem.

c). Framework affinity

Framework X prefers to run on the same nodes with another framwork Y.
For example, it's desirable for Shark or Spark-SQL to reside on the
*worker* node where Alluxio(former Tachyon) to gain more performance 
boosting as SPARK-6707 ticket message {tachyon=true;us-east-1=false}

If framework could advertise agent attributes in the ResourcesOffer 
process, awesome!

#4. Rearrange agents in a more scalable manner, like per rack basis

Randomly offering agents resource to framework does not improve data 
locality, imagine the likelihood of a framework getting resources 
underneath the same rack, at the scale of +30000 nodes. Moreover time to 
randomly shuffle the agents also grows.

How about rearranging the agent in a per rack basis, and a minor change 
to the way how resources are allocated will fix this.

I might not see the whole picture here, so comments are welcomed!

On 2016/6/6 17:17, Du, Fan wrote:
> Hi, Mesos folks
>
> Ive been thinking about Mesos rack awareness support for a while,
>
> its a common interest for lots of data center applications to provide
> data locality,
>
> fault tolerance and better task placement. Create MESOS-5545 to track
> the story,
>
> and here is the initial design doc [1] to support rack awareness in Mesos.
>
> Looking forward to hear any comments from end user and other developers,
>
> Thanks!
>
> [1]:
> https://docs.google.com/document/d/1rql_LZSwtQzBPALnk0qCLsmxcT3-zB7X7aJp-H3xxyE/edit?usp=sharing
>

Re: Rack awareness support for Mesos

Posted by "Du, Fan" <fa...@intel.com>.

Hi everyone

Let me summarize the discussion about Rack awareness in the community so 
far. First thanks for all the comments, advices or challenges! :)

#1. Stick with attributes for rack awareness

For compatibility with existing framework, I tend to be ok with using 
attributes to convey the rack information, but with the goal to do it
automatically, easy to maintain and with good attributes schema. This 
will bring up below question where the controversy starts.

#2. Scripts vs programmatic way

Both can be used to set attributes, I've made my arguments in the Jira 
and the Design doc, I'm not gonna to argue more here. But please take a 
look discussion at MESOS-3366 before, which allow resources/attributes 
discovery.

A module to implement *slaveAttributesDecorator* hook will works like
a charm here in a static way. And need to justify attributes updating.

#3. Allow updating attributes
Several cases need to be covered here:

a). Mesos runs inside VMs or container, where live migration happens, so 
rack information need to be updated.

b). LLDP packets are broadcasted by the interval 10s~30s, a vendor 
specific implementation, and rack information are usually stored in LLDP 
daemon to be queried. Worst cases(nodes fresh reboot, or daemon restart) 
would be: Mesos slave have to wait 10s~30s for a valid rack information 
before register to master. Allow updating attributes will mitigate this 
problem.

c). Framework affinity

Framework X prefers to run on the same nodes with another framwork Y.
For example, it's desirable for Shark or Spark-SQL to reside on the
*worker* node where Alluxio(former Tachyon) to gain more performance 
boosting as SPARK-6707 ticket message {tachyon=true;us-east-1=false}

If framework could advertise agent attributes in the ResourcesOffer 
process, awesome!

#4. Rearrange agents in a more scalable manner, like per rack basis

Randomly offering agents resource to framework does not improve data 
locality, imagine the likelihood of a framework getting resources 
underneath the same rack, at the scale of +30000 nodes. Moreover time to 
randomly shuffle the agents also grows.

How about rearranging the agent in a per rack basis, and a minor change 
to the way how resources are allocated will fix this.

I might not see the whole picture here, so comments are welcomed!

On 2016/6/6 17:17, Du, Fan wrote:
> Hi, Mesos folks
>
> Ive been thinking about Mesos rack awareness support for a while,
>
> its a common interest for lots of data center applications to provide
> data locality,
>
> fault tolerance and better task placement. Create MESOS-5545 to track
> the story,
>
> and here is the initial design doc [1] to support rack awareness in Mesos.
>
> Looking forward to hear any comments from end user and other developers,
>
> Thanks!
>
> [1]:
> https://docs.google.com/document/d/1rql_LZSwtQzBPALnk0qCLsmxcT3-zB7X7aJp-H3xxyE/edit?usp=sharing
>