You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@aurora.apache.org by Maxim Khutornenko <ma...@apache.org> on 2016/01/20 03:22:13 UTC

Non-exclusive dedicated constraint

Has anyone explored an idea of having a non-exclusive (wrt job role)
dedicated constraint in Aurora before?

We do have a dedicated constraint now but it assumes a 1:1
relationship between a job role and a slave attribute [1]. For
example: a 'www-data/prod/hello' job with a dedicated constraint of
'dedicated': 'www-data/hello' may only be pinned to a particular set
of slaves if all of them have 'www-data/hello' attribute set. No other
role tasks will be able to land on those slaves unless their
'role/name' pair is added into the slave attribute set.

The above is very limiting as it prevents carving out subsets of a
shared pool cluster to be used by multiple roles at the same time.
Would it make sense to have a free-form dedicated constraint not bound
to a particular role? Multiple jobs could then use this type of
constraint dynamically without modifying the slave command line (and
requiring slave restart).

This could be quite useful for experimenting purposes (e.g. different
host OS) or to target a different hardware offering (e.g. GPUs). In
other words, only those jobs that explicitly opt-in to participate in
an experiment or hw offering would be landing on that slave set.

Thanks,
Maxim

[1]- https://github.com/apache/aurora/blob/eec985d948f02f46637d87cd4d212eb2a70ef8d0/src/main/java/org/apache/aurora/scheduler/configuration/ConfigurationManager.java#L272-L276

Re: Non-exclusive dedicated constraint

Posted by Maxim Khutornenko <ma...@apache.org>.
Here is RB: https://reviews.apache.org/r/44602/

On Wed, Mar 9, 2016 at 2:41 PM, Bill Farner <wf...@apache.org> wrote:
> Ah, so it only practically makes sense when the dedicated attribute is
> */something, but * would not make much sense.  Seems reasonable to me.
>
> On Wed, Mar 9, 2016 at 2:32 PM, Maxim Khutornenko <ma...@apache.org> wrote:
>
>> It's an *easy* way to get a virtual cluster with specific
>> requirements. One example: have a set of machines in a shared pool
>> with a different OS. This would let any existing or new customers try
>> their services for compliance. The alternative would be spinning off a
>> completely new physical cluster, which is a huge overhead on both
>> supply and demand sides.
>>
>> On Wed, Mar 9, 2016 at 2:26 PM, Bill Farner <wf...@apache.org> wrote:
>> > What does it mean to have a 'dedicated' host that's free-for-all like
>> that?
>> >
>> > On Wed, Mar 9, 2016 at 2:16 PM, Maxim Khutornenko <ma...@apache.org>
>> wrote:
>> >
>> >> Reactivating this thread. I like Bill's suggestion to have scheduler
>> >> dedicated constraint management system. It will, however, require a
>> >> substantial effort to get done properly. Would anyone oppose adopting
>> >> Steve's patch in the meantime? The ROI is so high it would be a crime
>> >> NOT to take it :)
>> >>
>> >> On Wed, Jan 20, 2016 at 10:25 AM, Maxim Khutornenko <ma...@apache.org>
>> >> wrote:
>> >> > I should have looked closely, you are right! This indeed addresses
>> >> > both cases: a job with a named dedicated role is still allowed to get
>> >> > though if it's role matches the constraint and everything else
>> >> > (non-exclusive dedicated pool) is addressed with "*".
>> >> >
>> >> > What it does not solve though is the variety of non-exclusive
>> >> > dedicated pools (e.g. GPU, OS, high network bandwidth and etc.). For
>> >> > that we would need something similar to what Bill suggested.
>> >> >
>> >> > On Wed, Jan 20, 2016 at 10:03 AM, Steve Niemitz <sn...@apache.org>
>> >> wrote:
>> >> >> An arbitrary job can't target a fully dedicated role with this
>> patch, it
>> >> >> will still get a "constraint not satisfied: dedicated" error.  The
>> code
>> >> in
>> >> >> the scheduler that matches the constraints does a simple string
>> match,
>> >> so
>> >> >> "*/test" will not match "role1/test" when trying to place the task,
>> it
>> >> will
>> >> >> only match "*/test".
>> >> >>
>> >> >> On Wed, Jan 20, 2016 at 12:24 PM, Maxim Khutornenko <
>> maxim@apache.org>
>> >> >> wrote:
>> >> >>
>> >> >>> Thanks for the info, Steve! Yes, it would accomplish the same goal
>> but
>> >> >>> at the price of removing the exclusive dedicated constraint
>> >> >>> enforcement. With this patch any job could target a fully dedicated
>> >> >>> exclusive pool, which may be undesirable for dedicated pool owners.
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>> On Wed, Jan 20, 2016 at 7:13 AM, Steve Niemitz <sniemitz@apache.org
>> >
>> >> >>> wrote:
>> >> >>> > We've been running a trivial patch [1] that does what I believe
>> >> you're
>> >> >>> > talking about for awhile now.  It allows a * for the role name,
>> >> basically
>> >> >>> > allowing any role to match the constraint, so our constraints look
>> >> like
>> >> >>> > "*/secure"
>> >> >>> >
>> >> >>> > Our use case is we have a "secure" cluster of machines that is
>> >> >>> constrained
>> >> >>> > on what can run on it (via an external audit process) that
>> multiple
>> >> roles
>> >> >>> > run on.
>> >> >>> >
>> >> >>> > I believe I had talked to Bill about this a few months ago, but I
>> >> don't
>> >> >>> > remember where it ended up.
>> >> >>> >
>> >> >>> > [1]
>> >> >>> >
>> >> >>>
>> >>
>> https://github.com/tellapart/aurora/commit/76f978c76cc1377e19e602f7e0d050f7ce353562
>> >> >>> >
>> >> >>> > On Tue, Jan 19, 2016 at 11:48 PM, Maxim Khutornenko <
>> >> maxim@apache.org>
>> >> >>> > wrote:
>> >> >>> >
>> >> >>> >> Oh, I didn't mean the memory GC pressure in the pure sense,
>> rather a
>> >> >>> >> logical garbage of orphaned hosts that never leave the scheduler.
>> >> It's
>> >> >>> >> not something to be concerned about from the performance
>> standpoint.
>> >> >>> >> It's, however, something operators need to be aware of when a
>> host
>> >> >>> >> from a dedicated pool gets dropped or replaced.
>> >> >>> >>
>> >> >>> >> On Tue, Jan 19, 2016 at 8:39 PM, Bill Farner <wfarner@apache.org
>> >
>> >> >>> wrote:
>> >> >>> >> > What do you mean by GC burden?  What i'm proposing is
>> effectively
>> >> >>> >> > Map<String, String>.  Even with an extremely forgetful operator
>> >> (even
>> >> >>> >> more
>> >> >>> >> > than Joe!), it would require a huge oversight to put a dent in
>> >> heap
>> >> >>> >> usage.
>> >> >>> >> > I'm sure there are ways we could even expose a useful stat to
>> flag
>> >> >>> such
>> >> >>> >> an
>> >> >>> >> > oversight.
>> >> >>> >> >
>> >> >>> >> > On Tue, Jan 19, 2016 at 8:31 PM, Maxim Khutornenko <
>> >> maxim@apache.org>
>> >> >>> >> wrote:
>> >> >>> >> >
>> >> >>> >> >> Right, that's what I thought. Yes, it sounds interesting. My
>> only
>> >> >>> >> >> concern is the GC burden of getting rid of hostnames that are
>> >> >>> obsolete
>> >> >>> >> >> and no longer exist. Relying on offers to update hostname
>> >> 'relevance'
>> >> >>> >> >> may not work as dedicated hosts may be fully packed and not
>> >> release
>> >> >>> >> >> any resources for a very long time. Let me explore this idea a
>> >> bit to
>> >> >>> >> >> see what it would take to implement.
>> >> >>> >> >>
>> >> >>> >> >> On Tue, Jan 19, 2016 at 8:22 PM, Bill Farner <
>> wfarner@apache.org
>> >> >
>> >> >>> >> wrote:
>> >> >>> >> >> > Not a host->attribute mapping (attribute in the mesos sense,
>> >> >>> anyway).
>> >> >>> >> >> Rather
>> >> >>> >> >> > an out-of-band API for marking machines as reserved.  For
>> >> >>> task->offer
>> >> >>> >> >> > mapping it's just a matter of another data source.  Does
>> that
>> >> make
>> >> >>> >> sense?
>> >> >>> >> >> >
>> >> >>> >> >> > On Tuesday, January 19, 2016, Maxim Khutornenko <
>> >> maxim@apache.org>
>> >> >>> >> >> wrote:
>> >> >>> >> >> >
>> >> >>> >> >> >> >
>> >> >>> >> >> >> > Can't this just be any old Constraint (not named
>> >> "dedicated").
>> >> >>> In
>> >> >>> >> >> other
>> >> >>> >> >> >> > words, doesn't this code already deal with non-dedicated
>> >> >>> >> constraints?:
>> >> >>> >> >> >> >
>> >> >>> >> >> >> >
>> >> >>> >> >> >>
>> >> >>> >> >>
>> >> >>> >>
>> >> >>>
>> >>
>> https://github.com/apache/aurora/blob/master/src/main/java/org/apache/aurora/scheduler/filter/SchedulingFilterImpl.java#L193-L197
>> >> >>> >> >> >>
>> >> >>> >> >> >>
>> >> >>> >> >> >> Not really. There is a subtle difference here. A regular
>> >> >>> >> (non-dedicated)
>> >> >>> >> >> >> constraint does not prevent other tasks from landing on a
>> >> given
>> >> >>> >> machine
>> >> >>> >> >> set
>> >> >>> >> >> >> whereas dedicated keeps other tasks away by only allowing
>> >> those
>> >> >>> >> matching
>> >> >>> >> >> >> the dedicated attribute. What this proposal targets is
>> >> allowing
>> >> >>> >> >> exclusive
>> >> >>> >> >> >> machine pool matching any job that has this new constraint
>> >> while
>> >> >>> >> keeping
>> >> >>> >> >> >> all other tasks that don't have that attribute away.
>> >> >>> >> >> >>
>> >> >>> >> >> >> Following an example from my original post, imagine a GPU
>> >> machine
>> >> >>> >> pool.
>> >> >>> >> >> Any
>> >> >>> >> >> >> job (from any role) requiring GPU resource would be allowed
>> >> while
>> >> >>> all
>> >> >>> >> >> other
>> >> >>> >> >> >> jobs that don't have that constraint would be vetoed.
>> >> >>> >> >> >>
>> >> >>> >> >> >> Also, regarding dedicated constraints necessitating a slave
>> >> >>> restart -
>> >> >>> >> >> i've
>> >> >>> >> >> >> > pondered moving dedicated machine management to the
>> >> scheduler
>> >> >>> for
>> >> >>> >> >> similar
>> >> >>> >> >> >> > purposes.  There's not really much forcing that behavior
>> to
>> >> be
>> >> >>> >> managed
>> >> >>> >> >> >> with
>> >> >>> >> >> >> > a slave attribute.
>> >> >>> >> >> >>
>> >> >>> >> >> >>
>> >> >>> >> >> >> Would you mind giving a few more hints on the mechanics
>> behind
>> >> >>> this?
>> >> >>> >> How
>> >> >>> >> >> >> would scheduler know about dedicated hw without the slave
>> >> >>> attributes
>> >> >>> >> >> set?
>> >> >>> >> >> >> Are you proposing storing hostname->attribute mapping in
>> the
>> >> >>> >> scheduler
>> >> >>> >> >> >> store?
>> >> >>> >> >> >>
>> >> >>> >> >> >> On Tue, Jan 19, 2016 at 7:53 PM, Bill Farner <
>> >> wfarner@apache.org
>> >> >>> >> >> >> <javascript:;>> wrote:
>> >> >>> >> >> >>
>> >> >>> >> >> >> > Joe - if you want to pursue this, I suggest you start
>> >> another
>> >> >>> >> thread
>> >> >>> >> >> to
>> >> >>> >> >> >> > keep this thread's discussion in tact.  I will not be
>> able
>> >> to
>> >> >>> lead
>> >> >>> >> >> this
>> >> >>> >> >> >> > change, but can certainly shepherd!
>> >> >>> >> >> >> >
>> >> >>> >> >> >> > On Tuesday, January 19, 2016, Joe Smith <
>> >> yasumoto7@gmail.com
>> >> >>> >> >> >> <javascript:;>> wrote:
>> >> >>> >> >> >> >
>> >> >>> >> >> >> > > As an operator, that'd be a relatively simple change in
>> >> >>> tooling,
>> >> >>> >> and
>> >> >>> >> >> >> the
>> >> >>> >> >> >> > > benefits of not forcing a slave restart would be
>> _huge_.
>> >> >>> >> >> >> > >
>> >> >>> >> >> >> > > Keeping the dedicated semantics (but adding
>> non-exclusive)
>> >> >>> would
>> >> >>> >> be
>> >> >>> >> >> >> ideal
>> >> >>> >> >> >> > > if possible.
>> >> >>> >> >> >> > >
>> >> >>> >> >> >> > > > On Jan 19, 2016, at 19:09, Bill Farner <
>> >> wfarner@apache.org
>> >> >>> >> >> >> <javascript:;>
>> >> >>> >> >> >> > > <javascript:;>> wrote:
>> >> >>> >> >> >> > > >
>> >> >>> >> >> >> > > > Also, regarding dedicated constraints necessitating a
>> >> slave
>> >> >>> >> >> restart -
>> >> >>> >> >> >> > > i've
>> >> >>> >> >> >> > > > pondered moving dedicated machine management to the
>> >> >>> scheduler
>> >> >>> >> for
>> >> >>> >> >> >> > similar
>> >> >>> >> >> >> > > > purposes.  There's not really much forcing that
>> >> behavior to
>> >> >>> be
>> >> >>> >> >> >> managed
>> >> >>> >> >> >> > > with
>> >> >>> >> >> >> > > > a slave attribute.
>> >> >>> >> >> >> > > >
>> >> >>> >> >> >> > > > On Tue, Jan 19, 2016 at 7:05 PM, John Sirois <
>> >> >>> >> john@conductant.com
>> >> >>> >> >> >> <javascript:;>
>> >> >>> >> >> >> > > <javascript:;>> wrote:
>> >> >>> >> >> >> > > >
>> >> >>> >> >> >> > > >> On Tue, Jan 19, 2016 at 7:22 PM, Maxim Khutornenko <
>> >> >>> >> >> >> maxim@apache.org <javascript:;>
>> >> >>> >> >> >> > > <javascript:;>>
>> >> >>> >> >> >> > > >> wrote:
>> >> >>> >> >> >> > > >>
>> >> >>> >> >> >> > > >>> Has anyone explored an idea of having a
>> non-exclusive
>> >> (wrt
>> >> >>> >> job
>> >> >>> >> >> >> role)
>> >> >>> >> >> >> > > >>> dedicated constraint in Aurora before?
>> >> >>> >> >> >> > > >>
>> >> >>> >> >> >> > > >>
>> >> >>> >> >> >> > > >>> We do have a dedicated constraint now but it
>> assumes
>> >> a 1:1
>> >> >>> >> >> >> > > >>> relationship between a job role and a slave
>> attribute
>> >> [1].
>> >> >>> >> For
>> >> >>> >> >> >> > > >>> example: a 'www-data/prod/hello' job with a
>> dedicated
>> >> >>> >> >> constraint of
>> >> >>> >> >> >> > > >>> 'dedicated': 'www-data/hello' may only be pinned
>> to a
>> >> >>> >> particular
>> >> >>> >> >> >> set
>> >> >>> >> >> >> > > >>> of slaves if all of them have 'www-data/hello'
>> >> attribute
>> >> >>> >> set. No
>> >> >>> >> >> >> > other
>> >> >>> >> >> >> > > >>> role tasks will be able to land on those slaves
>> unless
>> >> >>> their
>> >> >>> >> >> >> > > >>> 'role/name' pair is added into the slave attribute
>> >> set.
>> >> >>> >> >> >> > > >>>
>> >> >>> >> >> >> > > >>> The above is very limiting as it prevents carving
>> out
>> >> >>> subsets
>> >> >>> >> >> of a
>> >> >>> >> >> >> > > >>> shared pool cluster to be used by multiple roles at
>> >> the
>> >> >>> same
>> >> >>> >> >> time.
>> >> >>> >> >> >> > > >>> Would it make sense to have a free-form dedicated
>> >> >>> constraint
>> >> >>> >> not
>> >> >>> >> >> >> > bound
>> >> >>> >> >> >> > > >>> to a particular role? Multiple jobs could then use
>> >> this
>> >> >>> type
>> >> >>> >> of
>> >> >>> >> >> >> > > >>> constraint dynamically without modifying the slave
>> >> command
>> >> >>> >> line
>> >> >>> >> >> >> (and
>> >> >>> >> >> >> > > >>> requiring slave restart).
>> >> >>> >> >> >> > > >>
>> >> >>> >> >> >> > > >> Can't this just be any old Constraint (not named
>> >> >>> "dedicated").
>> >> >>> >> >> In
>> >> >>> >> >> >> > other
>> >> >>> >> >> >> > > >> words, doesn't this code already deal with
>> >> non-dedicated
>> >> >>> >> >> >> constraints?:
>> >> >>> >> >> >> > > >>
>> >> >>> >> >> >> > > >>
>> >> >>> >> >> >> > >
>> >> >>> >> >> >> >
>> >> >>> >> >> >>
>> >> >>> >> >>
>> >> >>> >>
>> >> >>>
>> >>
>> https://github.com/apache/aurora/blob/master/src/main/java/org/apache/aurora/scheduler/filter/SchedulingFilterImpl.java#L193-L197
>> >> >>> >> >> >> > > >>
>> >> >>> >> >> >> > > >>
>> >> >>> >> >> >> > > >>> This could be quite useful for experimenting
>> purposes
>> >> >>> (e.g.
>> >> >>> >> >> >> different
>> >> >>> >> >> >> > > >>> host OS) or to target a different hardware offering
>> >> (e.g.
>> >> >>> >> >> GPUs). In
>> >> >>> >> >> >> > > >>> other words, only those jobs that explicitly
>> opt-in to
>> >> >>> >> >> participate
>> >> >>> >> >> >> in
>> >> >>> >> >> >> > > >>> an experiment or hw offering would be landing on
>> that
>> >> >>> slave
>> >> >>> >> set.
>> >> >>> >> >> >> > > >>>
>> >> >>> >> >> >> > > >>> Thanks,
>> >> >>> >> >> >> > > >>> Maxim
>> >> >>> >> >> >> > > >>>
>> >> >>> >> >> >> > > >>> [1]-
>> >> >>> >> >> >> > > >>
>> >> >>> >> >> >> > >
>> >> >>> >> >> >> >
>> >> >>> >> >> >>
>> >> >>> >> >>
>> >> >>> >>
>> >> >>>
>> >>
>> https://github.com/apache/aurora/blob/eec985d948f02f46637d87cd4d212eb2a70ef8d0/src/main/java/org/apache/aurora/scheduler/configuration/ConfigurationManager.java#L272-L276
>> >> >>> >> >> >> > > >>
>> >> >>> >> >> >> > > >>
>> >> >>> >> >> >> > > >>
>> >> >>> >> >> >> > > >> --
>> >> >>> >> >> >> > > >> John Sirois
>> >> >>> >> >> >> > > >> 303-512-3301
>> >> >>> >> >> >> > > >>
>> >> >>> >> >> >> > >
>> >> >>> >> >> >> >
>> >> >>> >> >> >>
>> >> >>> >> >>
>> >> >>> >>
>> >> >>>
>> >>
>>

Re: Non-exclusive dedicated constraint

Posted by Bill Farner <wf...@apache.org>.
Ah, so it only practically makes sense when the dedicated attribute is
*/something, but * would not make much sense.  Seems reasonable to me.

On Wed, Mar 9, 2016 at 2:32 PM, Maxim Khutornenko <ma...@apache.org> wrote:

> It's an *easy* way to get a virtual cluster with specific
> requirements. One example: have a set of machines in a shared pool
> with a different OS. This would let any existing or new customers try
> their services for compliance. The alternative would be spinning off a
> completely new physical cluster, which is a huge overhead on both
> supply and demand sides.
>
> On Wed, Mar 9, 2016 at 2:26 PM, Bill Farner <wf...@apache.org> wrote:
> > What does it mean to have a 'dedicated' host that's free-for-all like
> that?
> >
> > On Wed, Mar 9, 2016 at 2:16 PM, Maxim Khutornenko <ma...@apache.org>
> wrote:
> >
> >> Reactivating this thread. I like Bill's suggestion to have scheduler
> >> dedicated constraint management system. It will, however, require a
> >> substantial effort to get done properly. Would anyone oppose adopting
> >> Steve's patch in the meantime? The ROI is so high it would be a crime
> >> NOT to take it :)
> >>
> >> On Wed, Jan 20, 2016 at 10:25 AM, Maxim Khutornenko <ma...@apache.org>
> >> wrote:
> >> > I should have looked closely, you are right! This indeed addresses
> >> > both cases: a job with a named dedicated role is still allowed to get
> >> > though if it's role matches the constraint and everything else
> >> > (non-exclusive dedicated pool) is addressed with "*".
> >> >
> >> > What it does not solve though is the variety of non-exclusive
> >> > dedicated pools (e.g. GPU, OS, high network bandwidth and etc.). For
> >> > that we would need something similar to what Bill suggested.
> >> >
> >> > On Wed, Jan 20, 2016 at 10:03 AM, Steve Niemitz <sn...@apache.org>
> >> wrote:
> >> >> An arbitrary job can't target a fully dedicated role with this
> patch, it
> >> >> will still get a "constraint not satisfied: dedicated" error.  The
> code
> >> in
> >> >> the scheduler that matches the constraints does a simple string
> match,
> >> so
> >> >> "*/test" will not match "role1/test" when trying to place the task,
> it
> >> will
> >> >> only match "*/test".
> >> >>
> >> >> On Wed, Jan 20, 2016 at 12:24 PM, Maxim Khutornenko <
> maxim@apache.org>
> >> >> wrote:
> >> >>
> >> >>> Thanks for the info, Steve! Yes, it would accomplish the same goal
> but
> >> >>> at the price of removing the exclusive dedicated constraint
> >> >>> enforcement. With this patch any job could target a fully dedicated
> >> >>> exclusive pool, which may be undesirable for dedicated pool owners.
> >> >>>
> >> >>>
> >> >>>
> >> >>> On Wed, Jan 20, 2016 at 7:13 AM, Steve Niemitz <sniemitz@apache.org
> >
> >> >>> wrote:
> >> >>> > We've been running a trivial patch [1] that does what I believe
> >> you're
> >> >>> > talking about for awhile now.  It allows a * for the role name,
> >> basically
> >> >>> > allowing any role to match the constraint, so our constraints look
> >> like
> >> >>> > "*/secure"
> >> >>> >
> >> >>> > Our use case is we have a "secure" cluster of machines that is
> >> >>> constrained
> >> >>> > on what can run on it (via an external audit process) that
> multiple
> >> roles
> >> >>> > run on.
> >> >>> >
> >> >>> > I believe I had talked to Bill about this a few months ago, but I
> >> don't
> >> >>> > remember where it ended up.
> >> >>> >
> >> >>> > [1]
> >> >>> >
> >> >>>
> >>
> https://github.com/tellapart/aurora/commit/76f978c76cc1377e19e602f7e0d050f7ce353562
> >> >>> >
> >> >>> > On Tue, Jan 19, 2016 at 11:48 PM, Maxim Khutornenko <
> >> maxim@apache.org>
> >> >>> > wrote:
> >> >>> >
> >> >>> >> Oh, I didn't mean the memory GC pressure in the pure sense,
> rather a
> >> >>> >> logical garbage of orphaned hosts that never leave the scheduler.
> >> It's
> >> >>> >> not something to be concerned about from the performance
> standpoint.
> >> >>> >> It's, however, something operators need to be aware of when a
> host
> >> >>> >> from a dedicated pool gets dropped or replaced.
> >> >>> >>
> >> >>> >> On Tue, Jan 19, 2016 at 8:39 PM, Bill Farner <wfarner@apache.org
> >
> >> >>> wrote:
> >> >>> >> > What do you mean by GC burden?  What i'm proposing is
> effectively
> >> >>> >> > Map<String, String>.  Even with an extremely forgetful operator
> >> (even
> >> >>> >> more
> >> >>> >> > than Joe!), it would require a huge oversight to put a dent in
> >> heap
> >> >>> >> usage.
> >> >>> >> > I'm sure there are ways we could even expose a useful stat to
> flag
> >> >>> such
> >> >>> >> an
> >> >>> >> > oversight.
> >> >>> >> >
> >> >>> >> > On Tue, Jan 19, 2016 at 8:31 PM, Maxim Khutornenko <
> >> maxim@apache.org>
> >> >>> >> wrote:
> >> >>> >> >
> >> >>> >> >> Right, that's what I thought. Yes, it sounds interesting. My
> only
> >> >>> >> >> concern is the GC burden of getting rid of hostnames that are
> >> >>> obsolete
> >> >>> >> >> and no longer exist. Relying on offers to update hostname
> >> 'relevance'
> >> >>> >> >> may not work as dedicated hosts may be fully packed and not
> >> release
> >> >>> >> >> any resources for a very long time. Let me explore this idea a
> >> bit to
> >> >>> >> >> see what it would take to implement.
> >> >>> >> >>
> >> >>> >> >> On Tue, Jan 19, 2016 at 8:22 PM, Bill Farner <
> wfarner@apache.org
> >> >
> >> >>> >> wrote:
> >> >>> >> >> > Not a host->attribute mapping (attribute in the mesos sense,
> >> >>> anyway).
> >> >>> >> >> Rather
> >> >>> >> >> > an out-of-band API for marking machines as reserved.  For
> >> >>> task->offer
> >> >>> >> >> > mapping it's just a matter of another data source.  Does
> that
> >> make
> >> >>> >> sense?
> >> >>> >> >> >
> >> >>> >> >> > On Tuesday, January 19, 2016, Maxim Khutornenko <
> >> maxim@apache.org>
> >> >>> >> >> wrote:
> >> >>> >> >> >
> >> >>> >> >> >> >
> >> >>> >> >> >> > Can't this just be any old Constraint (not named
> >> "dedicated").
> >> >>> In
> >> >>> >> >> other
> >> >>> >> >> >> > words, doesn't this code already deal with non-dedicated
> >> >>> >> constraints?:
> >> >>> >> >> >> >
> >> >>> >> >> >> >
> >> >>> >> >> >>
> >> >>> >> >>
> >> >>> >>
> >> >>>
> >>
> https://github.com/apache/aurora/blob/master/src/main/java/org/apache/aurora/scheduler/filter/SchedulingFilterImpl.java#L193-L197
> >> >>> >> >> >>
> >> >>> >> >> >>
> >> >>> >> >> >> Not really. There is a subtle difference here. A regular
> >> >>> >> (non-dedicated)
> >> >>> >> >> >> constraint does not prevent other tasks from landing on a
> >> given
> >> >>> >> machine
> >> >>> >> >> set
> >> >>> >> >> >> whereas dedicated keeps other tasks away by only allowing
> >> those
> >> >>> >> matching
> >> >>> >> >> >> the dedicated attribute. What this proposal targets is
> >> allowing
> >> >>> >> >> exclusive
> >> >>> >> >> >> machine pool matching any job that has this new constraint
> >> while
> >> >>> >> keeping
> >> >>> >> >> >> all other tasks that don't have that attribute away.
> >> >>> >> >> >>
> >> >>> >> >> >> Following an example from my original post, imagine a GPU
> >> machine
> >> >>> >> pool.
> >> >>> >> >> Any
> >> >>> >> >> >> job (from any role) requiring GPU resource would be allowed
> >> while
> >> >>> all
> >> >>> >> >> other
> >> >>> >> >> >> jobs that don't have that constraint would be vetoed.
> >> >>> >> >> >>
> >> >>> >> >> >> Also, regarding dedicated constraints necessitating a slave
> >> >>> restart -
> >> >>> >> >> i've
> >> >>> >> >> >> > pondered moving dedicated machine management to the
> >> scheduler
> >> >>> for
> >> >>> >> >> similar
> >> >>> >> >> >> > purposes.  There's not really much forcing that behavior
> to
> >> be
> >> >>> >> managed
> >> >>> >> >> >> with
> >> >>> >> >> >> > a slave attribute.
> >> >>> >> >> >>
> >> >>> >> >> >>
> >> >>> >> >> >> Would you mind giving a few more hints on the mechanics
> behind
> >> >>> this?
> >> >>> >> How
> >> >>> >> >> >> would scheduler know about dedicated hw without the slave
> >> >>> attributes
> >> >>> >> >> set?
> >> >>> >> >> >> Are you proposing storing hostname->attribute mapping in
> the
> >> >>> >> scheduler
> >> >>> >> >> >> store?
> >> >>> >> >> >>
> >> >>> >> >> >> On Tue, Jan 19, 2016 at 7:53 PM, Bill Farner <
> >> wfarner@apache.org
> >> >>> >> >> >> <javascript:;>> wrote:
> >> >>> >> >> >>
> >> >>> >> >> >> > Joe - if you want to pursue this, I suggest you start
> >> another
> >> >>> >> thread
> >> >>> >> >> to
> >> >>> >> >> >> > keep this thread's discussion in tact.  I will not be
> able
> >> to
> >> >>> lead
> >> >>> >> >> this
> >> >>> >> >> >> > change, but can certainly shepherd!
> >> >>> >> >> >> >
> >> >>> >> >> >> > On Tuesday, January 19, 2016, Joe Smith <
> >> yasumoto7@gmail.com
> >> >>> >> >> >> <javascript:;>> wrote:
> >> >>> >> >> >> >
> >> >>> >> >> >> > > As an operator, that'd be a relatively simple change in
> >> >>> tooling,
> >> >>> >> and
> >> >>> >> >> >> the
> >> >>> >> >> >> > > benefits of not forcing a slave restart would be
> _huge_.
> >> >>> >> >> >> > >
> >> >>> >> >> >> > > Keeping the dedicated semantics (but adding
> non-exclusive)
> >> >>> would
> >> >>> >> be
> >> >>> >> >> >> ideal
> >> >>> >> >> >> > > if possible.
> >> >>> >> >> >> > >
> >> >>> >> >> >> > > > On Jan 19, 2016, at 19:09, Bill Farner <
> >> wfarner@apache.org
> >> >>> >> >> >> <javascript:;>
> >> >>> >> >> >> > > <javascript:;>> wrote:
> >> >>> >> >> >> > > >
> >> >>> >> >> >> > > > Also, regarding dedicated constraints necessitating a
> >> slave
> >> >>> >> >> restart -
> >> >>> >> >> >> > > i've
> >> >>> >> >> >> > > > pondered moving dedicated machine management to the
> >> >>> scheduler
> >> >>> >> for
> >> >>> >> >> >> > similar
> >> >>> >> >> >> > > > purposes.  There's not really much forcing that
> >> behavior to
> >> >>> be
> >> >>> >> >> >> managed
> >> >>> >> >> >> > > with
> >> >>> >> >> >> > > > a slave attribute.
> >> >>> >> >> >> > > >
> >> >>> >> >> >> > > > On Tue, Jan 19, 2016 at 7:05 PM, John Sirois <
> >> >>> >> john@conductant.com
> >> >>> >> >> >> <javascript:;>
> >> >>> >> >> >> > > <javascript:;>> wrote:
> >> >>> >> >> >> > > >
> >> >>> >> >> >> > > >> On Tue, Jan 19, 2016 at 7:22 PM, Maxim Khutornenko <
> >> >>> >> >> >> maxim@apache.org <javascript:;>
> >> >>> >> >> >> > > <javascript:;>>
> >> >>> >> >> >> > > >> wrote:
> >> >>> >> >> >> > > >>
> >> >>> >> >> >> > > >>> Has anyone explored an idea of having a
> non-exclusive
> >> (wrt
> >> >>> >> job
> >> >>> >> >> >> role)
> >> >>> >> >> >> > > >>> dedicated constraint in Aurora before?
> >> >>> >> >> >> > > >>
> >> >>> >> >> >> > > >>
> >> >>> >> >> >> > > >>> We do have a dedicated constraint now but it
> assumes
> >> a 1:1
> >> >>> >> >> >> > > >>> relationship between a job role and a slave
> attribute
> >> [1].
> >> >>> >> For
> >> >>> >> >> >> > > >>> example: a 'www-data/prod/hello' job with a
> dedicated
> >> >>> >> >> constraint of
> >> >>> >> >> >> > > >>> 'dedicated': 'www-data/hello' may only be pinned
> to a
> >> >>> >> particular
> >> >>> >> >> >> set
> >> >>> >> >> >> > > >>> of slaves if all of them have 'www-data/hello'
> >> attribute
> >> >>> >> set. No
> >> >>> >> >> >> > other
> >> >>> >> >> >> > > >>> role tasks will be able to land on those slaves
> unless
> >> >>> their
> >> >>> >> >> >> > > >>> 'role/name' pair is added into the slave attribute
> >> set.
> >> >>> >> >> >> > > >>>
> >> >>> >> >> >> > > >>> The above is very limiting as it prevents carving
> out
> >> >>> subsets
> >> >>> >> >> of a
> >> >>> >> >> >> > > >>> shared pool cluster to be used by multiple roles at
> >> the
> >> >>> same
> >> >>> >> >> time.
> >> >>> >> >> >> > > >>> Would it make sense to have a free-form dedicated
> >> >>> constraint
> >> >>> >> not
> >> >>> >> >> >> > bound
> >> >>> >> >> >> > > >>> to a particular role? Multiple jobs could then use
> >> this
> >> >>> type
> >> >>> >> of
> >> >>> >> >> >> > > >>> constraint dynamically without modifying the slave
> >> command
> >> >>> >> line
> >> >>> >> >> >> (and
> >> >>> >> >> >> > > >>> requiring slave restart).
> >> >>> >> >> >> > > >>
> >> >>> >> >> >> > > >> Can't this just be any old Constraint (not named
> >> >>> "dedicated").
> >> >>> >> >> In
> >> >>> >> >> >> > other
> >> >>> >> >> >> > > >> words, doesn't this code already deal with
> >> non-dedicated
> >> >>> >> >> >> constraints?:
> >> >>> >> >> >> > > >>
> >> >>> >> >> >> > > >>
> >> >>> >> >> >> > >
> >> >>> >> >> >> >
> >> >>> >> >> >>
> >> >>> >> >>
> >> >>> >>
> >> >>>
> >>
> https://github.com/apache/aurora/blob/master/src/main/java/org/apache/aurora/scheduler/filter/SchedulingFilterImpl.java#L193-L197
> >> >>> >> >> >> > > >>
> >> >>> >> >> >> > > >>
> >> >>> >> >> >> > > >>> This could be quite useful for experimenting
> purposes
> >> >>> (e.g.
> >> >>> >> >> >> different
> >> >>> >> >> >> > > >>> host OS) or to target a different hardware offering
> >> (e.g.
> >> >>> >> >> GPUs). In
> >> >>> >> >> >> > > >>> other words, only those jobs that explicitly
> opt-in to
> >> >>> >> >> participate
> >> >>> >> >> >> in
> >> >>> >> >> >> > > >>> an experiment or hw offering would be landing on
> that
> >> >>> slave
> >> >>> >> set.
> >> >>> >> >> >> > > >>>
> >> >>> >> >> >> > > >>> Thanks,
> >> >>> >> >> >> > > >>> Maxim
> >> >>> >> >> >> > > >>>
> >> >>> >> >> >> > > >>> [1]-
> >> >>> >> >> >> > > >>
> >> >>> >> >> >> > >
> >> >>> >> >> >> >
> >> >>> >> >> >>
> >> >>> >> >>
> >> >>> >>
> >> >>>
> >>
> https://github.com/apache/aurora/blob/eec985d948f02f46637d87cd4d212eb2a70ef8d0/src/main/java/org/apache/aurora/scheduler/configuration/ConfigurationManager.java#L272-L276
> >> >>> >> >> >> > > >>
> >> >>> >> >> >> > > >>
> >> >>> >> >> >> > > >>
> >> >>> >> >> >> > > >> --
> >> >>> >> >> >> > > >> John Sirois
> >> >>> >> >> >> > > >> 303-512-3301
> >> >>> >> >> >> > > >>
> >> >>> >> >> >> > >
> >> >>> >> >> >> >
> >> >>> >> >> >>
> >> >>> >> >>
> >> >>> >>
> >> >>>
> >>
>

Re: Non-exclusive dedicated constraint

Posted by Maxim Khutornenko <ma...@apache.org>.
It's an *easy* way to get a virtual cluster with specific
requirements. One example: have a set of machines in a shared pool
with a different OS. This would let any existing or new customers try
their services for compliance. The alternative would be spinning off a
completely new physical cluster, which is a huge overhead on both
supply and demand sides.

On Wed, Mar 9, 2016 at 2:26 PM, Bill Farner <wf...@apache.org> wrote:
> What does it mean to have a 'dedicated' host that's free-for-all like that?
>
> On Wed, Mar 9, 2016 at 2:16 PM, Maxim Khutornenko <ma...@apache.org> wrote:
>
>> Reactivating this thread. I like Bill's suggestion to have scheduler
>> dedicated constraint management system. It will, however, require a
>> substantial effort to get done properly. Would anyone oppose adopting
>> Steve's patch in the meantime? The ROI is so high it would be a crime
>> NOT to take it :)
>>
>> On Wed, Jan 20, 2016 at 10:25 AM, Maxim Khutornenko <ma...@apache.org>
>> wrote:
>> > I should have looked closely, you are right! This indeed addresses
>> > both cases: a job with a named dedicated role is still allowed to get
>> > though if it's role matches the constraint and everything else
>> > (non-exclusive dedicated pool) is addressed with "*".
>> >
>> > What it does not solve though is the variety of non-exclusive
>> > dedicated pools (e.g. GPU, OS, high network bandwidth and etc.). For
>> > that we would need something similar to what Bill suggested.
>> >
>> > On Wed, Jan 20, 2016 at 10:03 AM, Steve Niemitz <sn...@apache.org>
>> wrote:
>> >> An arbitrary job can't target a fully dedicated role with this patch, it
>> >> will still get a "constraint not satisfied: dedicated" error.  The code
>> in
>> >> the scheduler that matches the constraints does a simple string match,
>> so
>> >> "*/test" will not match "role1/test" when trying to place the task, it
>> will
>> >> only match "*/test".
>> >>
>> >> On Wed, Jan 20, 2016 at 12:24 PM, Maxim Khutornenko <ma...@apache.org>
>> >> wrote:
>> >>
>> >>> Thanks for the info, Steve! Yes, it would accomplish the same goal but
>> >>> at the price of removing the exclusive dedicated constraint
>> >>> enforcement. With this patch any job could target a fully dedicated
>> >>> exclusive pool, which may be undesirable for dedicated pool owners.
>> >>>
>> >>>
>> >>>
>> >>> On Wed, Jan 20, 2016 at 7:13 AM, Steve Niemitz <sn...@apache.org>
>> >>> wrote:
>> >>> > We've been running a trivial patch [1] that does what I believe
>> you're
>> >>> > talking about for awhile now.  It allows a * for the role name,
>> basically
>> >>> > allowing any role to match the constraint, so our constraints look
>> like
>> >>> > "*/secure"
>> >>> >
>> >>> > Our use case is we have a "secure" cluster of machines that is
>> >>> constrained
>> >>> > on what can run on it (via an external audit process) that multiple
>> roles
>> >>> > run on.
>> >>> >
>> >>> > I believe I had talked to Bill about this a few months ago, but I
>> don't
>> >>> > remember where it ended up.
>> >>> >
>> >>> > [1]
>> >>> >
>> >>>
>> https://github.com/tellapart/aurora/commit/76f978c76cc1377e19e602f7e0d050f7ce353562
>> >>> >
>> >>> > On Tue, Jan 19, 2016 at 11:48 PM, Maxim Khutornenko <
>> maxim@apache.org>
>> >>> > wrote:
>> >>> >
>> >>> >> Oh, I didn't mean the memory GC pressure in the pure sense, rather a
>> >>> >> logical garbage of orphaned hosts that never leave the scheduler.
>> It's
>> >>> >> not something to be concerned about from the performance standpoint.
>> >>> >> It's, however, something operators need to be aware of when a host
>> >>> >> from a dedicated pool gets dropped or replaced.
>> >>> >>
>> >>> >> On Tue, Jan 19, 2016 at 8:39 PM, Bill Farner <wf...@apache.org>
>> >>> wrote:
>> >>> >> > What do you mean by GC burden?  What i'm proposing is effectively
>> >>> >> > Map<String, String>.  Even with an extremely forgetful operator
>> (even
>> >>> >> more
>> >>> >> > than Joe!), it would require a huge oversight to put a dent in
>> heap
>> >>> >> usage.
>> >>> >> > I'm sure there are ways we could even expose a useful stat to flag
>> >>> such
>> >>> >> an
>> >>> >> > oversight.
>> >>> >> >
>> >>> >> > On Tue, Jan 19, 2016 at 8:31 PM, Maxim Khutornenko <
>> maxim@apache.org>
>> >>> >> wrote:
>> >>> >> >
>> >>> >> >> Right, that's what I thought. Yes, it sounds interesting. My only
>> >>> >> >> concern is the GC burden of getting rid of hostnames that are
>> >>> obsolete
>> >>> >> >> and no longer exist. Relying on offers to update hostname
>> 'relevance'
>> >>> >> >> may not work as dedicated hosts may be fully packed and not
>> release
>> >>> >> >> any resources for a very long time. Let me explore this idea a
>> bit to
>> >>> >> >> see what it would take to implement.
>> >>> >> >>
>> >>> >> >> On Tue, Jan 19, 2016 at 8:22 PM, Bill Farner <wfarner@apache.org
>> >
>> >>> >> wrote:
>> >>> >> >> > Not a host->attribute mapping (attribute in the mesos sense,
>> >>> anyway).
>> >>> >> >> Rather
>> >>> >> >> > an out-of-band API for marking machines as reserved.  For
>> >>> task->offer
>> >>> >> >> > mapping it's just a matter of another data source.  Does that
>> make
>> >>> >> sense?
>> >>> >> >> >
>> >>> >> >> > On Tuesday, January 19, 2016, Maxim Khutornenko <
>> maxim@apache.org>
>> >>> >> >> wrote:
>> >>> >> >> >
>> >>> >> >> >> >
>> >>> >> >> >> > Can't this just be any old Constraint (not named
>> "dedicated").
>> >>> In
>> >>> >> >> other
>> >>> >> >> >> > words, doesn't this code already deal with non-dedicated
>> >>> >> constraints?:
>> >>> >> >> >> >
>> >>> >> >> >> >
>> >>> >> >> >>
>> >>> >> >>
>> >>> >>
>> >>>
>> https://github.com/apache/aurora/blob/master/src/main/java/org/apache/aurora/scheduler/filter/SchedulingFilterImpl.java#L193-L197
>> >>> >> >> >>
>> >>> >> >> >>
>> >>> >> >> >> Not really. There is a subtle difference here. A regular
>> >>> >> (non-dedicated)
>> >>> >> >> >> constraint does not prevent other tasks from landing on a
>> given
>> >>> >> machine
>> >>> >> >> set
>> >>> >> >> >> whereas dedicated keeps other tasks away by only allowing
>> those
>> >>> >> matching
>> >>> >> >> >> the dedicated attribute. What this proposal targets is
>> allowing
>> >>> >> >> exclusive
>> >>> >> >> >> machine pool matching any job that has this new constraint
>> while
>> >>> >> keeping
>> >>> >> >> >> all other tasks that don't have that attribute away.
>> >>> >> >> >>
>> >>> >> >> >> Following an example from my original post, imagine a GPU
>> machine
>> >>> >> pool.
>> >>> >> >> Any
>> >>> >> >> >> job (from any role) requiring GPU resource would be allowed
>> while
>> >>> all
>> >>> >> >> other
>> >>> >> >> >> jobs that don't have that constraint would be vetoed.
>> >>> >> >> >>
>> >>> >> >> >> Also, regarding dedicated constraints necessitating a slave
>> >>> restart -
>> >>> >> >> i've
>> >>> >> >> >> > pondered moving dedicated machine management to the
>> scheduler
>> >>> for
>> >>> >> >> similar
>> >>> >> >> >> > purposes.  There's not really much forcing that behavior to
>> be
>> >>> >> managed
>> >>> >> >> >> with
>> >>> >> >> >> > a slave attribute.
>> >>> >> >> >>
>> >>> >> >> >>
>> >>> >> >> >> Would you mind giving a few more hints on the mechanics behind
>> >>> this?
>> >>> >> How
>> >>> >> >> >> would scheduler know about dedicated hw without the slave
>> >>> attributes
>> >>> >> >> set?
>> >>> >> >> >> Are you proposing storing hostname->attribute mapping in the
>> >>> >> scheduler
>> >>> >> >> >> store?
>> >>> >> >> >>
>> >>> >> >> >> On Tue, Jan 19, 2016 at 7:53 PM, Bill Farner <
>> wfarner@apache.org
>> >>> >> >> >> <javascript:;>> wrote:
>> >>> >> >> >>
>> >>> >> >> >> > Joe - if you want to pursue this, I suggest you start
>> another
>> >>> >> thread
>> >>> >> >> to
>> >>> >> >> >> > keep this thread's discussion in tact.  I will not be able
>> to
>> >>> lead
>> >>> >> >> this
>> >>> >> >> >> > change, but can certainly shepherd!
>> >>> >> >> >> >
>> >>> >> >> >> > On Tuesday, January 19, 2016, Joe Smith <
>> yasumoto7@gmail.com
>> >>> >> >> >> <javascript:;>> wrote:
>> >>> >> >> >> >
>> >>> >> >> >> > > As an operator, that'd be a relatively simple change in
>> >>> tooling,
>> >>> >> and
>> >>> >> >> >> the
>> >>> >> >> >> > > benefits of not forcing a slave restart would be _huge_.
>> >>> >> >> >> > >
>> >>> >> >> >> > > Keeping the dedicated semantics (but adding non-exclusive)
>> >>> would
>> >>> >> be
>> >>> >> >> >> ideal
>> >>> >> >> >> > > if possible.
>> >>> >> >> >> > >
>> >>> >> >> >> > > > On Jan 19, 2016, at 19:09, Bill Farner <
>> wfarner@apache.org
>> >>> >> >> >> <javascript:;>
>> >>> >> >> >> > > <javascript:;>> wrote:
>> >>> >> >> >> > > >
>> >>> >> >> >> > > > Also, regarding dedicated constraints necessitating a
>> slave
>> >>> >> >> restart -
>> >>> >> >> >> > > i've
>> >>> >> >> >> > > > pondered moving dedicated machine management to the
>> >>> scheduler
>> >>> >> for
>> >>> >> >> >> > similar
>> >>> >> >> >> > > > purposes.  There's not really much forcing that
>> behavior to
>> >>> be
>> >>> >> >> >> managed
>> >>> >> >> >> > > with
>> >>> >> >> >> > > > a slave attribute.
>> >>> >> >> >> > > >
>> >>> >> >> >> > > > On Tue, Jan 19, 2016 at 7:05 PM, John Sirois <
>> >>> >> john@conductant.com
>> >>> >> >> >> <javascript:;>
>> >>> >> >> >> > > <javascript:;>> wrote:
>> >>> >> >> >> > > >
>> >>> >> >> >> > > >> On Tue, Jan 19, 2016 at 7:22 PM, Maxim Khutornenko <
>> >>> >> >> >> maxim@apache.org <javascript:;>
>> >>> >> >> >> > > <javascript:;>>
>> >>> >> >> >> > > >> wrote:
>> >>> >> >> >> > > >>
>> >>> >> >> >> > > >>> Has anyone explored an idea of having a non-exclusive
>> (wrt
>> >>> >> job
>> >>> >> >> >> role)
>> >>> >> >> >> > > >>> dedicated constraint in Aurora before?
>> >>> >> >> >> > > >>
>> >>> >> >> >> > > >>
>> >>> >> >> >> > > >>> We do have a dedicated constraint now but it assumes
>> a 1:1
>> >>> >> >> >> > > >>> relationship between a job role and a slave attribute
>> [1].
>> >>> >> For
>> >>> >> >> >> > > >>> example: a 'www-data/prod/hello' job with a dedicated
>> >>> >> >> constraint of
>> >>> >> >> >> > > >>> 'dedicated': 'www-data/hello' may only be pinned to a
>> >>> >> particular
>> >>> >> >> >> set
>> >>> >> >> >> > > >>> of slaves if all of them have 'www-data/hello'
>> attribute
>> >>> >> set. No
>> >>> >> >> >> > other
>> >>> >> >> >> > > >>> role tasks will be able to land on those slaves unless
>> >>> their
>> >>> >> >> >> > > >>> 'role/name' pair is added into the slave attribute
>> set.
>> >>> >> >> >> > > >>>
>> >>> >> >> >> > > >>> The above is very limiting as it prevents carving out
>> >>> subsets
>> >>> >> >> of a
>> >>> >> >> >> > > >>> shared pool cluster to be used by multiple roles at
>> the
>> >>> same
>> >>> >> >> time.
>> >>> >> >> >> > > >>> Would it make sense to have a free-form dedicated
>> >>> constraint
>> >>> >> not
>> >>> >> >> >> > bound
>> >>> >> >> >> > > >>> to a particular role? Multiple jobs could then use
>> this
>> >>> type
>> >>> >> of
>> >>> >> >> >> > > >>> constraint dynamically without modifying the slave
>> command
>> >>> >> line
>> >>> >> >> >> (and
>> >>> >> >> >> > > >>> requiring slave restart).
>> >>> >> >> >> > > >>
>> >>> >> >> >> > > >> Can't this just be any old Constraint (not named
>> >>> "dedicated").
>> >>> >> >> In
>> >>> >> >> >> > other
>> >>> >> >> >> > > >> words, doesn't this code already deal with
>> non-dedicated
>> >>> >> >> >> constraints?:
>> >>> >> >> >> > > >>
>> >>> >> >> >> > > >>
>> >>> >> >> >> > >
>> >>> >> >> >> >
>> >>> >> >> >>
>> >>> >> >>
>> >>> >>
>> >>>
>> https://github.com/apache/aurora/blob/master/src/main/java/org/apache/aurora/scheduler/filter/SchedulingFilterImpl.java#L193-L197
>> >>> >> >> >> > > >>
>> >>> >> >> >> > > >>
>> >>> >> >> >> > > >>> This could be quite useful for experimenting purposes
>> >>> (e.g.
>> >>> >> >> >> different
>> >>> >> >> >> > > >>> host OS) or to target a different hardware offering
>> (e.g.
>> >>> >> >> GPUs). In
>> >>> >> >> >> > > >>> other words, only those jobs that explicitly opt-in to
>> >>> >> >> participate
>> >>> >> >> >> in
>> >>> >> >> >> > > >>> an experiment or hw offering would be landing on that
>> >>> slave
>> >>> >> set.
>> >>> >> >> >> > > >>>
>> >>> >> >> >> > > >>> Thanks,
>> >>> >> >> >> > > >>> Maxim
>> >>> >> >> >> > > >>>
>> >>> >> >> >> > > >>> [1]-
>> >>> >> >> >> > > >>
>> >>> >> >> >> > >
>> >>> >> >> >> >
>> >>> >> >> >>
>> >>> >> >>
>> >>> >>
>> >>>
>> https://github.com/apache/aurora/blob/eec985d948f02f46637d87cd4d212eb2a70ef8d0/src/main/java/org/apache/aurora/scheduler/configuration/ConfigurationManager.java#L272-L276
>> >>> >> >> >> > > >>
>> >>> >> >> >> > > >>
>> >>> >> >> >> > > >>
>> >>> >> >> >> > > >> --
>> >>> >> >> >> > > >> John Sirois
>> >>> >> >> >> > > >> 303-512-3301
>> >>> >> >> >> > > >>
>> >>> >> >> >> > >
>> >>> >> >> >> >
>> >>> >> >> >>
>> >>> >> >>
>> >>> >>
>> >>>
>>

Re: Non-exclusive dedicated constraint

Posted by Bill Farner <wf...@apache.org>.
What does it mean to have a 'dedicated' host that's free-for-all like that?

On Wed, Mar 9, 2016 at 2:16 PM, Maxim Khutornenko <ma...@apache.org> wrote:

> Reactivating this thread. I like Bill's suggestion to have scheduler
> dedicated constraint management system. It will, however, require a
> substantial effort to get done properly. Would anyone oppose adopting
> Steve's patch in the meantime? The ROI is so high it would be a crime
> NOT to take it :)
>
> On Wed, Jan 20, 2016 at 10:25 AM, Maxim Khutornenko <ma...@apache.org>
> wrote:
> > I should have looked closely, you are right! This indeed addresses
> > both cases: a job with a named dedicated role is still allowed to get
> > though if it's role matches the constraint and everything else
> > (non-exclusive dedicated pool) is addressed with "*".
> >
> > What it does not solve though is the variety of non-exclusive
> > dedicated pools (e.g. GPU, OS, high network bandwidth and etc.). For
> > that we would need something similar to what Bill suggested.
> >
> > On Wed, Jan 20, 2016 at 10:03 AM, Steve Niemitz <sn...@apache.org>
> wrote:
> >> An arbitrary job can't target a fully dedicated role with this patch, it
> >> will still get a "constraint not satisfied: dedicated" error.  The code
> in
> >> the scheduler that matches the constraints does a simple string match,
> so
> >> "*/test" will not match "role1/test" when trying to place the task, it
> will
> >> only match "*/test".
> >>
> >> On Wed, Jan 20, 2016 at 12:24 PM, Maxim Khutornenko <ma...@apache.org>
> >> wrote:
> >>
> >>> Thanks for the info, Steve! Yes, it would accomplish the same goal but
> >>> at the price of removing the exclusive dedicated constraint
> >>> enforcement. With this patch any job could target a fully dedicated
> >>> exclusive pool, which may be undesirable for dedicated pool owners.
> >>>
> >>>
> >>>
> >>> On Wed, Jan 20, 2016 at 7:13 AM, Steve Niemitz <sn...@apache.org>
> >>> wrote:
> >>> > We've been running a trivial patch [1] that does what I believe
> you're
> >>> > talking about for awhile now.  It allows a * for the role name,
> basically
> >>> > allowing any role to match the constraint, so our constraints look
> like
> >>> > "*/secure"
> >>> >
> >>> > Our use case is we have a "secure" cluster of machines that is
> >>> constrained
> >>> > on what can run on it (via an external audit process) that multiple
> roles
> >>> > run on.
> >>> >
> >>> > I believe I had talked to Bill about this a few months ago, but I
> don't
> >>> > remember where it ended up.
> >>> >
> >>> > [1]
> >>> >
> >>>
> https://github.com/tellapart/aurora/commit/76f978c76cc1377e19e602f7e0d050f7ce353562
> >>> >
> >>> > On Tue, Jan 19, 2016 at 11:48 PM, Maxim Khutornenko <
> maxim@apache.org>
> >>> > wrote:
> >>> >
> >>> >> Oh, I didn't mean the memory GC pressure in the pure sense, rather a
> >>> >> logical garbage of orphaned hosts that never leave the scheduler.
> It's
> >>> >> not something to be concerned about from the performance standpoint.
> >>> >> It's, however, something operators need to be aware of when a host
> >>> >> from a dedicated pool gets dropped or replaced.
> >>> >>
> >>> >> On Tue, Jan 19, 2016 at 8:39 PM, Bill Farner <wf...@apache.org>
> >>> wrote:
> >>> >> > What do you mean by GC burden?  What i'm proposing is effectively
> >>> >> > Map<String, String>.  Even with an extremely forgetful operator
> (even
> >>> >> more
> >>> >> > than Joe!), it would require a huge oversight to put a dent in
> heap
> >>> >> usage.
> >>> >> > I'm sure there are ways we could even expose a useful stat to flag
> >>> such
> >>> >> an
> >>> >> > oversight.
> >>> >> >
> >>> >> > On Tue, Jan 19, 2016 at 8:31 PM, Maxim Khutornenko <
> maxim@apache.org>
> >>> >> wrote:
> >>> >> >
> >>> >> >> Right, that's what I thought. Yes, it sounds interesting. My only
> >>> >> >> concern is the GC burden of getting rid of hostnames that are
> >>> obsolete
> >>> >> >> and no longer exist. Relying on offers to update hostname
> 'relevance'
> >>> >> >> may not work as dedicated hosts may be fully packed and not
> release
> >>> >> >> any resources for a very long time. Let me explore this idea a
> bit to
> >>> >> >> see what it would take to implement.
> >>> >> >>
> >>> >> >> On Tue, Jan 19, 2016 at 8:22 PM, Bill Farner <wfarner@apache.org
> >
> >>> >> wrote:
> >>> >> >> > Not a host->attribute mapping (attribute in the mesos sense,
> >>> anyway).
> >>> >> >> Rather
> >>> >> >> > an out-of-band API for marking machines as reserved.  For
> >>> task->offer
> >>> >> >> > mapping it's just a matter of another data source.  Does that
> make
> >>> >> sense?
> >>> >> >> >
> >>> >> >> > On Tuesday, January 19, 2016, Maxim Khutornenko <
> maxim@apache.org>
> >>> >> >> wrote:
> >>> >> >> >
> >>> >> >> >> >
> >>> >> >> >> > Can't this just be any old Constraint (not named
> "dedicated").
> >>> In
> >>> >> >> other
> >>> >> >> >> > words, doesn't this code already deal with non-dedicated
> >>> >> constraints?:
> >>> >> >> >> >
> >>> >> >> >> >
> >>> >> >> >>
> >>> >> >>
> >>> >>
> >>>
> https://github.com/apache/aurora/blob/master/src/main/java/org/apache/aurora/scheduler/filter/SchedulingFilterImpl.java#L193-L197
> >>> >> >> >>
> >>> >> >> >>
> >>> >> >> >> Not really. There is a subtle difference here. A regular
> >>> >> (non-dedicated)
> >>> >> >> >> constraint does not prevent other tasks from landing on a
> given
> >>> >> machine
> >>> >> >> set
> >>> >> >> >> whereas dedicated keeps other tasks away by only allowing
> those
> >>> >> matching
> >>> >> >> >> the dedicated attribute. What this proposal targets is
> allowing
> >>> >> >> exclusive
> >>> >> >> >> machine pool matching any job that has this new constraint
> while
> >>> >> keeping
> >>> >> >> >> all other tasks that don't have that attribute away.
> >>> >> >> >>
> >>> >> >> >> Following an example from my original post, imagine a GPU
> machine
> >>> >> pool.
> >>> >> >> Any
> >>> >> >> >> job (from any role) requiring GPU resource would be allowed
> while
> >>> all
> >>> >> >> other
> >>> >> >> >> jobs that don't have that constraint would be vetoed.
> >>> >> >> >>
> >>> >> >> >> Also, regarding dedicated constraints necessitating a slave
> >>> restart -
> >>> >> >> i've
> >>> >> >> >> > pondered moving dedicated machine management to the
> scheduler
> >>> for
> >>> >> >> similar
> >>> >> >> >> > purposes.  There's not really much forcing that behavior to
> be
> >>> >> managed
> >>> >> >> >> with
> >>> >> >> >> > a slave attribute.
> >>> >> >> >>
> >>> >> >> >>
> >>> >> >> >> Would you mind giving a few more hints on the mechanics behind
> >>> this?
> >>> >> How
> >>> >> >> >> would scheduler know about dedicated hw without the slave
> >>> attributes
> >>> >> >> set?
> >>> >> >> >> Are you proposing storing hostname->attribute mapping in the
> >>> >> scheduler
> >>> >> >> >> store?
> >>> >> >> >>
> >>> >> >> >> On Tue, Jan 19, 2016 at 7:53 PM, Bill Farner <
> wfarner@apache.org
> >>> >> >> >> <javascript:;>> wrote:
> >>> >> >> >>
> >>> >> >> >> > Joe - if you want to pursue this, I suggest you start
> another
> >>> >> thread
> >>> >> >> to
> >>> >> >> >> > keep this thread's discussion in tact.  I will not be able
> to
> >>> lead
> >>> >> >> this
> >>> >> >> >> > change, but can certainly shepherd!
> >>> >> >> >> >
> >>> >> >> >> > On Tuesday, January 19, 2016, Joe Smith <
> yasumoto7@gmail.com
> >>> >> >> >> <javascript:;>> wrote:
> >>> >> >> >> >
> >>> >> >> >> > > As an operator, that'd be a relatively simple change in
> >>> tooling,
> >>> >> and
> >>> >> >> >> the
> >>> >> >> >> > > benefits of not forcing a slave restart would be _huge_.
> >>> >> >> >> > >
> >>> >> >> >> > > Keeping the dedicated semantics (but adding non-exclusive)
> >>> would
> >>> >> be
> >>> >> >> >> ideal
> >>> >> >> >> > > if possible.
> >>> >> >> >> > >
> >>> >> >> >> > > > On Jan 19, 2016, at 19:09, Bill Farner <
> wfarner@apache.org
> >>> >> >> >> <javascript:;>
> >>> >> >> >> > > <javascript:;>> wrote:
> >>> >> >> >> > > >
> >>> >> >> >> > > > Also, regarding dedicated constraints necessitating a
> slave
> >>> >> >> restart -
> >>> >> >> >> > > i've
> >>> >> >> >> > > > pondered moving dedicated machine management to the
> >>> scheduler
> >>> >> for
> >>> >> >> >> > similar
> >>> >> >> >> > > > purposes.  There's not really much forcing that
> behavior to
> >>> be
> >>> >> >> >> managed
> >>> >> >> >> > > with
> >>> >> >> >> > > > a slave attribute.
> >>> >> >> >> > > >
> >>> >> >> >> > > > On Tue, Jan 19, 2016 at 7:05 PM, John Sirois <
> >>> >> john@conductant.com
> >>> >> >> >> <javascript:;>
> >>> >> >> >> > > <javascript:;>> wrote:
> >>> >> >> >> > > >
> >>> >> >> >> > > >> On Tue, Jan 19, 2016 at 7:22 PM, Maxim Khutornenko <
> >>> >> >> >> maxim@apache.org <javascript:;>
> >>> >> >> >> > > <javascript:;>>
> >>> >> >> >> > > >> wrote:
> >>> >> >> >> > > >>
> >>> >> >> >> > > >>> Has anyone explored an idea of having a non-exclusive
> (wrt
> >>> >> job
> >>> >> >> >> role)
> >>> >> >> >> > > >>> dedicated constraint in Aurora before?
> >>> >> >> >> > > >>
> >>> >> >> >> > > >>
> >>> >> >> >> > > >>> We do have a dedicated constraint now but it assumes
> a 1:1
> >>> >> >> >> > > >>> relationship between a job role and a slave attribute
> [1].
> >>> >> For
> >>> >> >> >> > > >>> example: a 'www-data/prod/hello' job with a dedicated
> >>> >> >> constraint of
> >>> >> >> >> > > >>> 'dedicated': 'www-data/hello' may only be pinned to a
> >>> >> particular
> >>> >> >> >> set
> >>> >> >> >> > > >>> of slaves if all of them have 'www-data/hello'
> attribute
> >>> >> set. No
> >>> >> >> >> > other
> >>> >> >> >> > > >>> role tasks will be able to land on those slaves unless
> >>> their
> >>> >> >> >> > > >>> 'role/name' pair is added into the slave attribute
> set.
> >>> >> >> >> > > >>>
> >>> >> >> >> > > >>> The above is very limiting as it prevents carving out
> >>> subsets
> >>> >> >> of a
> >>> >> >> >> > > >>> shared pool cluster to be used by multiple roles at
> the
> >>> same
> >>> >> >> time.
> >>> >> >> >> > > >>> Would it make sense to have a free-form dedicated
> >>> constraint
> >>> >> not
> >>> >> >> >> > bound
> >>> >> >> >> > > >>> to a particular role? Multiple jobs could then use
> this
> >>> type
> >>> >> of
> >>> >> >> >> > > >>> constraint dynamically without modifying the slave
> command
> >>> >> line
> >>> >> >> >> (and
> >>> >> >> >> > > >>> requiring slave restart).
> >>> >> >> >> > > >>
> >>> >> >> >> > > >> Can't this just be any old Constraint (not named
> >>> "dedicated").
> >>> >> >> In
> >>> >> >> >> > other
> >>> >> >> >> > > >> words, doesn't this code already deal with
> non-dedicated
> >>> >> >> >> constraints?:
> >>> >> >> >> > > >>
> >>> >> >> >> > > >>
> >>> >> >> >> > >
> >>> >> >> >> >
> >>> >> >> >>
> >>> >> >>
> >>> >>
> >>>
> https://github.com/apache/aurora/blob/master/src/main/java/org/apache/aurora/scheduler/filter/SchedulingFilterImpl.java#L193-L197
> >>> >> >> >> > > >>
> >>> >> >> >> > > >>
> >>> >> >> >> > > >>> This could be quite useful for experimenting purposes
> >>> (e.g.
> >>> >> >> >> different
> >>> >> >> >> > > >>> host OS) or to target a different hardware offering
> (e.g.
> >>> >> >> GPUs). In
> >>> >> >> >> > > >>> other words, only those jobs that explicitly opt-in to
> >>> >> >> participate
> >>> >> >> >> in
> >>> >> >> >> > > >>> an experiment or hw offering would be landing on that
> >>> slave
> >>> >> set.
> >>> >> >> >> > > >>>
> >>> >> >> >> > > >>> Thanks,
> >>> >> >> >> > > >>> Maxim
> >>> >> >> >> > > >>>
> >>> >> >> >> > > >>> [1]-
> >>> >> >> >> > > >>
> >>> >> >> >> > >
> >>> >> >> >> >
> >>> >> >> >>
> >>> >> >>
> >>> >>
> >>>
> https://github.com/apache/aurora/blob/eec985d948f02f46637d87cd4d212eb2a70ef8d0/src/main/java/org/apache/aurora/scheduler/configuration/ConfigurationManager.java#L272-L276
> >>> >> >> >> > > >>
> >>> >> >> >> > > >>
> >>> >> >> >> > > >>
> >>> >> >> >> > > >> --
> >>> >> >> >> > > >> John Sirois
> >>> >> >> >> > > >> 303-512-3301
> >>> >> >> >> > > >>
> >>> >> >> >> > >
> >>> >> >> >> >
> >>> >> >> >>
> >>> >> >>
> >>> >>
> >>>
>

Re: Non-exclusive dedicated constraint

Posted by Maxim Khutornenko <ma...@apache.org>.
Reactivating this thread. I like Bill's suggestion to have scheduler
dedicated constraint management system. It will, however, require a
substantial effort to get done properly. Would anyone oppose adopting
Steve's patch in the meantime? The ROI is so high it would be a crime
NOT to take it :)

On Wed, Jan 20, 2016 at 10:25 AM, Maxim Khutornenko <ma...@apache.org> wrote:
> I should have looked closely, you are right! This indeed addresses
> both cases: a job with a named dedicated role is still allowed to get
> though if it's role matches the constraint and everything else
> (non-exclusive dedicated pool) is addressed with "*".
>
> What it does not solve though is the variety of non-exclusive
> dedicated pools (e.g. GPU, OS, high network bandwidth and etc.). For
> that we would need something similar to what Bill suggested.
>
> On Wed, Jan 20, 2016 at 10:03 AM, Steve Niemitz <sn...@apache.org> wrote:
>> An arbitrary job can't target a fully dedicated role with this patch, it
>> will still get a "constraint not satisfied: dedicated" error.  The code in
>> the scheduler that matches the constraints does a simple string match, so
>> "*/test" will not match "role1/test" when trying to place the task, it will
>> only match "*/test".
>>
>> On Wed, Jan 20, 2016 at 12:24 PM, Maxim Khutornenko <ma...@apache.org>
>> wrote:
>>
>>> Thanks for the info, Steve! Yes, it would accomplish the same goal but
>>> at the price of removing the exclusive dedicated constraint
>>> enforcement. With this patch any job could target a fully dedicated
>>> exclusive pool, which may be undesirable for dedicated pool owners.
>>>
>>>
>>>
>>> On Wed, Jan 20, 2016 at 7:13 AM, Steve Niemitz <sn...@apache.org>
>>> wrote:
>>> > We've been running a trivial patch [1] that does what I believe you're
>>> > talking about for awhile now.  It allows a * for the role name, basically
>>> > allowing any role to match the constraint, so our constraints look like
>>> > "*/secure"
>>> >
>>> > Our use case is we have a "secure" cluster of machines that is
>>> constrained
>>> > on what can run on it (via an external audit process) that multiple roles
>>> > run on.
>>> >
>>> > I believe I had talked to Bill about this a few months ago, but I don't
>>> > remember where it ended up.
>>> >
>>> > [1]
>>> >
>>> https://github.com/tellapart/aurora/commit/76f978c76cc1377e19e602f7e0d050f7ce353562
>>> >
>>> > On Tue, Jan 19, 2016 at 11:48 PM, Maxim Khutornenko <ma...@apache.org>
>>> > wrote:
>>> >
>>> >> Oh, I didn't mean the memory GC pressure in the pure sense, rather a
>>> >> logical garbage of orphaned hosts that never leave the scheduler. It's
>>> >> not something to be concerned about from the performance standpoint.
>>> >> It's, however, something operators need to be aware of when a host
>>> >> from a dedicated pool gets dropped or replaced.
>>> >>
>>> >> On Tue, Jan 19, 2016 at 8:39 PM, Bill Farner <wf...@apache.org>
>>> wrote:
>>> >> > What do you mean by GC burden?  What i'm proposing is effectively
>>> >> > Map<String, String>.  Even with an extremely forgetful operator (even
>>> >> more
>>> >> > than Joe!), it would require a huge oversight to put a dent in heap
>>> >> usage.
>>> >> > I'm sure there are ways we could even expose a useful stat to flag
>>> such
>>> >> an
>>> >> > oversight.
>>> >> >
>>> >> > On Tue, Jan 19, 2016 at 8:31 PM, Maxim Khutornenko <ma...@apache.org>
>>> >> wrote:
>>> >> >
>>> >> >> Right, that's what I thought. Yes, it sounds interesting. My only
>>> >> >> concern is the GC burden of getting rid of hostnames that are
>>> obsolete
>>> >> >> and no longer exist. Relying on offers to update hostname 'relevance'
>>> >> >> may not work as dedicated hosts may be fully packed and not release
>>> >> >> any resources for a very long time. Let me explore this idea a bit to
>>> >> >> see what it would take to implement.
>>> >> >>
>>> >> >> On Tue, Jan 19, 2016 at 8:22 PM, Bill Farner <wf...@apache.org>
>>> >> wrote:
>>> >> >> > Not a host->attribute mapping (attribute in the mesos sense,
>>> anyway).
>>> >> >> Rather
>>> >> >> > an out-of-band API for marking machines as reserved.  For
>>> task->offer
>>> >> >> > mapping it's just a matter of another data source.  Does that make
>>> >> sense?
>>> >> >> >
>>> >> >> > On Tuesday, January 19, 2016, Maxim Khutornenko <ma...@apache.org>
>>> >> >> wrote:
>>> >> >> >
>>> >> >> >> >
>>> >> >> >> > Can't this just be any old Constraint (not named "dedicated").
>>> In
>>> >> >> other
>>> >> >> >> > words, doesn't this code already deal with non-dedicated
>>> >> constraints?:
>>> >> >> >> >
>>> >> >> >> >
>>> >> >> >>
>>> >> >>
>>> >>
>>> https://github.com/apache/aurora/blob/master/src/main/java/org/apache/aurora/scheduler/filter/SchedulingFilterImpl.java#L193-L197
>>> >> >> >>
>>> >> >> >>
>>> >> >> >> Not really. There is a subtle difference here. A regular
>>> >> (non-dedicated)
>>> >> >> >> constraint does not prevent other tasks from landing on a given
>>> >> machine
>>> >> >> set
>>> >> >> >> whereas dedicated keeps other tasks away by only allowing those
>>> >> matching
>>> >> >> >> the dedicated attribute. What this proposal targets is allowing
>>> >> >> exclusive
>>> >> >> >> machine pool matching any job that has this new constraint while
>>> >> keeping
>>> >> >> >> all other tasks that don't have that attribute away.
>>> >> >> >>
>>> >> >> >> Following an example from my original post, imagine a GPU machine
>>> >> pool.
>>> >> >> Any
>>> >> >> >> job (from any role) requiring GPU resource would be allowed while
>>> all
>>> >> >> other
>>> >> >> >> jobs that don't have that constraint would be vetoed.
>>> >> >> >>
>>> >> >> >> Also, regarding dedicated constraints necessitating a slave
>>> restart -
>>> >> >> i've
>>> >> >> >> > pondered moving dedicated machine management to the scheduler
>>> for
>>> >> >> similar
>>> >> >> >> > purposes.  There's not really much forcing that behavior to be
>>> >> managed
>>> >> >> >> with
>>> >> >> >> > a slave attribute.
>>> >> >> >>
>>> >> >> >>
>>> >> >> >> Would you mind giving a few more hints on the mechanics behind
>>> this?
>>> >> How
>>> >> >> >> would scheduler know about dedicated hw without the slave
>>> attributes
>>> >> >> set?
>>> >> >> >> Are you proposing storing hostname->attribute mapping in the
>>> >> scheduler
>>> >> >> >> store?
>>> >> >> >>
>>> >> >> >> On Tue, Jan 19, 2016 at 7:53 PM, Bill Farner <wfarner@apache.org
>>> >> >> >> <javascript:;>> wrote:
>>> >> >> >>
>>> >> >> >> > Joe - if you want to pursue this, I suggest you start another
>>> >> thread
>>> >> >> to
>>> >> >> >> > keep this thread's discussion in tact.  I will not be able to
>>> lead
>>> >> >> this
>>> >> >> >> > change, but can certainly shepherd!
>>> >> >> >> >
>>> >> >> >> > On Tuesday, January 19, 2016, Joe Smith <yasumoto7@gmail.com
>>> >> >> >> <javascript:;>> wrote:
>>> >> >> >> >
>>> >> >> >> > > As an operator, that'd be a relatively simple change in
>>> tooling,
>>> >> and
>>> >> >> >> the
>>> >> >> >> > > benefits of not forcing a slave restart would be _huge_.
>>> >> >> >> > >
>>> >> >> >> > > Keeping the dedicated semantics (but adding non-exclusive)
>>> would
>>> >> be
>>> >> >> >> ideal
>>> >> >> >> > > if possible.
>>> >> >> >> > >
>>> >> >> >> > > > On Jan 19, 2016, at 19:09, Bill Farner <wfarner@apache.org
>>> >> >> >> <javascript:;>
>>> >> >> >> > > <javascript:;>> wrote:
>>> >> >> >> > > >
>>> >> >> >> > > > Also, regarding dedicated constraints necessitating a slave
>>> >> >> restart -
>>> >> >> >> > > i've
>>> >> >> >> > > > pondered moving dedicated machine management to the
>>> scheduler
>>> >> for
>>> >> >> >> > similar
>>> >> >> >> > > > purposes.  There's not really much forcing that behavior to
>>> be
>>> >> >> >> managed
>>> >> >> >> > > with
>>> >> >> >> > > > a slave attribute.
>>> >> >> >> > > >
>>> >> >> >> > > > On Tue, Jan 19, 2016 at 7:05 PM, John Sirois <
>>> >> john@conductant.com
>>> >> >> >> <javascript:;>
>>> >> >> >> > > <javascript:;>> wrote:
>>> >> >> >> > > >
>>> >> >> >> > > >> On Tue, Jan 19, 2016 at 7:22 PM, Maxim Khutornenko <
>>> >> >> >> maxim@apache.org <javascript:;>
>>> >> >> >> > > <javascript:;>>
>>> >> >> >> > > >> wrote:
>>> >> >> >> > > >>
>>> >> >> >> > > >>> Has anyone explored an idea of having a non-exclusive (wrt
>>> >> job
>>> >> >> >> role)
>>> >> >> >> > > >>> dedicated constraint in Aurora before?
>>> >> >> >> > > >>
>>> >> >> >> > > >>
>>> >> >> >> > > >>> We do have a dedicated constraint now but it assumes a 1:1
>>> >> >> >> > > >>> relationship between a job role and a slave attribute [1].
>>> >> For
>>> >> >> >> > > >>> example: a 'www-data/prod/hello' job with a dedicated
>>> >> >> constraint of
>>> >> >> >> > > >>> 'dedicated': 'www-data/hello' may only be pinned to a
>>> >> particular
>>> >> >> >> set
>>> >> >> >> > > >>> of slaves if all of them have 'www-data/hello' attribute
>>> >> set. No
>>> >> >> >> > other
>>> >> >> >> > > >>> role tasks will be able to land on those slaves unless
>>> their
>>> >> >> >> > > >>> 'role/name' pair is added into the slave attribute set.
>>> >> >> >> > > >>>
>>> >> >> >> > > >>> The above is very limiting as it prevents carving out
>>> subsets
>>> >> >> of a
>>> >> >> >> > > >>> shared pool cluster to be used by multiple roles at the
>>> same
>>> >> >> time.
>>> >> >> >> > > >>> Would it make sense to have a free-form dedicated
>>> constraint
>>> >> not
>>> >> >> >> > bound
>>> >> >> >> > > >>> to a particular role? Multiple jobs could then use this
>>> type
>>> >> of
>>> >> >> >> > > >>> constraint dynamically without modifying the slave command
>>> >> line
>>> >> >> >> (and
>>> >> >> >> > > >>> requiring slave restart).
>>> >> >> >> > > >>
>>> >> >> >> > > >> Can't this just be any old Constraint (not named
>>> "dedicated").
>>> >> >> In
>>> >> >> >> > other
>>> >> >> >> > > >> words, doesn't this code already deal with non-dedicated
>>> >> >> >> constraints?:
>>> >> >> >> > > >>
>>> >> >> >> > > >>
>>> >> >> >> > >
>>> >> >> >> >
>>> >> >> >>
>>> >> >>
>>> >>
>>> https://github.com/apache/aurora/blob/master/src/main/java/org/apache/aurora/scheduler/filter/SchedulingFilterImpl.java#L193-L197
>>> >> >> >> > > >>
>>> >> >> >> > > >>
>>> >> >> >> > > >>> This could be quite useful for experimenting purposes
>>> (e.g.
>>> >> >> >> different
>>> >> >> >> > > >>> host OS) or to target a different hardware offering (e.g.
>>> >> >> GPUs). In
>>> >> >> >> > > >>> other words, only those jobs that explicitly opt-in to
>>> >> >> participate
>>> >> >> >> in
>>> >> >> >> > > >>> an experiment or hw offering would be landing on that
>>> slave
>>> >> set.
>>> >> >> >> > > >>>
>>> >> >> >> > > >>> Thanks,
>>> >> >> >> > > >>> Maxim
>>> >> >> >> > > >>>
>>> >> >> >> > > >>> [1]-
>>> >> >> >> > > >>
>>> >> >> >> > >
>>> >> >> >> >
>>> >> >> >>
>>> >> >>
>>> >>
>>> https://github.com/apache/aurora/blob/eec985d948f02f46637d87cd4d212eb2a70ef8d0/src/main/java/org/apache/aurora/scheduler/configuration/ConfigurationManager.java#L272-L276
>>> >> >> >> > > >>
>>> >> >> >> > > >>
>>> >> >> >> > > >>
>>> >> >> >> > > >> --
>>> >> >> >> > > >> John Sirois
>>> >> >> >> > > >> 303-512-3301
>>> >> >> >> > > >>
>>> >> >> >> > >
>>> >> >> >> >
>>> >> >> >>
>>> >> >>
>>> >>
>>>

Re: Non-exclusive dedicated constraint

Posted by Maxim Khutornenko <ma...@apache.org>.
I should have looked closely, you are right! This indeed addresses
both cases: a job with a named dedicated role is still allowed to get
though if it's role matches the constraint and everything else
(non-exclusive dedicated pool) is addressed with "*".

What it does not solve though is the variety of non-exclusive
dedicated pools (e.g. GPU, OS, high network bandwidth and etc.). For
that we would need something similar to what Bill suggested.

On Wed, Jan 20, 2016 at 10:03 AM, Steve Niemitz <sn...@apache.org> wrote:
> An arbitrary job can't target a fully dedicated role with this patch, it
> will still get a "constraint not satisfied: dedicated" error.  The code in
> the scheduler that matches the constraints does a simple string match, so
> "*/test" will not match "role1/test" when trying to place the task, it will
> only match "*/test".
>
> On Wed, Jan 20, 2016 at 12:24 PM, Maxim Khutornenko <ma...@apache.org>
> wrote:
>
>> Thanks for the info, Steve! Yes, it would accomplish the same goal but
>> at the price of removing the exclusive dedicated constraint
>> enforcement. With this patch any job could target a fully dedicated
>> exclusive pool, which may be undesirable for dedicated pool owners.
>>
>>
>>
>> On Wed, Jan 20, 2016 at 7:13 AM, Steve Niemitz <sn...@apache.org>
>> wrote:
>> > We've been running a trivial patch [1] that does what I believe you're
>> > talking about for awhile now.  It allows a * for the role name, basically
>> > allowing any role to match the constraint, so our constraints look like
>> > "*/secure"
>> >
>> > Our use case is we have a "secure" cluster of machines that is
>> constrained
>> > on what can run on it (via an external audit process) that multiple roles
>> > run on.
>> >
>> > I believe I had talked to Bill about this a few months ago, but I don't
>> > remember where it ended up.
>> >
>> > [1]
>> >
>> https://github.com/tellapart/aurora/commit/76f978c76cc1377e19e602f7e0d050f7ce353562
>> >
>> > On Tue, Jan 19, 2016 at 11:48 PM, Maxim Khutornenko <ma...@apache.org>
>> > wrote:
>> >
>> >> Oh, I didn't mean the memory GC pressure in the pure sense, rather a
>> >> logical garbage of orphaned hosts that never leave the scheduler. It's
>> >> not something to be concerned about from the performance standpoint.
>> >> It's, however, something operators need to be aware of when a host
>> >> from a dedicated pool gets dropped or replaced.
>> >>
>> >> On Tue, Jan 19, 2016 at 8:39 PM, Bill Farner <wf...@apache.org>
>> wrote:
>> >> > What do you mean by GC burden?  What i'm proposing is effectively
>> >> > Map<String, String>.  Even with an extremely forgetful operator (even
>> >> more
>> >> > than Joe!), it would require a huge oversight to put a dent in heap
>> >> usage.
>> >> > I'm sure there are ways we could even expose a useful stat to flag
>> such
>> >> an
>> >> > oversight.
>> >> >
>> >> > On Tue, Jan 19, 2016 at 8:31 PM, Maxim Khutornenko <ma...@apache.org>
>> >> wrote:
>> >> >
>> >> >> Right, that's what I thought. Yes, it sounds interesting. My only
>> >> >> concern is the GC burden of getting rid of hostnames that are
>> obsolete
>> >> >> and no longer exist. Relying on offers to update hostname 'relevance'
>> >> >> may not work as dedicated hosts may be fully packed and not release
>> >> >> any resources for a very long time. Let me explore this idea a bit to
>> >> >> see what it would take to implement.
>> >> >>
>> >> >> On Tue, Jan 19, 2016 at 8:22 PM, Bill Farner <wf...@apache.org>
>> >> wrote:
>> >> >> > Not a host->attribute mapping (attribute in the mesos sense,
>> anyway).
>> >> >> Rather
>> >> >> > an out-of-band API for marking machines as reserved.  For
>> task->offer
>> >> >> > mapping it's just a matter of another data source.  Does that make
>> >> sense?
>> >> >> >
>> >> >> > On Tuesday, January 19, 2016, Maxim Khutornenko <ma...@apache.org>
>> >> >> wrote:
>> >> >> >
>> >> >> >> >
>> >> >> >> > Can't this just be any old Constraint (not named "dedicated").
>> In
>> >> >> other
>> >> >> >> > words, doesn't this code already deal with non-dedicated
>> >> constraints?:
>> >> >> >> >
>> >> >> >> >
>> >> >> >>
>> >> >>
>> >>
>> https://github.com/apache/aurora/blob/master/src/main/java/org/apache/aurora/scheduler/filter/SchedulingFilterImpl.java#L193-L197
>> >> >> >>
>> >> >> >>
>> >> >> >> Not really. There is a subtle difference here. A regular
>> >> (non-dedicated)
>> >> >> >> constraint does not prevent other tasks from landing on a given
>> >> machine
>> >> >> set
>> >> >> >> whereas dedicated keeps other tasks away by only allowing those
>> >> matching
>> >> >> >> the dedicated attribute. What this proposal targets is allowing
>> >> >> exclusive
>> >> >> >> machine pool matching any job that has this new constraint while
>> >> keeping
>> >> >> >> all other tasks that don't have that attribute away.
>> >> >> >>
>> >> >> >> Following an example from my original post, imagine a GPU machine
>> >> pool.
>> >> >> Any
>> >> >> >> job (from any role) requiring GPU resource would be allowed while
>> all
>> >> >> other
>> >> >> >> jobs that don't have that constraint would be vetoed.
>> >> >> >>
>> >> >> >> Also, regarding dedicated constraints necessitating a slave
>> restart -
>> >> >> i've
>> >> >> >> > pondered moving dedicated machine management to the scheduler
>> for
>> >> >> similar
>> >> >> >> > purposes.  There's not really much forcing that behavior to be
>> >> managed
>> >> >> >> with
>> >> >> >> > a slave attribute.
>> >> >> >>
>> >> >> >>
>> >> >> >> Would you mind giving a few more hints on the mechanics behind
>> this?
>> >> How
>> >> >> >> would scheduler know about dedicated hw without the slave
>> attributes
>> >> >> set?
>> >> >> >> Are you proposing storing hostname->attribute mapping in the
>> >> scheduler
>> >> >> >> store?
>> >> >> >>
>> >> >> >> On Tue, Jan 19, 2016 at 7:53 PM, Bill Farner <wfarner@apache.org
>> >> >> >> <javascript:;>> wrote:
>> >> >> >>
>> >> >> >> > Joe - if you want to pursue this, I suggest you start another
>> >> thread
>> >> >> to
>> >> >> >> > keep this thread's discussion in tact.  I will not be able to
>> lead
>> >> >> this
>> >> >> >> > change, but can certainly shepherd!
>> >> >> >> >
>> >> >> >> > On Tuesday, January 19, 2016, Joe Smith <yasumoto7@gmail.com
>> >> >> >> <javascript:;>> wrote:
>> >> >> >> >
>> >> >> >> > > As an operator, that'd be a relatively simple change in
>> tooling,
>> >> and
>> >> >> >> the
>> >> >> >> > > benefits of not forcing a slave restart would be _huge_.
>> >> >> >> > >
>> >> >> >> > > Keeping the dedicated semantics (but adding non-exclusive)
>> would
>> >> be
>> >> >> >> ideal
>> >> >> >> > > if possible.
>> >> >> >> > >
>> >> >> >> > > > On Jan 19, 2016, at 19:09, Bill Farner <wfarner@apache.org
>> >> >> >> <javascript:;>
>> >> >> >> > > <javascript:;>> wrote:
>> >> >> >> > > >
>> >> >> >> > > > Also, regarding dedicated constraints necessitating a slave
>> >> >> restart -
>> >> >> >> > > i've
>> >> >> >> > > > pondered moving dedicated machine management to the
>> scheduler
>> >> for
>> >> >> >> > similar
>> >> >> >> > > > purposes.  There's not really much forcing that behavior to
>> be
>> >> >> >> managed
>> >> >> >> > > with
>> >> >> >> > > > a slave attribute.
>> >> >> >> > > >
>> >> >> >> > > > On Tue, Jan 19, 2016 at 7:05 PM, John Sirois <
>> >> john@conductant.com
>> >> >> >> <javascript:;>
>> >> >> >> > > <javascript:;>> wrote:
>> >> >> >> > > >
>> >> >> >> > > >> On Tue, Jan 19, 2016 at 7:22 PM, Maxim Khutornenko <
>> >> >> >> maxim@apache.org <javascript:;>
>> >> >> >> > > <javascript:;>>
>> >> >> >> > > >> wrote:
>> >> >> >> > > >>
>> >> >> >> > > >>> Has anyone explored an idea of having a non-exclusive (wrt
>> >> job
>> >> >> >> role)
>> >> >> >> > > >>> dedicated constraint in Aurora before?
>> >> >> >> > > >>
>> >> >> >> > > >>
>> >> >> >> > > >>> We do have a dedicated constraint now but it assumes a 1:1
>> >> >> >> > > >>> relationship between a job role and a slave attribute [1].
>> >> For
>> >> >> >> > > >>> example: a 'www-data/prod/hello' job with a dedicated
>> >> >> constraint of
>> >> >> >> > > >>> 'dedicated': 'www-data/hello' may only be pinned to a
>> >> particular
>> >> >> >> set
>> >> >> >> > > >>> of slaves if all of them have 'www-data/hello' attribute
>> >> set. No
>> >> >> >> > other
>> >> >> >> > > >>> role tasks will be able to land on those slaves unless
>> their
>> >> >> >> > > >>> 'role/name' pair is added into the slave attribute set.
>> >> >> >> > > >>>
>> >> >> >> > > >>> The above is very limiting as it prevents carving out
>> subsets
>> >> >> of a
>> >> >> >> > > >>> shared pool cluster to be used by multiple roles at the
>> same
>> >> >> time.
>> >> >> >> > > >>> Would it make sense to have a free-form dedicated
>> constraint
>> >> not
>> >> >> >> > bound
>> >> >> >> > > >>> to a particular role? Multiple jobs could then use this
>> type
>> >> of
>> >> >> >> > > >>> constraint dynamically without modifying the slave command
>> >> line
>> >> >> >> (and
>> >> >> >> > > >>> requiring slave restart).
>> >> >> >> > > >>
>> >> >> >> > > >> Can't this just be any old Constraint (not named
>> "dedicated").
>> >> >> In
>> >> >> >> > other
>> >> >> >> > > >> words, doesn't this code already deal with non-dedicated
>> >> >> >> constraints?:
>> >> >> >> > > >>
>> >> >> >> > > >>
>> >> >> >> > >
>> >> >> >> >
>> >> >> >>
>> >> >>
>> >>
>> https://github.com/apache/aurora/blob/master/src/main/java/org/apache/aurora/scheduler/filter/SchedulingFilterImpl.java#L193-L197
>> >> >> >> > > >>
>> >> >> >> > > >>
>> >> >> >> > > >>> This could be quite useful for experimenting purposes
>> (e.g.
>> >> >> >> different
>> >> >> >> > > >>> host OS) or to target a different hardware offering (e.g.
>> >> >> GPUs). In
>> >> >> >> > > >>> other words, only those jobs that explicitly opt-in to
>> >> >> participate
>> >> >> >> in
>> >> >> >> > > >>> an experiment or hw offering would be landing on that
>> slave
>> >> set.
>> >> >> >> > > >>>
>> >> >> >> > > >>> Thanks,
>> >> >> >> > > >>> Maxim
>> >> >> >> > > >>>
>> >> >> >> > > >>> [1]-
>> >> >> >> > > >>
>> >> >> >> > >
>> >> >> >> >
>> >> >> >>
>> >> >>
>> >>
>> https://github.com/apache/aurora/blob/eec985d948f02f46637d87cd4d212eb2a70ef8d0/src/main/java/org/apache/aurora/scheduler/configuration/ConfigurationManager.java#L272-L276
>> >> >> >> > > >>
>> >> >> >> > > >>
>> >> >> >> > > >>
>> >> >> >> > > >> --
>> >> >> >> > > >> John Sirois
>> >> >> >> > > >> 303-512-3301
>> >> >> >> > > >>
>> >> >> >> > >
>> >> >> >> >
>> >> >> >>
>> >> >>
>> >>
>>

Re: Non-exclusive dedicated constraint

Posted by Steve Niemitz <sn...@apache.org>.
An arbitrary job can't target a fully dedicated role with this patch, it
will still get a "constraint not satisfied: dedicated" error.  The code in
the scheduler that matches the constraints does a simple string match, so
"*/test" will not match "role1/test" when trying to place the task, it will
only match "*/test".

On Wed, Jan 20, 2016 at 12:24 PM, Maxim Khutornenko <ma...@apache.org>
wrote:

> Thanks for the info, Steve! Yes, it would accomplish the same goal but
> at the price of removing the exclusive dedicated constraint
> enforcement. With this patch any job could target a fully dedicated
> exclusive pool, which may be undesirable for dedicated pool owners.
>
>
>
> On Wed, Jan 20, 2016 at 7:13 AM, Steve Niemitz <sn...@apache.org>
> wrote:
> > We've been running a trivial patch [1] that does what I believe you're
> > talking about for awhile now.  It allows a * for the role name, basically
> > allowing any role to match the constraint, so our constraints look like
> > "*/secure"
> >
> > Our use case is we have a "secure" cluster of machines that is
> constrained
> > on what can run on it (via an external audit process) that multiple roles
> > run on.
> >
> > I believe I had talked to Bill about this a few months ago, but I don't
> > remember where it ended up.
> >
> > [1]
> >
> https://github.com/tellapart/aurora/commit/76f978c76cc1377e19e602f7e0d050f7ce353562
> >
> > On Tue, Jan 19, 2016 at 11:48 PM, Maxim Khutornenko <ma...@apache.org>
> > wrote:
> >
> >> Oh, I didn't mean the memory GC pressure in the pure sense, rather a
> >> logical garbage of orphaned hosts that never leave the scheduler. It's
> >> not something to be concerned about from the performance standpoint.
> >> It's, however, something operators need to be aware of when a host
> >> from a dedicated pool gets dropped or replaced.
> >>
> >> On Tue, Jan 19, 2016 at 8:39 PM, Bill Farner <wf...@apache.org>
> wrote:
> >> > What do you mean by GC burden?  What i'm proposing is effectively
> >> > Map<String, String>.  Even with an extremely forgetful operator (even
> >> more
> >> > than Joe!), it would require a huge oversight to put a dent in heap
> >> usage.
> >> > I'm sure there are ways we could even expose a useful stat to flag
> such
> >> an
> >> > oversight.
> >> >
> >> > On Tue, Jan 19, 2016 at 8:31 PM, Maxim Khutornenko <ma...@apache.org>
> >> wrote:
> >> >
> >> >> Right, that's what I thought. Yes, it sounds interesting. My only
> >> >> concern is the GC burden of getting rid of hostnames that are
> obsolete
> >> >> and no longer exist. Relying on offers to update hostname 'relevance'
> >> >> may not work as dedicated hosts may be fully packed and not release
> >> >> any resources for a very long time. Let me explore this idea a bit to
> >> >> see what it would take to implement.
> >> >>
> >> >> On Tue, Jan 19, 2016 at 8:22 PM, Bill Farner <wf...@apache.org>
> >> wrote:
> >> >> > Not a host->attribute mapping (attribute in the mesos sense,
> anyway).
> >> >> Rather
> >> >> > an out-of-band API for marking machines as reserved.  For
> task->offer
> >> >> > mapping it's just a matter of another data source.  Does that make
> >> sense?
> >> >> >
> >> >> > On Tuesday, January 19, 2016, Maxim Khutornenko <ma...@apache.org>
> >> >> wrote:
> >> >> >
> >> >> >> >
> >> >> >> > Can't this just be any old Constraint (not named "dedicated").
> In
> >> >> other
> >> >> >> > words, doesn't this code already deal with non-dedicated
> >> constraints?:
> >> >> >> >
> >> >> >> >
> >> >> >>
> >> >>
> >>
> https://github.com/apache/aurora/blob/master/src/main/java/org/apache/aurora/scheduler/filter/SchedulingFilterImpl.java#L193-L197
> >> >> >>
> >> >> >>
> >> >> >> Not really. There is a subtle difference here. A regular
> >> (non-dedicated)
> >> >> >> constraint does not prevent other tasks from landing on a given
> >> machine
> >> >> set
> >> >> >> whereas dedicated keeps other tasks away by only allowing those
> >> matching
> >> >> >> the dedicated attribute. What this proposal targets is allowing
> >> >> exclusive
> >> >> >> machine pool matching any job that has this new constraint while
> >> keeping
> >> >> >> all other tasks that don't have that attribute away.
> >> >> >>
> >> >> >> Following an example from my original post, imagine a GPU machine
> >> pool.
> >> >> Any
> >> >> >> job (from any role) requiring GPU resource would be allowed while
> all
> >> >> other
> >> >> >> jobs that don't have that constraint would be vetoed.
> >> >> >>
> >> >> >> Also, regarding dedicated constraints necessitating a slave
> restart -
> >> >> i've
> >> >> >> > pondered moving dedicated machine management to the scheduler
> for
> >> >> similar
> >> >> >> > purposes.  There's not really much forcing that behavior to be
> >> managed
> >> >> >> with
> >> >> >> > a slave attribute.
> >> >> >>
> >> >> >>
> >> >> >> Would you mind giving a few more hints on the mechanics behind
> this?
> >> How
> >> >> >> would scheduler know about dedicated hw without the slave
> attributes
> >> >> set?
> >> >> >> Are you proposing storing hostname->attribute mapping in the
> >> scheduler
> >> >> >> store?
> >> >> >>
> >> >> >> On Tue, Jan 19, 2016 at 7:53 PM, Bill Farner <wfarner@apache.org
> >> >> >> <javascript:;>> wrote:
> >> >> >>
> >> >> >> > Joe - if you want to pursue this, I suggest you start another
> >> thread
> >> >> to
> >> >> >> > keep this thread's discussion in tact.  I will not be able to
> lead
> >> >> this
> >> >> >> > change, but can certainly shepherd!
> >> >> >> >
> >> >> >> > On Tuesday, January 19, 2016, Joe Smith <yasumoto7@gmail.com
> >> >> >> <javascript:;>> wrote:
> >> >> >> >
> >> >> >> > > As an operator, that'd be a relatively simple change in
> tooling,
> >> and
> >> >> >> the
> >> >> >> > > benefits of not forcing a slave restart would be _huge_.
> >> >> >> > >
> >> >> >> > > Keeping the dedicated semantics (but adding non-exclusive)
> would
> >> be
> >> >> >> ideal
> >> >> >> > > if possible.
> >> >> >> > >
> >> >> >> > > > On Jan 19, 2016, at 19:09, Bill Farner <wfarner@apache.org
> >> >> >> <javascript:;>
> >> >> >> > > <javascript:;>> wrote:
> >> >> >> > > >
> >> >> >> > > > Also, regarding dedicated constraints necessitating a slave
> >> >> restart -
> >> >> >> > > i've
> >> >> >> > > > pondered moving dedicated machine management to the
> scheduler
> >> for
> >> >> >> > similar
> >> >> >> > > > purposes.  There's not really much forcing that behavior to
> be
> >> >> >> managed
> >> >> >> > > with
> >> >> >> > > > a slave attribute.
> >> >> >> > > >
> >> >> >> > > > On Tue, Jan 19, 2016 at 7:05 PM, John Sirois <
> >> john@conductant.com
> >> >> >> <javascript:;>
> >> >> >> > > <javascript:;>> wrote:
> >> >> >> > > >
> >> >> >> > > >> On Tue, Jan 19, 2016 at 7:22 PM, Maxim Khutornenko <
> >> >> >> maxim@apache.org <javascript:;>
> >> >> >> > > <javascript:;>>
> >> >> >> > > >> wrote:
> >> >> >> > > >>
> >> >> >> > > >>> Has anyone explored an idea of having a non-exclusive (wrt
> >> job
> >> >> >> role)
> >> >> >> > > >>> dedicated constraint in Aurora before?
> >> >> >> > > >>
> >> >> >> > > >>
> >> >> >> > > >>> We do have a dedicated constraint now but it assumes a 1:1
> >> >> >> > > >>> relationship between a job role and a slave attribute [1].
> >> For
> >> >> >> > > >>> example: a 'www-data/prod/hello' job with a dedicated
> >> >> constraint of
> >> >> >> > > >>> 'dedicated': 'www-data/hello' may only be pinned to a
> >> particular
> >> >> >> set
> >> >> >> > > >>> of slaves if all of them have 'www-data/hello' attribute
> >> set. No
> >> >> >> > other
> >> >> >> > > >>> role tasks will be able to land on those slaves unless
> their
> >> >> >> > > >>> 'role/name' pair is added into the slave attribute set.
> >> >> >> > > >>>
> >> >> >> > > >>> The above is very limiting as it prevents carving out
> subsets
> >> >> of a
> >> >> >> > > >>> shared pool cluster to be used by multiple roles at the
> same
> >> >> time.
> >> >> >> > > >>> Would it make sense to have a free-form dedicated
> constraint
> >> not
> >> >> >> > bound
> >> >> >> > > >>> to a particular role? Multiple jobs could then use this
> type
> >> of
> >> >> >> > > >>> constraint dynamically without modifying the slave command
> >> line
> >> >> >> (and
> >> >> >> > > >>> requiring slave restart).
> >> >> >> > > >>
> >> >> >> > > >> Can't this just be any old Constraint (not named
> "dedicated").
> >> >> In
> >> >> >> > other
> >> >> >> > > >> words, doesn't this code already deal with non-dedicated
> >> >> >> constraints?:
> >> >> >> > > >>
> >> >> >> > > >>
> >> >> >> > >
> >> >> >> >
> >> >> >>
> >> >>
> >>
> https://github.com/apache/aurora/blob/master/src/main/java/org/apache/aurora/scheduler/filter/SchedulingFilterImpl.java#L193-L197
> >> >> >> > > >>
> >> >> >> > > >>
> >> >> >> > > >>> This could be quite useful for experimenting purposes
> (e.g.
> >> >> >> different
> >> >> >> > > >>> host OS) or to target a different hardware offering (e.g.
> >> >> GPUs). In
> >> >> >> > > >>> other words, only those jobs that explicitly opt-in to
> >> >> participate
> >> >> >> in
> >> >> >> > > >>> an experiment or hw offering would be landing on that
> slave
> >> set.
> >> >> >> > > >>>
> >> >> >> > > >>> Thanks,
> >> >> >> > > >>> Maxim
> >> >> >> > > >>>
> >> >> >> > > >>> [1]-
> >> >> >> > > >>
> >> >> >> > >
> >> >> >> >
> >> >> >>
> >> >>
> >>
> https://github.com/apache/aurora/blob/eec985d948f02f46637d87cd4d212eb2a70ef8d0/src/main/java/org/apache/aurora/scheduler/configuration/ConfigurationManager.java#L272-L276
> >> >> >> > > >>
> >> >> >> > > >>
> >> >> >> > > >>
> >> >> >> > > >> --
> >> >> >> > > >> John Sirois
> >> >> >> > > >> 303-512-3301
> >> >> >> > > >>
> >> >> >> > >
> >> >> >> >
> >> >> >>
> >> >>
> >>
>

Re: Non-exclusive dedicated constraint

Posted by Maxim Khutornenko <ma...@apache.org>.
Thanks for the info, Steve! Yes, it would accomplish the same goal but
at the price of removing the exclusive dedicated constraint
enforcement. With this patch any job could target a fully dedicated
exclusive pool, which may be undesirable for dedicated pool owners.



On Wed, Jan 20, 2016 at 7:13 AM, Steve Niemitz <sn...@apache.org> wrote:
> We've been running a trivial patch [1] that does what I believe you're
> talking about for awhile now.  It allows a * for the role name, basically
> allowing any role to match the constraint, so our constraints look like
> "*/secure"
>
> Our use case is we have a "secure" cluster of machines that is constrained
> on what can run on it (via an external audit process) that multiple roles
> run on.
>
> I believe I had talked to Bill about this a few months ago, but I don't
> remember where it ended up.
>
> [1]
> https://github.com/tellapart/aurora/commit/76f978c76cc1377e19e602f7e0d050f7ce353562
>
> On Tue, Jan 19, 2016 at 11:48 PM, Maxim Khutornenko <ma...@apache.org>
> wrote:
>
>> Oh, I didn't mean the memory GC pressure in the pure sense, rather a
>> logical garbage of orphaned hosts that never leave the scheduler. It's
>> not something to be concerned about from the performance standpoint.
>> It's, however, something operators need to be aware of when a host
>> from a dedicated pool gets dropped or replaced.
>>
>> On Tue, Jan 19, 2016 at 8:39 PM, Bill Farner <wf...@apache.org> wrote:
>> > What do you mean by GC burden?  What i'm proposing is effectively
>> > Map<String, String>.  Even with an extremely forgetful operator (even
>> more
>> > than Joe!), it would require a huge oversight to put a dent in heap
>> usage.
>> > I'm sure there are ways we could even expose a useful stat to flag such
>> an
>> > oversight.
>> >
>> > On Tue, Jan 19, 2016 at 8:31 PM, Maxim Khutornenko <ma...@apache.org>
>> wrote:
>> >
>> >> Right, that's what I thought. Yes, it sounds interesting. My only
>> >> concern is the GC burden of getting rid of hostnames that are obsolete
>> >> and no longer exist. Relying on offers to update hostname 'relevance'
>> >> may not work as dedicated hosts may be fully packed and not release
>> >> any resources for a very long time. Let me explore this idea a bit to
>> >> see what it would take to implement.
>> >>
>> >> On Tue, Jan 19, 2016 at 8:22 PM, Bill Farner <wf...@apache.org>
>> wrote:
>> >> > Not a host->attribute mapping (attribute in the mesos sense, anyway).
>> >> Rather
>> >> > an out-of-band API for marking machines as reserved.  For task->offer
>> >> > mapping it's just a matter of another data source.  Does that make
>> sense?
>> >> >
>> >> > On Tuesday, January 19, 2016, Maxim Khutornenko <ma...@apache.org>
>> >> wrote:
>> >> >
>> >> >> >
>> >> >> > Can't this just be any old Constraint (not named "dedicated").  In
>> >> other
>> >> >> > words, doesn't this code already deal with non-dedicated
>> constraints?:
>> >> >> >
>> >> >> >
>> >> >>
>> >>
>> https://github.com/apache/aurora/blob/master/src/main/java/org/apache/aurora/scheduler/filter/SchedulingFilterImpl.java#L193-L197
>> >> >>
>> >> >>
>> >> >> Not really. There is a subtle difference here. A regular
>> (non-dedicated)
>> >> >> constraint does not prevent other tasks from landing on a given
>> machine
>> >> set
>> >> >> whereas dedicated keeps other tasks away by only allowing those
>> matching
>> >> >> the dedicated attribute. What this proposal targets is allowing
>> >> exclusive
>> >> >> machine pool matching any job that has this new constraint while
>> keeping
>> >> >> all other tasks that don't have that attribute away.
>> >> >>
>> >> >> Following an example from my original post, imagine a GPU machine
>> pool.
>> >> Any
>> >> >> job (from any role) requiring GPU resource would be allowed while all
>> >> other
>> >> >> jobs that don't have that constraint would be vetoed.
>> >> >>
>> >> >> Also, regarding dedicated constraints necessitating a slave restart -
>> >> i've
>> >> >> > pondered moving dedicated machine management to the scheduler for
>> >> similar
>> >> >> > purposes.  There's not really much forcing that behavior to be
>> managed
>> >> >> with
>> >> >> > a slave attribute.
>> >> >>
>> >> >>
>> >> >> Would you mind giving a few more hints on the mechanics behind this?
>> How
>> >> >> would scheduler know about dedicated hw without the slave attributes
>> >> set?
>> >> >> Are you proposing storing hostname->attribute mapping in the
>> scheduler
>> >> >> store?
>> >> >>
>> >> >> On Tue, Jan 19, 2016 at 7:53 PM, Bill Farner <wfarner@apache.org
>> >> >> <javascript:;>> wrote:
>> >> >>
>> >> >> > Joe - if you want to pursue this, I suggest you start another
>> thread
>> >> to
>> >> >> > keep this thread's discussion in tact.  I will not be able to lead
>> >> this
>> >> >> > change, but can certainly shepherd!
>> >> >> >
>> >> >> > On Tuesday, January 19, 2016, Joe Smith <yasumoto7@gmail.com
>> >> >> <javascript:;>> wrote:
>> >> >> >
>> >> >> > > As an operator, that'd be a relatively simple change in tooling,
>> and
>> >> >> the
>> >> >> > > benefits of not forcing a slave restart would be _huge_.
>> >> >> > >
>> >> >> > > Keeping the dedicated semantics (but adding non-exclusive) would
>> be
>> >> >> ideal
>> >> >> > > if possible.
>> >> >> > >
>> >> >> > > > On Jan 19, 2016, at 19:09, Bill Farner <wfarner@apache.org
>> >> >> <javascript:;>
>> >> >> > > <javascript:;>> wrote:
>> >> >> > > >
>> >> >> > > > Also, regarding dedicated constraints necessitating a slave
>> >> restart -
>> >> >> > > i've
>> >> >> > > > pondered moving dedicated machine management to the scheduler
>> for
>> >> >> > similar
>> >> >> > > > purposes.  There's not really much forcing that behavior to be
>> >> >> managed
>> >> >> > > with
>> >> >> > > > a slave attribute.
>> >> >> > > >
>> >> >> > > > On Tue, Jan 19, 2016 at 7:05 PM, John Sirois <
>> john@conductant.com
>> >> >> <javascript:;>
>> >> >> > > <javascript:;>> wrote:
>> >> >> > > >
>> >> >> > > >> On Tue, Jan 19, 2016 at 7:22 PM, Maxim Khutornenko <
>> >> >> maxim@apache.org <javascript:;>
>> >> >> > > <javascript:;>>
>> >> >> > > >> wrote:
>> >> >> > > >>
>> >> >> > > >>> Has anyone explored an idea of having a non-exclusive (wrt
>> job
>> >> >> role)
>> >> >> > > >>> dedicated constraint in Aurora before?
>> >> >> > > >>
>> >> >> > > >>
>> >> >> > > >>> We do have a dedicated constraint now but it assumes a 1:1
>> >> >> > > >>> relationship between a job role and a slave attribute [1].
>> For
>> >> >> > > >>> example: a 'www-data/prod/hello' job with a dedicated
>> >> constraint of
>> >> >> > > >>> 'dedicated': 'www-data/hello' may only be pinned to a
>> particular
>> >> >> set
>> >> >> > > >>> of slaves if all of them have 'www-data/hello' attribute
>> set. No
>> >> >> > other
>> >> >> > > >>> role tasks will be able to land on those slaves unless their
>> >> >> > > >>> 'role/name' pair is added into the slave attribute set.
>> >> >> > > >>>
>> >> >> > > >>> The above is very limiting as it prevents carving out subsets
>> >> of a
>> >> >> > > >>> shared pool cluster to be used by multiple roles at the same
>> >> time.
>> >> >> > > >>> Would it make sense to have a free-form dedicated constraint
>> not
>> >> >> > bound
>> >> >> > > >>> to a particular role? Multiple jobs could then use this type
>> of
>> >> >> > > >>> constraint dynamically without modifying the slave command
>> line
>> >> >> (and
>> >> >> > > >>> requiring slave restart).
>> >> >> > > >>
>> >> >> > > >> Can't this just be any old Constraint (not named "dedicated").
>> >> In
>> >> >> > other
>> >> >> > > >> words, doesn't this code already deal with non-dedicated
>> >> >> constraints?:
>> >> >> > > >>
>> >> >> > > >>
>> >> >> > >
>> >> >> >
>> >> >>
>> >>
>> https://github.com/apache/aurora/blob/master/src/main/java/org/apache/aurora/scheduler/filter/SchedulingFilterImpl.java#L193-L197
>> >> >> > > >>
>> >> >> > > >>
>> >> >> > > >>> This could be quite useful for experimenting purposes (e.g.
>> >> >> different
>> >> >> > > >>> host OS) or to target a different hardware offering (e.g.
>> >> GPUs). In
>> >> >> > > >>> other words, only those jobs that explicitly opt-in to
>> >> participate
>> >> >> in
>> >> >> > > >>> an experiment or hw offering would be landing on that slave
>> set.
>> >> >> > > >>>
>> >> >> > > >>> Thanks,
>> >> >> > > >>> Maxim
>> >> >> > > >>>
>> >> >> > > >>> [1]-
>> >> >> > > >>
>> >> >> > >
>> >> >> >
>> >> >>
>> >>
>> https://github.com/apache/aurora/blob/eec985d948f02f46637d87cd4d212eb2a70ef8d0/src/main/java/org/apache/aurora/scheduler/configuration/ConfigurationManager.java#L272-L276
>> >> >> > > >>
>> >> >> > > >>
>> >> >> > > >>
>> >> >> > > >> --
>> >> >> > > >> John Sirois
>> >> >> > > >> 303-512-3301
>> >> >> > > >>
>> >> >> > >
>> >> >> >
>> >> >>
>> >>
>>

Re: Non-exclusive dedicated constraint

Posted by Steve Niemitz <sn...@apache.org>.
We've been running a trivial patch [1] that does what I believe you're
talking about for awhile now.  It allows a * for the role name, basically
allowing any role to match the constraint, so our constraints look like
"*/secure"

Our use case is we have a "secure" cluster of machines that is constrained
on what can run on it (via an external audit process) that multiple roles
run on.

I believe I had talked to Bill about this a few months ago, but I don't
remember where it ended up.

[1]
https://github.com/tellapart/aurora/commit/76f978c76cc1377e19e602f7e0d050f7ce353562

On Tue, Jan 19, 2016 at 11:48 PM, Maxim Khutornenko <ma...@apache.org>
wrote:

> Oh, I didn't mean the memory GC pressure in the pure sense, rather a
> logical garbage of orphaned hosts that never leave the scheduler. It's
> not something to be concerned about from the performance standpoint.
> It's, however, something operators need to be aware of when a host
> from a dedicated pool gets dropped or replaced.
>
> On Tue, Jan 19, 2016 at 8:39 PM, Bill Farner <wf...@apache.org> wrote:
> > What do you mean by GC burden?  What i'm proposing is effectively
> > Map<String, String>.  Even with an extremely forgetful operator (even
> more
> > than Joe!), it would require a huge oversight to put a dent in heap
> usage.
> > I'm sure there are ways we could even expose a useful stat to flag such
> an
> > oversight.
> >
> > On Tue, Jan 19, 2016 at 8:31 PM, Maxim Khutornenko <ma...@apache.org>
> wrote:
> >
> >> Right, that's what I thought. Yes, it sounds interesting. My only
> >> concern is the GC burden of getting rid of hostnames that are obsolete
> >> and no longer exist. Relying on offers to update hostname 'relevance'
> >> may not work as dedicated hosts may be fully packed and not release
> >> any resources for a very long time. Let me explore this idea a bit to
> >> see what it would take to implement.
> >>
> >> On Tue, Jan 19, 2016 at 8:22 PM, Bill Farner <wf...@apache.org>
> wrote:
> >> > Not a host->attribute mapping (attribute in the mesos sense, anyway).
> >> Rather
> >> > an out-of-band API for marking machines as reserved.  For task->offer
> >> > mapping it's just a matter of another data source.  Does that make
> sense?
> >> >
> >> > On Tuesday, January 19, 2016, Maxim Khutornenko <ma...@apache.org>
> >> wrote:
> >> >
> >> >> >
> >> >> > Can't this just be any old Constraint (not named "dedicated").  In
> >> other
> >> >> > words, doesn't this code already deal with non-dedicated
> constraints?:
> >> >> >
> >> >> >
> >> >>
> >>
> https://github.com/apache/aurora/blob/master/src/main/java/org/apache/aurora/scheduler/filter/SchedulingFilterImpl.java#L193-L197
> >> >>
> >> >>
> >> >> Not really. There is a subtle difference here. A regular
> (non-dedicated)
> >> >> constraint does not prevent other tasks from landing on a given
> machine
> >> set
> >> >> whereas dedicated keeps other tasks away by only allowing those
> matching
> >> >> the dedicated attribute. What this proposal targets is allowing
> >> exclusive
> >> >> machine pool matching any job that has this new constraint while
> keeping
> >> >> all other tasks that don't have that attribute away.
> >> >>
> >> >> Following an example from my original post, imagine a GPU machine
> pool.
> >> Any
> >> >> job (from any role) requiring GPU resource would be allowed while all
> >> other
> >> >> jobs that don't have that constraint would be vetoed.
> >> >>
> >> >> Also, regarding dedicated constraints necessitating a slave restart -
> >> i've
> >> >> > pondered moving dedicated machine management to the scheduler for
> >> similar
> >> >> > purposes.  There's not really much forcing that behavior to be
> managed
> >> >> with
> >> >> > a slave attribute.
> >> >>
> >> >>
> >> >> Would you mind giving a few more hints on the mechanics behind this?
> How
> >> >> would scheduler know about dedicated hw without the slave attributes
> >> set?
> >> >> Are you proposing storing hostname->attribute mapping in the
> scheduler
> >> >> store?
> >> >>
> >> >> On Tue, Jan 19, 2016 at 7:53 PM, Bill Farner <wfarner@apache.org
> >> >> <javascript:;>> wrote:
> >> >>
> >> >> > Joe - if you want to pursue this, I suggest you start another
> thread
> >> to
> >> >> > keep this thread's discussion in tact.  I will not be able to lead
> >> this
> >> >> > change, but can certainly shepherd!
> >> >> >
> >> >> > On Tuesday, January 19, 2016, Joe Smith <yasumoto7@gmail.com
> >> >> <javascript:;>> wrote:
> >> >> >
> >> >> > > As an operator, that'd be a relatively simple change in tooling,
> and
> >> >> the
> >> >> > > benefits of not forcing a slave restart would be _huge_.
> >> >> > >
> >> >> > > Keeping the dedicated semantics (but adding non-exclusive) would
> be
> >> >> ideal
> >> >> > > if possible.
> >> >> > >
> >> >> > > > On Jan 19, 2016, at 19:09, Bill Farner <wfarner@apache.org
> >> >> <javascript:;>
> >> >> > > <javascript:;>> wrote:
> >> >> > > >
> >> >> > > > Also, regarding dedicated constraints necessitating a slave
> >> restart -
> >> >> > > i've
> >> >> > > > pondered moving dedicated machine management to the scheduler
> for
> >> >> > similar
> >> >> > > > purposes.  There's not really much forcing that behavior to be
> >> >> managed
> >> >> > > with
> >> >> > > > a slave attribute.
> >> >> > > >
> >> >> > > > On Tue, Jan 19, 2016 at 7:05 PM, John Sirois <
> john@conductant.com
> >> >> <javascript:;>
> >> >> > > <javascript:;>> wrote:
> >> >> > > >
> >> >> > > >> On Tue, Jan 19, 2016 at 7:22 PM, Maxim Khutornenko <
> >> >> maxim@apache.org <javascript:;>
> >> >> > > <javascript:;>>
> >> >> > > >> wrote:
> >> >> > > >>
> >> >> > > >>> Has anyone explored an idea of having a non-exclusive (wrt
> job
> >> >> role)
> >> >> > > >>> dedicated constraint in Aurora before?
> >> >> > > >>
> >> >> > > >>
> >> >> > > >>> We do have a dedicated constraint now but it assumes a 1:1
> >> >> > > >>> relationship between a job role and a slave attribute [1].
> For
> >> >> > > >>> example: a 'www-data/prod/hello' job with a dedicated
> >> constraint of
> >> >> > > >>> 'dedicated': 'www-data/hello' may only be pinned to a
> particular
> >> >> set
> >> >> > > >>> of slaves if all of them have 'www-data/hello' attribute
> set. No
> >> >> > other
> >> >> > > >>> role tasks will be able to land on those slaves unless their
> >> >> > > >>> 'role/name' pair is added into the slave attribute set.
> >> >> > > >>>
> >> >> > > >>> The above is very limiting as it prevents carving out subsets
> >> of a
> >> >> > > >>> shared pool cluster to be used by multiple roles at the same
> >> time.
> >> >> > > >>> Would it make sense to have a free-form dedicated constraint
> not
> >> >> > bound
> >> >> > > >>> to a particular role? Multiple jobs could then use this type
> of
> >> >> > > >>> constraint dynamically without modifying the slave command
> line
> >> >> (and
> >> >> > > >>> requiring slave restart).
> >> >> > > >>
> >> >> > > >> Can't this just be any old Constraint (not named "dedicated").
> >> In
> >> >> > other
> >> >> > > >> words, doesn't this code already deal with non-dedicated
> >> >> constraints?:
> >> >> > > >>
> >> >> > > >>
> >> >> > >
> >> >> >
> >> >>
> >>
> https://github.com/apache/aurora/blob/master/src/main/java/org/apache/aurora/scheduler/filter/SchedulingFilterImpl.java#L193-L197
> >> >> > > >>
> >> >> > > >>
> >> >> > > >>> This could be quite useful for experimenting purposes (e.g.
> >> >> different
> >> >> > > >>> host OS) or to target a different hardware offering (e.g.
> >> GPUs). In
> >> >> > > >>> other words, only those jobs that explicitly opt-in to
> >> participate
> >> >> in
> >> >> > > >>> an experiment or hw offering would be landing on that slave
> set.
> >> >> > > >>>
> >> >> > > >>> Thanks,
> >> >> > > >>> Maxim
> >> >> > > >>>
> >> >> > > >>> [1]-
> >> >> > > >>
> >> >> > >
> >> >> >
> >> >>
> >>
> https://github.com/apache/aurora/blob/eec985d948f02f46637d87cd4d212eb2a70ef8d0/src/main/java/org/apache/aurora/scheduler/configuration/ConfigurationManager.java#L272-L276
> >> >> > > >>
> >> >> > > >>
> >> >> > > >>
> >> >> > > >> --
> >> >> > > >> John Sirois
> >> >> > > >> 303-512-3301
> >> >> > > >>
> >> >> > >
> >> >> >
> >> >>
> >>
>

Re: Non-exclusive dedicated constraint

Posted by Maxim Khutornenko <ma...@apache.org>.
Oh, I didn't mean the memory GC pressure in the pure sense, rather a
logical garbage of orphaned hosts that never leave the scheduler. It's
not something to be concerned about from the performance standpoint.
It's, however, something operators need to be aware of when a host
from a dedicated pool gets dropped or replaced.

On Tue, Jan 19, 2016 at 8:39 PM, Bill Farner <wf...@apache.org> wrote:
> What do you mean by GC burden?  What i'm proposing is effectively
> Map<String, String>.  Even with an extremely forgetful operator (even more
> than Joe!), it would require a huge oversight to put a dent in heap usage.
> I'm sure there are ways we could even expose a useful stat to flag such an
> oversight.
>
> On Tue, Jan 19, 2016 at 8:31 PM, Maxim Khutornenko <ma...@apache.org> wrote:
>
>> Right, that's what I thought. Yes, it sounds interesting. My only
>> concern is the GC burden of getting rid of hostnames that are obsolete
>> and no longer exist. Relying on offers to update hostname 'relevance'
>> may not work as dedicated hosts may be fully packed and not release
>> any resources for a very long time. Let me explore this idea a bit to
>> see what it would take to implement.
>>
>> On Tue, Jan 19, 2016 at 8:22 PM, Bill Farner <wf...@apache.org> wrote:
>> > Not a host->attribute mapping (attribute in the mesos sense, anyway).
>> Rather
>> > an out-of-band API for marking machines as reserved.  For task->offer
>> > mapping it's just a matter of another data source.  Does that make sense?
>> >
>> > On Tuesday, January 19, 2016, Maxim Khutornenko <ma...@apache.org>
>> wrote:
>> >
>> >> >
>> >> > Can't this just be any old Constraint (not named "dedicated").  In
>> other
>> >> > words, doesn't this code already deal with non-dedicated constraints?:
>> >> >
>> >> >
>> >>
>> https://github.com/apache/aurora/blob/master/src/main/java/org/apache/aurora/scheduler/filter/SchedulingFilterImpl.java#L193-L197
>> >>
>> >>
>> >> Not really. There is a subtle difference here. A regular (non-dedicated)
>> >> constraint does not prevent other tasks from landing on a given machine
>> set
>> >> whereas dedicated keeps other tasks away by only allowing those matching
>> >> the dedicated attribute. What this proposal targets is allowing
>> exclusive
>> >> machine pool matching any job that has this new constraint while keeping
>> >> all other tasks that don't have that attribute away.
>> >>
>> >> Following an example from my original post, imagine a GPU machine pool.
>> Any
>> >> job (from any role) requiring GPU resource would be allowed while all
>> other
>> >> jobs that don't have that constraint would be vetoed.
>> >>
>> >> Also, regarding dedicated constraints necessitating a slave restart -
>> i've
>> >> > pondered moving dedicated machine management to the scheduler for
>> similar
>> >> > purposes.  There's not really much forcing that behavior to be managed
>> >> with
>> >> > a slave attribute.
>> >>
>> >>
>> >> Would you mind giving a few more hints on the mechanics behind this? How
>> >> would scheduler know about dedicated hw without the slave attributes
>> set?
>> >> Are you proposing storing hostname->attribute mapping in the scheduler
>> >> store?
>> >>
>> >> On Tue, Jan 19, 2016 at 7:53 PM, Bill Farner <wfarner@apache.org
>> >> <javascript:;>> wrote:
>> >>
>> >> > Joe - if you want to pursue this, I suggest you start another thread
>> to
>> >> > keep this thread's discussion in tact.  I will not be able to lead
>> this
>> >> > change, but can certainly shepherd!
>> >> >
>> >> > On Tuesday, January 19, 2016, Joe Smith <yasumoto7@gmail.com
>> >> <javascript:;>> wrote:
>> >> >
>> >> > > As an operator, that'd be a relatively simple change in tooling, and
>> >> the
>> >> > > benefits of not forcing a slave restart would be _huge_.
>> >> > >
>> >> > > Keeping the dedicated semantics (but adding non-exclusive) would be
>> >> ideal
>> >> > > if possible.
>> >> > >
>> >> > > > On Jan 19, 2016, at 19:09, Bill Farner <wfarner@apache.org
>> >> <javascript:;>
>> >> > > <javascript:;>> wrote:
>> >> > > >
>> >> > > > Also, regarding dedicated constraints necessitating a slave
>> restart -
>> >> > > i've
>> >> > > > pondered moving dedicated machine management to the scheduler for
>> >> > similar
>> >> > > > purposes.  There's not really much forcing that behavior to be
>> >> managed
>> >> > > with
>> >> > > > a slave attribute.
>> >> > > >
>> >> > > > On Tue, Jan 19, 2016 at 7:05 PM, John Sirois <john@conductant.com
>> >> <javascript:;>
>> >> > > <javascript:;>> wrote:
>> >> > > >
>> >> > > >> On Tue, Jan 19, 2016 at 7:22 PM, Maxim Khutornenko <
>> >> maxim@apache.org <javascript:;>
>> >> > > <javascript:;>>
>> >> > > >> wrote:
>> >> > > >>
>> >> > > >>> Has anyone explored an idea of having a non-exclusive (wrt job
>> >> role)
>> >> > > >>> dedicated constraint in Aurora before?
>> >> > > >>
>> >> > > >>
>> >> > > >>> We do have a dedicated constraint now but it assumes a 1:1
>> >> > > >>> relationship between a job role and a slave attribute [1]. For
>> >> > > >>> example: a 'www-data/prod/hello' job with a dedicated
>> constraint of
>> >> > > >>> 'dedicated': 'www-data/hello' may only be pinned to a particular
>> >> set
>> >> > > >>> of slaves if all of them have 'www-data/hello' attribute set. No
>> >> > other
>> >> > > >>> role tasks will be able to land on those slaves unless their
>> >> > > >>> 'role/name' pair is added into the slave attribute set.
>> >> > > >>>
>> >> > > >>> The above is very limiting as it prevents carving out subsets
>> of a
>> >> > > >>> shared pool cluster to be used by multiple roles at the same
>> time.
>> >> > > >>> Would it make sense to have a free-form dedicated constraint not
>> >> > bound
>> >> > > >>> to a particular role? Multiple jobs could then use this type of
>> >> > > >>> constraint dynamically without modifying the slave command line
>> >> (and
>> >> > > >>> requiring slave restart).
>> >> > > >>
>> >> > > >> Can't this just be any old Constraint (not named "dedicated").
>> In
>> >> > other
>> >> > > >> words, doesn't this code already deal with non-dedicated
>> >> constraints?:
>> >> > > >>
>> >> > > >>
>> >> > >
>> >> >
>> >>
>> https://github.com/apache/aurora/blob/master/src/main/java/org/apache/aurora/scheduler/filter/SchedulingFilterImpl.java#L193-L197
>> >> > > >>
>> >> > > >>
>> >> > > >>> This could be quite useful for experimenting purposes (e.g.
>> >> different
>> >> > > >>> host OS) or to target a different hardware offering (e.g.
>> GPUs). In
>> >> > > >>> other words, only those jobs that explicitly opt-in to
>> participate
>> >> in
>> >> > > >>> an experiment or hw offering would be landing on that slave set.
>> >> > > >>>
>> >> > > >>> Thanks,
>> >> > > >>> Maxim
>> >> > > >>>
>> >> > > >>> [1]-
>> >> > > >>
>> >> > >
>> >> >
>> >>
>> https://github.com/apache/aurora/blob/eec985d948f02f46637d87cd4d212eb2a70ef8d0/src/main/java/org/apache/aurora/scheduler/configuration/ConfigurationManager.java#L272-L276
>> >> > > >>
>> >> > > >>
>> >> > > >>
>> >> > > >> --
>> >> > > >> John Sirois
>> >> > > >> 303-512-3301
>> >> > > >>
>> >> > >
>> >> >
>> >>
>>

Re: Non-exclusive dedicated constraint

Posted by Bill Farner <wf...@apache.org>.
What do you mean by GC burden?  What i'm proposing is effectively
Map<String, String>.  Even with an extremely forgetful operator (even more
than Joe!), it would require a huge oversight to put a dent in heap usage.
I'm sure there are ways we could even expose a useful stat to flag such an
oversight.

On Tue, Jan 19, 2016 at 8:31 PM, Maxim Khutornenko <ma...@apache.org> wrote:

> Right, that's what I thought. Yes, it sounds interesting. My only
> concern is the GC burden of getting rid of hostnames that are obsolete
> and no longer exist. Relying on offers to update hostname 'relevance'
> may not work as dedicated hosts may be fully packed and not release
> any resources for a very long time. Let me explore this idea a bit to
> see what it would take to implement.
>
> On Tue, Jan 19, 2016 at 8:22 PM, Bill Farner <wf...@apache.org> wrote:
> > Not a host->attribute mapping (attribute in the mesos sense, anyway).
> Rather
> > an out-of-band API for marking machines as reserved.  For task->offer
> > mapping it's just a matter of another data source.  Does that make sense?
> >
> > On Tuesday, January 19, 2016, Maxim Khutornenko <ma...@apache.org>
> wrote:
> >
> >> >
> >> > Can't this just be any old Constraint (not named "dedicated").  In
> other
> >> > words, doesn't this code already deal with non-dedicated constraints?:
> >> >
> >> >
> >>
> https://github.com/apache/aurora/blob/master/src/main/java/org/apache/aurora/scheduler/filter/SchedulingFilterImpl.java#L193-L197
> >>
> >>
> >> Not really. There is a subtle difference here. A regular (non-dedicated)
> >> constraint does not prevent other tasks from landing on a given machine
> set
> >> whereas dedicated keeps other tasks away by only allowing those matching
> >> the dedicated attribute. What this proposal targets is allowing
> exclusive
> >> machine pool matching any job that has this new constraint while keeping
> >> all other tasks that don't have that attribute away.
> >>
> >> Following an example from my original post, imagine a GPU machine pool.
> Any
> >> job (from any role) requiring GPU resource would be allowed while all
> other
> >> jobs that don't have that constraint would be vetoed.
> >>
> >> Also, regarding dedicated constraints necessitating a slave restart -
> i've
> >> > pondered moving dedicated machine management to the scheduler for
> similar
> >> > purposes.  There's not really much forcing that behavior to be managed
> >> with
> >> > a slave attribute.
> >>
> >>
> >> Would you mind giving a few more hints on the mechanics behind this? How
> >> would scheduler know about dedicated hw without the slave attributes
> set?
> >> Are you proposing storing hostname->attribute mapping in the scheduler
> >> store?
> >>
> >> On Tue, Jan 19, 2016 at 7:53 PM, Bill Farner <wfarner@apache.org
> >> <javascript:;>> wrote:
> >>
> >> > Joe - if you want to pursue this, I suggest you start another thread
> to
> >> > keep this thread's discussion in tact.  I will not be able to lead
> this
> >> > change, but can certainly shepherd!
> >> >
> >> > On Tuesday, January 19, 2016, Joe Smith <yasumoto7@gmail.com
> >> <javascript:;>> wrote:
> >> >
> >> > > As an operator, that'd be a relatively simple change in tooling, and
> >> the
> >> > > benefits of not forcing a slave restart would be _huge_.
> >> > >
> >> > > Keeping the dedicated semantics (but adding non-exclusive) would be
> >> ideal
> >> > > if possible.
> >> > >
> >> > > > On Jan 19, 2016, at 19:09, Bill Farner <wfarner@apache.org
> >> <javascript:;>
> >> > > <javascript:;>> wrote:
> >> > > >
> >> > > > Also, regarding dedicated constraints necessitating a slave
> restart -
> >> > > i've
> >> > > > pondered moving dedicated machine management to the scheduler for
> >> > similar
> >> > > > purposes.  There's not really much forcing that behavior to be
> >> managed
> >> > > with
> >> > > > a slave attribute.
> >> > > >
> >> > > > On Tue, Jan 19, 2016 at 7:05 PM, John Sirois <john@conductant.com
> >> <javascript:;>
> >> > > <javascript:;>> wrote:
> >> > > >
> >> > > >> On Tue, Jan 19, 2016 at 7:22 PM, Maxim Khutornenko <
> >> maxim@apache.org <javascript:;>
> >> > > <javascript:;>>
> >> > > >> wrote:
> >> > > >>
> >> > > >>> Has anyone explored an idea of having a non-exclusive (wrt job
> >> role)
> >> > > >>> dedicated constraint in Aurora before?
> >> > > >>
> >> > > >>
> >> > > >>> We do have a dedicated constraint now but it assumes a 1:1
> >> > > >>> relationship between a job role and a slave attribute [1]. For
> >> > > >>> example: a 'www-data/prod/hello' job with a dedicated
> constraint of
> >> > > >>> 'dedicated': 'www-data/hello' may only be pinned to a particular
> >> set
> >> > > >>> of slaves if all of them have 'www-data/hello' attribute set. No
> >> > other
> >> > > >>> role tasks will be able to land on those slaves unless their
> >> > > >>> 'role/name' pair is added into the slave attribute set.
> >> > > >>>
> >> > > >>> The above is very limiting as it prevents carving out subsets
> of a
> >> > > >>> shared pool cluster to be used by multiple roles at the same
> time.
> >> > > >>> Would it make sense to have a free-form dedicated constraint not
> >> > bound
> >> > > >>> to a particular role? Multiple jobs could then use this type of
> >> > > >>> constraint dynamically without modifying the slave command line
> >> (and
> >> > > >>> requiring slave restart).
> >> > > >>
> >> > > >> Can't this just be any old Constraint (not named "dedicated").
> In
> >> > other
> >> > > >> words, doesn't this code already deal with non-dedicated
> >> constraints?:
> >> > > >>
> >> > > >>
> >> > >
> >> >
> >>
> https://github.com/apache/aurora/blob/master/src/main/java/org/apache/aurora/scheduler/filter/SchedulingFilterImpl.java#L193-L197
> >> > > >>
> >> > > >>
> >> > > >>> This could be quite useful for experimenting purposes (e.g.
> >> different
> >> > > >>> host OS) or to target a different hardware offering (e.g.
> GPUs). In
> >> > > >>> other words, only those jobs that explicitly opt-in to
> participate
> >> in
> >> > > >>> an experiment or hw offering would be landing on that slave set.
> >> > > >>>
> >> > > >>> Thanks,
> >> > > >>> Maxim
> >> > > >>>
> >> > > >>> [1]-
> >> > > >>
> >> > >
> >> >
> >>
> https://github.com/apache/aurora/blob/eec985d948f02f46637d87cd4d212eb2a70ef8d0/src/main/java/org/apache/aurora/scheduler/configuration/ConfigurationManager.java#L272-L276
> >> > > >>
> >> > > >>
> >> > > >>
> >> > > >> --
> >> > > >> John Sirois
> >> > > >> 303-512-3301
> >> > > >>
> >> > >
> >> >
> >>
>

Re: Non-exclusive dedicated constraint

Posted by Maxim Khutornenko <ma...@apache.org>.
Right, that's what I thought. Yes, it sounds interesting. My only
concern is the GC burden of getting rid of hostnames that are obsolete
and no longer exist. Relying on offers to update hostname 'relevance'
may not work as dedicated hosts may be fully packed and not release
any resources for a very long time. Let me explore this idea a bit to
see what it would take to implement.

On Tue, Jan 19, 2016 at 8:22 PM, Bill Farner <wf...@apache.org> wrote:
> Not a host->attribute mapping (attribute in the mesos sense, anyway).  Rather
> an out-of-band API for marking machines as reserved.  For task->offer
> mapping it's just a matter of another data source.  Does that make sense?
>
> On Tuesday, January 19, 2016, Maxim Khutornenko <ma...@apache.org> wrote:
>
>> >
>> > Can't this just be any old Constraint (not named "dedicated").  In other
>> > words, doesn't this code already deal with non-dedicated constraints?:
>> >
>> >
>> https://github.com/apache/aurora/blob/master/src/main/java/org/apache/aurora/scheduler/filter/SchedulingFilterImpl.java#L193-L197
>>
>>
>> Not really. There is a subtle difference here. A regular (non-dedicated)
>> constraint does not prevent other tasks from landing on a given machine set
>> whereas dedicated keeps other tasks away by only allowing those matching
>> the dedicated attribute. What this proposal targets is allowing exclusive
>> machine pool matching any job that has this new constraint while keeping
>> all other tasks that don't have that attribute away.
>>
>> Following an example from my original post, imagine a GPU machine pool. Any
>> job (from any role) requiring GPU resource would be allowed while all other
>> jobs that don't have that constraint would be vetoed.
>>
>> Also, regarding dedicated constraints necessitating a slave restart - i've
>> > pondered moving dedicated machine management to the scheduler for similar
>> > purposes.  There's not really much forcing that behavior to be managed
>> with
>> > a slave attribute.
>>
>>
>> Would you mind giving a few more hints on the mechanics behind this? How
>> would scheduler know about dedicated hw without the slave attributes set?
>> Are you proposing storing hostname->attribute mapping in the scheduler
>> store?
>>
>> On Tue, Jan 19, 2016 at 7:53 PM, Bill Farner <wfarner@apache.org
>> <javascript:;>> wrote:
>>
>> > Joe - if you want to pursue this, I suggest you start another thread to
>> > keep this thread's discussion in tact.  I will not be able to lead this
>> > change, but can certainly shepherd!
>> >
>> > On Tuesday, January 19, 2016, Joe Smith <yasumoto7@gmail.com
>> <javascript:;>> wrote:
>> >
>> > > As an operator, that'd be a relatively simple change in tooling, and
>> the
>> > > benefits of not forcing a slave restart would be _huge_.
>> > >
>> > > Keeping the dedicated semantics (but adding non-exclusive) would be
>> ideal
>> > > if possible.
>> > >
>> > > > On Jan 19, 2016, at 19:09, Bill Farner <wfarner@apache.org
>> <javascript:;>
>> > > <javascript:;>> wrote:
>> > > >
>> > > > Also, regarding dedicated constraints necessitating a slave restart -
>> > > i've
>> > > > pondered moving dedicated machine management to the scheduler for
>> > similar
>> > > > purposes.  There's not really much forcing that behavior to be
>> managed
>> > > with
>> > > > a slave attribute.
>> > > >
>> > > > On Tue, Jan 19, 2016 at 7:05 PM, John Sirois <john@conductant.com
>> <javascript:;>
>> > > <javascript:;>> wrote:
>> > > >
>> > > >> On Tue, Jan 19, 2016 at 7:22 PM, Maxim Khutornenko <
>> maxim@apache.org <javascript:;>
>> > > <javascript:;>>
>> > > >> wrote:
>> > > >>
>> > > >>> Has anyone explored an idea of having a non-exclusive (wrt job
>> role)
>> > > >>> dedicated constraint in Aurora before?
>> > > >>
>> > > >>
>> > > >>> We do have a dedicated constraint now but it assumes a 1:1
>> > > >>> relationship between a job role and a slave attribute [1]. For
>> > > >>> example: a 'www-data/prod/hello' job with a dedicated constraint of
>> > > >>> 'dedicated': 'www-data/hello' may only be pinned to a particular
>> set
>> > > >>> of slaves if all of them have 'www-data/hello' attribute set. No
>> > other
>> > > >>> role tasks will be able to land on those slaves unless their
>> > > >>> 'role/name' pair is added into the slave attribute set.
>> > > >>>
>> > > >>> The above is very limiting as it prevents carving out subsets of a
>> > > >>> shared pool cluster to be used by multiple roles at the same time.
>> > > >>> Would it make sense to have a free-form dedicated constraint not
>> > bound
>> > > >>> to a particular role? Multiple jobs could then use this type of
>> > > >>> constraint dynamically without modifying the slave command line
>> (and
>> > > >>> requiring slave restart).
>> > > >>
>> > > >> Can't this just be any old Constraint (not named "dedicated").  In
>> > other
>> > > >> words, doesn't this code already deal with non-dedicated
>> constraints?:
>> > > >>
>> > > >>
>> > >
>> >
>> https://github.com/apache/aurora/blob/master/src/main/java/org/apache/aurora/scheduler/filter/SchedulingFilterImpl.java#L193-L197
>> > > >>
>> > > >>
>> > > >>> This could be quite useful for experimenting purposes (e.g.
>> different
>> > > >>> host OS) or to target a different hardware offering (e.g. GPUs). In
>> > > >>> other words, only those jobs that explicitly opt-in to participate
>> in
>> > > >>> an experiment or hw offering would be landing on that slave set.
>> > > >>>
>> > > >>> Thanks,
>> > > >>> Maxim
>> > > >>>
>> > > >>> [1]-
>> > > >>
>> > >
>> >
>> https://github.com/apache/aurora/blob/eec985d948f02f46637d87cd4d212eb2a70ef8d0/src/main/java/org/apache/aurora/scheduler/configuration/ConfigurationManager.java#L272-L276
>> > > >>
>> > > >>
>> > > >>
>> > > >> --
>> > > >> John Sirois
>> > > >> 303-512-3301
>> > > >>
>> > >
>> >
>>

Re: Non-exclusive dedicated constraint

Posted by Bill Farner <wf...@apache.org>.
Not a host->attribute mapping (attribute in the mesos sense, anyway).  Rather
an out-of-band API for marking machines as reserved.  For task->offer
mapping it's just a matter of another data source.  Does that make sense?

On Tuesday, January 19, 2016, Maxim Khutornenko <ma...@apache.org> wrote:

> >
> > Can't this just be any old Constraint (not named "dedicated").  In other
> > words, doesn't this code already deal with non-dedicated constraints?:
> >
> >
> https://github.com/apache/aurora/blob/master/src/main/java/org/apache/aurora/scheduler/filter/SchedulingFilterImpl.java#L193-L197
>
>
> Not really. There is a subtle difference here. A regular (non-dedicated)
> constraint does not prevent other tasks from landing on a given machine set
> whereas dedicated keeps other tasks away by only allowing those matching
> the dedicated attribute. What this proposal targets is allowing exclusive
> machine pool matching any job that has this new constraint while keeping
> all other tasks that don't have that attribute away.
>
> Following an example from my original post, imagine a GPU machine pool. Any
> job (from any role) requiring GPU resource would be allowed while all other
> jobs that don't have that constraint would be vetoed.
>
> Also, regarding dedicated constraints necessitating a slave restart - i've
> > pondered moving dedicated machine management to the scheduler for similar
> > purposes.  There's not really much forcing that behavior to be managed
> with
> > a slave attribute.
>
>
> Would you mind giving a few more hints on the mechanics behind this? How
> would scheduler know about dedicated hw without the slave attributes set?
> Are you proposing storing hostname->attribute mapping in the scheduler
> store?
>
> On Tue, Jan 19, 2016 at 7:53 PM, Bill Farner <wfarner@apache.org
> <javascript:;>> wrote:
>
> > Joe - if you want to pursue this, I suggest you start another thread to
> > keep this thread's discussion in tact.  I will not be able to lead this
> > change, but can certainly shepherd!
> >
> > On Tuesday, January 19, 2016, Joe Smith <yasumoto7@gmail.com
> <javascript:;>> wrote:
> >
> > > As an operator, that'd be a relatively simple change in tooling, and
> the
> > > benefits of not forcing a slave restart would be _huge_.
> > >
> > > Keeping the dedicated semantics (but adding non-exclusive) would be
> ideal
> > > if possible.
> > >
> > > > On Jan 19, 2016, at 19:09, Bill Farner <wfarner@apache.org
> <javascript:;>
> > > <javascript:;>> wrote:
> > > >
> > > > Also, regarding dedicated constraints necessitating a slave restart -
> > > i've
> > > > pondered moving dedicated machine management to the scheduler for
> > similar
> > > > purposes.  There's not really much forcing that behavior to be
> managed
> > > with
> > > > a slave attribute.
> > > >
> > > > On Tue, Jan 19, 2016 at 7:05 PM, John Sirois <john@conductant.com
> <javascript:;>
> > > <javascript:;>> wrote:
> > > >
> > > >> On Tue, Jan 19, 2016 at 7:22 PM, Maxim Khutornenko <
> maxim@apache.org <javascript:;>
> > > <javascript:;>>
> > > >> wrote:
> > > >>
> > > >>> Has anyone explored an idea of having a non-exclusive (wrt job
> role)
> > > >>> dedicated constraint in Aurora before?
> > > >>
> > > >>
> > > >>> We do have a dedicated constraint now but it assumes a 1:1
> > > >>> relationship between a job role and a slave attribute [1]. For
> > > >>> example: a 'www-data/prod/hello' job with a dedicated constraint of
> > > >>> 'dedicated': 'www-data/hello' may only be pinned to a particular
> set
> > > >>> of slaves if all of them have 'www-data/hello' attribute set. No
> > other
> > > >>> role tasks will be able to land on those slaves unless their
> > > >>> 'role/name' pair is added into the slave attribute set.
> > > >>>
> > > >>> The above is very limiting as it prevents carving out subsets of a
> > > >>> shared pool cluster to be used by multiple roles at the same time.
> > > >>> Would it make sense to have a free-form dedicated constraint not
> > bound
> > > >>> to a particular role? Multiple jobs could then use this type of
> > > >>> constraint dynamically without modifying the slave command line
> (and
> > > >>> requiring slave restart).
> > > >>
> > > >> Can't this just be any old Constraint (not named "dedicated").  In
> > other
> > > >> words, doesn't this code already deal with non-dedicated
> constraints?:
> > > >>
> > > >>
> > >
> >
> https://github.com/apache/aurora/blob/master/src/main/java/org/apache/aurora/scheduler/filter/SchedulingFilterImpl.java#L193-L197
> > > >>
> > > >>
> > > >>> This could be quite useful for experimenting purposes (e.g.
> different
> > > >>> host OS) or to target a different hardware offering (e.g. GPUs). In
> > > >>> other words, only those jobs that explicitly opt-in to participate
> in
> > > >>> an experiment or hw offering would be landing on that slave set.
> > > >>>
> > > >>> Thanks,
> > > >>> Maxim
> > > >>>
> > > >>> [1]-
> > > >>
> > >
> >
> https://github.com/apache/aurora/blob/eec985d948f02f46637d87cd4d212eb2a70ef8d0/src/main/java/org/apache/aurora/scheduler/configuration/ConfigurationManager.java#L272-L276
> > > >>
> > > >>
> > > >>
> > > >> --
> > > >> John Sirois
> > > >> 303-512-3301
> > > >>
> > >
> >
>

Re: Non-exclusive dedicated constraint

Posted by Maxim Khutornenko <ma...@apache.org>.
>
> Can't this just be any old Constraint (not named "dedicated").  In other
> words, doesn't this code already deal with non-dedicated constraints?:
>
> https://github.com/apache/aurora/blob/master/src/main/java/org/apache/aurora/scheduler/filter/SchedulingFilterImpl.java#L193-L197


Not really. There is a subtle difference here. A regular (non-dedicated)
constraint does not prevent other tasks from landing on a given machine set
whereas dedicated keeps other tasks away by only allowing those matching
the dedicated attribute. What this proposal targets is allowing exclusive
machine pool matching any job that has this new constraint while keeping
all other tasks that don't have that attribute away.

Following an example from my original post, imagine a GPU machine pool. Any
job (from any role) requiring GPU resource would be allowed while all other
jobs that don't have that constraint would be vetoed.

Also, regarding dedicated constraints necessitating a slave restart - i've
> pondered moving dedicated machine management to the scheduler for similar
> purposes.  There's not really much forcing that behavior to be managed with
> a slave attribute.


Would you mind giving a few more hints on the mechanics behind this? How
would scheduler know about dedicated hw without the slave attributes set?
Are you proposing storing hostname->attribute mapping in the scheduler
store?

On Tue, Jan 19, 2016 at 7:53 PM, Bill Farner <wf...@apache.org> wrote:

> Joe - if you want to pursue this, I suggest you start another thread to
> keep this thread's discussion in tact.  I will not be able to lead this
> change, but can certainly shepherd!
>
> On Tuesday, January 19, 2016, Joe Smith <ya...@gmail.com> wrote:
>
> > As an operator, that'd be a relatively simple change in tooling, and the
> > benefits of not forcing a slave restart would be _huge_.
> >
> > Keeping the dedicated semantics (but adding non-exclusive) would be ideal
> > if possible.
> >
> > > On Jan 19, 2016, at 19:09, Bill Farner <wfarner@apache.org
> > <javascript:;>> wrote:
> > >
> > > Also, regarding dedicated constraints necessitating a slave restart -
> > i've
> > > pondered moving dedicated machine management to the scheduler for
> similar
> > > purposes.  There's not really much forcing that behavior to be managed
> > with
> > > a slave attribute.
> > >
> > > On Tue, Jan 19, 2016 at 7:05 PM, John Sirois <john@conductant.com
> > <javascript:;>> wrote:
> > >
> > >> On Tue, Jan 19, 2016 at 7:22 PM, Maxim Khutornenko <maxim@apache.org
> > <javascript:;>>
> > >> wrote:
> > >>
> > >>> Has anyone explored an idea of having a non-exclusive (wrt job role)
> > >>> dedicated constraint in Aurora before?
> > >>
> > >>
> > >>> We do have a dedicated constraint now but it assumes a 1:1
> > >>> relationship between a job role and a slave attribute [1]. For
> > >>> example: a 'www-data/prod/hello' job with a dedicated constraint of
> > >>> 'dedicated': 'www-data/hello' may only be pinned to a particular set
> > >>> of slaves if all of them have 'www-data/hello' attribute set. No
> other
> > >>> role tasks will be able to land on those slaves unless their
> > >>> 'role/name' pair is added into the slave attribute set.
> > >>>
> > >>> The above is very limiting as it prevents carving out subsets of a
> > >>> shared pool cluster to be used by multiple roles at the same time.
> > >>> Would it make sense to have a free-form dedicated constraint not
> bound
> > >>> to a particular role? Multiple jobs could then use this type of
> > >>> constraint dynamically without modifying the slave command line (and
> > >>> requiring slave restart).
> > >>
> > >> Can't this just be any old Constraint (not named "dedicated").  In
> other
> > >> words, doesn't this code already deal with non-dedicated constraints?:
> > >>
> > >>
> >
> https://github.com/apache/aurora/blob/master/src/main/java/org/apache/aurora/scheduler/filter/SchedulingFilterImpl.java#L193-L197
> > >>
> > >>
> > >>> This could be quite useful for experimenting purposes (e.g. different
> > >>> host OS) or to target a different hardware offering (e.g. GPUs). In
> > >>> other words, only those jobs that explicitly opt-in to participate in
> > >>> an experiment or hw offering would be landing on that slave set.
> > >>>
> > >>> Thanks,
> > >>> Maxim
> > >>>
> > >>> [1]-
> > >>
> >
> https://github.com/apache/aurora/blob/eec985d948f02f46637d87cd4d212eb2a70ef8d0/src/main/java/org/apache/aurora/scheduler/configuration/ConfigurationManager.java#L272-L276
> > >>
> > >>
> > >>
> > >> --
> > >> John Sirois
> > >> 303-512-3301
> > >>
> >
>

Re: Non-exclusive dedicated constraint

Posted by Bill Farner <wf...@apache.org>.
Joe - if you want to pursue this, I suggest you start another thread to
keep this thread's discussion in tact.  I will not be able to lead this
change, but can certainly shepherd!

On Tuesday, January 19, 2016, Joe Smith <ya...@gmail.com> wrote:

> As an operator, that'd be a relatively simple change in tooling, and the
> benefits of not forcing a slave restart would be _huge_.
>
> Keeping the dedicated semantics (but adding non-exclusive) would be ideal
> if possible.
>
> > On Jan 19, 2016, at 19:09, Bill Farner <wfarner@apache.org
> <javascript:;>> wrote:
> >
> > Also, regarding dedicated constraints necessitating a slave restart -
> i've
> > pondered moving dedicated machine management to the scheduler for similar
> > purposes.  There's not really much forcing that behavior to be managed
> with
> > a slave attribute.
> >
> > On Tue, Jan 19, 2016 at 7:05 PM, John Sirois <john@conductant.com
> <javascript:;>> wrote:
> >
> >> On Tue, Jan 19, 2016 at 7:22 PM, Maxim Khutornenko <maxim@apache.org
> <javascript:;>>
> >> wrote:
> >>
> >>> Has anyone explored an idea of having a non-exclusive (wrt job role)
> >>> dedicated constraint in Aurora before?
> >>
> >>
> >>> We do have a dedicated constraint now but it assumes a 1:1
> >>> relationship between a job role and a slave attribute [1]. For
> >>> example: a 'www-data/prod/hello' job with a dedicated constraint of
> >>> 'dedicated': 'www-data/hello' may only be pinned to a particular set
> >>> of slaves if all of them have 'www-data/hello' attribute set. No other
> >>> role tasks will be able to land on those slaves unless their
> >>> 'role/name' pair is added into the slave attribute set.
> >>>
> >>> The above is very limiting as it prevents carving out subsets of a
> >>> shared pool cluster to be used by multiple roles at the same time.
> >>> Would it make sense to have a free-form dedicated constraint not bound
> >>> to a particular role? Multiple jobs could then use this type of
> >>> constraint dynamically without modifying the slave command line (and
> >>> requiring slave restart).
> >>
> >> Can't this just be any old Constraint (not named "dedicated").  In other
> >> words, doesn't this code already deal with non-dedicated constraints?:
> >>
> >>
> https://github.com/apache/aurora/blob/master/src/main/java/org/apache/aurora/scheduler/filter/SchedulingFilterImpl.java#L193-L197
> >>
> >>
> >>> This could be quite useful for experimenting purposes (e.g. different
> >>> host OS) or to target a different hardware offering (e.g. GPUs). In
> >>> other words, only those jobs that explicitly opt-in to participate in
> >>> an experiment or hw offering would be landing on that slave set.
> >>>
> >>> Thanks,
> >>> Maxim
> >>>
> >>> [1]-
> >>
> https://github.com/apache/aurora/blob/eec985d948f02f46637d87cd4d212eb2a70ef8d0/src/main/java/org/apache/aurora/scheduler/configuration/ConfigurationManager.java#L272-L276
> >>
> >>
> >>
> >> --
> >> John Sirois
> >> 303-512-3301
> >>
>

Re: Non-exclusive dedicated constraint

Posted by Joe Smith <ya...@gmail.com>.
As an operator, that'd be a relatively simple change in tooling, and the benefits of not forcing a slave restart would be _huge_.

Keeping the dedicated semantics (but adding non-exclusive) would be ideal if possible.

> On Jan 19, 2016, at 19:09, Bill Farner <wf...@apache.org> wrote:
> 
> Also, regarding dedicated constraints necessitating a slave restart - i've
> pondered moving dedicated machine management to the scheduler for similar
> purposes.  There's not really much forcing that behavior to be managed with
> a slave attribute.
> 
> On Tue, Jan 19, 2016 at 7:05 PM, John Sirois <jo...@conductant.com> wrote:
> 
>> On Tue, Jan 19, 2016 at 7:22 PM, Maxim Khutornenko <ma...@apache.org>
>> wrote:
>> 
>>> Has anyone explored an idea of having a non-exclusive (wrt job role)
>>> dedicated constraint in Aurora before?
>> 
>> 
>>> We do have a dedicated constraint now but it assumes a 1:1
>>> relationship between a job role and a slave attribute [1]. For
>>> example: a 'www-data/prod/hello' job with a dedicated constraint of
>>> 'dedicated': 'www-data/hello' may only be pinned to a particular set
>>> of slaves if all of them have 'www-data/hello' attribute set. No other
>>> role tasks will be able to land on those slaves unless their
>>> 'role/name' pair is added into the slave attribute set.
>>> 
>>> The above is very limiting as it prevents carving out subsets of a
>>> shared pool cluster to be used by multiple roles at the same time.
>>> Would it make sense to have a free-form dedicated constraint not bound
>>> to a particular role? Multiple jobs could then use this type of
>>> constraint dynamically without modifying the slave command line (and
>>> requiring slave restart).
>> 
>> Can't this just be any old Constraint (not named "dedicated").  In other
>> words, doesn't this code already deal with non-dedicated constraints?:
>> 
>> https://github.com/apache/aurora/blob/master/src/main/java/org/apache/aurora/scheduler/filter/SchedulingFilterImpl.java#L193-L197
>> 
>> 
>>> This could be quite useful for experimenting purposes (e.g. different
>>> host OS) or to target a different hardware offering (e.g. GPUs). In
>>> other words, only those jobs that explicitly opt-in to participate in
>>> an experiment or hw offering would be landing on that slave set.
>>> 
>>> Thanks,
>>> Maxim
>>> 
>>> [1]-
>> https://github.com/apache/aurora/blob/eec985d948f02f46637d87cd4d212eb2a70ef8d0/src/main/java/org/apache/aurora/scheduler/configuration/ConfigurationManager.java#L272-L276
>> 
>> 
>> 
>> --
>> John Sirois
>> 303-512-3301
>> 

Re: Non-exclusive dedicated constraint

Posted by Bill Farner <wf...@apache.org>.
Also, regarding dedicated constraints necessitating a slave restart - i've
pondered moving dedicated machine management to the scheduler for similar
purposes.  There's not really much forcing that behavior to be managed with
a slave attribute.

On Tue, Jan 19, 2016 at 7:05 PM, John Sirois <jo...@conductant.com> wrote:

> On Tue, Jan 19, 2016 at 7:22 PM, Maxim Khutornenko <ma...@apache.org>
> wrote:
>
> > Has anyone explored an idea of having a non-exclusive (wrt job role)
> > dedicated constraint in Aurora before?
>
>
> > We do have a dedicated constraint now but it assumes a 1:1
> > relationship between a job role and a slave attribute [1]. For
> > example: a 'www-data/prod/hello' job with a dedicated constraint of
> > 'dedicated': 'www-data/hello' may only be pinned to a particular set
> > of slaves if all of them have 'www-data/hello' attribute set. No other
> > role tasks will be able to land on those slaves unless their
> > 'role/name' pair is added into the slave attribute set.
> >
> > The above is very limiting as it prevents carving out subsets of a
> > shared pool cluster to be used by multiple roles at the same time.
> > Would it make sense to have a free-form dedicated constraint not bound
> > to a particular role? Multiple jobs could then use this type of
> > constraint dynamically without modifying the slave command line (and
> > requiring slave restart).
> >
>
> Can't this just be any old Constraint (not named "dedicated").  In other
> words, doesn't this code already deal with non-dedicated constraints?:
>
> https://github.com/apache/aurora/blob/master/src/main/java/org/apache/aurora/scheduler/filter/SchedulingFilterImpl.java#L193-L197
>
>
> > This could be quite useful for experimenting purposes (e.g. different
> > host OS) or to target a different hardware offering (e.g. GPUs). In
> > other words, only those jobs that explicitly opt-in to participate in
> > an experiment or hw offering would be landing on that slave set.
> >
> > Thanks,
> > Maxim
> >
> > [1]-
> >
> https://github.com/apache/aurora/blob/eec985d948f02f46637d87cd4d212eb2a70ef8d0/src/main/java/org/apache/aurora/scheduler/configuration/ConfigurationManager.java#L272-L276
> >
>
>
>
> --
> John Sirois
> 303-512-3301
>

Re: Non-exclusive dedicated constraint

Posted by John Sirois <jo...@conductant.com>.
On Tue, Jan 19, 2016 at 7:22 PM, Maxim Khutornenko <ma...@apache.org> wrote:

> Has anyone explored an idea of having a non-exclusive (wrt job role)
> dedicated constraint in Aurora before?


> We do have a dedicated constraint now but it assumes a 1:1
> relationship between a job role and a slave attribute [1]. For
> example: a 'www-data/prod/hello' job with a dedicated constraint of
> 'dedicated': 'www-data/hello' may only be pinned to a particular set
> of slaves if all of them have 'www-data/hello' attribute set. No other
> role tasks will be able to land on those slaves unless their
> 'role/name' pair is added into the slave attribute set.
>
> The above is very limiting as it prevents carving out subsets of a
> shared pool cluster to be used by multiple roles at the same time.
> Would it make sense to have a free-form dedicated constraint not bound
> to a particular role? Multiple jobs could then use this type of
> constraint dynamically without modifying the slave command line (and
> requiring slave restart).
>

Can't this just be any old Constraint (not named "dedicated").  In other
words, doesn't this code already deal with non-dedicated constraints?:
https://github.com/apache/aurora/blob/master/src/main/java/org/apache/aurora/scheduler/filter/SchedulingFilterImpl.java#L193-L197


> This could be quite useful for experimenting purposes (e.g. different
> host OS) or to target a different hardware offering (e.g. GPUs). In
> other words, only those jobs that explicitly opt-in to participate in
> an experiment or hw offering would be landing on that slave set.
>
> Thanks,
> Maxim
>
> [1]-
> https://github.com/apache/aurora/blob/eec985d948f02f46637d87cd4d212eb2a70ef8d0/src/main/java/org/apache/aurora/scheduler/configuration/ConfigurationManager.java#L272-L276
>



-- 
John Sirois
303-512-3301