You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@aurora.apache.org by Mauricio Garavaglia <ma...@gmail.com> on 2017/03/06 19:44:41 UTC

schedule task instances spreading them based on a host attribute.

Hello!

I have a job that have multiple instances (>100) that'd I like to spread
across the hosts in a cluster. Using a constraint such as "limit=host:1"
doesn't work quite well, as I have more instances than nodes.

As a workaround I increased the limit value to something like
ceil(instances/nodes). But now the problem happens if a bunch of nodes go
down (think a whole rack dies) because the instances will not run until
them are back, even though we may have spare capacity on the rest of the
hosts that we'd like to use. In that scenario, the job availability may be
affected because it's running with fewer instances than expected. On a
smaller scale, the former approach would also apply if you want to spread
tasks in racks or availability zones. I'd like to have one instance of a
job per rack (failure domain) but in the case of it going down, the
instance can be spawn on a different rack.

I thought we could have a scheduling constraint to "spread" instances
across a particular host attribute; instead of vetoing an offer right away
we check where the other instances of a task are running, looking for a
particular attribute of the host. We try to maximize the different values
of a particular attribute (rack, hostname, etc) on the task instances
assignment.

what do you think? did something like this came up in the past? is it
feasible?


Mauricio

Re: schedule task instances spreading them based on a host attribute.

Posted by Meghdoot bhattacharya <me...@yahoo.com.INVALID>.

Fenzo integration should be considered.

Old thread
http://markmail.org/message/bjifjyvhvs2en3ts

If no volunteers by summer would take it up.

Thx

> On Mar 23, 2017, at 5:14 PM, Zameer Manji <zm...@apache.org> wrote:
> 
> Hey,
> 
> Sorry for the late reply.
> 
> It is possible to make this configurable. For example we could just
> implement multiple algorithms and switch between them using different
> flags. If the flag value is just a class on the classpath that implements
> an interface, it can be 100% pluggable.
> 
> The primary part of the scheduling code is `TaskScheduler` and
> `TaskAssigner`.
> 
> `TaskScheduler` receives requests to schedule tasks and does some
> validation and preparation. `TaskAssigner` implements the first fit
> algorithm.
> 
> However, I feel the best move for the project would be to move away from
> first fit, to support soft constraints. I think it is a very valid feature
> request and I believe it can be done without degradation performance.
> Ideally, we should just use an existing Java library that implements a well
> known algorithm. For example, Netflix's Fenzo
> <https://github.com/Netflix/Fenzo> could be used here.
> 
> On Wed, Mar 15, 2017 at 11:10 AM, Mauricio Garavaglia <
> mauriciogaravaglia@gmail.com> wrote:
> 
>> Hi,
>> 
>> Rather than changing the scheduling algorithm, I think we should open to
>> support multiple algorithms. First-fit is certainly a great solution for
>> humongous clusters with homogeneous workloads; but for smaller clusters we
>> can make have more optimized scheduling without sacrificing scheduling
>> performance.
>> 
>> How difficult do you think it would be to start exploring that option?
>> Haven't looked into the scheduling side of the code :)
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>>> On Mon, Mar 6, 2017 at 2:57 PM, Zameer Manji <zm...@apache.org> wrote:
>>> 
>>> Something similar was proposed on a Dynamic Reservations review and there
>>> is a ticket for it here <https://issues.apache.org/
>> jira/browse/AURORA-173
>>>> .
>>> 
>>> I think it is feasible, but it is important to note that this is large
>>> change because we are going to move Aurora from first fit to some other
>>> algorithm.
>>> 
>>> If we do this we need to ensure it scales to very large clusters and
>>> ensures reasonably low latency in assigning tasks to offers.
>>> 
>>> I support the idea of "spread", but it would need to be after a change to
>>> the scheduling algorithm.
>>> 
>>> On Mon, Mar 6, 2017 at 11:44 AM, Mauricio Garavaglia <
>>> mauriciogaravaglia@gmail.com> wrote:
>>> 
>>>> Hello!
>>>> 
>>>> I have a job that have multiple instances (>100) that'd I like to
>> spread
>>>> across the hosts in a cluster. Using a constraint such as
>> "limit=host:1"
>>>> doesn't work quite well, as I have more instances than nodes.
>>>> 
>>>> As a workaround I increased the limit value to something like
>>>> ceil(instances/nodes). But now the problem happens if a bunch of nodes
>> go
>>>> down (think a whole rack dies) because the instances will not run until
>>>> them are back, even though we may have spare capacity on the rest of
>> the
>>>> hosts that we'd like to use. In that scenario, the job availability may
>>> be
>>>> affected because it's running with fewer instances than expected. On a
>>>> smaller scale, the former approach would also apply if you want to
>> spread
>>>> tasks in racks or availability zones. I'd like to have one instance of
>> a
>>>> job per rack (failure domain) but in the case of it going down, the
>>>> instance can be spawn on a different rack.
>>>> 
>>>> I thought we could have a scheduling constraint to "spread" instances
>>>> across a particular host attribute; instead of vetoing an offer right
>>> away
>>>> we check where the other instances of a task are running, looking for a
>>>> particular attribute of the host. We try to maximize the different
>> values
>>>> of a particular attribute (rack, hostname, etc) on the task instances
>>>> assignment.
>>>> 
>>>> what do you think? did something like this came up in the past? is it
>>>> feasible?
>>>> 
>>>> 
>>>> Mauricio
>>>> 
>>>> --
>>>> Zameer Manji
>>>> 
>>> 
>> 
>> --
>> Zameer Manji
>>

Re: schedule task instances spreading them based on a host attribute.

Posted by Zameer Manji <zm...@apache.org>.

Hey,

Sorry for the late reply.

It is possible to make this configurable. For example we could just
implement multiple algorithms and switch between them using different
flags. If the flag value is just a class on the classpath that implements
an interface, it can be 100% pluggable.

The primary part of the scheduling code is `TaskScheduler` and
`TaskAssigner`.

`TaskScheduler` receives requests to schedule tasks and does some
validation and preparation. `TaskAssigner` implements the first fit
algorithm.

However, I feel the best move for the project would be to move away from
first fit, to support soft constraints. I think it is a very valid feature
request and I believe it can be done without degradation performance.
Ideally, we should just use an existing Java library that implements a well
known algorithm. For example, Netflix's Fenzo
<https://github.com/Netflix/Fenzo> could be used here.

On Wed, Mar 15, 2017 at 11:10 AM, Mauricio Garavaglia <
mauriciogaravaglia@gmail.com> wrote:

> Hi,
>
> Rather than changing the scheduling algorithm, I think we should open to
> support multiple algorithms. First-fit is certainly a great solution for
> humongous clusters with homogeneous workloads; but for smaller clusters we
> can make have more optimized scheduling without sacrificing scheduling
> performance.
>
> How difficult do you think it would be to start exploring that option?
> Haven't looked into the scheduling side of the code :)
>
>
>
>
>
>
>
>
> On Mon, Mar 6, 2017 at 2:57 PM, Zameer Manji <zm...@apache.org> wrote:
>
> > Something similar was proposed on a Dynamic Reservations review and there
> > is a ticket for it here <https://issues.apache.org/
> jira/browse/AURORA-173
> > >.
> >
> > I think it is feasible, but it is important to note that this is large
> > change because we are going to move Aurora from first fit to some other
> > algorithm.
> >
> > If we do this we need to ensure it scales to very large clusters and
> > ensures reasonably low latency in assigning tasks to offers.
> >
> > I support the idea of "spread", but it would need to be after a change to
> > the scheduling algorithm.
> >
> > On Mon, Mar 6, 2017 at 11:44 AM, Mauricio Garavaglia <
> > mauriciogaravaglia@gmail.com> wrote:
> >
> > > Hello!
> > >
> > > I have a job that have multiple instances (>100) that'd I like to
> spread
> > > across the hosts in a cluster. Using a constraint such as
> "limit=host:1"
> > > doesn't work quite well, as I have more instances than nodes.
> > >
> > > As a workaround I increased the limit value to something like
> > > ceil(instances/nodes). But now the problem happens if a bunch of nodes
> go
> > > down (think a whole rack dies) because the instances will not run until
> > > them are back, even though we may have spare capacity on the rest of
> the
> > > hosts that we'd like to use. In that scenario, the job availability may
> > be
> > > affected because it's running with fewer instances than expected. On a
> > > smaller scale, the former approach would also apply if you want to
> spread
> > > tasks in racks or availability zones. I'd like to have one instance of
> a
> > > job per rack (failure domain) but in the case of it going down, the
> > > instance can be spawn on a different rack.
> > >
> > > I thought we could have a scheduling constraint to "spread" instances
> > > across a particular host attribute; instead of vetoing an offer right
> > away
> > > we check where the other instances of a task are running, looking for a
> > > particular attribute of the host. We try to maximize the different
> values
> > > of a particular attribute (rack, hostname, etc) on the task instances
> > > assignment.
> > >
> > > what do you think? did something like this came up in the past? is it
> > > feasible?
> > >
> > >
> > > Mauricio
> > >
> > > --
> > > Zameer Manji
> > >
> >
>
> --
> Zameer Manji
>

Re: schedule task instances spreading them based on a host attribute.

Posted by Mauricio Garavaglia <ma...@gmail.com>.

Hi,

Rather than changing the scheduling algorithm, I think we should open to
support multiple algorithms. First-fit is certainly a great solution for
humongous clusters with homogeneous workloads; but for smaller clusters we
can make have more optimized scheduling without sacrificing scheduling
performance.

How difficult do you think it would be to start exploring that option?
Haven't looked into the scheduling side of the code :)








On Mon, Mar 6, 2017 at 2:57 PM, Zameer Manji <zm...@apache.org> wrote:

> Something similar was proposed on a Dynamic Reservations review and there
> is a ticket for it here <https://issues.apache.org/jira/browse/AURORA-173
> >.
>
> I think it is feasible, but it is important to note that this is large
> change because we are going to move Aurora from first fit to some other
> algorithm.
>
> If we do this we need to ensure it scales to very large clusters and
> ensures reasonably low latency in assigning tasks to offers.
>
> I support the idea of "spread", but it would need to be after a change to
> the scheduling algorithm.
>
> On Mon, Mar 6, 2017 at 11:44 AM, Mauricio Garavaglia <
> mauriciogaravaglia@gmail.com> wrote:
>
> > Hello!
> >
> > I have a job that have multiple instances (>100) that'd I like to spread
> > across the hosts in a cluster. Using a constraint such as "limit=host:1"
> > doesn't work quite well, as I have more instances than nodes.
> >
> > As a workaround I increased the limit value to something like
> > ceil(instances/nodes). But now the problem happens if a bunch of nodes go
> > down (think a whole rack dies) because the instances will not run until
> > them are back, even though we may have spare capacity on the rest of the
> > hosts that we'd like to use. In that scenario, the job availability may
> be
> > affected because it's running with fewer instances than expected. On a
> > smaller scale, the former approach would also apply if you want to spread
> > tasks in racks or availability zones. I'd like to have one instance of a
> > job per rack (failure domain) but in the case of it going down, the
> > instance can be spawn on a different rack.
> >
> > I thought we could have a scheduling constraint to "spread" instances
> > across a particular host attribute; instead of vetoing an offer right
> away
> > we check where the other instances of a task are running, looking for a
> > particular attribute of the host. We try to maximize the different values
> > of a particular attribute (rack, hostname, etc) on the task instances
> > assignment.
> >
> > what do you think? did something like this came up in the past? is it
> > feasible?
> >
> >
> > Mauricio
> >
> > --
> > Zameer Manji
> >
>

Re: schedule task instances spreading them based on a host attribute.

Posted by Zameer Manji <zm...@apache.org>.

Something similar was proposed on a Dynamic Reservations review and there
is a ticket for it here <https://issues.apache.org/jira/browse/AURORA-173>.

I think it is feasible, but it is important to note that this is large
change because we are going to move Aurora from first fit to some other
algorithm.

If we do this we need to ensure it scales to very large clusters and
ensures reasonably low latency in assigning tasks to offers.

I support the idea of "spread", but it would need to be after a change to
the scheduling algorithm.

On Mon, Mar 6, 2017 at 11:44 AM, Mauricio Garavaglia <
mauriciogaravaglia@gmail.com> wrote:

> Hello!
>
> I have a job that have multiple instances (>100) that'd I like to spread
> across the hosts in a cluster. Using a constraint such as "limit=host:1"
> doesn't work quite well, as I have more instances than nodes.
>
> As a workaround I increased the limit value to something like
> ceil(instances/nodes). But now the problem happens if a bunch of nodes go
> down (think a whole rack dies) because the instances will not run until
> them are back, even though we may have spare capacity on the rest of the
> hosts that we'd like to use. In that scenario, the job availability may be
> affected because it's running with fewer instances than expected. On a
> smaller scale, the former approach would also apply if you want to spread
> tasks in racks or availability zones. I'd like to have one instance of a
> job per rack (failure domain) but in the case of it going down, the
> instance can be spawn on a different rack.
>
> I thought we could have a scheduling constraint to "spread" instances
> across a particular host attribute; instead of vetoing an offer right away
> we check where the other instances of a task are running, looking for a
> particular attribute of the host. We try to maximize the different values
> of a particular attribute (rack, hostname, etc) on the task instances
> assignment.
>
> what do you think? did something like this came up in the past? is it
> feasible?
>
>
> Mauricio
>
> --
> Zameer Manji
>