You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by Zhitao Li <zh...@gmail.com> on 2018/03/22 17:06:57 UTC

Support deadline for tasks

In our environment, we run a lot of batch jobs, some of which have tight
timeline. If any tasks in the job runs longer than x hours, it does not
make sense to run it anymore.

For instance, a team would submit a job which builds a weekly index and
repeats every Monday. If the job does not finish before next Monday for
whatever reason, there is no point to keep any task running.

We believe that implementing deadline tracking distributed across our
cluster makes more sense as it makes the system more scalable and also
makes our centralized state machine simpler.

One idea I have right now is to add an  *optional* *TimeInfo deadline* to
TaskInfo field, and all default executors in Mesos can simply terminate the
task and send a proper *StatusUpdate.*

I summarized above idea in MESOS-8725
<https://issues.apache.org/jira/browse/MESOS-8725>.

Please let me know what you think. Thanks!

-- 
Cheers,

Zhitao Li

Re: Support deadline for tasks

Posted by David Morrison <dr...@yelp.com>.
Hi, Benjamin,

Usually for us if tasks run longer than a certain period of time it means
that something has gone wrong and we should just abort/try again.

David (also at Yelp)

On Fri, Mar 23, 2018 at 7:14 PM, Benjamin Mahler <bm...@apache.org> wrote:

> Ah, I was more curious about why they need to be killed after a timeout.
> E.g. After a particular deadline the work is useless (in Zhitao's case).
>
> On Fri, Mar 23, 2018 at 6:22 PM Sagar Sadashiv Patwardhan <sa...@yelp.com>
> wrote:
>
>> Hi Benjamin,
>>                     We have a few tasks that should be killed after
>> some timeout. We currently have some logic in our scheduler to kill these
>> tasks. Would be nice to delegate this to the executor.
>>
>> - Sagar
>>
>> On Fri, Mar 23, 2018 at 3:29 PM, Benjamin Mahler <bm...@apache.org>
>> wrote:
>>
>> > Sagar, could you share your use case? Or is it exactly the same as
>> > Zhitao's?
>> >
>> > On Fri, Mar 23, 2018 at 3:15 PM, Sagar Sadashiv Patwardhan <
>> > sagarp@yelp.com>
>> > wrote:
>> >
>> > > +1
>> > >
>> > > This will be useful for us(Yelp) as well.
>> > >
>> > > On Fri, Mar 23, 2018 at 1:31 PM, Benjamin Mahler <bm...@apache.org>
>> > > wrote:
>> > >
>> > > > Also, it's advantageous for mesos to be aware of a hard deadline
>> when
>> > it
>> > > > comes to resource allocation. We know that some resources will free
>> up
>> > > and
>> > > > can make better decisions when it comes to pre-emption, for example.
>> > > > Currently, mesos doesn't know if a task will run forever or will
>> run to
>> > > > completion.
>> > > >
>> > > > On Fri, Mar 23, 2018 at 10:07 AM, James Peach <jp...@apache.org>
>> > wrote:
>> > > >
>> > > > >
>> > > > >
>> > > > > > On Mar 23, 2018, at 9:57 AM, Renan DelValle <
>> > > renanidelvalle@gmail.com>
>> > > > > wrote:
>> > > > > >
>> > > > > > Hi Zhitao,
>> > > > > >
>> > > > > > Since this is something that could potentially be handled by the
>> > > > > executor and/or framework, I was wondering if you could speak to
>> the
>> > > > > advantages of making this a TaskInfo primitive vs having the
>> executor
>> > > (or
>> > > > > even the framework) handle it.
>> > > > >
>> > > > > There's some discussion around this on https://issues.apache.org/
>> > > > > jira/browse/MESOS-8725.
>> > > > >
>> > > > > My take is that delegating too much to the scheduler makes
>> schedulers
>> > > > > harder to write and exacerbates the complexity of the system. If 4
>> > > > > different schedulers implement this feature, operators are likely
>> to
>> > > need
>> > > > > to understand 4 different ways of doing the same thing, which
>> would
>> > be
>> > > > > unfortunate.
>> > > > >
>> > > > > J
>> > > >
>> > >
>> >
>>
>

Re: Support deadline for tasks

Posted by David Morrison <dr...@yelp.com>.
Hi, Benjamin,

Usually for us if tasks run longer than a certain period of time it means
that something has gone wrong and we should just abort/try again.

David (also at Yelp)

On Fri, Mar 23, 2018 at 7:14 PM, Benjamin Mahler <bm...@apache.org> wrote:

> Ah, I was more curious about why they need to be killed after a timeout.
> E.g. After a particular deadline the work is useless (in Zhitao's case).
>
> On Fri, Mar 23, 2018 at 6:22 PM Sagar Sadashiv Patwardhan <sa...@yelp.com>
> wrote:
>
>> Hi Benjamin,
>>                     We have a few tasks that should be killed after
>> some timeout. We currently have some logic in our scheduler to kill these
>> tasks. Would be nice to delegate this to the executor.
>>
>> - Sagar
>>
>> On Fri, Mar 23, 2018 at 3:29 PM, Benjamin Mahler <bm...@apache.org>
>> wrote:
>>
>> > Sagar, could you share your use case? Or is it exactly the same as
>> > Zhitao's?
>> >
>> > On Fri, Mar 23, 2018 at 3:15 PM, Sagar Sadashiv Patwardhan <
>> > sagarp@yelp.com>
>> > wrote:
>> >
>> > > +1
>> > >
>> > > This will be useful for us(Yelp) as well.
>> > >
>> > > On Fri, Mar 23, 2018 at 1:31 PM, Benjamin Mahler <bm...@apache.org>
>> > > wrote:
>> > >
>> > > > Also, it's advantageous for mesos to be aware of a hard deadline
>> when
>> > it
>> > > > comes to resource allocation. We know that some resources will free
>> up
>> > > and
>> > > > can make better decisions when it comes to pre-emption, for example.
>> > > > Currently, mesos doesn't know if a task will run forever or will
>> run to
>> > > > completion.
>> > > >
>> > > > On Fri, Mar 23, 2018 at 10:07 AM, James Peach <jp...@apache.org>
>> > wrote:
>> > > >
>> > > > >
>> > > > >
>> > > > > > On Mar 23, 2018, at 9:57 AM, Renan DelValle <
>> > > renanidelvalle@gmail.com>
>> > > > > wrote:
>> > > > > >
>> > > > > > Hi Zhitao,
>> > > > > >
>> > > > > > Since this is something that could potentially be handled by the
>> > > > > executor and/or framework, I was wondering if you could speak to
>> the
>> > > > > advantages of making this a TaskInfo primitive vs having the
>> executor
>> > > (or
>> > > > > even the framework) handle it.
>> > > > >
>> > > > > There's some discussion around this on https://issues.apache.org/
>> > > > > jira/browse/MESOS-8725.
>> > > > >
>> > > > > My take is that delegating too much to the scheduler makes
>> schedulers
>> > > > > harder to write and exacerbates the complexity of the system. If 4
>> > > > > different schedulers implement this feature, operators are likely
>> to
>> > > need
>> > > > > to understand 4 different ways of doing the same thing, which
>> would
>> > be
>> > > > > unfortunate.
>> > > > >
>> > > > > J
>> > > >
>> > >
>> >
>>
>

Re: Support deadline for tasks

Posted by Benjamin Mahler <bm...@apache.org>.
Ah, I was more curious about why they need to be killed after a timeout.
E.g. After a particular deadline the work is useless (in Zhitao's case).

On Fri, Mar 23, 2018 at 6:22 PM Sagar Sadashiv Patwardhan <sa...@yelp.com>
wrote:

> Hi Benjamin,
>                     We have a few tasks that should be killed after
> some timeout. We currently have some logic in our scheduler to kill these
> tasks. Would be nice to delegate this to the executor.
>
> - Sagar
>
> On Fri, Mar 23, 2018 at 3:29 PM, Benjamin Mahler <bm...@apache.org>
> wrote:
>
> > Sagar, could you share your use case? Or is it exactly the same as
> > Zhitao's?
> >
> > On Fri, Mar 23, 2018 at 3:15 PM, Sagar Sadashiv Patwardhan <
> > sagarp@yelp.com>
> > wrote:
> >
> > > +1
> > >
> > > This will be useful for us(Yelp) as well.
> > >
> > > On Fri, Mar 23, 2018 at 1:31 PM, Benjamin Mahler <bm...@apache.org>
> > > wrote:
> > >
> > > > Also, it's advantageous for mesos to be aware of a hard deadline when
> > it
> > > > comes to resource allocation. We know that some resources will free
> up
> > > and
> > > > can make better decisions when it comes to pre-emption, for example.
> > > > Currently, mesos doesn't know if a task will run forever or will run
> to
> > > > completion.
> > > >
> > > > On Fri, Mar 23, 2018 at 10:07 AM, James Peach <jp...@apache.org>
> > wrote:
> > > >
> > > > >
> > > > >
> > > > > > On Mar 23, 2018, at 9:57 AM, Renan DelValle <
> > > renanidelvalle@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > Hi Zhitao,
> > > > > >
> > > > > > Since this is something that could potentially be handled by the
> > > > > executor and/or framework, I was wondering if you could speak to
> the
> > > > > advantages of making this a TaskInfo primitive vs having the
> executor
> > > (or
> > > > > even the framework) handle it.
> > > > >
> > > > > There's some discussion around this on https://issues.apache.org/
> > > > > jira/browse/MESOS-8725.
> > > > >
> > > > > My take is that delegating too much to the scheduler makes
> schedulers
> > > > > harder to write and exacerbates the complexity of the system. If 4
> > > > > different schedulers implement this feature, operators are likely
> to
> > > need
> > > > > to understand 4 different ways of doing the same thing, which would
> > be
> > > > > unfortunate.
> > > > >
> > > > > J
> > > >
> > >
> >
>

Re: Support deadline for tasks

Posted by Benjamin Mahler <bm...@apache.org>.
Ah, I was more curious about why they need to be killed after a timeout.
E.g. After a particular deadline the work is useless (in Zhitao's case).

On Fri, Mar 23, 2018 at 6:22 PM Sagar Sadashiv Patwardhan <sa...@yelp.com>
wrote:

> Hi Benjamin,
>                     We have a few tasks that should be killed after
> some timeout. We currently have some logic in our scheduler to kill these
> tasks. Would be nice to delegate this to the executor.
>
> - Sagar
>
> On Fri, Mar 23, 2018 at 3:29 PM, Benjamin Mahler <bm...@apache.org>
> wrote:
>
> > Sagar, could you share your use case? Or is it exactly the same as
> > Zhitao's?
> >
> > On Fri, Mar 23, 2018 at 3:15 PM, Sagar Sadashiv Patwardhan <
> > sagarp@yelp.com>
> > wrote:
> >
> > > +1
> > >
> > > This will be useful for us(Yelp) as well.
> > >
> > > On Fri, Mar 23, 2018 at 1:31 PM, Benjamin Mahler <bm...@apache.org>
> > > wrote:
> > >
> > > > Also, it's advantageous for mesos to be aware of a hard deadline when
> > it
> > > > comes to resource allocation. We know that some resources will free
> up
> > > and
> > > > can make better decisions when it comes to pre-emption, for example.
> > > > Currently, mesos doesn't know if a task will run forever or will run
> to
> > > > completion.
> > > >
> > > > On Fri, Mar 23, 2018 at 10:07 AM, James Peach <jp...@apache.org>
> > wrote:
> > > >
> > > > >
> > > > >
> > > > > > On Mar 23, 2018, at 9:57 AM, Renan DelValle <
> > > renanidelvalle@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > Hi Zhitao,
> > > > > >
> > > > > > Since this is something that could potentially be handled by the
> > > > > executor and/or framework, I was wondering if you could speak to
> the
> > > > > advantages of making this a TaskInfo primitive vs having the
> executor
> > > (or
> > > > > even the framework) handle it.
> > > > >
> > > > > There's some discussion around this on https://issues.apache.org/
> > > > > jira/browse/MESOS-8725.
> > > > >
> > > > > My take is that delegating too much to the scheduler makes
> schedulers
> > > > > harder to write and exacerbates the complexity of the system. If 4
> > > > > different schedulers implement this feature, operators are likely
> to
> > > need
> > > > > to understand 4 different ways of doing the same thing, which would
> > be
> > > > > unfortunate.
> > > > >
> > > > > J
> > > >
> > >
> >
>

Re: Support deadline for tasks

Posted by Sagar Sadashiv Patwardhan <sa...@yelp.com>.
Hi Benjamin,
                    We have a few tasks that should be killed after
some timeout. We currently have some logic in our scheduler to kill these
tasks. Would be nice to delegate this to the executor.

- Sagar

On Fri, Mar 23, 2018 at 3:29 PM, Benjamin Mahler <bm...@apache.org> wrote:

> Sagar, could you share your use case? Or is it exactly the same as
> Zhitao's?
>
> On Fri, Mar 23, 2018 at 3:15 PM, Sagar Sadashiv Patwardhan <
> sagarp@yelp.com>
> wrote:
>
> > +1
> >
> > This will be useful for us(Yelp) as well.
> >
> > On Fri, Mar 23, 2018 at 1:31 PM, Benjamin Mahler <bm...@apache.org>
> > wrote:
> >
> > > Also, it's advantageous for mesos to be aware of a hard deadline when
> it
> > > comes to resource allocation. We know that some resources will free up
> > and
> > > can make better decisions when it comes to pre-emption, for example.
> > > Currently, mesos doesn't know if a task will run forever or will run to
> > > completion.
> > >
> > > On Fri, Mar 23, 2018 at 10:07 AM, James Peach <jp...@apache.org>
> wrote:
> > >
> > > >
> > > >
> > > > > On Mar 23, 2018, at 9:57 AM, Renan DelValle <
> > renanidelvalle@gmail.com>
> > > > wrote:
> > > > >
> > > > > Hi Zhitao,
> > > > >
> > > > > Since this is something that could potentially be handled by the
> > > > executor and/or framework, I was wondering if you could speak to the
> > > > advantages of making this a TaskInfo primitive vs having the executor
> > (or
> > > > even the framework) handle it.
> > > >
> > > > There's some discussion around this on https://issues.apache.org/
> > > > jira/browse/MESOS-8725.
> > > >
> > > > My take is that delegating too much to the scheduler makes schedulers
> > > > harder to write and exacerbates the complexity of the system. If 4
> > > > different schedulers implement this feature, operators are likely to
> > need
> > > > to understand 4 different ways of doing the same thing, which would
> be
> > > > unfortunate.
> > > >
> > > > J
> > >
> >
>

Re: Support deadline for tasks

Posted by Benjamin Mahler <bm...@apache.org>.
Sagar, could you share your use case? Or is it exactly the same as Zhitao's?

On Fri, Mar 23, 2018 at 3:15 PM, Sagar Sadashiv Patwardhan <sa...@yelp.com>
wrote:

> +1
>
> This will be useful for us(Yelp) as well.
>
> On Fri, Mar 23, 2018 at 1:31 PM, Benjamin Mahler <bm...@apache.org>
> wrote:
>
> > Also, it's advantageous for mesos to be aware of a hard deadline when it
> > comes to resource allocation. We know that some resources will free up
> and
> > can make better decisions when it comes to pre-emption, for example.
> > Currently, mesos doesn't know if a task will run forever or will run to
> > completion.
> >
> > On Fri, Mar 23, 2018 at 10:07 AM, James Peach <jp...@apache.org> wrote:
> >
> > >
> > >
> > > > On Mar 23, 2018, at 9:57 AM, Renan DelValle <
> renanidelvalle@gmail.com>
> > > wrote:
> > > >
> > > > Hi Zhitao,
> > > >
> > > > Since this is something that could potentially be handled by the
> > > executor and/or framework, I was wondering if you could speak to the
> > > advantages of making this a TaskInfo primitive vs having the executor
> (or
> > > even the framework) handle it.
> > >
> > > There's some discussion around this on https://issues.apache.org/
> > > jira/browse/MESOS-8725.
> > >
> > > My take is that delegating too much to the scheduler makes schedulers
> > > harder to write and exacerbates the complexity of the system. If 4
> > > different schedulers implement this feature, operators are likely to
> need
> > > to understand 4 different ways of doing the same thing, which would be
> > > unfortunate.
> > >
> > > J
> >
>

Re: Support deadline for tasks

Posted by Benjamin Mahler <bm...@apache.org>.
Sagar, could you share your use case? Or is it exactly the same as Zhitao's?

On Fri, Mar 23, 2018 at 3:15 PM, Sagar Sadashiv Patwardhan <sa...@yelp.com>
wrote:

> +1
>
> This will be useful for us(Yelp) as well.
>
> On Fri, Mar 23, 2018 at 1:31 PM, Benjamin Mahler <bm...@apache.org>
> wrote:
>
> > Also, it's advantageous for mesos to be aware of a hard deadline when it
> > comes to resource allocation. We know that some resources will free up
> and
> > can make better decisions when it comes to pre-emption, for example.
> > Currently, mesos doesn't know if a task will run forever or will run to
> > completion.
> >
> > On Fri, Mar 23, 2018 at 10:07 AM, James Peach <jp...@apache.org> wrote:
> >
> > >
> > >
> > > > On Mar 23, 2018, at 9:57 AM, Renan DelValle <
> renanidelvalle@gmail.com>
> > > wrote:
> > > >
> > > > Hi Zhitao,
> > > >
> > > > Since this is something that could potentially be handled by the
> > > executor and/or framework, I was wondering if you could speak to the
> > > advantages of making this a TaskInfo primitive vs having the executor
> (or
> > > even the framework) handle it.
> > >
> > > There's some discussion around this on https://issues.apache.org/
> > > jira/browse/MESOS-8725.
> > >
> > > My take is that delegating too much to the scheduler makes schedulers
> > > harder to write and exacerbates the complexity of the system. If 4
> > > different schedulers implement this feature, operators are likely to
> need
> > > to understand 4 different ways of doing the same thing, which would be
> > > unfortunate.
> > >
> > > J
> >
>

Re: Support deadline for tasks

Posted by Sagar Sadashiv Patwardhan <sa...@yelp.com>.
+1

This will be useful for us(Yelp) as well.

On Fri, Mar 23, 2018 at 1:31 PM, Benjamin Mahler <bm...@apache.org> wrote:

> Also, it's advantageous for mesos to be aware of a hard deadline when it
> comes to resource allocation. We know that some resources will free up and
> can make better decisions when it comes to pre-emption, for example.
> Currently, mesos doesn't know if a task will run forever or will run to
> completion.
>
> On Fri, Mar 23, 2018 at 10:07 AM, James Peach <jp...@apache.org> wrote:
>
> >
> >
> > > On Mar 23, 2018, at 9:57 AM, Renan DelValle <re...@gmail.com>
> > wrote:
> > >
> > > Hi Zhitao,
> > >
> > > Since this is something that could potentially be handled by the
> > executor and/or framework, I was wondering if you could speak to the
> > advantages of making this a TaskInfo primitive vs having the executor (or
> > even the framework) handle it.
> >
> > There's some discussion around this on https://issues.apache.org/
> > jira/browse/MESOS-8725.
> >
> > My take is that delegating too much to the scheduler makes schedulers
> > harder to write and exacerbates the complexity of the system. If 4
> > different schedulers implement this feature, operators are likely to need
> > to understand 4 different ways of doing the same thing, which would be
> > unfortunate.
> >
> > J
>

Re: Support deadline for tasks

Posted by Benjamin Mahler <bm...@apache.org>.
Also, it's advantageous for mesos to be aware of a hard deadline when it
comes to resource allocation. We know that some resources will free up and
can make better decisions when it comes to pre-emption, for example.
Currently, mesos doesn't know if a task will run forever or will run to
completion.

On Fri, Mar 23, 2018 at 10:07 AM, James Peach <jp...@apache.org> wrote:

>
>
> > On Mar 23, 2018, at 9:57 AM, Renan DelValle <re...@gmail.com>
> wrote:
> >
> > Hi Zhitao,
> >
> > Since this is something that could potentially be handled by the
> executor and/or framework, I was wondering if you could speak to the
> advantages of making this a TaskInfo primitive vs having the executor (or
> even the framework) handle it.
>
> There's some discussion around this on https://issues.apache.org/
> jira/browse/MESOS-8725.
>
> My take is that delegating too much to the scheduler makes schedulers
> harder to write and exacerbates the complexity of the system. If 4
> different schedulers implement this feature, operators are likely to need
> to understand 4 different ways of doing the same thing, which would be
> unfortunate.
>
> J

Re: Support deadline for tasks

Posted by Benjamin Mahler <bm...@apache.org>.
Also, it's advantageous for mesos to be aware of a hard deadline when it
comes to resource allocation. We know that some resources will free up and
can make better decisions when it comes to pre-emption, for example.
Currently, mesos doesn't know if a task will run forever or will run to
completion.

On Fri, Mar 23, 2018 at 10:07 AM, James Peach <jp...@apache.org> wrote:

>
>
> > On Mar 23, 2018, at 9:57 AM, Renan DelValle <re...@gmail.com>
> wrote:
> >
> > Hi Zhitao,
> >
> > Since this is something that could potentially be handled by the
> executor and/or framework, I was wondering if you could speak to the
> advantages of making this a TaskInfo primitive vs having the executor (or
> even the framework) handle it.
>
> There's some discussion around this on https://issues.apache.org/
> jira/browse/MESOS-8725.
>
> My take is that delegating too much to the scheduler makes schedulers
> harder to write and exacerbates the complexity of the system. If 4
> different schedulers implement this feature, operators are likely to need
> to understand 4 different ways of doing the same thing, which would be
> unfortunate.
>
> J

Re: Support deadline for tasks

Posted by James Peach <jp...@apache.org>.

> On Mar 23, 2018, at 9:57 AM, Renan DelValle <re...@gmail.com> wrote:
> 
> Hi Zhitao,
> 
> Since this is something that could potentially be handled by the executor and/or framework, I was wondering if you could speak to the advantages of making this a TaskInfo primitive vs having the executor (or even the framework) handle it.

There's some discussion around this on https://issues.apache.org/jira/browse/MESOS-8725.

My take is that delegating too much to the scheduler makes schedulers harder to write and exacerbates the complexity of the system. If 4 different schedulers implement this feature, operators are likely to need to understand 4 different ways of doing the same thing, which would be unfortunate. 

J

Re: Support deadline for tasks

Posted by James Peach <jp...@apache.org>.

> On Mar 23, 2018, at 9:57 AM, Renan DelValle <re...@gmail.com> wrote:
> 
> Hi Zhitao,
> 
> Since this is something that could potentially be handled by the executor and/or framework, I was wondering if you could speak to the advantages of making this a TaskInfo primitive vs having the executor (or even the framework) handle it.

There's some discussion around this on https://issues.apache.org/jira/browse/MESOS-8725.

My take is that delegating too much to the scheduler makes schedulers harder to write and exacerbates the complexity of the system. If 4 different schedulers implement this feature, operators are likely to need to understand 4 different ways of doing the same thing, which would be unfortunate. 

J

Re: Support deadline for tasks

Posted by Renan DelValle <re...@gmail.com>.
Hi Zhitao,

Since this is something that could potentially be handled by the executor
and/or framework, I was wondering if you could speak to the advantages of
making this a TaskInfo primitive vs having the executor (or even the
framework) handle it.

-Renan


On Fri, Mar 23, 2018 at 9:19 AM, Zhitao Li <zh...@gmail.com> wrote:

> Thanks James. I'll update the JIRA with our names and start with some
> prototype.
>
> On Thu, Mar 22, 2018 at 9:07 PM, James Peach <jp...@apache.org> wrote:
>
>>
>>
>> > On Mar 22, 2018, at 10:06 AM, Zhitao Li <zh...@gmail.com> wrote:
>> >
>> > In our environment, we run a lot of batch jobs, some of which have
>> tight timeline. If any tasks in the job runs longer than x hours, it does
>> not make sense to run it anymore.
>> >
>> > For instance, a team would submit a job which builds a weekly index and
>> repeats every Monday. If the job does not finish before next Monday for
>> whatever reason, there is no point to keep any task running.
>> >
>> > We believe that implementing deadline tracking distributed across our
>> cluster makes more sense as it makes the system more scalable and also
>> makes our centralized state machine simpler.
>> >
>> > One idea I have right now is to add an  optional TimeInfo deadline to
>> TaskInfo field, and all default executors in Mesos can simply terminate the
>> task and send a proper StatusUpdate.
>> >
>> > I summarized above idea in MESOS-8725.
>> >
>> > Please let me know what you think. Thanks!
>>
>> This sounds both useful and simple to implement. I’m happy to shepherd if
>> you’d like
>>
>> J
>
>
>
>
> --
> Cheers,
>
> Zhitao Li
>

Re: Support deadline for tasks

Posted by Renan DelValle <re...@gmail.com>.
Hi Zhitao,

Since this is something that could potentially be handled by the executor
and/or framework, I was wondering if you could speak to the advantages of
making this a TaskInfo primitive vs having the executor (or even the
framework) handle it.

-Renan


On Fri, Mar 23, 2018 at 9:19 AM, Zhitao Li <zh...@gmail.com> wrote:

> Thanks James. I'll update the JIRA with our names and start with some
> prototype.
>
> On Thu, Mar 22, 2018 at 9:07 PM, James Peach <jp...@apache.org> wrote:
>
>>
>>
>> > On Mar 22, 2018, at 10:06 AM, Zhitao Li <zh...@gmail.com> wrote:
>> >
>> > In our environment, we run a lot of batch jobs, some of which have
>> tight timeline. If any tasks in the job runs longer than x hours, it does
>> not make sense to run it anymore.
>> >
>> > For instance, a team would submit a job which builds a weekly index and
>> repeats every Monday. If the job does not finish before next Monday for
>> whatever reason, there is no point to keep any task running.
>> >
>> > We believe that implementing deadline tracking distributed across our
>> cluster makes more sense as it makes the system more scalable and also
>> makes our centralized state machine simpler.
>> >
>> > One idea I have right now is to add an  optional TimeInfo deadline to
>> TaskInfo field, and all default executors in Mesos can simply terminate the
>> task and send a proper StatusUpdate.
>> >
>> > I summarized above idea in MESOS-8725.
>> >
>> > Please let me know what you think. Thanks!
>>
>> This sounds both useful and simple to implement. I’m happy to shepherd if
>> you’d like
>>
>> J
>
>
>
>
> --
> Cheers,
>
> Zhitao Li
>

Re: Support deadline for tasks

Posted by Zhitao Li <zh...@gmail.com>.
Thanks James. I'll update the JIRA with our names and start with some
prototype.

On Thu, Mar 22, 2018 at 9:07 PM, James Peach <jp...@apache.org> wrote:

>
>
> > On Mar 22, 2018, at 10:06 AM, Zhitao Li <zh...@gmail.com> wrote:
> >
> > In our environment, we run a lot of batch jobs, some of which have tight
> timeline. If any tasks in the job runs longer than x hours, it does not
> make sense to run it anymore.
> >
> > For instance, a team would submit a job which builds a weekly index and
> repeats every Monday. If the job does not finish before next Monday for
> whatever reason, there is no point to keep any task running.
> >
> > We believe that implementing deadline tracking distributed across our
> cluster makes more sense as it makes the system more scalable and also
> makes our centralized state machine simpler.
> >
> > One idea I have right now is to add an  optional TimeInfo deadline to
> TaskInfo field, and all default executors in Mesos can simply terminate the
> task and send a proper StatusUpdate.
> >
> > I summarized above idea in MESOS-8725.
> >
> > Please let me know what you think. Thanks!
>
> This sounds both useful and simple to implement. I’m happy to shepherd if
> you’d like
>
> J




-- 
Cheers,

Zhitao Li

Re: Support deadline for tasks

Posted by Zhitao Li <zh...@gmail.com>.
Thanks James. I'll update the JIRA with our names and start with some
prototype.

On Thu, Mar 22, 2018 at 9:07 PM, James Peach <jp...@apache.org> wrote:

>
>
> > On Mar 22, 2018, at 10:06 AM, Zhitao Li <zh...@gmail.com> wrote:
> >
> > In our environment, we run a lot of batch jobs, some of which have tight
> timeline. If any tasks in the job runs longer than x hours, it does not
> make sense to run it anymore.
> >
> > For instance, a team would submit a job which builds a weekly index and
> repeats every Monday. If the job does not finish before next Monday for
> whatever reason, there is no point to keep any task running.
> >
> > We believe that implementing deadline tracking distributed across our
> cluster makes more sense as it makes the system more scalable and also
> makes our centralized state machine simpler.
> >
> > One idea I have right now is to add an  optional TimeInfo deadline to
> TaskInfo field, and all default executors in Mesos can simply terminate the
> task and send a proper StatusUpdate.
> >
> > I summarized above idea in MESOS-8725.
> >
> > Please let me know what you think. Thanks!
>
> This sounds both useful and simple to implement. I’m happy to shepherd if
> you’d like
>
> J




-- 
Cheers,

Zhitao Li

Re: Support deadline for tasks

Posted by James Peach <jp...@apache.org>.

> On Mar 22, 2018, at 10:06 AM, Zhitao Li <zh...@gmail.com> wrote:
> 
> In our environment, we run a lot of batch jobs, some of which have tight timeline. If any tasks in the job runs longer than x hours, it does not make sense to run it anymore. 
>  
> For instance, a team would submit a job which builds a weekly index and repeats every Monday. If the job does not finish before next Monday for whatever reason, there is no point to keep any task running.
>  
> We believe that implementing deadline tracking distributed across our cluster makes more sense as it makes the system more scalable and also makes our centralized state machine simpler.
>  
> One idea I have right now is to add an  optional TimeInfo deadline to TaskInfo field, and all default executors in Mesos can simply terminate the task and send a proper StatusUpdate.
> 
> I summarized above idea in MESOS-8725.
> 
> Please let me know what you think. Thanks! 

This sounds both useful and simple to implement. I’m happy to shepherd if you’d like

J

Re: Support deadline for tasks

Posted by James Peach <jp...@apache.org>.

> On Mar 22, 2018, at 10:06 AM, Zhitao Li <zh...@gmail.com> wrote:
> 
> In our environment, we run a lot of batch jobs, some of which have tight timeline. If any tasks in the job runs longer than x hours, it does not make sense to run it anymore. 
>  
> For instance, a team would submit a job which builds a weekly index and repeats every Monday. If the job does not finish before next Monday for whatever reason, there is no point to keep any task running.
>  
> We believe that implementing deadline tracking distributed across our cluster makes more sense as it makes the system more scalable and also makes our centralized state machine simpler.
>  
> One idea I have right now is to add an  optional TimeInfo deadline to TaskInfo field, and all default executors in Mesos can simply terminate the task and send a proper StatusUpdate.
> 
> I summarized above idea in MESOS-8725.
> 
> Please let me know what you think. Thanks! 

This sounds both useful and simple to implement. I’m happy to shepherd if you’d like

J