You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@aurora.apache.org by Yuan <yu...@gmail.com> on 2015/02/25 04:11:56 UTC

time constraint on aurora jobs?

Hello,

    In apache aurora, there are resource isolations and sizings on CPU,
memory and disk space, which can be specified in the job configuration
file. Is there any similar way to put a constraint on job running time,
like killing a job if it has been running for more than a certain amount of
time?

Thanks,
Yuan

Re: time constraint on aurora jobs?

Posted by Brian Wickman <wi...@apache.org>.
Another option is to just add a process to your task that looks something
like Process(name = 'timeout', cmdline = 'sleep 3600; false').  If the task
runs for 3600 seconds, then that process will exit with a failure causing
the whole task to fail.  The only issue I can think of is that it will also
cause your tasks to run for 3600 seconds even if the main process
succeeds.  You may get around this by setting the process ephemeral=True
bit (though I'm not sure if an ephemeral process failure will cause the
whole task to fail -- this is something I can double check when I'm in
front of a computer with thermos installed.)

On Tue, Feb 24, 2015 at 11:45 PM, Joseph Smith <ya...@gmail.com> wrote:

> Very good question.. to my knowledge there is not a ‘time’ constraint.
>
> However, you could implement this in a few ways. One of my first thoughts
> is to setup a custom StatusChecker <
> https://github.com/apache/incubator-aurora/blob/e6e7e53d92b52d78960824022bef8a0546002180/src/main/python/apache/aurora/executor/common/status_checker.py#L68>
> which checks the length of a task's runtime. StatusCheckers can return an
> ExitState <
> https://github.com/apache/incubator-aurora/blob/e6e7e53d92b52d78960824022bef8a0546002180/src/main/python/apache/aurora/executor/common/status_checker.py#L27>
> which can end a task. FAILED will allow a Service() to be restarted, but
> KILLED should (if I’m following right) actually prevent that from being
> rescheduled unless a user manually reschedules it, which may or may not be
> what you’re looking for.
>
> An example of this is the HealthChecker <
> https://github.com/apache/incubator-aurora/blob/467bc56049cc775eaf61520a464b363d44023024/src/main/python/apache/aurora/executor/common/health_checker.py>,
> which causes a task to go into ‘FAILED’ if it does not pass a specified
> health check.
>
> Please let me know if that makes sense!
> Joe
>
> > On Feb 24, 2015, at 19:11, Yuan <yu...@gmail.com> wrote:
> >
> > Hello,
> >
> >    In apache aurora, there are resource isolations and sizings on CPU,
> > memory and disk space, which can be specified in the job configuration
> > file. Is there any similar way to put a constraint on job running time,
> > like killing a job if it has been running for more than a certain amount
> of
> > time?
> >
> > Thanks,
> > Yuan
>
>

Re: time constraint on aurora jobs?

Posted by Joseph Smith <ya...@gmail.com>.
Very good question.. to my knowledge there is not a ‘time’ constraint.

However, you could implement this in a few ways. One of my first thoughts is to setup a custom StatusChecker <https://github.com/apache/incubator-aurora/blob/e6e7e53d92b52d78960824022bef8a0546002180/src/main/python/apache/aurora/executor/common/status_checker.py#L68> which checks the length of a task's runtime. StatusCheckers can return an ExitState <https://github.com/apache/incubator-aurora/blob/e6e7e53d92b52d78960824022bef8a0546002180/src/main/python/apache/aurora/executor/common/status_checker.py#L27> which can end a task. FAILED will allow a Service() to be restarted, but KILLED should (if I’m following right) actually prevent that from being rescheduled unless a user manually reschedules it, which may or may not be what you’re looking for.

An example of this is the HealthChecker <https://github.com/apache/incubator-aurora/blob/467bc56049cc775eaf61520a464b363d44023024/src/main/python/apache/aurora/executor/common/health_checker.py>, which causes a task to go into ‘FAILED’ if it does not pass a specified health check.

Please let me know if that makes sense!
Joe

> On Feb 24, 2015, at 19:11, Yuan <yu...@gmail.com> wrote:
> 
> Hello,
> 
>    In apache aurora, there are resource isolations and sizings on CPU,
> memory and disk space, which can be specified in the job configuration
> file. Is there any similar way to put a constraint on job running time,
> like killing a job if it has been running for more than a certain amount of
> time?
> 
> Thanks,
> Yuan