You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@aurora.apache.org by David Siegel <ds...@knewton.com> on 2014/03/27 16:53:39 UTC
Job History
Hello Aurorans,
Please enlighten me.
I think job history is a critical feature for Aurora.
A. Do you agree?
B. Is this feature secretly already in Aurora?
C. If not, is this on your roadmap?
D. Would you be interested in a patch or patches that adds job history to
Aurora?
Below I discuss why I think this is an important feature and some thoughts
on an implementation.
Job history has a number of uses:
1. Debugging production issues after the job has been updated. I may need
to know the exact configuration of a system at a previous point in time in
order to debug an issue.
2. Rolling back to a previous job configuration after a bad release.
How I think Aurora works:
As far as I can tell from the Aurora source, job history is discarded. The
MemJobStore replaces Job entries when a job is updated, so you lose the old
Job configuration. The log is truncated every time a Snapshot is taken and
the snapshots do not contain job history.
This seems like a sound decision given that the job history will grow
forever, but means there's no history we can really audit.
How job history might work:
Instead of building job history into the scheduler one might write an
independent process that consumed the logs generated by the scheduler and
built up a database of job history information. It would then provide a
REST interface for querying the job history. This would keep the scheduler
free from dealing with job history.
Any feedback is appreciated. Thanks.
-David Siegel
Re: Job History
Posted by Bill Farner <wf...@apache.org>.
Great question, David!
Aurora does indeed preserve some history, though the means is non-obvious.
The management of history is mostly done in HistoryPruner [1], with
command line knobs defined in AsyncModule [2]. This feature might meet
some, but maybe not all of your requirements.
The class naming sent you to the obvious place: MemJobStore. As it turns
out, though, that's actually only storing cron jobs (this relates to an
abstraction that never really panned out). Regular jobs are translated
from JobConfiguration [3] objects into independent ScheduledTasks [4]
representing the instances. These tasks, in turn, are stored in
MemTaskStore [5], which is agnostic to states of tasks (aside for query
matching). Note: we do have interest in making the data structure
arrangement more natural in AURORA-106 [6].
That said, we have kicked around the idea of exposing state mutations to an
external log/queue, but our use cases so far have required stronger
consistency than we felt we could achieve with that. I wouldn't turn down
a discussion about if/how we approach that.
I hope that answers your questions, feel free to ask follow-ups! Cheers!
-=Bill
[1]
https://github.com/apache/incubator-aurora/blob/master/src/main/java/org/apache/aurora/scheduler/async/HistoryPruner.java
[2]
https://github.com/apache/incubator-aurora/blob/master/src/main/java/org/apache/aurora/scheduler/async/AsyncModule.java#L100
[3]
https://github.com/apache/incubator-aurora/blob/master/src/main/thrift/org/apache/aurora/gen/api.thrift#L191-210
[4]
https://github.com/apache/incubator-aurora/blob/master/src/main/thrift/org/apache/aurora/gen/api.thrift#L355-365
[5]
https://github.com/apache/incubator-aurora/blob/master/src/main/java/org/apache/aurora/scheduler/storage/mem/MemTaskStore.java
[6] https://issues.apache.org/jira/browse/AURORA-106
On Thu, Mar 27, 2014 at 8:53 AM, David Siegel <ds...@knewton.com> wrote:
> Hello Aurorans,
>
> Please enlighten me.
>
> I think job history is a critical feature for Aurora.
>
> A. Do you agree?
>
> B. Is this feature secretly already in Aurora?
>
> C. If not, is this on your roadmap?
>
> D. Would you be interested in a patch or patches that adds job history to
> Aurora?
>
> Below I discuss why I think this is an important feature and some thoughts
> on an implementation.
>
> Job history has a number of uses:
>
> 1. Debugging production issues after the job has been updated. I may need
> to know the exact configuration of a system at a previous point in time in
> order to debug an issue.
>
> 2. Rolling back to a previous job configuration after a bad release.
>
> How I think Aurora works:
>
> As far as I can tell from the Aurora source, job history is discarded. The
> MemJobStore replaces Job entries when a job is updated, so you lose the old
> Job configuration. The log is truncated every time a Snapshot is taken and
> the snapshots do not contain job history.
>
> This seems like a sound decision given that the job history will grow
> forever, but means there's no history we can really audit.
>
> How job history might work:
>
> Instead of building job history into the scheduler one might write an
> independent process that consumed the logs generated by the scheduler and
> built up a database of job history information. It would then provide a
> REST interface for querying the job history. This would keep the scheduler
> free from dealing with job history.
>
> Any feedback is appreciated. Thanks.
>
> -David Siegel
>