You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@aurora.apache.org by David Siegel <ds...@knewton.com> on 2014/03/27 16:53:39 UTC

Job History

Hello Aurorans,

Please enlighten me.

I think job history is a critical feature for Aurora.

A. Do you agree?

B. Is this feature secretly already in Aurora?

C. If not, is this on your roadmap?

D. Would you be interested in a patch or patches that adds job history to
Aurora?

Below I discuss why I think this is an important feature and some thoughts
on an implementation.

Job history has a number of uses:

1. Debugging production issues after the job has been updated. I may need
to know the exact configuration of a system at a previous point in time in
order to debug an issue.

2. Rolling back to a previous job configuration after a bad release.

How I think Aurora works:

As far as I can tell from the Aurora source, job history is discarded. The
MemJobStore replaces Job entries when a job is updated, so you lose the old
Job configuration. The log is truncated every time a Snapshot is taken and
the snapshots do not contain job history.

This seems like a sound decision given that the job history will grow
forever, but means there's no history we can really audit.

How job history might work:

Instead of building job history into the scheduler one might write an
independent process that consumed the logs generated by the scheduler and
built up a database of job history information. It would then provide a
REST interface for querying the job history. This would keep the scheduler
free from dealing with job history.

Any feedback is appreciated. Thanks.

-David Siegel

Re: Job History

Posted by Bill Farner <wf...@apache.org>.
Great question, David!

Aurora does indeed preserve some history, though the means is non-obvious.
 The management of history is mostly done in HistoryPruner [1], with
command line knobs defined in AsyncModule [2].  This feature might meet
some, but maybe not all of your requirements.

The class naming sent you to the obvious place: MemJobStore.  As it turns
out, though, that's actually only storing cron jobs (this relates to an
abstraction that never really panned out).  Regular jobs are translated
from JobConfiguration [3] objects into independent ScheduledTasks [4]
representing the instances.  These tasks, in turn, are stored in
MemTaskStore [5], which is agnostic to states of tasks (aside for query
matching).  Note: we do have interest in making the data structure
arrangement more natural in AURORA-106 [6].

That said, we have kicked around the idea of exposing state mutations to an
external log/queue, but our use cases so far have required stronger
consistency than we felt we could achieve with that.  I wouldn't turn down
a discussion about if/how we approach that.

I hope that answers your questions, feel free to ask follow-ups!  Cheers!


-=Bill


[1]
https://github.com/apache/incubator-aurora/blob/master/src/main/java/org/apache/aurora/scheduler/async/HistoryPruner.java
[2]
https://github.com/apache/incubator-aurora/blob/master/src/main/java/org/apache/aurora/scheduler/async/AsyncModule.java#L100
[3]
https://github.com/apache/incubator-aurora/blob/master/src/main/thrift/org/apache/aurora/gen/api.thrift#L191-210
[4]
https://github.com/apache/incubator-aurora/blob/master/src/main/thrift/org/apache/aurora/gen/api.thrift#L355-365
[5]
https://github.com/apache/incubator-aurora/blob/master/src/main/java/org/apache/aurora/scheduler/storage/mem/MemTaskStore.java
[6] https://issues.apache.org/jira/browse/AURORA-106

On Thu, Mar 27, 2014 at 8:53 AM, David Siegel <ds...@knewton.com> wrote:

> Hello Aurorans,
>
> Please enlighten me.
>
> I think job history is a critical feature for Aurora.
>
> A. Do you agree?
>
> B. Is this feature secretly already in Aurora?
>
> C. If not, is this on your roadmap?
>
> D. Would you be interested in a patch or patches that adds job history to
> Aurora?
>
> Below I discuss why I think this is an important feature and some thoughts
> on an implementation.
>
> Job history has a number of uses:
>
> 1. Debugging production issues after the job has been updated. I may need
> to know the exact configuration of a system at a previous point in time in
> order to debug an issue.
>
> 2. Rolling back to a previous job configuration after a bad release.
>
> How I think Aurora works:
>
> As far as I can tell from the Aurora source, job history is discarded. The
> MemJobStore replaces Job entries when a job is updated, so you lose the old
> Job configuration. The log is truncated every time a Snapshot is taken and
> the snapshots do not contain job history.
>
> This seems like a sound decision given that the job history will grow
> forever, but means there's no history we can really audit.
>
> How job history might work:
>
> Instead of building job history into the scheduler one might write an
> independent process that consumed the logs generated by the scheduler and
> built up a database of job history information. It would then provide a
> REST interface for querying the job history. This would keep the scheduler
> free from dealing with job history.
>
> Any feedback is appreciated. Thanks.
>
> -David Siegel
>