You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@aurora.apache.org by Maxim Khutornenko <ma...@apache.org> on 2016/06/29 20:43:49 UTC

[PROPOSAL] Job as a first-class citizen

TL;DR - I am proposing we store and maintain job-level data
(JobConfiguration [1]) instead of relying on storing everything in a
TaskConfig [2].


Aurora storage currently does not have a concept of a "job" when it
comes to services and adhoc jobs. Instead, it relies on a collection
of TaskConfigs that represent a view of what the job state is. This is
in stark contrast to cron jobs, which are already represented by the
JobConfiguration struct.

This lack of representation limits our ability to deliver richer
features and may result in suboptimal design and storage utilization.
Specifically, the following is currently impossible:

- storing normalized job-level data without repeating it in every task
(e.g. contactEmail, isService);

- maintaining job-level data that may be different for every instance
(SLA requirements, topology specs for stateful services and etc.);

- knowing what the job instance count is without pulling all ACTIVE
tasks and iterating over them.

To address the above, I propose we start treating Aurora job as a
tangible entity in the storage and specifically use JobConfiguration
wherever applicable. As a welcome side effect, this will let us:

- allow instantaneous job updates when job-level fields are updated
(e.g. those that don't require instance restarts);
- finally get rid of the deprecated Identity struct [3];
- reduce or completely eliminate DB garbage collection of abandoned job keys [4]

Any thoughts, suggestions, objections?

Thanks,
Maxim


[1] - https://github.com/apache/aurora/blob/4e28b9c8b29b66f2f10b0a6cafdec1f8e2c1bd7b/api/src/main/thrift/org/apache/aurora/gen/api.thrift#L316-L338

[2] - https://github.com/apache/aurora/blob/4e28b9c8b29b66f2f10b0a6cafdec1f8e2c1bd7b/api/src/main/thrift/org/apache/aurora/gen/api.thrift#L240-L284

[3] - https://issues.apache.org/jira/browse/AURORA-84

[4] - RowGarbageCollector:
https://github.com/apache/aurora/blob/b24619b28c4dbb35188871bacd0091a9e01218e3/src/main/java/org/apache/aurora/scheduler/storage/db/RowGarbageCollector.java

Re: [PROPOSAL] Job as a first-class citizen

Posted by Maxim Khutornenko <ma...@apache.org>.

I have updated the summary
<https://docs.google.com/document/d/1myYX3yuofGr8JIzud98xXd5mqgpZ8q_RqKBpSff4-WE>
with a minor but important change. Instead of relying on TaskHistoryPruner
to remove JobConfigurations from the storage, the cleanup is now going to
happen inside a TaskStateChange event listener when all job instances reach
terminal status. As before, feedback is highly appreciated!

On Tue, Jul 26, 2016 at 4:55 PM, Maxim Khutornenko <ma...@apache.org> wrote:

> I felt this change is large enough to warrant a brief design summary.
> Please, take a look at this document
> <https://docs.google.com/document/d/1myYX3yuofGr8JIzud98xXd5mqgpZ8q_RqKBpSff4-WE>and
> leave your feedback as applicable.
>
> On Fri, Jul 1, 2016 at 9:15 AM, Maxim Khutornenko <ma...@apache.org>
> wrote:
>
>> Thanks for the feedback! I will follow up with an itemized epic to
>> track this refactoring work.
>>
>> On Wed, Jun 29, 2016 at 2:29 PM, Jake Farrell <jf...@apache.org>
>> wrote:
>> > huge +1, socket activation is our exact use case for this type of action
>> > also
>> >
>> > -Jake
>> >
>> > On Wed, Jun 29, 2016 at 5:18 PM, Erb, Stephan <
>> Stephan.Erb@blue-yonder.com>
>> > wrote:
>> >
>> >> I recently thought about the same idea. Use case for us would be to
>> scale
>> >> a job 0 instances. While this sounds useless at first, it can be quite
>> >> powerful when trying to implement a feature like socket activation.
>> >>
>> >> ________________________________________
>> >> From: Maxim Khutornenko <ma...@apache.org>
>> >> Sent: Wednesday, June 29, 2016 22:43
>> >> To: dev@aurora.apache.org
>> >> Subject: [PROPOSAL] Job as a first-class citizen
>> >>
>> >> TL;DR - I am proposing we store and maintain job-level data
>> >> (JobConfiguration [1]) instead of relying on storing everything in a
>> >> TaskConfig [2].
>> >>
>> >>
>> >> Aurora storage currently does not have a concept of a "job" when it
>> >> comes to services and adhoc jobs. Instead, it relies on a collection
>> >> of TaskConfigs that represent a view of what the job state is. This is
>> >> in stark contrast to cron jobs, which are already represented by the
>> >> JobConfiguration struct.
>> >>
>> >> This lack of representation limits our ability to deliver richer
>> >> features and may result in suboptimal design and storage utilization.
>> >> Specifically, the following is currently impossible:
>> >>
>> >> - storing normalized job-level data without repeating it in every task
>> >> (e.g. contactEmail, isService);
>> >>
>> >> - maintaining job-level data that may be different for every instance
>> >> (SLA requirements, topology specs for stateful services and etc.);
>> >>
>> >> - knowing what the job instance count is without pulling all ACTIVE
>> >> tasks and iterating over them.
>> >>
>> >> To address the above, I propose we start treating Aurora job as a
>> >> tangible entity in the storage and specifically use JobConfiguration
>> >> wherever applicable. As a welcome side effect, this will let us:
>> >>
>> >> - allow instantaneous job updates when job-level fields are updated
>> >> (e.g. those that don't require instance restarts);
>> >> - finally get rid of the deprecated Identity struct [3];
>> >> - reduce or completely eliminate DB garbage collection of abandoned job
>> >> keys [4]
>> >>
>> >> Any thoughts, suggestions, objections?
>> >>
>> >> Thanks,
>> >> Maxim
>> >>
>> >>
>> >> [1] -
>> >>
>> https://github.com/apache/aurora/blob/4e28b9c8b29b66f2f10b0a6cafdec1f8e2c1bd7b/api/src/main/thrift/org/apache/aurora/gen/api.thrift#L316-L338
>> >>
>> >> [2] -
>> >>
>> https://github.com/apache/aurora/blob/4e28b9c8b29b66f2f10b0a6cafdec1f8e2c1bd7b/api/src/main/thrift/org/apache/aurora/gen/api.thrift#L240-L284
>> >>
>> >> [3] - https://issues.apache.org/jira/browse/AURORA-84
>> >>
>> >> [4] - RowGarbageCollector:
>> >>
>> >>
>> https://github.com/apache/aurora/blob/b24619b28c4dbb35188871bacd0091a9e01218e3/src/main/java/org/apache/aurora/scheduler/storage/db/RowGarbageCollector.java
>> >>
>>
>
>

Re: [PROPOSAL] Job as a first-class citizen

Posted by Maxim Khutornenko <ma...@apache.org>.

I felt this change is large enough to warrant a brief design summary.
Please, take a look at this document
<https://docs.google.com/document/d/1myYX3yuofGr8JIzud98xXd5mqgpZ8q_RqKBpSff4-WE>and
leave your feedback as applicable.

On Fri, Jul 1, 2016 at 9:15 AM, Maxim Khutornenko <ma...@apache.org> wrote:

> Thanks for the feedback! I will follow up with an itemized epic to
> track this refactoring work.
>
> On Wed, Jun 29, 2016 at 2:29 PM, Jake Farrell <jf...@apache.org> wrote:
> > huge +1, socket activation is our exact use case for this type of action
> > also
> >
> > -Jake
> >
> > On Wed, Jun 29, 2016 at 5:18 PM, Erb, Stephan <
> Stephan.Erb@blue-yonder.com>
> > wrote:
> >
> >> I recently thought about the same idea. Use case for us would be to
> scale
> >> a job 0 instances. While this sounds useless at first, it can be quite
> >> powerful when trying to implement a feature like socket activation.
> >>
> >> ________________________________________
> >> From: Maxim Khutornenko <ma...@apache.org>
> >> Sent: Wednesday, June 29, 2016 22:43
> >> To: dev@aurora.apache.org
> >> Subject: [PROPOSAL] Job as a first-class citizen
> >>
> >> TL;DR - I am proposing we store and maintain job-level data
> >> (JobConfiguration [1]) instead of relying on storing everything in a
> >> TaskConfig [2].
> >>
> >>
> >> Aurora storage currently does not have a concept of a "job" when it
> >> comes to services and adhoc jobs. Instead, it relies on a collection
> >> of TaskConfigs that represent a view of what the job state is. This is
> >> in stark contrast to cron jobs, which are already represented by the
> >> JobConfiguration struct.
> >>
> >> This lack of representation limits our ability to deliver richer
> >> features and may result in suboptimal design and storage utilization.
> >> Specifically, the following is currently impossible:
> >>
> >> - storing normalized job-level data without repeating it in every task
> >> (e.g. contactEmail, isService);
> >>
> >> - maintaining job-level data that may be different for every instance
> >> (SLA requirements, topology specs for stateful services and etc.);
> >>
> >> - knowing what the job instance count is without pulling all ACTIVE
> >> tasks and iterating over them.
> >>
> >> To address the above, I propose we start treating Aurora job as a
> >> tangible entity in the storage and specifically use JobConfiguration
> >> wherever applicable. As a welcome side effect, this will let us:
> >>
> >> - allow instantaneous job updates when job-level fields are updated
> >> (e.g. those that don't require instance restarts);
> >> - finally get rid of the deprecated Identity struct [3];
> >> - reduce or completely eliminate DB garbage collection of abandoned job
> >> keys [4]
> >>
> >> Any thoughts, suggestions, objections?
> >>
> >> Thanks,
> >> Maxim
> >>
> >>
> >> [1] -
> >>
> https://github.com/apache/aurora/blob/4e28b9c8b29b66f2f10b0a6cafdec1f8e2c1bd7b/api/src/main/thrift/org/apache/aurora/gen/api.thrift#L316-L338
> >>
> >> [2] -
> >>
> https://github.com/apache/aurora/blob/4e28b9c8b29b66f2f10b0a6cafdec1f8e2c1bd7b/api/src/main/thrift/org/apache/aurora/gen/api.thrift#L240-L284
> >>
> >> [3] - https://issues.apache.org/jira/browse/AURORA-84
> >>
> >> [4] - RowGarbageCollector:
> >>
> >>
> https://github.com/apache/aurora/blob/b24619b28c4dbb35188871bacd0091a9e01218e3/src/main/java/org/apache/aurora/scheduler/storage/db/RowGarbageCollector.java
> >>
>

Re: [PROPOSAL] Job as a first-class citizen

Posted by Maxim Khutornenko <ma...@apache.org>.

Thanks for the feedback! I will follow up with an itemized epic to
track this refactoring work.

On Wed, Jun 29, 2016 at 2:29 PM, Jake Farrell <jf...@apache.org> wrote:
> huge +1, socket activation is our exact use case for this type of action
> also
>
> -Jake
>
> On Wed, Jun 29, 2016 at 5:18 PM, Erb, Stephan <St...@blue-yonder.com>
> wrote:
>
>> I recently thought about the same idea. Use case for us would be to scale
>> a job 0 instances. While this sounds useless at first, it can be quite
>> powerful when trying to implement a feature like socket activation.
>>
>> ________________________________________
>> From: Maxim Khutornenko <ma...@apache.org>
>> Sent: Wednesday, June 29, 2016 22:43
>> To: dev@aurora.apache.org
>> Subject: [PROPOSAL] Job as a first-class citizen
>>
>> TL;DR - I am proposing we store and maintain job-level data
>> (JobConfiguration [1]) instead of relying on storing everything in a
>> TaskConfig [2].
>>
>>
>> Aurora storage currently does not have a concept of a "job" when it
>> comes to services and adhoc jobs. Instead, it relies on a collection
>> of TaskConfigs that represent a view of what the job state is. This is
>> in stark contrast to cron jobs, which are already represented by the
>> JobConfiguration struct.
>>
>> This lack of representation limits our ability to deliver richer
>> features and may result in suboptimal design and storage utilization.
>> Specifically, the following is currently impossible:
>>
>> - storing normalized job-level data without repeating it in every task
>> (e.g. contactEmail, isService);
>>
>> - maintaining job-level data that may be different for every instance
>> (SLA requirements, topology specs for stateful services and etc.);
>>
>> - knowing what the job instance count is without pulling all ACTIVE
>> tasks and iterating over them.
>>
>> To address the above, I propose we start treating Aurora job as a
>> tangible entity in the storage and specifically use JobConfiguration
>> wherever applicable. As a welcome side effect, this will let us:
>>
>> - allow instantaneous job updates when job-level fields are updated
>> (e.g. those that don't require instance restarts);
>> - finally get rid of the deprecated Identity struct [3];
>> - reduce or completely eliminate DB garbage collection of abandoned job
>> keys [4]
>>
>> Any thoughts, suggestions, objections?
>>
>> Thanks,
>> Maxim
>>
>>
>> [1] -
>> https://github.com/apache/aurora/blob/4e28b9c8b29b66f2f10b0a6cafdec1f8e2c1bd7b/api/src/main/thrift/org/apache/aurora/gen/api.thrift#L316-L338
>>
>> [2] -
>> https://github.com/apache/aurora/blob/4e28b9c8b29b66f2f10b0a6cafdec1f8e2c1bd7b/api/src/main/thrift/org/apache/aurora/gen/api.thrift#L240-L284
>>
>> [3] - https://issues.apache.org/jira/browse/AURORA-84
>>
>> [4] - RowGarbageCollector:
>>
>> https://github.com/apache/aurora/blob/b24619b28c4dbb35188871bacd0091a9e01218e3/src/main/java/org/apache/aurora/scheduler/storage/db/RowGarbageCollector.java
>>

Re: [PROPOSAL] Job as a first-class citizen

Posted by Jake Farrell <jf...@apache.org>.

huge +1, socket activation is our exact use case for this type of action
also

-Jake

On Wed, Jun 29, 2016 at 5:18 PM, Erb, Stephan <St...@blue-yonder.com>
wrote:

> I recently thought about the same idea. Use case for us would be to scale
> a job 0 instances. While this sounds useless at first, it can be quite
> powerful when trying to implement a feature like socket activation.
>
> ________________________________________
> From: Maxim Khutornenko <ma...@apache.org>
> Sent: Wednesday, June 29, 2016 22:43
> To: dev@aurora.apache.org
> Subject: [PROPOSAL] Job as a first-class citizen
>
> TL;DR - I am proposing we store and maintain job-level data
> (JobConfiguration [1]) instead of relying on storing everything in a
> TaskConfig [2].
>
>
> Aurora storage currently does not have a concept of a "job" when it
> comes to services and adhoc jobs. Instead, it relies on a collection
> of TaskConfigs that represent a view of what the job state is. This is
> in stark contrast to cron jobs, which are already represented by the
> JobConfiguration struct.
>
> This lack of representation limits our ability to deliver richer
> features and may result in suboptimal design and storage utilization.
> Specifically, the following is currently impossible:
>
> - storing normalized job-level data without repeating it in every task
> (e.g. contactEmail, isService);
>
> - maintaining job-level data that may be different for every instance
> (SLA requirements, topology specs for stateful services and etc.);
>
> - knowing what the job instance count is without pulling all ACTIVE
> tasks and iterating over them.
>
> To address the above, I propose we start treating Aurora job as a
> tangible entity in the storage and specifically use JobConfiguration
> wherever applicable. As a welcome side effect, this will let us:
>
> - allow instantaneous job updates when job-level fields are updated
> (e.g. those that don't require instance restarts);
> - finally get rid of the deprecated Identity struct [3];
> - reduce or completely eliminate DB garbage collection of abandoned job
> keys [4]
>
> Any thoughts, suggestions, objections?
>
> Thanks,
> Maxim
>
>
> [1] -
> https://github.com/apache/aurora/blob/4e28b9c8b29b66f2f10b0a6cafdec1f8e2c1bd7b/api/src/main/thrift/org/apache/aurora/gen/api.thrift#L316-L338
>
> [2] -
> https://github.com/apache/aurora/blob/4e28b9c8b29b66f2f10b0a6cafdec1f8e2c1bd7b/api/src/main/thrift/org/apache/aurora/gen/api.thrift#L240-L284
>
> [3] - https://issues.apache.org/jira/browse/AURORA-84
>
> [4] - RowGarbageCollector:
>
> https://github.com/apache/aurora/blob/b24619b28c4dbb35188871bacd0091a9e01218e3/src/main/java/org/apache/aurora/scheduler/storage/db/RowGarbageCollector.java
>

Re: [PROPOSAL] Job as a first-class citizen

Posted by "Erb, Stephan" <St...@blue-yonder.com>.

I recently thought about the same idea. Use case for us would be to scale a job 0 instances. While this sounds useless at first, it can be quite powerful when trying to implement a feature like socket activation.

________________________________________
From: Maxim Khutornenko <ma...@apache.org>
Sent: Wednesday, June 29, 2016 22:43
To: dev@aurora.apache.org
Subject: [PROPOSAL] Job as a first-class citizen

TL;DR - I am proposing we store and maintain job-level data
(JobConfiguration [1]) instead of relying on storing everything in a
TaskConfig [2].

Aurora storage currently does not have a concept of a "job" when it
comes to services and adhoc jobs. Instead, it relies on a collection
of TaskConfigs that represent a view of what the job state is. This is
in stark contrast to cron jobs, which are already represented by the
JobConfiguration struct.

This lack of representation limits our ability to deliver richer
features and may result in suboptimal design and storage utilization.
Specifically, the following is currently impossible:

- storing normalized job-level data without repeating it in every task
(e.g. contactEmail, isService);

- maintaining job-level data that may be different for every instance
(SLA requirements, topology specs for stateful services and etc.);

- knowing what the job instance count is without pulling all ACTIVE
tasks and iterating over them.

To address the above, I propose we start treating Aurora job as a
tangible entity in the storage and specifically use JobConfiguration
wherever applicable. As a welcome side effect, this will let us:

- allow instantaneous job updates when job-level fields are updated
(e.g. those that don't require instance restarts);
- finally get rid of the deprecated Identity struct [3];
- reduce or completely eliminate DB garbage collection of abandoned job keys [4]

Any thoughts, suggestions, objections?

Thanks,
Maxim

[1] - https://github.com/apache/aurora/blob/4e28b9c8b29b66f2f10b0a6cafdec1f8e2c1bd7b/api/src/main/thrift/org/apache/aurora/gen/api.thrift#L316-L338

[2] - https://github.com/apache/aurora/blob/4e28b9c8b29b66f2f10b0a6cafdec1f8e2c1bd7b/api/src/main/thrift/org/apache/aurora/gen/api.thrift#L240-L284

[3] - https://issues.apache.org/jira/browse/AURORA-84

[4] - RowGarbageCollector:
https://github.com/apache/aurora/blob/b24619b28c4dbb35188871bacd0091a9e01218e3/src/main/java/org/apache/aurora/scheduler/storage/db/RowGarbageCollector.java