You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@aurora.apache.org by 黄凯 <te...@hotmail.com> on 2016/09/02 01:40:41 UTC

Discussion on review request 51536

Hi Folks,

I'm currently working on a feature on aurora scheduler and executor. The implementation strategy became controversial on the review board, so I was wondering if I should broadcast it to more audience and initiate a discussion. Please feel free to let me know your thoughts, your help is greatly appreciated!

The high level goal of this feature is to improve reliability and performance of the Aurora scheduler job updater, by relying on health check status rather than watch_secs timeout when deciding an individual instance update state.

Please see the original review request https://reviews.apache.org/r/51536/
aurora JIRA ticket https://issues.apache.org/jira/browse/AURORA-894
design doc https://docs.google.com/document/d/1ZdgW8S4xMhvKW7iQUX99xZm10NXSxEWR0a-21FP5d94/edit#
for more details and background.

Note: The design doc becomes a little bit outdated on the "scheduler change summary" part (this is what the review request trying to address). As a result, I've left some comment to clarify the latest proposed implementation plan for scheduler change.

There are two questions I'm trying to address here:
1. How does the scheduler infer the executor version and be backward compatible?
2. Where do we determine if health check is enabled?

In short, there are 3 different solutions proposed on the review board.

In the first two approaches, the scheduler will rely on a string to determine the executor version. We determine whether health check is enabled merely on executor side. There will be communication between the executor and the scheduler.
Solution 1:
vCurrent executor sends a message in its health check thread during RUNNING state transition, and the vCurrent updater will infer the executor version from the presence of this message, and skip the watch_secs if necessary.

Solution 2:
Instead of relying on the presence of an arbitrary string in the message, rely on the presence of a string like: "capabilities:CAPABILITY_1,CAPABILITY-2" where CAPABILITY_1 and CAPABILITY_2 (etc.) are constants defined in api.thrift. Basically just formalizing the mechanism and making it a bit more future proof.

In the third solution, the scheduler infers the executor version from the JobUpdateSettings on scheduler side.
Solution 3:
Adding a bit to JobUpdateSettings which is ‘executorDrivenUpdates', if that is set, the scheduler assumes that the transition from STARTING -> RUNNING makes the executor healthy and concurrently, we release thermos and change HealthCheckConfig to say that it should only go to running after healthy.

Pros and Cons:
The main benefit of Solution 1 is:
1. By using the message in task status update, we don't have to make any schema change, which makes the design simple.
2. The feature is fully backward-compatible. When we roll out the vCurrent schedulers and executors, we do not have to instruct the users to provide additional field in the Job or Update configs, which could confuses customers when the vPrev and vCurrent executor coexist in the cluster.

Concerns:
Relying on the presence of a message makes things brittle. Also we do not want to expose this message to users.

The benefit of Solution 2 is making the feature more future proof. However, if we do not envision a new executor feature in the short term, it's not too much different from Solution 1.

The benefits of Solution 3 include:
1. We support more than just thermos now (and others rely on custom executors).
2. A lot of things in Aurora treat the executor as opaque. The status update message sent by executor should not be visible to users only if it's an error message.

Concerns:
1. In addition to the ‘executorDrivenUpdates' bit that identifies the executor version, we still need to notify the scheduler if health check is enabled on vCurrent executor, if not, the scheduler must be able to fall back to use watch_secs.
2. The users have to provide an additional field in their .aurora config files. The feature wouldn't be available unless new clients are rolled out as well.

Please let me know if I understand your suggestions correctly and hopefully everyone is on the same page!

Thanks,

Kai

答复: Discussion on review request 51536

Posted by 黄凯 <te...@hotmail.com>.

Thanks for the clarification, Maxim. The corrections make perfect sense to me. Does that address your concern, Zameer?

It appears that the remaining disagreement in the discussion is whether we should make our implementation future proof and ensure it can be use elsewhere.

Kai



________________________________
发件人: Maxim Khutornenko <ma...@apache.org>
发送时间: 2016年9月3日 6:24
收件人: dev@aurora.apache.org
抄送: 黄 凯; Joshua Cohen; serb@apache.org; caldima@gmail.com; rdelval1@binghamton.edu
主题: Re: Discussion on review request 51536

Need to correct a few previous statements:

> Also we do not want to expose this message to users.
This is incorrect. The original design proposal suggested to show this
message in the UI as: "Task is healthy"

> The Mesos API isn't designed for packing arbitrary data
> in the status update message.
Don't think I agree, this is exactly what this field is for [1] and we
already use it for other states [2].

> I would be open to just saying that scheduler version
> 0.16 (or 0.17) just assumes the executor transitions to
> RUNNING once a task is healthy and dropping
> `watch_secs`entirely.
We can't drop 'watch_secs' entirely as we still have to babysit job
updates that don't have health checks enabled.

As for my take on the above, I favor #1 as the simplest answer to an
already simple question: "Should we use watch_secs for this instance
or not?". That's pretty much it. Scheduler does not need any schema
changes, know what health checks are or if a job has them enabled. At
least not until we attempt to move to centralized health checks
(AURORA-279) but that will be an entirely different design discussion.

[1] - https://github.com/apache/mesos/blob/master/include/mesos/mesos.proto#L1605.
[2] - https://github.com/apache/aurora/blob/5cad046fc0f0c4bb79a4563cfcff0442b7bf8383/src/main/python/apache/aurora/executor/aurora_executor.py#L97

On Fri, Sep 2, 2016 at 2:26 PM, Zameer Manji <zm...@apache.org> wrote:
> *cc: Renan*
>
> I think there is some disagreement/discussion on the review because we have
> not achieved consensus on the design. Since the design doc was written,
> Aurora adopted multiple executor support as well non HTTP based
> healthchecking. This invalidates some parts of the original design. I think
> all of the solutions here are possible amendments to the design doc.
>
> I am not in favor of Solution 2 at all because status updates between
> executor <-> agent <-> master <-> scheduler are designed to update the
> framework of updates to the task and not really designed to send arbitrary
> information. Just because the Mesos API provides us with a string field
> doesn't mean we should try to pack in arbitrary data. Also, it isn't clear
> what other capabilities we might add in the future so I'm unconvinced that
> capabilities needs to exist at all. My fear is that we will create the
> infrastructure for capabilities just to serve this need and nothing else.
>
> I object to Solution 1 along the same lines. The Mesos API isn't designed
> for packing arbitrary data in the status update message and I don't think
> we should abuse that and rely on that. Also our current infrastructure just
> plumbs the message to the UI and I think displaying capabilities is not
> something we should do.
>
> I am in favor of Solution 3 which is as close as possible to the original
> design in the design doc. The design doc says the following:
>
> Scheduler updater will skip the minWaitInInstanceMs (aka watch_secs
>> <https://github.com/apache/aurora/blob/4b43305b33cd8bebdd80225a3987b7cc7a8389a2/docs/configuration-reference.md#updateconfig-objects>)
>> grace period any time it detects a named port ‘health’ in task
>> configuration. A RUNNING instance status will signify the end of instance
>> update.
>
>
> Instead of detecting the 'health' port in the task configuration, we make
> enabling this feature explicitly by enabling a bit in the task
> configuration with the `executorDrivenUpdates` bit.
>
> I understand this option makes this feature more complex because it
> requires a schema change and requires operators to deploy the executor to
> all agents before upgrading the client. However, I think that's a one time
> operational cost as a opposed to long lived design choices that will affect
> the code.
>
> Further Solution 3 is the most amenable to custom executors and continues
> our tradition of treating executors as opaque black boxes. I think there is
> a lot of value in treating executors as black boxes as it leaves the door
> open to switching our executor to something else and doesn't impose a
> burden to others that want to write their own.
>
> Alternatively, if amending the schema is too much work, I would be open to
> just saying that scheduler version 0.16 (or 0.17) just assumes the executor
> transitions to RUNNING once a task is healthy and dropping `watch_secs`
> entirely. We can put it in the release notes that operators must deploy the
> executor to 100% before deploying the scheduler.
>
>
> On Thu, Sep 1, 2016 at 6:40 PM, 黄 凯 <te...@hotmail.com> wrote:
>
>> Hi Folks,
>>
>> I'm currently working on a feature on aurora scheduler and executor. The
>> implementation strategy became controversial on the review board, so I was
>> wondering if I should broadcast it to more audience and initiate a
>> discussion. Please feel free to let me know your thoughts, your help is
>> greatly appreciated!
>>
>> The high level goal of this feature is to improve reliability and
>> performance of the Aurora scheduler job updater, by relying on health check
>> status rather than watch_secs timeout when deciding an individual instance
>> update state.
>>
>> Please see the original review request *https://reviews.apache.org/r/51536/
>> <https://reviews.apache.org/r/51536/> *
>> aurora JIRA ticket *https://issues.apache.org/jira/browse/AURORA-894
>> <https://issues.apache.org/jira/browse/AURORA-894>*
>> design doc *https://docs.google.com/document/d/1ZdgW8S4xMhvKW7iQUX99xZm10NXSxEWR0a-21FP5d94/edit#
>> <https://docs.google.com/document/d/1ZdgW8S4xMhvKW7iQUX99xZm10NXSxEWR0a-21FP5d94/edit#>*
>> for more details and background.
>>
>> Note: The design doc becomes a little bit outdated on the "scheduler
>> change summary" part (this is what the review request trying to address).
>> As a result, I've left some comment to clarify the latest proposed
>> implementation plan for scheduler change.
>>
>> There are two questions I'm trying to address here:
>> *1. How does the scheduler infer the executor version and be backward
>> compatible?*
>> *2. Where do we determine if health check is enabled?*
>>
>> In short, there are 3 different solutions proposed on the review board.
>>
>> In the first two approaches, the scheduler will rely on a string to
>> determine the executor version. We determine whether health check is
>> enabled merely on executor side. There will be communication between the
>> executor and the scheduler.
>> *Solution 1: *
>> *vCurrent executor sends a message in its health check thread during
>> RUNNING state transition, and the vCurrent updater will infer the executor
>> version from the presence of this message, and skip the watch_secs if
>> necessary.*
>>
>> *Solution 2:*
>> *Instead of relying on the presence of an arbitrary string in the message,
>> rely on the presence of a string like:
>> "capabilities:CAPABILITY_1,CAPABILITY-2" where CAPABILITY_1 and
>> CAPABILITY_2 (etc.) are constants defined in api.thrift. Basically just
>> formalizing the mechanism and making it a bit more future proof.*
>>
>> In the third solution, the scheduler infers the executor version from the
>> JobUpdateSettings on scheduler side.
>> *Solution 3:*
>> *Adding a bit to JobUpdateSettings which is ‘executorDrivenUpdates', if
>> that is set, the scheduler assumes that the transition from STARTING ->
>> RUNNING makes the executor healthy and concurrently, we release thermos and
>> change HealthCheckConfig to say that it should only go to running after
>> healthy*.
>>
>> *Pros and Cons:*
>> The main benefit of Solution 1 is:
>> 1. By using the message in task status update, we don't have to make any
>> schema change, which makes the design simple.
>> 2. The feature is fully backward-compatible. When we roll out the vCurrent
>> schedulers and executors, we do not have to instruct the users to provide
>> additional field in the Job or Update configs, which could confuses
>> customers when the vPrev and vCurrent executor coexist in the cluster.
>>
>> Concerns:
>> Relying on the presence of a message makes things brittle. Also we do not
>> want to expose this message to users.
>>
>> The benefit of Solution 2 is making the feature more future proof.
>> However, if we do not envision a new executor feature in the short term,
>> it's not too much different from Solution 1.
>>
>> The benefits of Solution 3 include:
>> 1. We support more than just thermos now (and others rely on custom
>> executors).
>> 2. A lot of things in Aurora treat the executor as opaque. The status
>> update message sent by executor should not be visible to users only if it's
>> an error message.
>>
>> Concerns:
>> 1. In addition to the ‘executorDrivenUpdates' bit that identifies the
>> executor version, we still need to notify the scheduler if health check is
>> enabled on vCurrent executor, if not, the scheduler must be able to fall
>> back to use watch_secs.
>> 2. The users have to provide an additional field in their .aurora config
>> files. The feature wouldn't be available unless new clients are rolled out
>> as well.
>>
>> Please let me know if I understand your suggestions correctly and
>> hopefully everyone is on the same page!
>>
>> Thanks,
>>
>> Kai
>>

Re: 答复: Discussion on review request 51536

Posted by Joshua Cohen <jc...@apache.org>.

I'm +1 to this as well. Thanks for driving Zameer!

On Sat, Sep 3, 2016 at 1:31 PM, Renan DelValle <rd...@binghamton.edu>
wrote:

> +1 On Zameer's solution as that keeps the scheduler executor agnostic. In
> my opinion, it is a reasonable requirement to expect an executor to send
> the RUNNING status update only when it is confident that the task has been
> launched successfully. This also provides great flexibility for those
> creating custom executors because a task entering a healthy running state
> can mean different things depending on the executor. For example,
> docker-compose-executor needs to launch multiple containers. This can take
> a while and should only be considered to be healthy if all containers are
> running properly.
>
> Thanks for the proposal Zameer, and thanks for the work on this Kai.
>
> On Sat, Sep 3, 2016 at 1:39 AM, 黄 凯 <te...@hotmail.com> wrote:
>
> > +1 to Zameer's idea. Now we have three persons on board.
> >
> >
> > ------------------------------
> > *From:* Maxim Khutornenko <ma...@apache.org>
> > *Sent:* Friday, September 2, 2016 18:36
> > *To:* dev@aurora.apache.org
> > *Cc:* 黄 凯; Joshua Cohen; serb@apache.org; caldima@gmail.com;
> > rdelval1@binghamton.edu
> > *Subject:* Re: 答复: Discussion on review request 51536
> >
> > Just to summarize Zameer's proposal buried deep in the quotation
> > stream: we keep watch_secs but let users set it to 0 iff they have
> > health checks enabled. No scheduler (updater) changes needed at all.
> > Users will need to opt-in to use the new feature by changing
> > watch_secs to 0 in their configs (or skipping it completely, which
> > will set the default to 0 automatically).
> >
> > I am +1 to this idea. Thanks Zameer!
> >
> > On Fri, Sep 2, 2016 at 6:28 PM, Zameer Manji <zm...@apache.org> wrote:
> > > Resending my emails from a domain that has mailing list friendly DMARC
> > > settings.
> > >
> > > It seems that we have achieved some consensus on the design, others
> feel
> > > free to weigh in.
> > >
> > > On Fri, Sep 2, 2016 at 5:29 PM, Zameer Manji <zm...@uber.com> wrote:
> > >
> > >> Kai,
> > >>
> > >> We have had coupled deploys before, I don't think it's too terrible.
> > It's
> > >> something to note in the release notes and some operational pain for
> > large
> > >> users.
> > >>
> > >> On Fri, Sep 2, 2016 at 4:42 PM, 黄 凯 <te...@hotmail.com> wrote:
> > >>
> > >> > Another concern is that once we rolled out the new executor, we
> should
> > >> > rolled out a new client in order to use the health-check feature.
> > Hence
> > >> the
> > >> > executor and client rolling out process seem to be coupled.
> > >> >
> > >> >
> > >> >
> > >> >
> > >> > ------------------------------
> > >> > *发件人:* 黄 凯 <te...@hotmail.com>
> > >> > *发送时间:* 2016年9月3日 7:23
> > >> > *收件人:* Zameer Manji; dev@aurora.apache.org
> > >> > *抄送:* Joshua Cohen; serb@apache.org; caldima@gmail.com;
> > >> > rdelval1@binghamton.edu
> > >> > *主题:* 答复: Discussion on review request 51536
> > >> >
> > >> >
> > >> > Thanks for the new proposal, Zameer. It sounds good to me. The
> > benefit is
> > >> > that it does not alter the current infrastructure too much.
> > >> >
> > >> >
> > >> > However, there is one thing to keep in mind:
> > >> >
> > >> > we currently do a check to ensure watch_sec is longer than
> > >> > initial_interval_secs. We will have to remove the alert message if
> we
> > >> > choose to skip watch_sec by setting it as zero.
> > >> >
> > >> >
> > >> > So the new configuration will not support executor-driven health
> check
> > >> > unless the executors are rolled out 100%.
> > >> >
> > >> >
> > >> > Does this tradeoff seems OK for us, Maxim?
> > >> >
> > >> >
> > >> > Kai
> > >> >
> > >> >
> > >> > ------------------------------
> > >> > *发件人:* Zameer Manji <zm...@uber.com>
> > >> > *发送时间:* 2016年9月3日 6:53
> > >> > *收件人:* dev@aurora.apache.org
> > >> > *抄送:* 黄 凯; Joshua Cohen; serb@apache.org; caldima@gmail.com;
> > >> > rdelval1@binghamton.edu
> > >> > *主题:* Re: Discussion on review request 51536
> > >> >
> > >> >
> > >> >
> > >> > On Fri, Sep 2, 2016 at 3:24 PM, Maxim Khutornenko <maxim@apache.org
> >
> > >> > wrote:
> > >> >
> > >> >> Need to correct a few previous statements:
> > >> >>
> > >> >> > Also we do not want to expose this message to users.
> > >> >> This is incorrect. The original design proposal suggested to show
> > this
> > >> >> message in the UI as: "Task is healthy"
> > >> >>
> > >> >
> > >> > Does this mean the message in the status update is going to be
> > exactly,
> > >> > "Task is healthy" and the scheduler is going to check for this
> string
> > in
> > >> > the `TASK_RUNNING` status update? This means we are going to
> > establish a
> > >> > communication
> > >> > mechanism between the executor and scheduler that's not defined by a
> > >> > schema. I feel that's worse than putting JSON in there and having
> the
> > >> > scheduler parse it.
> > >> >
> > >> >
> > >> >> > The Mesos API isn't designed for packing arbitrary data
> > >> >> > in the status update message.
> > >> >> Don't think I agree, this is exactly what this field is for [1] and
> > we
> > >> >> already use it for other states [2].
> > >> >>
> > >> >
> > >> > I guess I should have said 'structured arbitrary data'. The
> > >> informational,
> > >> > messages are fine and we plumb them blindly into our logging and UI.
> > I'm
> > >> > not convinced we should start putting JSON or something more
> > structured
> > >> in
> > >> > there. That's yet another schema we have and yet another versioning
> > story
> > >> > we have to go though. This also complicates matters for custom
> > executor
> > >> > authors.
> > >> >
> > >> >
> > >> >>
> > >> >> > I would be open to just saying that scheduler version
> > >> >> > 0.16 (or 0.17) just assumes the executor transitions to
> > >> >> > RUNNING once a task is healthy and dropping
> > >> >> > `watch_secs`entirely.
> > >> >> We can't drop 'watch_secs' entirely as we still have to babysit job
> > >> >> updates that don't have health checks enabled.
> > >> >>
> > >> >
> > >> > Understood. I guess we can keep it but I'm now frustrated that we
> > have a
> > >> > parameter that is ignored if we set some json in
> ExecutorConfig.data.
> > >> > Ideally, we don't accept `watch_secs` if we want health check driven
> > >> > updates. As mentioned before I don't like this implicit tightening
> of
> > the
> > >> > executor and the scheduler.
> > >> >
> > >> >
> > >> >>
> > >> >> As for my take on the above, I favor #1 as the simplest answer to
> an
> > >> >> already simple question: "Should we use watch_secs for this
> instance
> > >> >> or not?". That's pretty much it. Scheduler does not need any schema
> > >> >> changes, know what health checks are or if a job has them enabled.
> At
> > >> >> least not until we attempt to move to centralized health checks
> > >> >> (AURORA-279) but that will be an entirely different design
> > discussion.
> > >> >>
> > >> >> [1] - https://github.com/apache/mesos/blob/master/include/mesos/
> > <https://github.com/apache/mesos/blob/master/include/mesos/>
> > apache/mesos <https://github.com/apache/mesos/blob/master/include/mesos/
> >
> > github.com
> > mesos - Mirror of Apache Mesos
> >
> >
> > >> >> mesos.proto#L1605.
> > >> >> [2] - https://github.com/apache/aurora/blob/5cad046fc0f0c4bb79a456
> > >> >> 3cfcff0442b7bf8383/src/main/python/apache/aurora/executor/
> > >> >> aurora_executor.py#L97
> > >> >
> > >> >
> > >> >
> > >> > With all of this in mind, I have another proposal. Why can't we have
> > the
> > >> > executor changes (wait until the task is healthy for RUNNING) *and*
> > read
> > >> > `watch_secs` if it is set? Why not have both of these features and
> if
> > >> users
> > >> > want purely health checking driven updates they can set this value
> to
> > 0
> > >> and
> > >> > enable health checks. If they want to have both health checking and
> > time
> > >> > driven updates they can set this to value to the time that they care
> > >> about.
> > >> > If they just want time driven updates they can disable health
> checking
> > >> and
> > >> > set this value.
> > >> >
> > >> > Then there is no coupling between the executor and the scheduler
> > except
> > >> > for status updates and there is no dependency on the `message` field
> > of
> > >> the
> > >> > status update.
> > >> >
> > >> > We could even treat `watch_secs` as minimum time in STARTING +
> RUNNING
> > >> > instead of RUNNING with this change and it becomes the lower bound
> in
> > the
> > >> > update transition speed. This can ensure that users don't deploy
> "too
> > >> fast"
> > >> > and end up overwhelming other services if they are deployed too
> > quickly.
> > >> >
> > >> >
> > >> >
> > >> >>
> > >> >>
> > >> >> On Fri, Sep 2, 2016 at 2:26 PM, Zameer Manji <zm...@apache.org>
> > wrote:
> > >> >> > *cc: Renan*
> > >> >> >
> > >> >> > I think there is some disagreement/discussion on the review
> > because we
> > >> >> have
> > >> >> > not achieved consensus on the design. Since the design doc was
> > >> written,
> > >> >> > Aurora adopted multiple executor support as well non HTTP based
> > >> >> > healthchecking. This invalidates some parts of the original
> > design. I
> > >> >> think
> > >> >> > all of the solutions here are possible amendments to the design
> > doc.
> > >> >> >
> > >> >> > I am not in favor of Solution 2 at all because status updates
> > between
> > >> >> > executor <-> agent <-> master <-> scheduler are designed to
> update
> > the
> > >> >> > framework of updates to the task and not really designed to send
> > >> >> arbitrary
> > >> >> > information. Just because the Mesos API provides us with a string
> > >> field
> > >> >> > doesn't mean we should try to pack in arbitrary data. Also, it
> > isn't
> > >> >> clear
> > >> >> > what other capabilities we might add in the future so I'm
> > unconvinced
> > >> >> that
> > >> >> > capabilities needs to exist at all. My fear is that we will
> create
> > the
> > >> >> > infrastructure for capabilities just to serve this need and
> nothing
> > >> >> else.
> > >> >> >
> > >> >> > I object to Solution 1 along the same lines. The Mesos API isn't
> > >> >> designed
> > >> >> > for packing arbitrary data in the status update message and I
> don't
> > >> >> think
> > >> >> > we should abuse that and rely on that. Also our current
> > infrastructure
> > >> >> just
> > >> >> > plumbs the message to the UI and I think displaying capabilities
> is
> > >> not
> > >> >> > something we should do.
> > >> >> >
> > >> >> > I am in favor of Solution 3 which is as close as possible to the
> > >> >> original
> > >> >> > design in the design doc. The design doc says the following:
> > >> >> >
> > >> >> > Scheduler updater will skip the minWaitInInstanceMs (aka
> watch_secs
> > >> >> >> <https://github.com/apache/aurora/blob/4b43305b33cd8bebdd802
> > >> >> 25a3987b7cc7a8389a2/docs/configuration-reference.md#
> > >> updateconfig-objects
> > >> >> >)
> > >> >> >> grace period any time it detects a named port ‘health’ in task
> > >> >> >> configuration. A RUNNING instance status will signify the end of
> > >> >> instance
> > >> >> >> update.
> > >> >> >
> > >> >> >
> > >> >> > Instead of detecting the 'health' port in the task configuration,
> > we
> > >> >> make
> > >> >> > enabling this feature explicitly by enabling a bit in the task
> > >> >> > configuration with the `executorDrivenUpdates` bit.
> > >> >> >
> > >> >> > I understand this option makes this feature more complex because
> it
> > >> >> > requires a schema change and requires operators to deploy the
> > executor
> > >> >> to
> > >> >> > all agents before upgrading the client. However, I think that's a
> > one
> > >> >> time
> > >> >> > operational cost as a opposed to long lived design choices that
> > will
> > >> >> affect
> > >> >> > the code.
> > >> >> >
> > >> >> > Further Solution 3 is the most amenable to custom executors and
> > >> >> continues
> > >> >> > our tradition of treating executors as opaque black boxes. I
> think
> > >> >> there is
> > >> >> > a lot of value in treating executors as black boxes as it leaves
> > the
> > >> >> door
> > >> >> > open to switching our executor to something else and doesn't
> > impose a
> > >> >> > burden to others that want to write their own.
> > >> >> >
> > >> >> > Alternatively, if amending the schema is too much work, I would
> be
> > >> open
> > >> >> to
> > >> >> > just saying that scheduler version 0.16 (or 0.17) just assumes
> the
> > >> >> executor
> > >> >> > transitions to RUNNING once a task is healthy and dropping
> > >> `watch_secs`
> > >> >> > entirely. We can put it in the release notes that operators must
> > >> deploy
> > >> >> the
> > >> >> > executor to 100% before deploying the scheduler.
> > >> >> >
> > >> >> >
> > >> >> > On Thu, Sep 1, 2016 at 6:40 PM, 黄 凯 <te...@hotmail.com>
> > wrote:
> > >> >> >
> > >> >> >> Hi Folks,
> > >> >> >>
> > >> >> >> I'm currently working on a feature on aurora scheduler and
> > executor.
> > >> >> The
> > >> >> >> implementation strategy became controversial on the review
> board,
> > so
> > >> I
> > >> >> was
> > >> >> >> wondering if I should broadcast it to more audience and
> initiate a
> > >> >> >> discussion. Please feel free to let me know your thoughts, your
> > help
> > >> is
> > >> >> >> greatly appreciated!
> > >> >> >>
> > >> >> >> The high level goal of this feature is to improve reliability
> and
> > >> >> >> performance of the Aurora scheduler job updater, by relying on
> > health
> > >> >> check
> > >> >> >> status rather than watch_secs timeout when deciding an
> individual
> > >> >> instance
> > >> >> >> update state.
> > >> >> >>
> > >> >> >> Please see the original review request *
> > >> https://reviews.apache.org/r/
> > >> >> 51536/
> > >> >> >> <https://reviews.apache.org/r/51536/> *
> > >> >> >> aurora JIRA ticket *https://issues.apache.org/
> > jira/browse/AURORA-894
> > >> >> >> <https://issues.apache.org/jira/browse/AURORA-894>*
> > >> >> >> design doc *https://docs.google.com/docum
> > >> >> ent/d/1ZdgW8S4xMhvKW7iQUX99xZm10NXSxEWR0a-21FP5d94/edit#
> > >> >> >> <https://docs.google.com/document/d/1ZdgW8S4xMhvKW7iQUX99xZm
> > >> >> 10NXSxEWR0a-21FP5d94/edit#>*
> > >> >> >> for more details and background.
> > >> >> >>
> > >> >> >> Note: The design doc becomes a little bit outdated on the
> > "scheduler
> > >> >> >> change summary" part (this is what the review request trying to
> > >> >> address).
> > >> >> >> As a result, I've left some comment to clarify the latest
> proposed
> > >> >> >> implementation plan for scheduler change.
> > >> >> >>
> > >> >> >> There are two questions I'm trying to address here:
> > >> >> >> *1. How does the scheduler infer the executor version and be
> > backward
> > >> >> >> compatible?*
> > >> >> >> *2. Where do we determine if health check is enabled?*
> > >> >> >>
> > >> >> >> In short, there are 3 different solutions proposed on the review
> > >> board.
> > >> >> >>
> > >> >> >> In the first two approaches, the scheduler will rely on a string
> > to
> > >> >> >> determine the executor version. We determine whether health
> check
> > is
> > >> >> >> enabled merely on executor side. There will be communication
> > between
> > >> >> the
> > >> >> >> executor and the scheduler.
> > >> >> >> *Solution 1: *
> > >> >> >> *vCurrent executor sends a message in its health check thread
> > during
> > >> >> >> RUNNING state transition, and the vCurrent updater will infer
> the
> > >> >> executor
> > >> >> >> version from the presence of this message, and skip the
> > watch_secs if
> > >> >> >> necessary.*
> > >> >> >>
> > >> >> >> *Solution 2:*
> > >> >> >> *Instead of relying on the presence of an arbitrary string in
> the
> > >> >> message,
> > >> >> >> rely on the presence of a string like:
> > >> >> >> "capabilities:CAPABILITY_1,CAPABILITY-2" where CAPABILITY_1 and
> > >> >> >> CAPABILITY_2 (etc.) are constants defined in api.thrift.
> Basically
> > >> just
> > >> >> >> formalizing the mechanism and making it a bit more future
> proof.*
> > >> >> >>
> > >> >> >> In the third solution, the scheduler infers the executor version
> > from
> > >> >> the
> > >> >> >> JobUpdateSettings on scheduler side.
> > >> >> >> *Solution 3:*
> > >> >> >> *Adding a bit to JobUpdateSettings which is
> > ‘executorDrivenUpdates',
> > >> if
> > >> >> >> that is set, the scheduler assumes that the transition from
> > STARTING
> > >> ->
> > >> >> >> RUNNING makes the executor healthy and concurrently, we release
> > >> >> thermos and
> > >> >> >> change HealthCheckConfig to say that it should only go to
> running
> > >> after
> > >> >> >> healthy*.
> > >> >> >>
> > >> >> >> *Pros and Cons:*
> > >> >> >> The main benefit of Solution 1 is:
> > >> >> >> 1. By using the message in task status update, we don't have to
> > make
> > >> >> any
> > >> >> >> schema change, which makes the design simple.
> > >> >> >> 2. The feature is fully backward-compatible. When we roll out
> the
> > >> >> vCurrent
> > >> >> >> schedulers and executors, we do not have to instruct the users
> to
> > >> >> provide
> > >> >> >> additional field in the Job or Update configs, which could
> > confuses
> > >> >> >> customers when the vPrev and vCurrent executor coexist in the
> > >> cluster.
> > >> >> >>
> > >> >> >> Concerns:
> > >> >> >> Relying on the presence of a message makes things brittle. Also
> > we do
> > >> >> not
> > >> >> >> want to expose this message to users.
> > >> >> >>
> > >> >> >> The benefit of Solution 2 is making the feature more future
> proof.
> > >> >> >> However, if we do not envision a new executor feature in the
> short
> > >> >> term,
> > >> >> >> it's not too much different from Solution 1.
> > >> >> >>
> > >> >> >> The benefits of Solution 3 include:
> > >> >> >> 1. We support more than just thermos now (and others rely on
> > custom
> > >> >> >> executors).
> > >> >> >> 2. A lot of things in Aurora treat the executor as opaque. The
> > status
> > >> >> >> update message sent by executor should not be visible to users
> > only
> > >> if
> > >> >> it's
> > >> >> >> an error message.
> > >> >> >>
> > >> >> >> Concerns:
> > >> >> >> 1. In addition to the ‘executorDrivenUpdates' bit that
> identifies
> > the
> > >> >> >> executor version, we still need to notify the scheduler if
> health
> > >> >> check is
> > >> >> >> enabled on vCurrent executor, if not, the scheduler must be able
> > to
> > >> >> fall
> > >> >> >> back to use watch_secs.
> > >> >> >> 2. The users have to provide an additional field in their
> .aurora
> > >> >> config
> > >> >> >> files. The feature wouldn't be available unless new clients are
> > >> rolled
> > >> >> out
> > >> >> >> as well.
> > >> >> >>
> > >> >> >> Please let me know if I understand your suggestions correctly
> and
> > >> >> >> hopefully everyone is on the same page!
> > >> >> >>
> > >> >> >> Thanks,
> > >> >> >>
> > >> >> >> Kai
> > >> >> >>
> > >> >>
> > >> >
> > >> >
> > >> >
> > >> > --
> > >> > Zameer Manji
> > >> >
> > >>
> > >>
> > >>
> > >> --
> > >> Zameer Manji
> > >>
> >
>

Re: 答复: Discussion on review request 51536

Posted by Renan DelValle <rd...@binghamton.edu>.

+1 On Zameer's solution as that keeps the scheduler executor agnostic. In
my opinion, it is a reasonable requirement to expect an executor to send
the RUNNING status update only when it is confident that the task has been
launched successfully. This also provides great flexibility for those
creating custom executors because a task entering a healthy running state
can mean different things depending on the executor. For example,
docker-compose-executor needs to launch multiple containers. This can take
a while and should only be considered to be healthy if all containers are
running properly.

Thanks for the proposal Zameer, and thanks for the work on this Kai.

On Sat, Sep 3, 2016 at 1:39 AM, 黄 凯 <te...@hotmail.com> wrote:

> +1 to Zameer's idea. Now we have three persons on board.
>
>
> ------------------------------
> *From:* Maxim Khutornenko <ma...@apache.org>
> *Sent:* Friday, September 2, 2016 18:36
> *To:* dev@aurora.apache.org
> *Cc:* 黄 凯; Joshua Cohen; serb@apache.org; caldima@gmail.com;
> rdelval1@binghamton.edu
> *Subject:* Re: 答复: Discussion on review request 51536
>
> Just to summarize Zameer's proposal buried deep in the quotation
> stream: we keep watch_secs but let users set it to 0 iff they have
> health checks enabled. No scheduler (updater) changes needed at all.
> Users will need to opt-in to use the new feature by changing
> watch_secs to 0 in their configs (or skipping it completely, which
> will set the default to 0 automatically).
>
> I am +1 to this idea. Thanks Zameer!
>
> On Fri, Sep 2, 2016 at 6:28 PM, Zameer Manji <zm...@apache.org> wrote:
> > Resending my emails from a domain that has mailing list friendly DMARC
> > settings.
> >
> > It seems that we have achieved some consensus on the design, others feel
> > free to weigh in.
> >
> > On Fri, Sep 2, 2016 at 5:29 PM, Zameer Manji <zm...@uber.com> wrote:
> >
> >> Kai,
> >>
> >> We have had coupled deploys before, I don't think it's too terrible.
> It's
> >> something to note in the release notes and some operational pain for
> large
> >> users.
> >>
> >> On Fri, Sep 2, 2016 at 4:42 PM, 黄 凯 <te...@hotmail.com> wrote:
> >>
> >> > Another concern is that once we rolled out the new executor, we should
> >> > rolled out a new client in order to use the health-check feature.
> Hence
> >> the
> >> > executor and client rolling out process seem to be coupled.
> >> >
> >> >
> >> >
> >> >
> >> > ------------------------------
> >> > *发件人:* 黄 凯 <te...@hotmail.com>
> >> > *发送时间:* 2016年9月3日 7:23
> >> > *收件人:* Zameer Manji; dev@aurora.apache.org
> >> > *抄送:* Joshua Cohen; serb@apache.org; caldima@gmail.com;
> >> > rdelval1@binghamton.edu
> >> > *主题:* 答复: Discussion on review request 51536
> >> >
> >> >
> >> > Thanks for the new proposal, Zameer. It sounds good to me. The
> benefit is
> >> > that it does not alter the current infrastructure too much.
> >> >
> >> >
> >> > However, there is one thing to keep in mind:
> >> >
> >> > we currently do a check to ensure watch_sec is longer than
> >> > initial_interval_secs. We will have to remove the alert message if we
> >> > choose to skip watch_sec by setting it as zero.
> >> >
> >> >
> >> > So the new configuration will not support executor-driven health check
> >> > unless the executors are rolled out 100%.
> >> >
> >> >
> >> > Does this tradeoff seems OK for us, Maxim?
> >> >
> >> >
> >> > Kai
> >> >
> >> >
> >> > ------------------------------
> >> > *发件人:* Zameer Manji <zm...@uber.com>
> >> > *发送时间:* 2016年9月3日 6:53
> >> > *收件人:* dev@aurora.apache.org
> >> > *抄送:* 黄 凯; Joshua Cohen; serb@apache.org; caldima@gmail.com;
> >> > rdelval1@binghamton.edu
> >> > *主题:* Re: Discussion on review request 51536
> >> >
> >> >
> >> >
> >> > On Fri, Sep 2, 2016 at 3:24 PM, Maxim Khutornenko <ma...@apache.org>
> >> > wrote:
> >> >
> >> >> Need to correct a few previous statements:
> >> >>
> >> >> > Also we do not want to expose this message to users.
> >> >> This is incorrect. The original design proposal suggested to show
> this
> >> >> message in the UI as: "Task is healthy"
> >> >>
> >> >
> >> > Does this mean the message in the status update is going to be
> exactly,
> >> > "Task is healthy" and the scheduler is going to check for this string
> in
> >> > the `TASK_RUNNING` status update? This means we are going to
> establish a
> >> > communication
> >> > mechanism between the executor and scheduler that's not defined by a
> >> > schema. I feel that's worse than putting JSON in there and having the
> >> > scheduler parse it.
> >> >
> >> >
> >> >> > The Mesos API isn't designed for packing arbitrary data
> >> >> > in the status update message.
> >> >> Don't think I agree, this is exactly what this field is for [1] and
> we
> >> >> already use it for other states [2].
> >> >>
> >> >
> >> > I guess I should have said 'structured arbitrary data'. The
> >> informational,
> >> > messages are fine and we plumb them blindly into our logging and UI.
> I'm
> >> > not convinced we should start putting JSON or something more
> structured
> >> in
> >> > there. That's yet another schema we have and yet another versioning
> story
> >> > we have to go though. This also complicates matters for custom
> executor
> >> > authors.
> >> >
> >> >
> >> >>
> >> >> > I would be open to just saying that scheduler version
> >> >> > 0.16 (or 0.17) just assumes the executor transitions to
> >> >> > RUNNING once a task is healthy and dropping
> >> >> > `watch_secs`entirely.
> >> >> We can't drop 'watch_secs' entirely as we still have to babysit job
> >> >> updates that don't have health checks enabled.
> >> >>
> >> >
> >> > Understood. I guess we can keep it but I'm now frustrated that we
> have a
> >> > parameter that is ignored if we set some json in ExecutorConfig.data.
> >> > Ideally, we don't accept `watch_secs` if we want health check driven
> >> > updates. As mentioned before I don't like this implicit tightening of
> the
> >> > executor and the scheduler.
> >> >
> >> >
> >> >>
> >> >> As for my take on the above, I favor #1 as the simplest answer to an
> >> >> already simple question: "Should we use watch_secs for this instance
> >> >> or not?". That's pretty much it. Scheduler does not need any schema
> >> >> changes, know what health checks are or if a job has them enabled. At
> >> >> least not until we attempt to move to centralized health checks
> >> >> (AURORA-279) but that will be an entirely different design
> discussion.
> >> >>
> >> >> [1] - https://github.com/apache/mesos/blob/master/include/mesos/
> <https://github.com/apache/mesos/blob/master/include/mesos/>
> apache/mesos <https://github.com/apache/mesos/blob/master/include/mesos/>
> github.com
> mesos - Mirror of Apache Mesos
>
>
> >> >> mesos.proto#L1605.
> >> >> [2] - https://github.com/apache/aurora/blob/5cad046fc0f0c4bb79a456
> >> >> 3cfcff0442b7bf8383/src/main/python/apache/aurora/executor/
> >> >> aurora_executor.py#L97
> >> >
> >> >
> >> >
> >> > With all of this in mind, I have another proposal. Why can't we have
> the
> >> > executor changes (wait until the task is healthy for RUNNING) *and*
> read
> >> > `watch_secs` if it is set? Why not have both of these features and if
> >> users
> >> > want purely health checking driven updates they can set this value to
> 0
> >> and
> >> > enable health checks. If they want to have both health checking and
> time
> >> > driven updates they can set this to value to the time that they care
> >> about.
> >> > If they just want time driven updates they can disable health checking
> >> and
> >> > set this value.
> >> >
> >> > Then there is no coupling between the executor and the scheduler
> except
> >> > for status updates and there is no dependency on the `message` field
> of
> >> the
> >> > status update.
> >> >
> >> > We could even treat `watch_secs` as minimum time in STARTING + RUNNING
> >> > instead of RUNNING with this change and it becomes the lower bound in
> the
> >> > update transition speed. This can ensure that users don't deploy "too
> >> fast"
> >> > and end up overwhelming other services if they are deployed too
> quickly.
> >> >
> >> >
> >> >
> >> >>
> >> >>
> >> >> On Fri, Sep 2, 2016 at 2:26 PM, Zameer Manji <zm...@apache.org>
> wrote:
> >> >> > *cc: Renan*
> >> >> >
> >> >> > I think there is some disagreement/discussion on the review
> because we
> >> >> have
> >> >> > not achieved consensus on the design. Since the design doc was
> >> written,
> >> >> > Aurora adopted multiple executor support as well non HTTP based
> >> >> > healthchecking. This invalidates some parts of the original
> design. I
> >> >> think
> >> >> > all of the solutions here are possible amendments to the design
> doc.
> >> >> >
> >> >> > I am not in favor of Solution 2 at all because status updates
> between
> >> >> > executor <-> agent <-> master <-> scheduler are designed to update
> the
> >> >> > framework of updates to the task and not really designed to send
> >> >> arbitrary
> >> >> > information. Just because the Mesos API provides us with a string
> >> field
> >> >> > doesn't mean we should try to pack in arbitrary data. Also, it
> isn't
> >> >> clear
> >> >> > what other capabilities we might add in the future so I'm
> unconvinced
> >> >> that
> >> >> > capabilities needs to exist at all. My fear is that we will create
> the
> >> >> > infrastructure for capabilities just to serve this need and nothing
> >> >> else.
> >> >> >
> >> >> > I object to Solution 1 along the same lines. The Mesos API isn't
> >> >> designed
> >> >> > for packing arbitrary data in the status update message and I don't
> >> >> think
> >> >> > we should abuse that and rely on that. Also our current
> infrastructure
> >> >> just
> >> >> > plumbs the message to the UI and I think displaying capabilities is
> >> not
> >> >> > something we should do.
> >> >> >
> >> >> > I am in favor of Solution 3 which is as close as possible to the
> >> >> original
> >> >> > design in the design doc. The design doc says the following:
> >> >> >
> >> >> > Scheduler updater will skip the minWaitInInstanceMs (aka watch_secs
> >> >> >> <https://github.com/apache/aurora/blob/4b43305b33cd8bebdd802
> >> >> 25a3987b7cc7a8389a2/docs/configuration-reference.md#
> >> updateconfig-objects
> >> >> >)
> >> >> >> grace period any time it detects a named port ‘health’ in task
> >> >> >> configuration. A RUNNING instance status will signify the end of
> >> >> instance
> >> >> >> update.
> >> >> >
> >> >> >
> >> >> > Instead of detecting the 'health' port in the task configuration,
> we
> >> >> make
> >> >> > enabling this feature explicitly by enabling a bit in the task
> >> >> > configuration with the `executorDrivenUpdates` bit.
> >> >> >
> >> >> > I understand this option makes this feature more complex because it
> >> >> > requires a schema change and requires operators to deploy the
> executor
> >> >> to
> >> >> > all agents before upgrading the client. However, I think that's a
> one
> >> >> time
> >> >> > operational cost as a opposed to long lived design choices that
> will
> >> >> affect
> >> >> > the code.
> >> >> >
> >> >> > Further Solution 3 is the most amenable to custom executors and
> >> >> continues
> >> >> > our tradition of treating executors as opaque black boxes. I think
> >> >> there is
> >> >> > a lot of value in treating executors as black boxes as it leaves
> the
> >> >> door
> >> >> > open to switching our executor to something else and doesn't
> impose a
> >> >> > burden to others that want to write their own.
> >> >> >
> >> >> > Alternatively, if amending the schema is too much work, I would be
> >> open
> >> >> to
> >> >> > just saying that scheduler version 0.16 (or 0.17) just assumes the
> >> >> executor
> >> >> > transitions to RUNNING once a task is healthy and dropping
> >> `watch_secs`
> >> >> > entirely. We can put it in the release notes that operators must
> >> deploy
> >> >> the
> >> >> > executor to 100% before deploying the scheduler.
> >> >> >
> >> >> >
> >> >> > On Thu, Sep 1, 2016 at 6:40 PM, 黄 凯 <te...@hotmail.com>
> wrote:
> >> >> >
> >> >> >> Hi Folks,
> >> >> >>
> >> >> >> I'm currently working on a feature on aurora scheduler and
> executor.
> >> >> The
> >> >> >> implementation strategy became controversial on the review board,
> so
> >> I
> >> >> was
> >> >> >> wondering if I should broadcast it to more audience and initiate a
> >> >> >> discussion. Please feel free to let me know your thoughts, your
> help
> >> is
> >> >> >> greatly appreciated!
> >> >> >>
> >> >> >> The high level goal of this feature is to improve reliability and
> >> >> >> performance of the Aurora scheduler job updater, by relying on
> health
> >> >> check
> >> >> >> status rather than watch_secs timeout when deciding an individual
> >> >> instance
> >> >> >> update state.
> >> >> >>
> >> >> >> Please see the original review request *
> >> https://reviews.apache.org/r/
> >> >> 51536/
> >> >> >> <https://reviews.apache.org/r/51536/> *
> >> >> >> aurora JIRA ticket *https://issues.apache.org/
> jira/browse/AURORA-894
> >> >> >> <https://issues.apache.org/jira/browse/AURORA-894>*
> >> >> >> design doc *https://docs.google.com/docum
> >> >> ent/d/1ZdgW8S4xMhvKW7iQUX99xZm10NXSxEWR0a-21FP5d94/edit#
> >> >> >> <https://docs.google.com/document/d/1ZdgW8S4xMhvKW7iQUX99xZm
> >> >> 10NXSxEWR0a-21FP5d94/edit#>*
> >> >> >> for more details and background.
> >> >> >>
> >> >> >> Note: The design doc becomes a little bit outdated on the
> "scheduler
> >> >> >> change summary" part (this is what the review request trying to
> >> >> address).
> >> >> >> As a result, I've left some comment to clarify the latest proposed
> >> >> >> implementation plan for scheduler change.
> >> >> >>
> >> >> >> There are two questions I'm trying to address here:
> >> >> >> *1. How does the scheduler infer the executor version and be
> backward
> >> >> >> compatible?*
> >> >> >> *2. Where do we determine if health check is enabled?*
> >> >> >>
> >> >> >> In short, there are 3 different solutions proposed on the review
> >> board.
> >> >> >>
> >> >> >> In the first two approaches, the scheduler will rely on a string
> to
> >> >> >> determine the executor version. We determine whether health check
> is
> >> >> >> enabled merely on executor side. There will be communication
> between
> >> >> the
> >> >> >> executor and the scheduler.
> >> >> >> *Solution 1: *
> >> >> >> *vCurrent executor sends a message in its health check thread
> during
> >> >> >> RUNNING state transition, and the vCurrent updater will infer the
> >> >> executor
> >> >> >> version from the presence of this message, and skip the
> watch_secs if
> >> >> >> necessary.*
> >> >> >>
> >> >> >> *Solution 2:*
> >> >> >> *Instead of relying on the presence of an arbitrary string in the
> >> >> message,
> >> >> >> rely on the presence of a string like:
> >> >> >> "capabilities:CAPABILITY_1,CAPABILITY-2" where CAPABILITY_1 and
> >> >> >> CAPABILITY_2 (etc.) are constants defined in api.thrift. Basically
> >> just
> >> >> >> formalizing the mechanism and making it a bit more future proof.*
> >> >> >>
> >> >> >> In the third solution, the scheduler infers the executor version
> from
> >> >> the
> >> >> >> JobUpdateSettings on scheduler side.
> >> >> >> *Solution 3:*
> >> >> >> *Adding a bit to JobUpdateSettings which is
> ‘executorDrivenUpdates',
> >> if
> >> >> >> that is set, the scheduler assumes that the transition from
> STARTING
> >> ->
> >> >> >> RUNNING makes the executor healthy and concurrently, we release
> >> >> thermos and
> >> >> >> change HealthCheckConfig to say that it should only go to running
> >> after
> >> >> >> healthy*.
> >> >> >>
> >> >> >> *Pros and Cons:*
> >> >> >> The main benefit of Solution 1 is:
> >> >> >> 1. By using the message in task status update, we don't have to
> make
> >> >> any
> >> >> >> schema change, which makes the design simple.
> >> >> >> 2. The feature is fully backward-compatible. When we roll out the
> >> >> vCurrent
> >> >> >> schedulers and executors, we do not have to instruct the users to
> >> >> provide
> >> >> >> additional field in the Job or Update configs, which could
> confuses
> >> >> >> customers when the vPrev and vCurrent executor coexist in the
> >> cluster.
> >> >> >>
> >> >> >> Concerns:
> >> >> >> Relying on the presence of a message makes things brittle. Also
> we do
> >> >> not
> >> >> >> want to expose this message to users.
> >> >> >>
> >> >> >> The benefit of Solution 2 is making the feature more future proof.
> >> >> >> However, if we do not envision a new executor feature in the short
> >> >> term,
> >> >> >> it's not too much different from Solution 1.
> >> >> >>
> >> >> >> The benefits of Solution 3 include:
> >> >> >> 1. We support more than just thermos now (and others rely on
> custom
> >> >> >> executors).
> >> >> >> 2. A lot of things in Aurora treat the executor as opaque. The
> status
> >> >> >> update message sent by executor should not be visible to users
> only
> >> if
> >> >> it's
> >> >> >> an error message.
> >> >> >>
> >> >> >> Concerns:
> >> >> >> 1. In addition to the ‘executorDrivenUpdates' bit that identifies
> the
> >> >> >> executor version, we still need to notify the scheduler if health
> >> >> check is
> >> >> >> enabled on vCurrent executor, if not, the scheduler must be able
> to
> >> >> fall
> >> >> >> back to use watch_secs.
> >> >> >> 2. The users have to provide an additional field in their .aurora
> >> >> config
> >> >> >> files. The feature wouldn't be available unless new clients are
> >> rolled
> >> >> out
> >> >> >> as well.
> >> >> >>
> >> >> >> Please let me know if I understand your suggestions correctly and
> >> >> >> hopefully everyone is on the same page!
> >> >> >>
> >> >> >> Thanks,
> >> >> >>
> >> >> >> Kai
> >> >> >>
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > Zameer Manji
> >> >
> >>
> >>
> >>
> >> --
> >> Zameer Manji
> >>
>

Re: 答复: Discussion on review request 51536

Posted by Maxim Khutornenko <ma...@apache.org>.

Just to summarize Zameer's proposal buried deep in the quotation
stream: we keep watch_secs but let users set it to 0 iff they have
health checks enabled. No scheduler (updater) changes needed at all.
Users will need to opt-in to use the new feature by changing
watch_secs to 0 in their configs (or skipping it completely, which
will set the default to 0 automatically).

I am +1 to this idea. Thanks Zameer!

On Fri, Sep 2, 2016 at 6:28 PM, Zameer Manji <zm...@apache.org> wrote:
> Resending my emails from a domain that has mailing list friendly DMARC
> settings.
>
> It seems that we have achieved some consensus on the design, others feel
> free to weigh in.
>
> On Fri, Sep 2, 2016 at 5:29 PM, Zameer Manji <zm...@uber.com> wrote:
>
>> Kai,
>>
>> We have had coupled deploys before, I don't think it's too terrible. It's
>> something to note in the release notes and some operational pain for large
>> users.
>>
>> On Fri, Sep 2, 2016 at 4:42 PM, 黄 凯 <te...@hotmail.com> wrote:
>>
>> > Another concern is that once we rolled out the new executor, we should
>> > rolled out a new client in order to use the health-check feature. Hence
>> the
>> > executor and client rolling out process seem to be coupled.
>> >
>> >
>> >
>> >
>> > ------------------------------
>> > *发件人:* 黄 凯 <te...@hotmail.com>
>> > *发送时间:* 2016年9月3日 7:23
>> > *收件人:* Zameer Manji; dev@aurora.apache.org
>> > *抄送:* Joshua Cohen; serb@apache.org; caldima@gmail.com;
>> > rdelval1@binghamton.edu
>> > *主题:* 答复: Discussion on review request 51536
>> >
>> >
>> > Thanks for the new proposal, Zameer. It sounds good to me. The benefit is
>> > that it does not alter the current infrastructure too much.
>> >
>> >
>> > However, there is one thing to keep in mind:
>> >
>> > we currently do a check to ensure watch_sec is longer than
>> > initial_interval_secs. We will have to remove the alert message if we
>> > choose to skip watch_sec by setting it as zero.
>> >
>> >
>> > So the new configuration will not support executor-driven health check
>> > unless the executors are rolled out 100%.
>> >
>> >
>> > Does this tradeoff seems OK for us, Maxim?
>> >
>> >
>> > Kai
>> >
>> >
>> > ------------------------------
>> > *发件人:* Zameer Manji <zm...@uber.com>
>> > *发送时间:* 2016年9月3日 6:53
>> > *收件人:* dev@aurora.apache.org
>> > *抄送:* 黄 凯; Joshua Cohen; serb@apache.org; caldima@gmail.com;
>> > rdelval1@binghamton.edu
>> > *主题:* Re: Discussion on review request 51536
>> >
>> >
>> >
>> > On Fri, Sep 2, 2016 at 3:24 PM, Maxim Khutornenko <ma...@apache.org>
>> > wrote:
>> >
>> >> Need to correct a few previous statements:
>> >>
>> >> > Also we do not want to expose this message to users.
>> >> This is incorrect. The original design proposal suggested to show this
>> >> message in the UI as: "Task is healthy"
>> >>
>> >
>> > Does this mean the message in the status update is going to be exactly,
>> > "Task is healthy" and the scheduler is going to check for this string in
>> > the `TASK_RUNNING` status update? This means we are going to establish a
>> > communication
>> > mechanism between the executor and scheduler that's not defined by a
>> > schema. I feel that's worse than putting JSON in there and having the
>> > scheduler parse it.
>> >
>> >
>> >> > The Mesos API isn't designed for packing arbitrary data
>> >> > in the status update message.
>> >> Don't think I agree, this is exactly what this field is for [1] and we
>> >> already use it for other states [2].
>> >>
>> >
>> > I guess I should have said 'structured arbitrary data'. The
>> informational,
>> > messages are fine and we plumb them blindly into our logging and UI. I'm
>> > not convinced we should start putting JSON or something more structured
>> in
>> > there. That's yet another schema we have and yet another versioning story
>> > we have to go though. This also complicates matters for custom executor
>> > authors.
>> >
>> >
>> >>
>> >> > I would be open to just saying that scheduler version
>> >> > 0.16 (or 0.17) just assumes the executor transitions to
>> >> > RUNNING once a task is healthy and dropping
>> >> > `watch_secs`entirely.
>> >> We can't drop 'watch_secs' entirely as we still have to babysit job
>> >> updates that don't have health checks enabled.
>> >>
>> >
>> > Understood. I guess we can keep it but I'm now frustrated that we have a
>> > parameter that is ignored if we set some json in ExecutorConfig.data.
>> > Ideally, we don't accept `watch_secs` if we want health check driven
>> > updates. As mentioned before I don't like this implicit tightening of the
>> > executor and the scheduler.
>> >
>> >
>> >>
>> >> As for my take on the above, I favor #1 as the simplest answer to an
>> >> already simple question: "Should we use watch_secs for this instance
>> >> or not?". That's pretty much it. Scheduler does not need any schema
>> >> changes, know what health checks are or if a job has them enabled. At
>> >> least not until we attempt to move to centralized health checks
>> >> (AURORA-279) but that will be an entirely different design discussion.
>> >>
>> >> [1] - https://github.com/apache/mesos/blob/master/include/mesos/
>> >> mesos.proto#L1605.
>> >> [2] - https://github.com/apache/aurora/blob/5cad046fc0f0c4bb79a456
>> >> 3cfcff0442b7bf8383/src/main/python/apache/aurora/executor/
>> >> aurora_executor.py#L97
>> >
>> >
>> >
>> > With all of this in mind, I have another proposal. Why can't we have the
>> > executor changes (wait until the task is healthy for RUNNING) *and* read
>> > `watch_secs` if it is set? Why not have both of these features and if
>> users
>> > want purely health checking driven updates they can set this value to 0
>> and
>> > enable health checks. If they want to have both health checking and time
>> > driven updates they can set this to value to the time that they care
>> about.
>> > If they just want time driven updates they can disable health checking
>> and
>> > set this value.
>> >
>> > Then there is no coupling between the executor and the scheduler except
>> > for status updates and there is no dependency on the `message` field of
>> the
>> > status update.
>> >
>> > We could even treat `watch_secs` as minimum time in STARTING + RUNNING
>> > instead of RUNNING with this change and it becomes the lower bound in the
>> > update transition speed. This can ensure that users don't deploy "too
>> fast"
>> > and end up overwhelming other services if they are deployed too quickly.
>> >
>> >
>> >
>> >>
>> >>
>> >> On Fri, Sep 2, 2016 at 2:26 PM, Zameer Manji <zm...@apache.org> wrote:
>> >> > *cc: Renan*
>> >> >
>> >> > I think there is some disagreement/discussion on the review because we
>> >> have
>> >> > not achieved consensus on the design. Since the design doc was
>> written,
>> >> > Aurora adopted multiple executor support as well non HTTP based
>> >> > healthchecking. This invalidates some parts of the original design. I
>> >> think
>> >> > all of the solutions here are possible amendments to the design doc.
>> >> >
>> >> > I am not in favor of Solution 2 at all because status updates between
>> >> > executor <-> agent <-> master <-> scheduler are designed to update the
>> >> > framework of updates to the task and not really designed to send
>> >> arbitrary
>> >> > information. Just because the Mesos API provides us with a string
>> field
>> >> > doesn't mean we should try to pack in arbitrary data. Also, it isn't
>> >> clear
>> >> > what other capabilities we might add in the future so I'm unconvinced
>> >> that
>> >> > capabilities needs to exist at all. My fear is that we will create the
>> >> > infrastructure for capabilities just to serve this need and nothing
>> >> else.
>> >> >
>> >> > I object to Solution 1 along the same lines. The Mesos API isn't
>> >> designed
>> >> > for packing arbitrary data in the status update message and I don't
>> >> think
>> >> > we should abuse that and rely on that. Also our current infrastructure
>> >> just
>> >> > plumbs the message to the UI and I think displaying capabilities is
>> not
>> >> > something we should do.
>> >> >
>> >> > I am in favor of Solution 3 which is as close as possible to the
>> >> original
>> >> > design in the design doc. The design doc says the following:
>> >> >
>> >> > Scheduler updater will skip the minWaitInInstanceMs (aka watch_secs
>> >> >> <https://github.com/apache/aurora/blob/4b43305b33cd8bebdd802
>> >> 25a3987b7cc7a8389a2/docs/configuration-reference.md#
>> updateconfig-objects
>> >> >)
>> >> >> grace period any time it detects a named port ‘health’ in task
>> >> >> configuration. A RUNNING instance status will signify the end of
>> >> instance
>> >> >> update.
>> >> >
>> >> >
>> >> > Instead of detecting the 'health' port in the task configuration, we
>> >> make
>> >> > enabling this feature explicitly by enabling a bit in the task
>> >> > configuration with the `executorDrivenUpdates` bit.
>> >> >
>> >> > I understand this option makes this feature more complex because it
>> >> > requires a schema change and requires operators to deploy the executor
>> >> to
>> >> > all agents before upgrading the client. However, I think that's a one
>> >> time
>> >> > operational cost as a opposed to long lived design choices that will
>> >> affect
>> >> > the code.
>> >> >
>> >> > Further Solution 3 is the most amenable to custom executors and
>> >> continues
>> >> > our tradition of treating executors as opaque black boxes. I think
>> >> there is
>> >> > a lot of value in treating executors as black boxes as it leaves the
>> >> door
>> >> > open to switching our executor to something else and doesn't impose a
>> >> > burden to others that want to write their own.
>> >> >
>> >> > Alternatively, if amending the schema is too much work, I would be
>> open
>> >> to
>> >> > just saying that scheduler version 0.16 (or 0.17) just assumes the
>> >> executor
>> >> > transitions to RUNNING once a task is healthy and dropping
>> `watch_secs`
>> >> > entirely. We can put it in the release notes that operators must
>> deploy
>> >> the
>> >> > executor to 100% before deploying the scheduler.
>> >> >
>> >> >
>> >> > On Thu, Sep 1, 2016 at 6:40 PM, 黄 凯 <te...@hotmail.com> wrote:
>> >> >
>> >> >> Hi Folks,
>> >> >>
>> >> >> I'm currently working on a feature on aurora scheduler and executor.
>> >> The
>> >> >> implementation strategy became controversial on the review board, so
>> I
>> >> was
>> >> >> wondering if I should broadcast it to more audience and initiate a
>> >> >> discussion. Please feel free to let me know your thoughts, your help
>> is
>> >> >> greatly appreciated!
>> >> >>
>> >> >> The high level goal of this feature is to improve reliability and
>> >> >> performance of the Aurora scheduler job updater, by relying on health
>> >> check
>> >> >> status rather than watch_secs timeout when deciding an individual
>> >> instance
>> >> >> update state.
>> >> >>
>> >> >> Please see the original review request *
>> https://reviews.apache.org/r/
>> >> 51536/
>> >> >> <https://reviews.apache.org/r/51536/> *
>> >> >> aurora JIRA ticket *https://issues.apache.org/jira/browse/AURORA-894
>> >> >> <https://issues.apache.org/jira/browse/AURORA-894>*
>> >> >> design doc *https://docs.google.com/docum
>> >> ent/d/1ZdgW8S4xMhvKW7iQUX99xZm10NXSxEWR0a-21FP5d94/edit#
>> >> >> <https://docs.google.com/document/d/1ZdgW8S4xMhvKW7iQUX99xZm
>> >> 10NXSxEWR0a-21FP5d94/edit#>*
>> >> >> for more details and background.
>> >> >>
>> >> >> Note: The design doc becomes a little bit outdated on the "scheduler
>> >> >> change summary" part (this is what the review request trying to
>> >> address).
>> >> >> As a result, I've left some comment to clarify the latest proposed
>> >> >> implementation plan for scheduler change.
>> >> >>
>> >> >> There are two questions I'm trying to address here:
>> >> >> *1. How does the scheduler infer the executor version and be backward
>> >> >> compatible?*
>> >> >> *2. Where do we determine if health check is enabled?*
>> >> >>
>> >> >> In short, there are 3 different solutions proposed on the review
>> board.
>> >> >>
>> >> >> In the first two approaches, the scheduler will rely on a string to
>> >> >> determine the executor version. We determine whether health check is
>> >> >> enabled merely on executor side. There will be communication between
>> >> the
>> >> >> executor and the scheduler.
>> >> >> *Solution 1: *
>> >> >> *vCurrent executor sends a message in its health check thread during
>> >> >> RUNNING state transition, and the vCurrent updater will infer the
>> >> executor
>> >> >> version from the presence of this message, and skip the watch_secs if
>> >> >> necessary.*
>> >> >>
>> >> >> *Solution 2:*
>> >> >> *Instead of relying on the presence of an arbitrary string in the
>> >> message,
>> >> >> rely on the presence of a string like:
>> >> >> "capabilities:CAPABILITY_1,CAPABILITY-2" where CAPABILITY_1 and
>> >> >> CAPABILITY_2 (etc.) are constants defined in api.thrift. Basically
>> just
>> >> >> formalizing the mechanism and making it a bit more future proof.*
>> >> >>
>> >> >> In the third solution, the scheduler infers the executor version from
>> >> the
>> >> >> JobUpdateSettings on scheduler side.
>> >> >> *Solution 3:*
>> >> >> *Adding a bit to JobUpdateSettings which is ‘executorDrivenUpdates',
>> if
>> >> >> that is set, the scheduler assumes that the transition from STARTING
>> ->
>> >> >> RUNNING makes the executor healthy and concurrently, we release
>> >> thermos and
>> >> >> change HealthCheckConfig to say that it should only go to running
>> after
>> >> >> healthy*.
>> >> >>
>> >> >> *Pros and Cons:*
>> >> >> The main benefit of Solution 1 is:
>> >> >> 1. By using the message in task status update, we don't have to make
>> >> any
>> >> >> schema change, which makes the design simple.
>> >> >> 2. The feature is fully backward-compatible. When we roll out the
>> >> vCurrent
>> >> >> schedulers and executors, we do not have to instruct the users to
>> >> provide
>> >> >> additional field in the Job or Update configs, which could confuses
>> >> >> customers when the vPrev and vCurrent executor coexist in the
>> cluster.
>> >> >>
>> >> >> Concerns:
>> >> >> Relying on the presence of a message makes things brittle. Also we do
>> >> not
>> >> >> want to expose this message to users.
>> >> >>
>> >> >> The benefit of Solution 2 is making the feature more future proof.
>> >> >> However, if we do not envision a new executor feature in the short
>> >> term,
>> >> >> it's not too much different from Solution 1.
>> >> >>
>> >> >> The benefits of Solution 3 include:
>> >> >> 1. We support more than just thermos now (and others rely on custom
>> >> >> executors).
>> >> >> 2. A lot of things in Aurora treat the executor as opaque. The status
>> >> >> update message sent by executor should not be visible to users only
>> if
>> >> it's
>> >> >> an error message.
>> >> >>
>> >> >> Concerns:
>> >> >> 1. In addition to the ‘executorDrivenUpdates' bit that identifies the
>> >> >> executor version, we still need to notify the scheduler if health
>> >> check is
>> >> >> enabled on vCurrent executor, if not, the scheduler must be able to
>> >> fall
>> >> >> back to use watch_secs.
>> >> >> 2. The users have to provide an additional field in their .aurora
>> >> config
>> >> >> files. The feature wouldn't be available unless new clients are
>> rolled
>> >> out
>> >> >> as well.
>> >> >>
>> >> >> Please let me know if I understand your suggestions correctly and
>> >> >> hopefully everyone is on the same page!
>> >> >>
>> >> >> Thanks,
>> >> >>
>> >> >> Kai
>> >> >>
>> >>
>> >
>> >
>> >
>> > --
>> > Zameer Manji
>> >
>>
>>
>>
>> --
>> Zameer Manji
>>

Re: 答复: Discussion on review request 51536

Posted by Zameer Manji <zm...@apache.org>.

Resending my emails from a domain that has mailing list friendly DMARC
settings.

It seems that we have achieved some consensus on the design, others feel
free to weigh in.

On Fri, Sep 2, 2016 at 5:29 PM, Zameer Manji <zm...@uber.com> wrote:

> Kai,
>
> We have had coupled deploys before, I don't think it's too terrible. It's
> something to note in the release notes and some operational pain for large
> users.
>
> On Fri, Sep 2, 2016 at 4:42 PM, 黄 凯 <te...@hotmail.com> wrote:
>
> > Another concern is that once we rolled out the new executor, we should
> > rolled out a new client in order to use the health-check feature. Hence
> the
> > executor and client rolling out process seem to be coupled.
> >
> >
> >
> >
> > ------------------------------
> > *发件人:* 黄 凯 <te...@hotmail.com>
> > *发送时间:* 2016年9月3日 7:23
> > *收件人:* Zameer Manji; dev@aurora.apache.org
> > *抄送:* Joshua Cohen; serb@apache.org; caldima@gmail.com;
> > rdelval1@binghamton.edu
> > *主题:* 答复: Discussion on review request 51536
> >
> >
> > Thanks for the new proposal, Zameer. It sounds good to me. The benefit is
> > that it does not alter the current infrastructure too much.
> >
> >
> > However, there is one thing to keep in mind:
> >
> > we currently do a check to ensure watch_sec is longer than
> > initial_interval_secs. We will have to remove the alert message if we
> > choose to skip watch_sec by setting it as zero.
> >
> >
> > So the new configuration will not support executor-driven health check
> > unless the executors are rolled out 100%.
> >
> >
> > Does this tradeoff seems OK for us, Maxim?
> >
> >
> > Kai
> >
> >
> > ------------------------------
> > *发件人:* Zameer Manji <zm...@uber.com>
> > *发送时间:* 2016年9月3日 6:53
> > *收件人:* dev@aurora.apache.org
> > *抄送:* 黄 凯; Joshua Cohen; serb@apache.org; caldima@gmail.com;
> > rdelval1@binghamton.edu
> > *主题:* Re: Discussion on review request 51536
> >
> >
> >
> > On Fri, Sep 2, 2016 at 3:24 PM, Maxim Khutornenko <ma...@apache.org>
> > wrote:
> >
> >> Need to correct a few previous statements:
> >>
> >> > Also we do not want to expose this message to users.
> >> This is incorrect. The original design proposal suggested to show this
> >> message in the UI as: "Task is healthy"
> >>
> >
> > Does this mean the message in the status update is going to be exactly,
> > "Task is healthy" and the scheduler is going to check for this string in
> > the `TASK_RUNNING` status update? This means we are going to establish a
> > communication
> > mechanism between the executor and scheduler that's not defined by a
> > schema. I feel that's worse than putting JSON in there and having the
> > scheduler parse it.
> >
> >
> >> > The Mesos API isn't designed for packing arbitrary data
> >> > in the status update message.
> >> Don't think I agree, this is exactly what this field is for [1] and we
> >> already use it for other states [2].
> >>
> >
> > I guess I should have said 'structured arbitrary data'. The
> informational,
> > messages are fine and we plumb them blindly into our logging and UI. I'm
> > not convinced we should start putting JSON or something more structured
> in
> > there. That's yet another schema we have and yet another versioning story
> > we have to go though. This also complicates matters for custom executor
> > authors.
> >
> >
> >>
> >> > I would be open to just saying that scheduler version
> >> > 0.16 (or 0.17) just assumes the executor transitions to
> >> > RUNNING once a task is healthy and dropping
> >> > `watch_secs`entirely.
> >> We can't drop 'watch_secs' entirely as we still have to babysit job
> >> updates that don't have health checks enabled.
> >>
> >
> > Understood. I guess we can keep it but I'm now frustrated that we have a
> > parameter that is ignored if we set some json in ExecutorConfig.data.
> > Ideally, we don't accept `watch_secs` if we want health check driven
> > updates. As mentioned before I don't like this implicit tightening of the
> > executor and the scheduler.
> >
> >
> >>
> >> As for my take on the above, I favor #1 as the simplest answer to an
> >> already simple question: "Should we use watch_secs for this instance
> >> or not?". That's pretty much it. Scheduler does not need any schema
> >> changes, know what health checks are or if a job has them enabled. At
> >> least not until we attempt to move to centralized health checks
> >> (AURORA-279) but that will be an entirely different design discussion.
> >>
> >> [1] - https://github.com/apache/mesos/blob/master/include/mesos/
> >> mesos.proto#L1605.
> >> [2] - https://github.com/apache/aurora/blob/5cad046fc0f0c4bb79a456
> >> 3cfcff0442b7bf8383/src/main/python/apache/aurora/executor/
> >> aurora_executor.py#L97
> >
> >
> >
> > With all of this in mind, I have another proposal. Why can't we have the
> > executor changes (wait until the task is healthy for RUNNING) *and* read
> > `watch_secs` if it is set? Why not have both of these features and if
> users
> > want purely health checking driven updates they can set this value to 0
> and
> > enable health checks. If they want to have both health checking and time
> > driven updates they can set this to value to the time that they care
> about.
> > If they just want time driven updates they can disable health checking
> and
> > set this value.
> >
> > Then there is no coupling between the executor and the scheduler except
> > for status updates and there is no dependency on the `message` field of
> the
> > status update.
> >
> > We could even treat `watch_secs` as minimum time in STARTING + RUNNING
> > instead of RUNNING with this change and it becomes the lower bound in the
> > update transition speed. This can ensure that users don't deploy "too
> fast"
> > and end up overwhelming other services if they are deployed too quickly.
> >
> >
> >
> >>
> >>
> >> On Fri, Sep 2, 2016 at 2:26 PM, Zameer Manji <zm...@apache.org> wrote:
> >> > *cc: Renan*
> >> >
> >> > I think there is some disagreement/discussion on the review because we
> >> have
> >> > not achieved consensus on the design. Since the design doc was
> written,
> >> > Aurora adopted multiple executor support as well non HTTP based
> >> > healthchecking. This invalidates some parts of the original design. I
> >> think
> >> > all of the solutions here are possible amendments to the design doc.
> >> >
> >> > I am not in favor of Solution 2 at all because status updates between
> >> > executor <-> agent <-> master <-> scheduler are designed to update the
> >> > framework of updates to the task and not really designed to send
> >> arbitrary
> >> > information. Just because the Mesos API provides us with a string
> field
> >> > doesn't mean we should try to pack in arbitrary data. Also, it isn't
> >> clear
> >> > what other capabilities we might add in the future so I'm unconvinced
> >> that
> >> > capabilities needs to exist at all. My fear is that we will create the
> >> > infrastructure for capabilities just to serve this need and nothing
> >> else.
> >> >
> >> > I object to Solution 1 along the same lines. The Mesos API isn't
> >> designed
> >> > for packing arbitrary data in the status update message and I don't
> >> think
> >> > we should abuse that and rely on that. Also our current infrastructure
> >> just
> >> > plumbs the message to the UI and I think displaying capabilities is
> not
> >> > something we should do.
> >> >
> >> > I am in favor of Solution 3 which is as close as possible to the
> >> original
> >> > design in the design doc. The design doc says the following:
> >> >
> >> > Scheduler updater will skip the minWaitInInstanceMs (aka watch_secs
> >> >> <https://github.com/apache/aurora/blob/4b43305b33cd8bebdd802
> >> 25a3987b7cc7a8389a2/docs/configuration-reference.md#
> updateconfig-objects
> >> >)
> >> >> grace period any time it detects a named port ‘health’ in task
> >> >> configuration. A RUNNING instance status will signify the end of
> >> instance
> >> >> update.
> >> >
> >> >
> >> > Instead of detecting the 'health' port in the task configuration, we
> >> make
> >> > enabling this feature explicitly by enabling a bit in the task
> >> > configuration with the `executorDrivenUpdates` bit.
> >> >
> >> > I understand this option makes this feature more complex because it
> >> > requires a schema change and requires operators to deploy the executor
> >> to
> >> > all agents before upgrading the client. However, I think that's a one
> >> time
> >> > operational cost as a opposed to long lived design choices that will
> >> affect
> >> > the code.
> >> >
> >> > Further Solution 3 is the most amenable to custom executors and
> >> continues
> >> > our tradition of treating executors as opaque black boxes. I think
> >> there is
> >> > a lot of value in treating executors as black boxes as it leaves the
> >> door
> >> > open to switching our executor to something else and doesn't impose a
> >> > burden to others that want to write their own.
> >> >
> >> > Alternatively, if amending the schema is too much work, I would be
> open
> >> to
> >> > just saying that scheduler version 0.16 (or 0.17) just assumes the
> >> executor
> >> > transitions to RUNNING once a task is healthy and dropping
> `watch_secs`
> >> > entirely. We can put it in the release notes that operators must
> deploy
> >> the
> >> > executor to 100% before deploying the scheduler.
> >> >
> >> >
> >> > On Thu, Sep 1, 2016 at 6:40 PM, 黄 凯 <te...@hotmail.com> wrote:
> >> >
> >> >> Hi Folks,
> >> >>
> >> >> I'm currently working on a feature on aurora scheduler and executor.
> >> The
> >> >> implementation strategy became controversial on the review board, so
> I
> >> was
> >> >> wondering if I should broadcast it to more audience and initiate a
> >> >> discussion. Please feel free to let me know your thoughts, your help
> is
> >> >> greatly appreciated!
> >> >>
> >> >> The high level goal of this feature is to improve reliability and
> >> >> performance of the Aurora scheduler job updater, by relying on health
> >> check
> >> >> status rather than watch_secs timeout when deciding an individual
> >> instance
> >> >> update state.
> >> >>
> >> >> Please see the original review request *
> https://reviews.apache.org/r/
> >> 51536/
> >> >> <https://reviews.apache.org/r/51536/> *
> >> >> aurora JIRA ticket *https://issues.apache.org/jira/browse/AURORA-894
> >> >> <https://issues.apache.org/jira/browse/AURORA-894>*
> >> >> design doc *https://docs.google.com/docum
> >> ent/d/1ZdgW8S4xMhvKW7iQUX99xZm10NXSxEWR0a-21FP5d94/edit#
> >> >> <https://docs.google.com/document/d/1ZdgW8S4xMhvKW7iQUX99xZm
> >> 10NXSxEWR0a-21FP5d94/edit#>*
> >> >> for more details and background.
> >> >>
> >> >> Note: The design doc becomes a little bit outdated on the "scheduler
> >> >> change summary" part (this is what the review request trying to
> >> address).
> >> >> As a result, I've left some comment to clarify the latest proposed
> >> >> implementation plan for scheduler change.
> >> >>
> >> >> There are two questions I'm trying to address here:
> >> >> *1. How does the scheduler infer the executor version and be backward
> >> >> compatible?*
> >> >> *2. Where do we determine if health check is enabled?*
> >> >>
> >> >> In short, there are 3 different solutions proposed on the review
> board.
> >> >>
> >> >> In the first two approaches, the scheduler will rely on a string to
> >> >> determine the executor version. We determine whether health check is
> >> >> enabled merely on executor side. There will be communication between
> >> the
> >> >> executor and the scheduler.
> >> >> *Solution 1: *
> >> >> *vCurrent executor sends a message in its health check thread during
> >> >> RUNNING state transition, and the vCurrent updater will infer the
> >> executor
> >> >> version from the presence of this message, and skip the watch_secs if
> >> >> necessary.*
> >> >>
> >> >> *Solution 2:*
> >> >> *Instead of relying on the presence of an arbitrary string in the
> >> message,
> >> >> rely on the presence of a string like:
> >> >> "capabilities:CAPABILITY_1,CAPABILITY-2" where CAPABILITY_1 and
> >> >> CAPABILITY_2 (etc.) are constants defined in api.thrift. Basically
> just
> >> >> formalizing the mechanism and making it a bit more future proof.*
> >> >>
> >> >> In the third solution, the scheduler infers the executor version from
> >> the
> >> >> JobUpdateSettings on scheduler side.
> >> >> *Solution 3:*
> >> >> *Adding a bit to JobUpdateSettings which is ‘executorDrivenUpdates',
> if
> >> >> that is set, the scheduler assumes that the transition from STARTING
> ->
> >> >> RUNNING makes the executor healthy and concurrently, we release
> >> thermos and
> >> >> change HealthCheckConfig to say that it should only go to running
> after
> >> >> healthy*.
> >> >>
> >> >> *Pros and Cons:*
> >> >> The main benefit of Solution 1 is:
> >> >> 1. By using the message in task status update, we don't have to make
> >> any
> >> >> schema change, which makes the design simple.
> >> >> 2. The feature is fully backward-compatible. When we roll out the
> >> vCurrent
> >> >> schedulers and executors, we do not have to instruct the users to
> >> provide
> >> >> additional field in the Job or Update configs, which could confuses
> >> >> customers when the vPrev and vCurrent executor coexist in the
> cluster.
> >> >>
> >> >> Concerns:
> >> >> Relying on the presence of a message makes things brittle. Also we do
> >> not
> >> >> want to expose this message to users.
> >> >>
> >> >> The benefit of Solution 2 is making the feature more future proof.
> >> >> However, if we do not envision a new executor feature in the short
> >> term,
> >> >> it's not too much different from Solution 1.
> >> >>
> >> >> The benefits of Solution 3 include:
> >> >> 1. We support more than just thermos now (and others rely on custom
> >> >> executors).
> >> >> 2. A lot of things in Aurora treat the executor as opaque. The status
> >> >> update message sent by executor should not be visible to users only
> if
> >> it's
> >> >> an error message.
> >> >>
> >> >> Concerns:
> >> >> 1. In addition to the ‘executorDrivenUpdates' bit that identifies the
> >> >> executor version, we still need to notify the scheduler if health
> >> check is
> >> >> enabled on vCurrent executor, if not, the scheduler must be able to
> >> fall
> >> >> back to use watch_secs.
> >> >> 2. The users have to provide an additional field in their .aurora
> >> config
> >> >> files. The feature wouldn't be available unless new clients are
> rolled
> >> out
> >> >> as well.
> >> >>
> >> >> Please let me know if I understand your suggestions correctly and
> >> >> hopefully everyone is on the same page!
> >> >>
> >> >> Thanks,
> >> >>
> >> >> Kai
> >> >>
> >>
> >
> >
> >
> > --
> > Zameer Manji
> >
>
>
>
> --
> Zameer Manji
>

Re: 答复: Discussion on review request 51536

Posted by Zameer Manji <zm...@uber.com>.

Kai,

We have had coupled deploys before, I don't think it's too terrible. It's
something to note in the release notes and some operational pain for large
users.

On Fri, Sep 2, 2016 at 4:42 PM, 黄 凯 <te...@hotmail.com> wrote:

> Another concern is that once we rolled out the new executor, we should
> rolled out a new client in order to use the health-check feature. Hence the
> executor and client rolling out process seem to be coupled.
>
>
>
>
> ------------------------------
> *发件人:* 黄 凯 <te...@hotmail.com>
> *发送时间:* 2016年9月3日 7:23
> *收件人:* Zameer Manji; dev@aurora.apache.org
> *抄送:* Joshua Cohen; serb@apache.org; caldima@gmail.com;
> rdelval1@binghamton.edu
> *主题:* 答复: Discussion on review request 51536
>
>
> Thanks for the new proposal, Zameer. It sounds good to me. The benefit is
> that it does not alter the current infrastructure too much.
>
>
> However, there is one thing to keep in mind:
>
> we currently do a check to ensure watch_sec is longer than
> initial_interval_secs. We will have to remove the alert message if we
> choose to skip watch_sec by setting it as zero.
>
>
> So the new configuration will not support executor-driven health check
> unless the executors are rolled out 100%.
>
>
> Does this tradeoff seems OK for us, Maxim?
>
>
> Kai
>
>
> ------------------------------
> *发件人:* Zameer Manji <zm...@uber.com>
> *发送时间:* 2016年9月3日 6:53
> *收件人:* dev@aurora.apache.org
> *抄送:* 黄 凯; Joshua Cohen; serb@apache.org; caldima@gmail.com;
> rdelval1@binghamton.edu
> *主题:* Re: Discussion on review request 51536
>
>
>
> On Fri, Sep 2, 2016 at 3:24 PM, Maxim Khutornenko <ma...@apache.org>
> wrote:
>
>> Need to correct a few previous statements:
>>
>> > Also we do not want to expose this message to users.
>> This is incorrect. The original design proposal suggested to show this
>> message in the UI as: "Task is healthy"
>>
>
> Does this mean the message in the status update is going to be exactly,
> "Task is healthy" and the scheduler is going to check for this string in
> the `TASK_RUNNING` status update? This means we are going to establish a
> communication
> mechanism between the executor and scheduler that's not defined by a
> schema. I feel that's worse than putting JSON in there and having the
> scheduler parse it.
>
>
>> > The Mesos API isn't designed for packing arbitrary data
>> > in the status update message.
>> Don't think I agree, this is exactly what this field is for [1] and we
>> already use it for other states [2].
>>
>
> I guess I should have said 'structured arbitrary data'. The informational,
> messages are fine and we plumb them blindly into our logging and UI. I'm
> not convinced we should start putting JSON or something more structured in
> there. That's yet another schema we have and yet another versioning story
> we have to go though. This also complicates matters for custom executor
> authors.
>
>
>>
>> > I would be open to just saying that scheduler version
>> > 0.16 (or 0.17) just assumes the executor transitions to
>> > RUNNING once a task is healthy and dropping
>> > `watch_secs`entirely.
>> We can't drop 'watch_secs' entirely as we still have to babysit job
>> updates that don't have health checks enabled.
>>
>
> Understood. I guess we can keep it but I'm now frustrated that we have a
> parameter that is ignored if we set some json in ExecutorConfig.data.
> Ideally, we don't accept `watch_secs` if we want health check driven
> updates. As mentioned before I don't like this implicit tightening of the
> executor and the scheduler.
>
>
>>
>> As for my take on the above, I favor #1 as the simplest answer to an
>> already simple question: "Should we use watch_secs for this instance
>> or not?". That's pretty much it. Scheduler does not need any schema
>> changes, know what health checks are or if a job has them enabled. At
>> least not until we attempt to move to centralized health checks
>> (AURORA-279) but that will be an entirely different design discussion.
>>
>> [1] - https://github.com/apache/mesos/blob/master/include/mesos/
>> mesos.proto#L1605.
>> [2] - https://github.com/apache/aurora/blob/5cad046fc0f0c4bb79a456
>> 3cfcff0442b7bf8383/src/main/python/apache/aurora/executor/
>> aurora_executor.py#L97
>
>
>
> With all of this in mind, I have another proposal. Why can't we have the
> executor changes (wait until the task is healthy for RUNNING) *and* read
> `watch_secs` if it is set? Why not have both of these features and if users
> want purely health checking driven updates they can set this value to 0 and
> enable health checks. If they want to have both health checking and time
> driven updates they can set this to value to the time that they care about.
> If they just want time driven updates they can disable health checking and
> set this value.
>
> Then there is no coupling between the executor and the scheduler except
> for status updates and there is no dependency on the `message` field of the
> status update.
>
> We could even treat `watch_secs` as minimum time in STARTING + RUNNING
> instead of RUNNING with this change and it becomes the lower bound in the
> update transition speed. This can ensure that users don't deploy "too fast"
> and end up overwhelming other services if they are deployed too quickly.
>
>
>
>>
>>
>> On Fri, Sep 2, 2016 at 2:26 PM, Zameer Manji <zm...@apache.org> wrote:
>> > *cc: Renan*
>> >
>> > I think there is some disagreement/discussion on the review because we
>> have
>> > not achieved consensus on the design. Since the design doc was written,
>> > Aurora adopted multiple executor support as well non HTTP based
>> > healthchecking. This invalidates some parts of the original design. I
>> think
>> > all of the solutions here are possible amendments to the design doc.
>> >
>> > I am not in favor of Solution 2 at all because status updates between
>> > executor <-> agent <-> master <-> scheduler are designed to update the
>> > framework of updates to the task and not really designed to send
>> arbitrary
>> > information. Just because the Mesos API provides us with a string field
>> > doesn't mean we should try to pack in arbitrary data. Also, it isn't
>> clear
>> > what other capabilities we might add in the future so I'm unconvinced
>> that
>> > capabilities needs to exist at all. My fear is that we will create the
>> > infrastructure for capabilities just to serve this need and nothing
>> else.
>> >
>> > I object to Solution 1 along the same lines. The Mesos API isn't
>> designed
>> > for packing arbitrary data in the status update message and I don't
>> think
>> > we should abuse that and rely on that. Also our current infrastructure
>> just
>> > plumbs the message to the UI and I think displaying capabilities is not
>> > something we should do.
>> >
>> > I am in favor of Solution 3 which is as close as possible to the
>> original
>> > design in the design doc. The design doc says the following:
>> >
>> > Scheduler updater will skip the minWaitInInstanceMs (aka watch_secs
>> >> <https://github.com/apache/aurora/blob/4b43305b33cd8bebdd802
>> 25a3987b7cc7a8389a2/docs/configuration-reference.md#updateconfig-objects
>> >)
>> >> grace period any time it detects a named port ‘health’ in task
>> >> configuration. A RUNNING instance status will signify the end of
>> instance
>> >> update.
>> >
>> >
>> > Instead of detecting the 'health' port in the task configuration, we
>> make
>> > enabling this feature explicitly by enabling a bit in the task
>> > configuration with the `executorDrivenUpdates` bit.
>> >
>> > I understand this option makes this feature more complex because it
>> > requires a schema change and requires operators to deploy the executor
>> to
>> > all agents before upgrading the client. However, I think that's a one
>> time
>> > operational cost as a opposed to long lived design choices that will
>> affect
>> > the code.
>> >
>> > Further Solution 3 is the most amenable to custom executors and
>> continues
>> > our tradition of treating executors as opaque black boxes. I think
>> there is
>> > a lot of value in treating executors as black boxes as it leaves the
>> door
>> > open to switching our executor to something else and doesn't impose a
>> > burden to others that want to write their own.
>> >
>> > Alternatively, if amending the schema is too much work, I would be open
>> to
>> > just saying that scheduler version 0.16 (or 0.17) just assumes the
>> executor
>> > transitions to RUNNING once a task is healthy and dropping `watch_secs`
>> > entirely. We can put it in the release notes that operators must deploy
>> the
>> > executor to 100% before deploying the scheduler.
>> >
>> >
>> > On Thu, Sep 1, 2016 at 6:40 PM, 黄 凯 <te...@hotmail.com> wrote:
>> >
>> >> Hi Folks,
>> >>
>> >> I'm currently working on a feature on aurora scheduler and executor.
>> The
>> >> implementation strategy became controversial on the review board, so I
>> was
>> >> wondering if I should broadcast it to more audience and initiate a
>> >> discussion. Please feel free to let me know your thoughts, your help is
>> >> greatly appreciated!
>> >>
>> >> The high level goal of this feature is to improve reliability and
>> >> performance of the Aurora scheduler job updater, by relying on health
>> check
>> >> status rather than watch_secs timeout when deciding an individual
>> instance
>> >> update state.
>> >>
>> >> Please see the original review request *https://reviews.apache.org/r/
>> 51536/
>> >> <https://reviews.apache.org/r/51536/> *
>> >> aurora JIRA ticket *https://issues.apache.org/jira/browse/AURORA-894
>> >> <https://issues.apache.org/jira/browse/AURORA-894>*
>> >> design doc *https://docs.google.com/docum
>> ent/d/1ZdgW8S4xMhvKW7iQUX99xZm10NXSxEWR0a-21FP5d94/edit#
>> >> <https://docs.google.com/document/d/1ZdgW8S4xMhvKW7iQUX99xZm
>> 10NXSxEWR0a-21FP5d94/edit#>*
>> >> for more details and background.
>> >>
>> >> Note: The design doc becomes a little bit outdated on the "scheduler
>> >> change summary" part (this is what the review request trying to
>> address).
>> >> As a result, I've left some comment to clarify the latest proposed
>> >> implementation plan for scheduler change.
>> >>
>> >> There are two questions I'm trying to address here:
>> >> *1. How does the scheduler infer the executor version and be backward
>> >> compatible?*
>> >> *2. Where do we determine if health check is enabled?*
>> >>
>> >> In short, there are 3 different solutions proposed on the review board.
>> >>
>> >> In the first two approaches, the scheduler will rely on a string to
>> >> determine the executor version. We determine whether health check is
>> >> enabled merely on executor side. There will be communication between
>> the
>> >> executor and the scheduler.
>> >> *Solution 1: *
>> >> *vCurrent executor sends a message in its health check thread during
>> >> RUNNING state transition, and the vCurrent updater will infer the
>> executor
>> >> version from the presence of this message, and skip the watch_secs if
>> >> necessary.*
>> >>
>> >> *Solution 2:*
>> >> *Instead of relying on the presence of an arbitrary string in the
>> message,
>> >> rely on the presence of a string like:
>> >> "capabilities:CAPABILITY_1,CAPABILITY-2" where CAPABILITY_1 and
>> >> CAPABILITY_2 (etc.) are constants defined in api.thrift. Basically just
>> >> formalizing the mechanism and making it a bit more future proof.*
>> >>
>> >> In the third solution, the scheduler infers the executor version from
>> the
>> >> JobUpdateSettings on scheduler side.
>> >> *Solution 3:*
>> >> *Adding a bit to JobUpdateSettings which is ‘executorDrivenUpdates', if
>> >> that is set, the scheduler assumes that the transition from STARTING ->
>> >> RUNNING makes the executor healthy and concurrently, we release
>> thermos and
>> >> change HealthCheckConfig to say that it should only go to running after
>> >> healthy*.
>> >>
>> >> *Pros and Cons:*
>> >> The main benefit of Solution 1 is:
>> >> 1. By using the message in task status update, we don't have to make
>> any
>> >> schema change, which makes the design simple.
>> >> 2. The feature is fully backward-compatible. When we roll out the
>> vCurrent
>> >> schedulers and executors, we do not have to instruct the users to
>> provide
>> >> additional field in the Job or Update configs, which could confuses
>> >> customers when the vPrev and vCurrent executor coexist in the cluster.
>> >>
>> >> Concerns:
>> >> Relying on the presence of a message makes things brittle. Also we do
>> not
>> >> want to expose this message to users.
>> >>
>> >> The benefit of Solution 2 is making the feature more future proof.
>> >> However, if we do not envision a new executor feature in the short
>> term,
>> >> it's not too much different from Solution 1.
>> >>
>> >> The benefits of Solution 3 include:
>> >> 1. We support more than just thermos now (and others rely on custom
>> >> executors).
>> >> 2. A lot of things in Aurora treat the executor as opaque. The status
>> >> update message sent by executor should not be visible to users only if
>> it's
>> >> an error message.
>> >>
>> >> Concerns:
>> >> 1. In addition to the ‘executorDrivenUpdates' bit that identifies the
>> >> executor version, we still need to notify the scheduler if health
>> check is
>> >> enabled on vCurrent executor, if not, the scheduler must be able to
>> fall
>> >> back to use watch_secs.
>> >> 2. The users have to provide an additional field in their .aurora
>> config
>> >> files. The feature wouldn't be available unless new clients are rolled
>> out
>> >> as well.
>> >>
>> >> Please let me know if I understand your suggestions correctly and
>> >> hopefully everyone is on the same page!
>> >>
>> >> Thanks,
>> >>
>> >> Kai
>> >>
>>
>
>
>
> --
> Zameer Manji
>



-- 
Zameer Manji

答复: Discussion on review request 51536

Posted by 黄凯 <te...@hotmail.com>.

Another concern is that once we rolled out the new executor, we should rolled out a new client in order to use the health-check feature. Hence the executor and client rolling out process seem to be coupled.



________________________________
发件人: 黄 凯 <te...@hotmail.com>
发送时间: 2016年9月3日 7:23
收件人: Zameer Manji; dev@aurora.apache.org
抄送: Joshua Cohen; serb@apache.org; caldima@gmail.com; rdelval1@binghamton.edu
主题: 答复: Discussion on review request 51536


Thanks for the new proposal, Zameer. It sounds good to me. The benefit is that it does not alter the current infrastructure too much.


However, there is one thing to keep in mind:

we currently do a check to ensure watch_sec is longer than initial_interval_secs. We will have to remove the alert message if we choose to skip watch_sec by setting it as zero.


So the new configuration will not support executor-driven health check unless the executors are rolled out 100%.


Does this tradeoff seems OK for us, Maxim?


Kai


________________________________
发件人: Zameer Manji <zm...@uber.com>
发送时间: 2016年9月3日 6:53
收件人: dev@aurora.apache.org
抄送: 黄 凯; Joshua Cohen; serb@apache.org; caldima@gmail.com; rdelval1@binghamton.edu
主题: Re: Discussion on review request 51536



On Fri, Sep 2, 2016 at 3:24 PM, Maxim Khutornenko <ma...@apache.org>> wrote:
Need to correct a few previous statements:

> Also we do not want to expose this message to users.
This is incorrect. The original design proposal suggested to show this
message in the UI as: "Task is healthy"

Does this mean the message in the status update is going to be exactly, "Task is healthy" and the scheduler is going to check for this string in the `TASK_RUNNING` status update? This means we are going to establish a communication
mechanism between the executor and scheduler that's not defined by a schema. I feel that's worse than putting JSON in there and having the scheduler parse it.

> The Mesos API isn't designed for packing arbitrary data
> in the status update message.
Don't think I agree, this is exactly what this field is for [1] and we
already use it for other states [2].

I guess I should have said 'structured arbitrary data'. The informational, messages are fine and we plumb them blindly into our logging and UI. I'm not convinced we should start putting JSON or something more structured in there. That's yet another schema we have and yet another versioning story we have to go though. This also complicates matters for custom executor authors.


> I would be open to just saying that scheduler version
> 0.16 (or 0.17) just assumes the executor transitions to
> RUNNING once a task is healthy and dropping
> `watch_secs`entirely.
We can't drop 'watch_secs' entirely as we still have to babysit job
updates that don't have health checks enabled.

Understood. I guess we can keep it but I'm now frustrated that we have a parameter that is ignored if we set some json in ExecutorConfig.data. Ideally, we don't accept `watch_secs` if we want health check driven updates. As mentioned before I don't like this implicit tightening of the executor and the scheduler.


As for my take on the above, I favor #1 as the simplest answer to an
already simple question: "Should we use watch_secs for this instance
or not?". That's pretty much it. Scheduler does not need any schema
changes, know what health checks are or if a job has them enabled. At
least not until we attempt to move to centralized health checks
(AURORA-279) but that will be an entirely different design discussion.

[1] - https://github.com/apache/mesos/blob/master/include/mesos/mesos.proto#L1605.
[2] - https://github.com/apache/aurora/blob/5cad046fc0f0c4bb79a4563cfcff0442b7bf8383/src/main/python/apache/aurora/executor/aurora_executor.py#L97


With all of this in mind, I have another proposal. Why can't we have the executor changes (wait until the task is healthy for RUNNING) *and* read `watch_secs` if it is set? Why not have both of these features and if users want purely health checking driven updates they can set this value to 0 and enable health checks. If they want to have both health checking and time driven updates they can set this to value to the time that they care about. If they just want time driven updates they can disable health checking and set this value.

Then there is no coupling between the executor and the scheduler except for status updates and there is no dependency on the `message` field of the status update.

We could even treat `watch_secs` as minimum time in STARTING + RUNNING instead of RUNNING with this change and it becomes the lower bound in the update transition speed. This can ensure that users don't deploy "too fast" and end up overwhelming other services if they are deployed too quickly.




On Fri, Sep 2, 2016 at 2:26 PM, Zameer Manji <zm...@apache.org>> wrote:
> *cc: Renan*
>
> I think there is some disagreement/discussion on the review because we have
> not achieved consensus on the design. Since the design doc was written,
> Aurora adopted multiple executor support as well non HTTP based
> healthchecking. This invalidates some parts of the original design. I think
> all of the solutions here are possible amendments to the design doc.
>
> I am not in favor of Solution 2 at all because status updates between
> executor <-> agent <-> master <-> scheduler are designed to update the
> framework of updates to the task and not really designed to send arbitrary
> information. Just because the Mesos API provides us with a string field
> doesn't mean we should try to pack in arbitrary data. Also, it isn't clear
> what other capabilities we might add in the future so I'm unconvinced that
> capabilities needs to exist at all. My fear is that we will create the
> infrastructure for capabilities just to serve this need and nothing else.
>
> I object to Solution 1 along the same lines. The Mesos API isn't designed
> for packing arbitrary data in the status update message and I don't think
> we should abuse that and rely on that. Also our current infrastructure just
> plumbs the message to the UI and I think displaying capabilities is not
> something we should do.
>
> I am in favor of Solution 3 which is as close as possible to the original
> design in the design doc. The design doc says the following:
>
> Scheduler updater will skip the minWaitInInstanceMs (aka watch_secs
>> <https://github.com/apache/aurora/blob/4b43305b33cd8bebdd80225a3987b7cc7a8389a2/docs/configuration-reference.md#updateconfig-objects>)
>> grace period any time it detects a named port ‘health’ in task
>> configuration. A RUNNING instance status will signify the end of instance
>> update.
>
>
> Instead of detecting the 'health' port in the task configuration, we make
> enabling this feature explicitly by enabling a bit in the task
> configuration with the `executorDrivenUpdates` bit.
>
> I understand this option makes this feature more complex because it
> requires a schema change and requires operators to deploy the executor to
> all agents before upgrading the client. However, I think that's a one time
> operational cost as a opposed to long lived design choices that will affect
> the code.
>
> Further Solution 3 is the most amenable to custom executors and continues
> our tradition of treating executors as opaque black boxes. I think there is
> a lot of value in treating executors as black boxes as it leaves the door
> open to switching our executor to something else and doesn't impose a
> burden to others that want to write their own.
>
> Alternatively, if amending the schema is too much work, I would be open to
> just saying that scheduler version 0.16 (or 0.17) just assumes the executor
> transitions to RUNNING once a task is healthy and dropping `watch_secs`
> entirely. We can put it in the release notes that operators must deploy the
> executor to 100% before deploying the scheduler.
>
>
> On Thu, Sep 1, 2016 at 6:40 PM, 黄 凯 <te...@hotmail.com>> wrote:
>
>> Hi Folks,
>>
>> I'm currently working on a feature on aurora scheduler and executor. The
>> implementation strategy became controversial on the review board, so I was
>> wondering if I should broadcast it to more audience and initiate a
>> discussion. Please feel free to let me know your thoughts, your help is
>> greatly appreciated!
>>
>> The high level goal of this feature is to improve reliability and
>> performance of the Aurora scheduler job updater, by relying on health check
>> status rather than watch_secs timeout when deciding an individual instance
>> update state.
>>
>> Please see the original review request *https://reviews.apache.org/r/51536/
>> <https://reviews.apache.org/r/51536/> *
>> aurora JIRA ticket *https://issues.apache.org/jira/browse/AURORA-894
>> <https://issues.apache.org/jira/browse/AURORA-894>*
>> design doc *https://docs.google.com/document/d/1ZdgW8S4xMhvKW7iQUX99xZm10NXSxEWR0a-21FP5d94/edit#
>> <https://docs.google.com/document/d/1ZdgW8S4xMhvKW7iQUX99xZm10NXSxEWR0a-21FP5d94/edit#>*
>> for more details and background.
>>
>> Note: The design doc becomes a little bit outdated on the "scheduler
>> change summary" part (this is what the review request trying to address).
>> As a result, I've left some comment to clarify the latest proposed
>> implementation plan for scheduler change.
>>
>> There are two questions I'm trying to address here:
>> *1. How does the scheduler infer the executor version and be backward
>> compatible?*
>> *2. Where do we determine if health check is enabled?*
>>
>> In short, there are 3 different solutions proposed on the review board.
>>
>> In the first two approaches, the scheduler will rely on a string to
>> determine the executor version. We determine whether health check is
>> enabled merely on executor side. There will be communication between the
>> executor and the scheduler.
>> *Solution 1: *
>> *vCurrent executor sends a message in its health check thread during
>> RUNNING state transition, and the vCurrent updater will infer the executor
>> version from the presence of this message, and skip the watch_secs if
>> necessary.*
>>
>> *Solution 2:*
>> *Instead of relying on the presence of an arbitrary string in the message,
>> rely on the presence of a string like:
>> "capabilities:CAPABILITY_1,CAPABILITY-2" where CAPABILITY_1 and
>> CAPABILITY_2 (etc.) are constants defined in api.thrift. Basically just
>> formalizing the mechanism and making it a bit more future proof.*
>>
>> In the third solution, the scheduler infers the executor version from the
>> JobUpdateSettings on scheduler side.
>> *Solution 3:*
>> *Adding a bit to JobUpdateSettings which is ‘executorDrivenUpdates', if
>> that is set, the scheduler assumes that the transition from STARTING ->
>> RUNNING makes the executor healthy and concurrently, we release thermos and
>> change HealthCheckConfig to say that it should only go to running after
>> healthy*.
>>
>> *Pros and Cons:*
>> The main benefit of Solution 1 is:
>> 1. By using the message in task status update, we don't have to make any
>> schema change, which makes the design simple.
>> 2. The feature is fully backward-compatible. When we roll out the vCurrent
>> schedulers and executors, we do not have to instruct the users to provide
>> additional field in the Job or Update configs, which could confuses
>> customers when the vPrev and vCurrent executor coexist in the cluster.
>>
>> Concerns:
>> Relying on the presence of a message makes things brittle. Also we do not
>> want to expose this message to users.
>>
>> The benefit of Solution 2 is making the feature more future proof.
>> However, if we do not envision a new executor feature in the short term,
>> it's not too much different from Solution 1.
>>
>> The benefits of Solution 3 include:
>> 1. We support more than just thermos now (and others rely on custom
>> executors).
>> 2. A lot of things in Aurora treat the executor as opaque. The status
>> update message sent by executor should not be visible to users only if it's
>> an error message.
>>
>> Concerns:
>> 1. In addition to the ‘executorDrivenUpdates' bit that identifies the
>> executor version, we still need to notify the scheduler if health check is
>> enabled on vCurrent executor, if not, the scheduler must be able to fall
>> back to use watch_secs.
>> 2. The users have to provide an additional field in their .aurora config
>> files. The feature wouldn't be available unless new clients are rolled out
>> as well.
>>
>> Please let me know if I understand your suggestions correctly and
>> hopefully everyone is on the same page!
>>
>> Thanks,
>>
>> Kai
>>



--
Zameer Manji

答复: Discussion on review request 51536

Posted by 黄凯 <te...@hotmail.com>.

Thanks for the new proposal, Zameer. It sounds good to me. The benefit is that it does not alter the current infrastructure too much.


However, there is one thing to keep in mind:

we currently do a check to ensure watch_sec is longer than initial_interval_secs. We will have to remove the alert message if we choose to skip watch_sec by setting it as zero.


So the new configuration will not support executor-driven health check unless the executors are rolled out 100%.


Does this tradeoff seems OK for us, Maxim?


Kai


________________________________
发件人: Zameer Manji <zm...@uber.com>
发送时间: 2016年9月3日 6:53
收件人: dev@aurora.apache.org
抄送: 黄 凯; Joshua Cohen; serb@apache.org; caldima@gmail.com; rdelval1@binghamton.edu
主题: Re: Discussion on review request 51536



On Fri, Sep 2, 2016 at 3:24 PM, Maxim Khutornenko <ma...@apache.org>> wrote:
Need to correct a few previous statements:

> Also we do not want to expose this message to users.
This is incorrect. The original design proposal suggested to show this
message in the UI as: "Task is healthy"

Does this mean the message in the status update is going to be exactly, "Task is healthy" and the scheduler is going to check for this string in the `TASK_RUNNING` status update? This means we are going to establish a communication
mechanism between the executor and scheduler that's not defined by a schema. I feel that's worse than putting JSON in there and having the scheduler parse it.

> The Mesos API isn't designed for packing arbitrary data
> in the status update message.
Don't think I agree, this is exactly what this field is for [1] and we
already use it for other states [2].

I guess I should have said 'structured arbitrary data'. The informational, messages are fine and we plumb them blindly into our logging and UI. I'm not convinced we should start putting JSON or something more structured in there. That's yet another schema we have and yet another versioning story we have to go though. This also complicates matters for custom executor authors.


> I would be open to just saying that scheduler version
> 0.16 (or 0.17) just assumes the executor transitions to
> RUNNING once a task is healthy and dropping
> `watch_secs`entirely.
We can't drop 'watch_secs' entirely as we still have to babysit job
updates that don't have health checks enabled.

Understood. I guess we can keep it but I'm now frustrated that we have a parameter that is ignored if we set some json in ExecutorConfig.data. Ideally, we don't accept `watch_secs` if we want health check driven updates. As mentioned before I don't like this implicit tightening of the executor and the scheduler.


As for my take on the above, I favor #1 as the simplest answer to an
already simple question: "Should we use watch_secs for this instance
or not?". That's pretty much it. Scheduler does not need any schema
changes, know what health checks are or if a job has them enabled. At
least not until we attempt to move to centralized health checks
(AURORA-279) but that will be an entirely different design discussion.

[1] - https://github.com/apache/mesos/blob/master/include/mesos/mesos.proto#L1605.
[2] - https://github.com/apache/aurora/blob/5cad046fc0f0c4bb79a4563cfcff0442b7bf8383/src/main/python/apache/aurora/executor/aurora_executor.py#L97


With all of this in mind, I have another proposal. Why can't we have the executor changes (wait until the task is healthy for RUNNING) *and* read `watch_secs` if it is set? Why not have both of these features and if users want purely health checking driven updates they can set this value to 0 and enable health checks. If they want to have both health checking and time driven updates they can set this to value to the time that they care about. If they just want time driven updates they can disable health checking and set this value.

Then there is no coupling between the executor and the scheduler except for status updates and there is no dependency on the `message` field of the status update.

We could even treat `watch_secs` as minimum time in STARTING + RUNNING instead of RUNNING with this change and it becomes the lower bound in the update transition speed. This can ensure that users don't deploy "too fast" and end up overwhelming other services if they are deployed too quickly.




On Fri, Sep 2, 2016 at 2:26 PM, Zameer Manji <zm...@apache.org>> wrote:
> *cc: Renan*
>
> I think there is some disagreement/discussion on the review because we have
> not achieved consensus on the design. Since the design doc was written,
> Aurora adopted multiple executor support as well non HTTP based
> healthchecking. This invalidates some parts of the original design. I think
> all of the solutions here are possible amendments to the design doc.
>
> I am not in favor of Solution 2 at all because status updates between
> executor <-> agent <-> master <-> scheduler are designed to update the
> framework of updates to the task and not really designed to send arbitrary
> information. Just because the Mesos API provides us with a string field
> doesn't mean we should try to pack in arbitrary data. Also, it isn't clear
> what other capabilities we might add in the future so I'm unconvinced that
> capabilities needs to exist at all. My fear is that we will create the
> infrastructure for capabilities just to serve this need and nothing else.
>
> I object to Solution 1 along the same lines. The Mesos API isn't designed
> for packing arbitrary data in the status update message and I don't think
> we should abuse that and rely on that. Also our current infrastructure just
> plumbs the message to the UI and I think displaying capabilities is not
> something we should do.
>
> I am in favor of Solution 3 which is as close as possible to the original
> design in the design doc. The design doc says the following:
>
> Scheduler updater will skip the minWaitInInstanceMs (aka watch_secs
>> <https://github.com/apache/aurora/blob/4b43305b33cd8bebdd80225a3987b7cc7a8389a2/docs/configuration-reference.md#updateconfig-objects>)
>> grace period any time it detects a named port ‘health’ in task
>> configuration. A RUNNING instance status will signify the end of instance
>> update.
>
>
> Instead of detecting the 'health' port in the task configuration, we make
> enabling this feature explicitly by enabling a bit in the task
> configuration with the `executorDrivenUpdates` bit.
>
> I understand this option makes this feature more complex because it
> requires a schema change and requires operators to deploy the executor to
> all agents before upgrading the client. However, I think that's a one time
> operational cost as a opposed to long lived design choices that will affect
> the code.
>
> Further Solution 3 is the most amenable to custom executors and continues
> our tradition of treating executors as opaque black boxes. I think there is
> a lot of value in treating executors as black boxes as it leaves the door
> open to switching our executor to something else and doesn't impose a
> burden to others that want to write their own.
>
> Alternatively, if amending the schema is too much work, I would be open to
> just saying that scheduler version 0.16 (or 0.17) just assumes the executor
> transitions to RUNNING once a task is healthy and dropping `watch_secs`
> entirely. We can put it in the release notes that operators must deploy the
> executor to 100% before deploying the scheduler.
>
>
> On Thu, Sep 1, 2016 at 6:40 PM, 黄 凯 <te...@hotmail.com>> wrote:
>
>> Hi Folks,
>>
>> I'm currently working on a feature on aurora scheduler and executor. The
>> implementation strategy became controversial on the review board, so I was
>> wondering if I should broadcast it to more audience and initiate a
>> discussion. Please feel free to let me know your thoughts, your help is
>> greatly appreciated!
>>
>> The high level goal of this feature is to improve reliability and
>> performance of the Aurora scheduler job updater, by relying on health check
>> status rather than watch_secs timeout when deciding an individual instance
>> update state.
>>
>> Please see the original review request *https://reviews.apache.org/r/51536/
>> <https://reviews.apache.org/r/51536/> *
>> aurora JIRA ticket *https://issues.apache.org/jira/browse/AURORA-894
>> <https://issues.apache.org/jira/browse/AURORA-894>*
>> design doc *https://docs.google.com/document/d/1ZdgW8S4xMhvKW7iQUX99xZm10NXSxEWR0a-21FP5d94/edit#
>> <https://docs.google.com/document/d/1ZdgW8S4xMhvKW7iQUX99xZm10NXSxEWR0a-21FP5d94/edit#>*
>> for more details and background.
>>
>> Note: The design doc becomes a little bit outdated on the "scheduler
>> change summary" part (this is what the review request trying to address).
>> As a result, I've left some comment to clarify the latest proposed
>> implementation plan for scheduler change.
>>
>> There are two questions I'm trying to address here:
>> *1. How does the scheduler infer the executor version and be backward
>> compatible?*
>> *2. Where do we determine if health check is enabled?*
>>
>> In short, there are 3 different solutions proposed on the review board.
>>
>> In the first two approaches, the scheduler will rely on a string to
>> determine the executor version. We determine whether health check is
>> enabled merely on executor side. There will be communication between the
>> executor and the scheduler.
>> *Solution 1: *
>> *vCurrent executor sends a message in its health check thread during
>> RUNNING state transition, and the vCurrent updater will infer the executor
>> version from the presence of this message, and skip the watch_secs if
>> necessary.*
>>
>> *Solution 2:*
>> *Instead of relying on the presence of an arbitrary string in the message,
>> rely on the presence of a string like:
>> "capabilities:CAPABILITY_1,CAPABILITY-2" where CAPABILITY_1 and
>> CAPABILITY_2 (etc.) are constants defined in api.thrift. Basically just
>> formalizing the mechanism and making it a bit more future proof.*
>>
>> In the third solution, the scheduler infers the executor version from the
>> JobUpdateSettings on scheduler side.
>> *Solution 3:*
>> *Adding a bit to JobUpdateSettings which is ‘executorDrivenUpdates', if
>> that is set, the scheduler assumes that the transition from STARTING ->
>> RUNNING makes the executor healthy and concurrently, we release thermos and
>> change HealthCheckConfig to say that it should only go to running after
>> healthy*.
>>
>> *Pros and Cons:*
>> The main benefit of Solution 1 is:
>> 1. By using the message in task status update, we don't have to make any
>> schema change, which makes the design simple.
>> 2. The feature is fully backward-compatible. When we roll out the vCurrent
>> schedulers and executors, we do not have to instruct the users to provide
>> additional field in the Job or Update configs, which could confuses
>> customers when the vPrev and vCurrent executor coexist in the cluster.
>>
>> Concerns:
>> Relying on the presence of a message makes things brittle. Also we do not
>> want to expose this message to users.
>>
>> The benefit of Solution 2 is making the feature more future proof.
>> However, if we do not envision a new executor feature in the short term,
>> it's not too much different from Solution 1.
>>
>> The benefits of Solution 3 include:
>> 1. We support more than just thermos now (and others rely on custom
>> executors).
>> 2. A lot of things in Aurora treat the executor as opaque. The status
>> update message sent by executor should not be visible to users only if it's
>> an error message.
>>
>> Concerns:
>> 1. In addition to the ‘executorDrivenUpdates' bit that identifies the
>> executor version, we still need to notify the scheduler if health check is
>> enabled on vCurrent executor, if not, the scheduler must be able to fall
>> back to use watch_secs.
>> 2. The users have to provide an additional field in their .aurora config
>> files. The feature wouldn't be available unless new clients are rolled out
>> as well.
>>
>> Please let me know if I understand your suggestions correctly and
>> hopefully everyone is on the same page!
>>
>> Thanks,
>>
>> Kai
>>



--
Zameer Manji

Re: Discussion on review request 51536

Posted by Zameer Manji <zm...@uber.com>.

On Fri, Sep 2, 2016 at 3:24 PM, Maxim Khutornenko <ma...@apache.org> wrote:

> Need to correct a few previous statements:
>
> > Also we do not want to expose this message to users.
> This is incorrect. The original design proposal suggested to show this
> message in the UI as: "Task is healthy"
>

Does this mean the message in the status update is going to be exactly,
"Task is healthy" and the scheduler is going to check for this string in
the `TASK_RUNNING` status update? This means we are going to establish a
communication
mechanism between the executor and scheduler that's not defined by a
schema. I feel that's worse than putting JSON in there and having the
scheduler parse it.


> > The Mesos API isn't designed for packing arbitrary data
> > in the status update message.
> Don't think I agree, this is exactly what this field is for [1] and we
> already use it for other states [2].
>

I guess I should have said 'structured arbitrary data'. The informational,
messages are fine and we plumb them blindly into our logging and UI. I'm
not convinced we should start putting JSON or something more structured in
there. That's yet another schema we have and yet another versioning story
we have to go though. This also complicates matters for custom executor
authors.


>
> > I would be open to just saying that scheduler version
> > 0.16 (or 0.17) just assumes the executor transitions to
> > RUNNING once a task is healthy and dropping
> > `watch_secs`entirely.
> We can't drop 'watch_secs' entirely as we still have to babysit job
> updates that don't have health checks enabled.
>

Understood. I guess we can keep it but I'm now frustrated that we have a
parameter that is ignored if we set some json in ExecutorConfig.data.
Ideally, we don't accept `watch_secs` if we want health check driven
updates. As mentioned before I don't like this implicit tightening of the
executor and the scheduler.


>
> As for my take on the above, I favor #1 as the simplest answer to an
> already simple question: "Should we use watch_secs for this instance
> or not?". That's pretty much it. Scheduler does not need any schema
> changes, know what health checks are or if a job has them enabled. At
> least not until we attempt to move to centralized health checks
> (AURORA-279) but that will be an entirely different design discussion.
>
> [1] - https://github.com/apache/mesos/blob/master/include/
> mesos/mesos.proto#L1605.
> [2] - https://github.com/apache/aurora/blob/5cad046fc0f0c4bb79a4563cfcff04
> 42b7bf8383/src/main/python/apache/aurora/executor/aurora_executor.py#L97



With all of this in mind, I have another proposal. Why can't we have the
executor changes (wait until the task is healthy for RUNNING) *and* read
`watch_secs` if it is set? Why not have both of these features and if users
want purely health checking driven updates they can set this value to 0 and
enable health checks. If they want to have both health checking and time
driven updates they can set this to value to the time that they care about.
If they just want time driven updates they can disable health checking and
set this value.

Then there is no coupling between the executor and the scheduler except for
status updates and there is no dependency on the `message` field of the
status update.

We could even treat `watch_secs` as minimum time in STARTING + RUNNING
instead of RUNNING with this change and it becomes the lower bound in the
update transition speed. This can ensure that users don't deploy "too fast"
and end up overwhelming other services if they are deployed too quickly.



>
>
> On Fri, Sep 2, 2016 at 2:26 PM, Zameer Manji <zm...@apache.org> wrote:
> > *cc: Renan*
> >
> > I think there is some disagreement/discussion on the review because we
> have
> > not achieved consensus on the design. Since the design doc was written,
> > Aurora adopted multiple executor support as well non HTTP based
> > healthchecking. This invalidates some parts of the original design. I
> think
> > all of the solutions here are possible amendments to the design doc.
> >
> > I am not in favor of Solution 2 at all because status updates between
> > executor <-> agent <-> master <-> scheduler are designed to update the
> > framework of updates to the task and not really designed to send
> arbitrary
> > information. Just because the Mesos API provides us with a string field
> > doesn't mean we should try to pack in arbitrary data. Also, it isn't
> clear
> > what other capabilities we might add in the future so I'm unconvinced
> that
> > capabilities needs to exist at all. My fear is that we will create the
> > infrastructure for capabilities just to serve this need and nothing else.
> >
> > I object to Solution 1 along the same lines. The Mesos API isn't designed
> > for packing arbitrary data in the status update message and I don't think
> > we should abuse that and rely on that. Also our current infrastructure
> just
> > plumbs the message to the UI and I think displaying capabilities is not
> > something we should do.
> >
> > I am in favor of Solution 3 which is as close as possible to the original
> > design in the design doc. The design doc says the following:
> >
> > Scheduler updater will skip the minWaitInInstanceMs (aka watch_secs
> >> <https://github.com/apache/aurora/blob/4b43305b33cd8bebdd80225a3987b7
> cc7a8389a2/docs/configuration-reference.md#updateconfig-objects>)
> >> grace period any time it detects a named port ‘health’ in task
> >> configuration. A RUNNING instance status will signify the end of
> instance
> >> update.
> >
> >
> > Instead of detecting the 'health' port in the task configuration, we make
> > enabling this feature explicitly by enabling a bit in the task
> > configuration with the `executorDrivenUpdates` bit.
> >
> > I understand this option makes this feature more complex because it
> > requires a schema change and requires operators to deploy the executor to
> > all agents before upgrading the client. However, I think that's a one
> time
> > operational cost as a opposed to long lived design choices that will
> affect
> > the code.
> >
> > Further Solution 3 is the most amenable to custom executors and continues
> > our tradition of treating executors as opaque black boxes. I think there
> is
> > a lot of value in treating executors as black boxes as it leaves the door
> > open to switching our executor to something else and doesn't impose a
> > burden to others that want to write their own.
> >
> > Alternatively, if amending the schema is too much work, I would be open
> to
> > just saying that scheduler version 0.16 (or 0.17) just assumes the
> executor
> > transitions to RUNNING once a task is healthy and dropping `watch_secs`
> > entirely. We can put it in the release notes that operators must deploy
> the
> > executor to 100% before deploying the scheduler.
> >
> >
> > On Thu, Sep 1, 2016 at 6:40 PM, 黄 凯 <te...@hotmail.com> wrote:
> >
> >> Hi Folks,
> >>
> >> I'm currently working on a feature on aurora scheduler and executor. The
> >> implementation strategy became controversial on the review board, so I
> was
> >> wondering if I should broadcast it to more audience and initiate a
> >> discussion. Please feel free to let me know your thoughts, your help is
> >> greatly appreciated!
> >>
> >> The high level goal of this feature is to improve reliability and
> >> performance of the Aurora scheduler job updater, by relying on health
> check
> >> status rather than watch_secs timeout when deciding an individual
> instance
> >> update state.
> >>
> >> Please see the original review request *https://reviews.apache.org/r/
> 51536/
> >> <https://reviews.apache.org/r/51536/> *
> >> aurora JIRA ticket *https://issues.apache.org/jira/browse/AURORA-894
> >> <https://issues.apache.org/jira/browse/AURORA-894>*
> >> design doc *https://docs.google.com/document/d/
> 1ZdgW8S4xMhvKW7iQUX99xZm10NXSxEWR0a-21FP5d94/edit#
> >> <https://docs.google.com/document/d/1ZdgW8S4xMhvKW7iQUX99xZm10NXSx
> EWR0a-21FP5d94/edit#>*
> >> for more details and background.
> >>
> >> Note: The design doc becomes a little bit outdated on the "scheduler
> >> change summary" part (this is what the review request trying to
> address).
> >> As a result, I've left some comment to clarify the latest proposed
> >> implementation plan for scheduler change.
> >>
> >> There are two questions I'm trying to address here:
> >> *1. How does the scheduler infer the executor version and be backward
> >> compatible?*
> >> *2. Where do we determine if health check is enabled?*
> >>
> >> In short, there are 3 different solutions proposed on the review board.
> >>
> >> In the first two approaches, the scheduler will rely on a string to
> >> determine the executor version. We determine whether health check is
> >> enabled merely on executor side. There will be communication between the
> >> executor and the scheduler.
> >> *Solution 1: *
> >> *vCurrent executor sends a message in its health check thread during
> >> RUNNING state transition, and the vCurrent updater will infer the
> executor
> >> version from the presence of this message, and skip the watch_secs if
> >> necessary.*
> >>
> >> *Solution 2:*
> >> *Instead of relying on the presence of an arbitrary string in the
> message,
> >> rely on the presence of a string like:
> >> "capabilities:CAPABILITY_1,CAPABILITY-2" where CAPABILITY_1 and
> >> CAPABILITY_2 (etc.) are constants defined in api.thrift. Basically just
> >> formalizing the mechanism and making it a bit more future proof.*
> >>
> >> In the third solution, the scheduler infers the executor version from
> the
> >> JobUpdateSettings on scheduler side.
> >> *Solution 3:*
> >> *Adding a bit to JobUpdateSettings which is ‘executorDrivenUpdates', if
> >> that is set, the scheduler assumes that the transition from STARTING ->
> >> RUNNING makes the executor healthy and concurrently, we release thermos
> and
> >> change HealthCheckConfig to say that it should only go to running after
> >> healthy*.
> >>
> >> *Pros and Cons:*
> >> The main benefit of Solution 1 is:
> >> 1. By using the message in task status update, we don't have to make any
> >> schema change, which makes the design simple.
> >> 2. The feature is fully backward-compatible. When we roll out the
> vCurrent
> >> schedulers and executors, we do not have to instruct the users to
> provide
> >> additional field in the Job or Update configs, which could confuses
> >> customers when the vPrev and vCurrent executor coexist in the cluster.
> >>
> >> Concerns:
> >> Relying on the presence of a message makes things brittle. Also we do
> not
> >> want to expose this message to users.
> >>
> >> The benefit of Solution 2 is making the feature more future proof.
> >> However, if we do not envision a new executor feature in the short term,
> >> it's not too much different from Solution 1.
> >>
> >> The benefits of Solution 3 include:
> >> 1. We support more than just thermos now (and others rely on custom
> >> executors).
> >> 2. A lot of things in Aurora treat the executor as opaque. The status
> >> update message sent by executor should not be visible to users only if
> it's
> >> an error message.
> >>
> >> Concerns:
> >> 1. In addition to the ‘executorDrivenUpdates' bit that identifies the
> >> executor version, we still need to notify the scheduler if health check
> is
> >> enabled on vCurrent executor, if not, the scheduler must be able to fall
> >> back to use watch_secs.
> >> 2. The users have to provide an additional field in their .aurora config
> >> files. The feature wouldn't be available unless new clients are rolled
> out
> >> as well.
> >>
> >> Please let me know if I understand your suggestions correctly and
> >> hopefully everyone is on the same page!
> >>
> >> Thanks,
> >>
> >> Kai
> >>
>



-- 
Zameer Manji

Re: Discussion on review request 51536

Posted by Maxim Khutornenko <ma...@apache.org>.

Need to correct a few previous statements:

> Also we do not want to expose this message to users.
This is incorrect. The original design proposal suggested to show this
message in the UI as: "Task is healthy"

> The Mesos API isn't designed for packing arbitrary data
> in the status update message.
Don't think I agree, this is exactly what this field is for [1] and we
already use it for other states [2].

> I would be open to just saying that scheduler version
> 0.16 (or 0.17) just assumes the executor transitions to
> RUNNING once a task is healthy and dropping
> `watch_secs`entirely.
We can't drop 'watch_secs' entirely as we still have to babysit job
updates that don't have health checks enabled.

As for my take on the above, I favor #1 as the simplest answer to an
already simple question: "Should we use watch_secs for this instance
or not?". That's pretty much it. Scheduler does not need any schema
changes, know what health checks are or if a job has them enabled. At
least not until we attempt to move to centralized health checks
(AURORA-279) but that will be an entirely different design discussion.

[1] - https://github.com/apache/mesos/blob/master/include/mesos/mesos.proto#L1605.
[2] - https://github.com/apache/aurora/blob/5cad046fc0f0c4bb79a4563cfcff0442b7bf8383/src/main/python/apache/aurora/executor/aurora_executor.py#L97

On Fri, Sep 2, 2016 at 2:26 PM, Zameer Manji <zm...@apache.org> wrote:
> *cc: Renan*
>
> I think there is some disagreement/discussion on the review because we have
> not achieved consensus on the design. Since the design doc was written,
> Aurora adopted multiple executor support as well non HTTP based
> healthchecking. This invalidates some parts of the original design. I think
> all of the solutions here are possible amendments to the design doc.
>
> I am not in favor of Solution 2 at all because status updates between
> executor <-> agent <-> master <-> scheduler are designed to update the
> framework of updates to the task and not really designed to send arbitrary
> information. Just because the Mesos API provides us with a string field
> doesn't mean we should try to pack in arbitrary data. Also, it isn't clear
> what other capabilities we might add in the future so I'm unconvinced that
> capabilities needs to exist at all. My fear is that we will create the
> infrastructure for capabilities just to serve this need and nothing else.
>
> I object to Solution 1 along the same lines. The Mesos API isn't designed
> for packing arbitrary data in the status update message and I don't think
> we should abuse that and rely on that. Also our current infrastructure just
> plumbs the message to the UI and I think displaying capabilities is not
> something we should do.
>
> I am in favor of Solution 3 which is as close as possible to the original
> design in the design doc. The design doc says the following:
>
> Scheduler updater will skip the minWaitInInstanceMs (aka watch_secs
>> <https://github.com/apache/aurora/blob/4b43305b33cd8bebdd80225a3987b7cc7a8389a2/docs/configuration-reference.md#updateconfig-objects>)
>> grace period any time it detects a named port ‘health’ in task
>> configuration. A RUNNING instance status will signify the end of instance
>> update.
>
>
> Instead of detecting the 'health' port in the task configuration, we make
> enabling this feature explicitly by enabling a bit in the task
> configuration with the `executorDrivenUpdates` bit.
>
> I understand this option makes this feature more complex because it
> requires a schema change and requires operators to deploy the executor to
> all agents before upgrading the client. However, I think that's a one time
> operational cost as a opposed to long lived design choices that will affect
> the code.
>
> Further Solution 3 is the most amenable to custom executors and continues
> our tradition of treating executors as opaque black boxes. I think there is
> a lot of value in treating executors as black boxes as it leaves the door
> open to switching our executor to something else and doesn't impose a
> burden to others that want to write their own.
>
> Alternatively, if amending the schema is too much work, I would be open to
> just saying that scheduler version 0.16 (or 0.17) just assumes the executor
> transitions to RUNNING once a task is healthy and dropping `watch_secs`
> entirely. We can put it in the release notes that operators must deploy the
> executor to 100% before deploying the scheduler.
>
>
> On Thu, Sep 1, 2016 at 6:40 PM, 黄 凯 <te...@hotmail.com> wrote:
>
>> Hi Folks,
>>
>> I'm currently working on a feature on aurora scheduler and executor. The
>> implementation strategy became controversial on the review board, so I was
>> wondering if I should broadcast it to more audience and initiate a
>> discussion. Please feel free to let me know your thoughts, your help is
>> greatly appreciated!
>>
>> The high level goal of this feature is to improve reliability and
>> performance of the Aurora scheduler job updater, by relying on health check
>> status rather than watch_secs timeout when deciding an individual instance
>> update state.
>>
>> Please see the original review request *https://reviews.apache.org/r/51536/
>> <https://reviews.apache.org/r/51536/> *
>> aurora JIRA ticket *https://issues.apache.org/jira/browse/AURORA-894
>> <https://issues.apache.org/jira/browse/AURORA-894>*
>> design doc *https://docs.google.com/document/d/1ZdgW8S4xMhvKW7iQUX99xZm10NXSxEWR0a-21FP5d94/edit#
>> <https://docs.google.com/document/d/1ZdgW8S4xMhvKW7iQUX99xZm10NXSxEWR0a-21FP5d94/edit#>*
>> for more details and background.
>>
>> Note: The design doc becomes a little bit outdated on the "scheduler
>> change summary" part (this is what the review request trying to address).
>> As a result, I've left some comment to clarify the latest proposed
>> implementation plan for scheduler change.
>>
>> There are two questions I'm trying to address here:
>> *1. How does the scheduler infer the executor version and be backward
>> compatible?*
>> *2. Where do we determine if health check is enabled?*
>>
>> In short, there are 3 different solutions proposed on the review board.
>>
>> In the first two approaches, the scheduler will rely on a string to
>> determine the executor version. We determine whether health check is
>> enabled merely on executor side. There will be communication between the
>> executor and the scheduler.
>> *Solution 1: *
>> *vCurrent executor sends a message in its health check thread during
>> RUNNING state transition, and the vCurrent updater will infer the executor
>> version from the presence of this message, and skip the watch_secs if
>> necessary.*
>>
>> *Solution 2:*
>> *Instead of relying on the presence of an arbitrary string in the message,
>> rely on the presence of a string like:
>> "capabilities:CAPABILITY_1,CAPABILITY-2" where CAPABILITY_1 and
>> CAPABILITY_2 (etc.) are constants defined in api.thrift. Basically just
>> formalizing the mechanism and making it a bit more future proof.*
>>
>> In the third solution, the scheduler infers the executor version from the
>> JobUpdateSettings on scheduler side.
>> *Solution 3:*
>> *Adding a bit to JobUpdateSettings which is ‘executorDrivenUpdates', if
>> that is set, the scheduler assumes that the transition from STARTING ->
>> RUNNING makes the executor healthy and concurrently, we release thermos and
>> change HealthCheckConfig to say that it should only go to running after
>> healthy*.
>>
>> *Pros and Cons:*
>> The main benefit of Solution 1 is:
>> 1. By using the message in task status update, we don't have to make any
>> schema change, which makes the design simple.
>> 2. The feature is fully backward-compatible. When we roll out the vCurrent
>> schedulers and executors, we do not have to instruct the users to provide
>> additional field in the Job or Update configs, which could confuses
>> customers when the vPrev and vCurrent executor coexist in the cluster.
>>
>> Concerns:
>> Relying on the presence of a message makes things brittle. Also we do not
>> want to expose this message to users.
>>
>> The benefit of Solution 2 is making the feature more future proof.
>> However, if we do not envision a new executor feature in the short term,
>> it's not too much different from Solution 1.
>>
>> The benefits of Solution 3 include:
>> 1. We support more than just thermos now (and others rely on custom
>> executors).
>> 2. A lot of things in Aurora treat the executor as opaque. The status
>> update message sent by executor should not be visible to users only if it's
>> an error message.
>>
>> Concerns:
>> 1. In addition to the ‘executorDrivenUpdates' bit that identifies the
>> executor version, we still need to notify the scheduler if health check is
>> enabled on vCurrent executor, if not, the scheduler must be able to fall
>> back to use watch_secs.
>> 2. The users have to provide an additional field in their .aurora config
>> files. The feature wouldn't be available unless new clients are rolled out
>> as well.
>>
>> Please let me know if I understand your suggestions correctly and
>> hopefully everyone is on the same page!
>>
>> Thanks,
>>
>> Kai
>>

Re: Discussion on review request 51536

Posted by Zameer Manji <zm...@apache.org>.

*cc: Renan*

I think there is some disagreement/discussion on the review because we have
not achieved consensus on the design. Since the design doc was written,
Aurora adopted multiple executor support as well non HTTP based
healthchecking. This invalidates some parts of the original design. I think
all of the solutions here are possible amendments to the design doc.

I am not in favor of Solution 2 at all because status updates between
executor <-> agent <-> master <-> scheduler are designed to update the
framework of updates to the task and not really designed to send arbitrary
information. Just because the Mesos API provides us with a string field
doesn't mean we should try to pack in arbitrary data. Also, it isn't clear
what other capabilities we might add in the future so I'm unconvinced that
capabilities needs to exist at all. My fear is that we will create the
infrastructure for capabilities just to serve this need and nothing else.

I object to Solution 1 along the same lines. The Mesos API isn't designed
for packing arbitrary data in the status update message and I don't think
we should abuse that and rely on that. Also our current infrastructure just
plumbs the message to the UI and I think displaying capabilities is not
something we should do.

I am in favor of Solution 3 which is as close as possible to the original
design in the design doc. The design doc says the following:

Scheduler updater will skip the minWaitInInstanceMs (aka watch_secs
> <https://github.com/apache/aurora/blob/4b43305b33cd8bebdd80225a3987b7cc7a8389a2/docs/configuration-reference.md#updateconfig-objects>)
> grace period any time it detects a named port ‘health’ in task
> configuration. A RUNNING instance status will signify the end of instance
> update.

Instead of detecting the 'health' port in the task configuration, we make
enabling this feature explicitly by enabling a bit in the task
configuration with the `executorDrivenUpdates` bit.

I understand this option makes this feature more complex because it
requires a schema change and requires operators to deploy the executor to
all agents before upgrading the client. However, I think that's a one time
operational cost as a opposed to long lived design choices that will affect
the code.

Further Solution 3 is the most amenable to custom executors and continues
our tradition of treating executors as opaque black boxes. I think there is
a lot of value in treating executors as black boxes as it leaves the door
open to switching our executor to something else and doesn't impose a
burden to others that want to write their own.

Alternatively, if amending the schema is too much work, I would be open to
just saying that scheduler version 0.16 (or 0.17) just assumes the executor
transitions to RUNNING once a task is healthy and dropping `watch_secs`
entirely. We can put it in the release notes that operators must deploy the
executor to 100% before deploying the scheduler.

On Thu, Sep 1, 2016 at 6:40 PM, 黄 凯 <te...@hotmail.com> wrote:

> Hi Folks,
>
> I'm currently working on a feature on aurora scheduler and executor. The
> implementation strategy became controversial on the review board, so I was
> wondering if I should broadcast it to more audience and initiate a
> discussion. Please feel free to let me know your thoughts, your help is
> greatly appreciated!
>
> The high level goal of this feature is to improve reliability and
> performance of the Aurora scheduler job updater, by relying on health check
> status rather than watch_secs timeout when deciding an individual instance
> update state.
>
> Please see the original review request *https://reviews.apache.org/r/51536/
> <https://reviews.apache.org/r/51536/> *
> aurora JIRA ticket *https://issues.apache.org/jira/browse/AURORA-894
> <https://issues.apache.org/jira/browse/AURORA-894>*
> design doc *https://docs.google.com/document/d/1ZdgW8S4xMhvKW7iQUX99xZm10NXSxEWR0a-21FP5d94/edit#
> <https://docs.google.com/document/d/1ZdgW8S4xMhvKW7iQUX99xZm10NXSxEWR0a-21FP5d94/edit#>*
> for more details and background.
>
> Note: The design doc becomes a little bit outdated on the "scheduler
> change summary" part (this is what the review request trying to address).
> As a result, I've left some comment to clarify the latest proposed
> implementation plan for scheduler change.
>
> There are two questions I'm trying to address here:
> *1. How does the scheduler infer the executor version and be backward
> compatible?*
> *2. Where do we determine if health check is enabled?*
>
> In short, there are 3 different solutions proposed on the review board.
>
> In the first two approaches, the scheduler will rely on a string to
> determine the executor version. We determine whether health check is
> enabled merely on executor side. There will be communication between the
> executor and the scheduler.
> *Solution 1: *
> *vCurrent executor sends a message in its health check thread during
> RUNNING state transition, and the vCurrent updater will infer the executor
> version from the presence of this message, and skip the watch_secs if
> necessary.*
>
> *Solution 2:*
> *Instead of relying on the presence of an arbitrary string in the message,
> rely on the presence of a string like:
> "capabilities:CAPABILITY_1,CAPABILITY-2" where CAPABILITY_1 and
> CAPABILITY_2 (etc.) are constants defined in api.thrift. Basically just
> formalizing the mechanism and making it a bit more future proof.*
>
> In the third solution, the scheduler infers the executor version from the
> JobUpdateSettings on scheduler side.
> *Solution 3:*
> *Adding a bit to JobUpdateSettings which is ‘executorDrivenUpdates', if
> that is set, the scheduler assumes that the transition from STARTING ->
> RUNNING makes the executor healthy and concurrently, we release thermos and
> change HealthCheckConfig to say that it should only go to running after
> healthy*.
>
> *Pros and Cons:*
> The main benefit of Solution 1 is:
> 1. By using the message in task status update, we don't have to make any
> schema change, which makes the design simple.
> 2. The feature is fully backward-compatible. When we roll out the vCurrent
> schedulers and executors, we do not have to instruct the users to provide
> additional field in the Job or Update configs, which could confuses
> customers when the vPrev and vCurrent executor coexist in the cluster.
>
> Concerns:
> Relying on the presence of a message makes things brittle. Also we do not
> want to expose this message to users.
>
> The benefit of Solution 2 is making the feature more future proof.
> However, if we do not envision a new executor feature in the short term,
> it's not too much different from Solution 1.
>
> The benefits of Solution 3 include:
> 1. We support more than just thermos now (and others rely on custom
> executors).
> 2. A lot of things in Aurora treat the executor as opaque. The status
> update message sent by executor should not be visible to users only if it's
> an error message.
>
> Concerns:
> 1. In addition to the ‘executorDrivenUpdates' bit that identifies the
> executor version, we still need to notify the scheduler if health check is
> enabled on vCurrent executor, if not, the scheduler must be able to fall
> back to use watch_secs.
> 2. The users have to provide an additional field in their .aurora config
> files. The feature wouldn't be available unless new clients are rolled out
> as well.
>
> Please let me know if I understand your suggestions correctly and
> hopefully everyone is on the same page!
>
> Thanks,
>
> Kai
>