You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@aurora.apache.org by Renan DelValle <re...@gmail.com> on 2018/02/01 02:15:46 UTC

Staggered deployments

Hi all,

We have a use case where we want to deploy X amount of instances in N
amount of steps. The size of the batch could potentially change every step.
For example, we might start a with deploying 1 instances, followed by batch
size of 50, followed by a batch of 49 to deploy 100 instances.

Is there any way of achieving this behavior through existing Aurora thrift
calls/primitives?

Any insights would be greatly appreciated.

-Renan

Re: Staggered deployments

Posted by Renan DelValle <re...@gmail.com>.
Yup, I can take the lead on this, I'll work on a design doc first to get
the story straight before I dive into coding.

On Thu, Feb 1, 2018 at 10:28 PM, Meghdoot bhattacharya <meghdoot_b@yahoo.com
> wrote:

> Thx David. Renan can you take a lead on this?
>
> Yeah we do have separate orchestration for multi aurora cluster geo deploy
> and blue-green deploys within each cluster but we have also a requirement
> to support the flexible controlled rolling deploy inside each cluster like
> in old platform. And we really want to avoid targeting shard ids for the
> scenario below.
>
> Aurora pause is there today but thats user initiated and not deterministic
> auto pause but resume already there. So, feeling hopeful below can be done
> relatively easily.
>
> Thx
>
> On Feb 1, 2018, at 10:08 PM, David McLaughlin <dm...@apache.org>
> wrote:
>
> Yeah that is definitely not possible right now in Aurora. We've built
> something external to Aurora (like Spinnaker) that our users use to do this
> type of thing (also do it across multiple Schedulers).
>
> Adding this type of "acknowledge before continuing" functionality to the
> updater is certainly possible though. I'm happy to shepherd any proposals.
>
>
> On Thu, Feb 1, 2018 at 9:58 PM, Meghdoot bhattacharya <
> meghdoot_b@yahoo.com> wrote:
>
>> David here is the scenario that we are looking. Its most likely not
>> supported but let us know if the changes can be made without much hassle.
>> We can then research and do a PR.
>>
>> The use case arises from our current non mesos platform.
>>
>> Lets say the app has 100 instances. Here is how we will roll it as an
>> example
>>
>> 1. First to 10 instances (rolled in parallel) and then pause.
>>
>> 2. App metrics are monitored.
>>
>> 3. If fine we resume deploy, but increasing the batch size say now 40 in
>> parallel and again pause.
>>
>> 4. Then resume and do the rest 50.
>>
>> 5. Rollback by default follows 50, 40, 10 but can be changed.
>>
>> To put this in aurora
>>
>> 1. We want to do auto pause after every batch. We will use the wait for
>> batch completion flag (set it to true) to make sure batch is not sliding.
>> If all is good in terms of success threshold met, we would want aurora to
>> enter the pause state. Hope existing pause state can be reused.
>>
>> 2. Then we would want to change the batch size
>>
>> 3. And then use the existing resume api to start the update.
>>
>> I am not sure changing the batch size outside of pause window ( with
>> batch completion flag) is useful or not. In theory I guess batch size can
>> be updated for any scenario.
>>
>> Pulse update does not provide the same as above. If we poll and see how
>> many instances done then we can probably withhold sending a pulse to pause
>> the deploy and resume with pulse later but really not deterministic. Also
>> dynamic batch size not solved.
>>
>> Let us know your thoughts. We will like to contribute.
>>
>> Thx
>>
>> On Jan 31, 2018, at 6:56 PM, David McLaughlin <dm...@apache.org>
>> wrote:
>>
>> It's not currently possible with a single API call, you'd have to submit
>> multiple calls to startJobUpdate, changing the JobUpdateSettings each time.
>> When you say the size of the batch could change each step - could it change
>> dynamically (e.g. after you've submitted the call to the Scheduler), or is
>> all the information known upfront?
>>
>> On Wed, Jan 31, 2018 at 6:15 PM, Renan DelValle <renanidelvalle@gmail.com
>> > wrote:
>>
>>> Hi all,
>>>
>>> We have a use case where we want to deploy X amount of instances in N
>>> amount of steps. The size of the batch could potentially change every step.
>>> For example, we might start a with deploying 1 instances, followed by batch
>>> size of 50, followed by a batch of 49 to deploy 100 instances.
>>>
>>> Is there any way of achieving this behavior through existing Aurora
>>> thrift calls/primitives?
>>>
>>> Any insights would be greatly appreciated.
>>>
>>> -Renan
>>>
>>
>>
>

Re: Staggered deployments

Posted by Meghdoot bhattacharya <me...@yahoo.com>.
Thx David. Renan can you take a lead on this?

Yeah we do have separate orchestration for multi aurora cluster geo deploy and blue-green deploys within each cluster but we have also a requirement to support the flexible controlled rolling deploy inside each cluster like in old platform. And we really want to avoid targeting shard ids for the scenario below. 

Aurora pause is there today but thats user initiated and not deterministic auto pause but resume already there. So, feeling hopeful below can be done relatively easily.

Thx

> On Feb 1, 2018, at 10:08 PM, David McLaughlin <dm...@apache.org> wrote:
> 
> Yeah that is definitely not possible right now in Aurora. We've built something external to Aurora (like Spinnaker) that our users use to do this type of thing (also do it across multiple Schedulers).
> 
> Adding this type of "acknowledge before continuing" functionality to the updater is certainly possible though. I'm happy to shepherd any proposals.
> 
> 
>> On Thu, Feb 1, 2018 at 9:58 PM, Meghdoot bhattacharya <me...@yahoo.com> wrote:
>> David here is the scenario that we are looking. Its most likely not supported but let us know if the changes can be made without much hassle. We can then research and do a PR.
>> 
>> The use case arises from our current non mesos platform.
>> 
>> Lets say the app has 100 instances. Here is how we will roll it as an example
>> 
>> 1. First to 10 instances (rolled in parallel) and then pause.
>> 
>> 2. App metrics are monitored. 
>> 
>> 3. If fine we resume deploy, but increasing the batch size say now 40 in parallel and again pause.
>> 
>> 4. Then resume and do the rest 50.
>> 
>> 5. Rollback by default follows 50, 40, 10 but can be changed.
>> 
>> To put this in aurora
>> 
>> 1. We want to do auto pause after every batch. We will use the wait for batch completion flag (set it to true) to make sure batch is not sliding. If all is good in terms of success threshold met, we would want aurora to enter the pause state. Hope existing pause state can be reused.
>> 
>> 2. Then we would want to change the batch size
>> 
>> 3. And then use the existing resume api to start the update.
>> 
>> I am not sure changing the batch size outside of pause window ( with batch completion flag) is useful or not. In theory I guess batch size can be updated for any scenario.
>> 
>> Pulse update does not provide the same as above. If we poll and see how many instances done then we can probably withhold sending a pulse to pause the deploy and resume with pulse later but really not deterministic. Also dynamic batch size not solved.
>> 
>> Let us know your thoughts. We will like to contribute.
>> 
>> Thx
>> 
>>> On Jan 31, 2018, at 6:56 PM, David McLaughlin <dm...@apache.org> wrote:
>>> 
>>> It's not currently possible with a single API call, you'd have to submit multiple calls to startJobUpdate, changing the JobUpdateSettings each time. When you say the size of the batch could change each step - could it change dynamically (e.g. after you've submitted the call to the Scheduler), or is all the information known upfront? 
>>> 
>>>> On Wed, Jan 31, 2018 at 6:15 PM, Renan DelValle <re...@gmail.com> wrote:
>>>> Hi all,
>>>> 
>>>> We have a use case where we want to deploy X amount of instances in N amount of steps. The size of the batch could potentially change every step. For example, we might start a with deploying 1 instances, followed by batch size of 50, followed by a batch of 49 to deploy 100 instances.
>>>> 
>>>> Is there any way of achieving this behavior through existing Aurora thrift calls/primitives?
>>>> 
>>>> Any insights would be greatly appreciated.
>>>> 
>>>> -Renan
>>> 
> 

Re: Staggered deployments

Posted by David McLaughlin <dm...@apache.org>.
Yeah that is definitely not possible right now in Aurora. We've built
something external to Aurora (like Spinnaker) that our users use to do this
type of thing (also do it across multiple Schedulers).

Adding this type of "acknowledge before continuing" functionality to the
updater is certainly possible though. I'm happy to shepherd any proposals.


On Thu, Feb 1, 2018 at 9:58 PM, Meghdoot bhattacharya <me...@yahoo.com>
wrote:

> David here is the scenario that we are looking. Its most likely not
> supported but let us know if the changes can be made without much hassle.
> We can then research and do a PR.
>
> The use case arises from our current non mesos platform.
>
> Lets say the app has 100 instances. Here is how we will roll it as an
> example
>
> 1. First to 10 instances (rolled in parallel) and then pause.
>
> 2. App metrics are monitored.
>
> 3. If fine we resume deploy, but increasing the batch size say now 40 in
> parallel and again pause.
>
> 4. Then resume and do the rest 50.
>
> 5. Rollback by default follows 50, 40, 10 but can be changed.
>
> To put this in aurora
>
> 1. We want to do auto pause after every batch. We will use the wait for
> batch completion flag (set it to true) to make sure batch is not sliding.
> If all is good in terms of success threshold met, we would want aurora to
> enter the pause state. Hope existing pause state can be reused.
>
> 2. Then we would want to change the batch size
>
> 3. And then use the existing resume api to start the update.
>
> I am not sure changing the batch size outside of pause window ( with batch
> completion flag) is useful or not. In theory I guess batch size can be
> updated for any scenario.
>
> Pulse update does not provide the same as above. If we poll and see how
> many instances done then we can probably withhold sending a pulse to pause
> the deploy and resume with pulse later but really not deterministic. Also
> dynamic batch size not solved.
>
> Let us know your thoughts. We will like to contribute.
>
> Thx
>
> On Jan 31, 2018, at 6:56 PM, David McLaughlin <dm...@apache.org>
> wrote:
>
> It's not currently possible with a single API call, you'd have to submit
> multiple calls to startJobUpdate, changing the JobUpdateSettings each time.
> When you say the size of the batch could change each step - could it change
> dynamically (e.g. after you've submitted the call to the Scheduler), or is
> all the information known upfront?
>
> On Wed, Jan 31, 2018 at 6:15 PM, Renan DelValle <re...@gmail.com>
> wrote:
>
>> Hi all,
>>
>> We have a use case where we want to deploy X amount of instances in N
>> amount of steps. The size of the batch could potentially change every step.
>> For example, we might start a with deploying 1 instances, followed by batch
>> size of 50, followed by a batch of 49 to deploy 100 instances.
>>
>> Is there any way of achieving this behavior through existing Aurora
>> thrift calls/primitives?
>>
>> Any insights would be greatly appreciated.
>>
>> -Renan
>>
>
>

Re: Staggered deployments

Posted by Meghdoot bhattacharya <me...@yahoo.com>.
David here is the scenario that we are looking. Its most likely not supported but let us know if the changes can be made without much hassle. We can then research and do a PR.

The use case arises from our current non mesos platform.

Lets say the app has 100 instances. Here is how we will roll it as an example

1. First to 10 instances (rolled in parallel) and then pause.

2. App metrics are monitored. 

3. If fine we resume deploy, but increasing the batch size say now 40 in parallel and again pause.

4. Then resume and do the rest 50.

5. Rollback by default follows 50, 40, 10 but can be changed.

To put this in aurora

1. We want to do auto pause after every batch. We will use the wait for batch completion flag (set it to true) to make sure batch is not sliding. If all is good in terms of success threshold met, we would want aurora to enter the pause state. Hope existing pause state can be reused.

2. Then we would want to change the batch size

3. And then use the existing resume api to start the update.

I am not sure changing the batch size outside of pause window ( with batch completion flag) is useful or not. In theory I guess batch size can be updated for any scenario.

Pulse update does not provide the same as above. If we poll and see how many instances done then we can probably withhold sending a pulse to pause the deploy and resume with pulse later but really not deterministic. Also dynamic batch size not solved.

Let us know your thoughts. We will like to contribute.

Thx

> On Jan 31, 2018, at 6:56 PM, David McLaughlin <dm...@apache.org> wrote:
> 
> It's not currently possible with a single API call, you'd have to submit multiple calls to startJobUpdate, changing the JobUpdateSettings each time. When you say the size of the batch could change each step - could it change dynamically (e.g. after you've submitted the call to the Scheduler), or is all the information known upfront? 
> 
>> On Wed, Jan 31, 2018 at 6:15 PM, Renan DelValle <re...@gmail.com> wrote:
>> Hi all,
>> 
>> We have a use case where we want to deploy X amount of instances in N amount of steps. The size of the batch could potentially change every step. For example, we might start a with deploying 1 instances, followed by batch size of 50, followed by a batch of 49 to deploy 100 instances.
>> 
>> Is there any way of achieving this behavior through existing Aurora thrift calls/primitives?
>> 
>> Any insights would be greatly appreciated.
>> 
>> -Renan
> 

Re: Staggered deployments

Posted by David McLaughlin <dm...@apache.org>.
It's not currently possible with a single API call, you'd have to submit
multiple calls to startJobUpdate, changing the JobUpdateSettings each time.
When you say the size of the batch could change each step - could it change
dynamically (e.g. after you've submitted the call to the Scheduler), or is
all the information known upfront?

On Wed, Jan 31, 2018 at 6:15 PM, Renan DelValle <re...@gmail.com>
wrote:

> Hi all,
>
> We have a use case where we want to deploy X amount of instances in N
> amount of steps. The size of the batch could potentially change every step.
> For example, we might start a with deploying 1 instances, followed by batch
> size of 50, followed by a batch of 49 to deploy 100 instances.
>
> Is there any way of achieving this behavior through existing Aurora thrift
> calls/primitives?
>
> Any insights would be greatly appreciated.
>
> -Renan
>