You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@stratos.apache.org by Sajith Kariyawasam <sa...@wso2.com> on 2015/02/11 20:36:49 UTC

Member termination took 30 minutes

Hi Devs,

While testing group scaling, I noticed when scaling down it takes 30
minutes from the moment scaling rule decides to terminate an instance.

An active member, which was selected by the rule, first moves to a
"termination pending member map", and after a certain period
(terminationPendingMemberExpiryTime) that member
moves to an "obsolete member map". Then by the obsolete check rule, that
member will be terminated via cloud controller.

It seems because of the property  terminationPendingMemberExpiryTime,
default value of which is 30 minutes, this takes that amount of time to get
terminated

Sorry for asking, I might have missed some past discussions regarding this,
could someone explain the purpose of moving the member to an intermediary
map "termination pending member map", rather than moving directly to
"obsolete member map"?

Also, is terminationPendingMemberExpiryTime parameter configurable? (seems
not) , and any reason for it to set to 30 minutes?

Further, we should make sleep times of  PendingMemberWatcher,
ObsoletedMemberWatcher and TerminationPendingMemberWatcher configurable.
WDYT?

We need to document those configurable parameters as well, @Mari please
note.


Thanks,
Sajith

Re: Member termination took 30 minutes

Posted by Rajkumar Rajaratnam <ra...@wso2.com>.

Hi,

I made *terminationPendingMemberExp**iry**Time* configurable via
autoscaler.xml, like other expiry timeouts.

Thanks.

On Thu, Feb 12, 2015 at 12:58 PM, Sajith Kariyawasam <sa...@wso2.com>
wrote:

> Thanks for the explanation Raj! Its more clear now
>
> On Thu, Feb 12, 2015 at 6:56 AM, Rajkumar Rajaratnam <ra...@wso2.com>
> wrote:
>
>> Hi Sajith,
>>
>> Please find my comments inline.
>>
>> On Thu, Feb 12, 2015 at 1:06 AM, Sajith Kariyawasam <sa...@wso2.com>
>> wrote:
>>
>>> Hi Devs,
>>>
>>> While testing group scaling, I noticed when scaling down it takes 30
>>> minutes from the moment scaling rule decides to terminate an instance.
>>>
>>> An active member, which was selected by the rule, first moves to a
>>> "termination pending member map", and after a certain period
>>> (terminationPendingMemberExpiryTime) that member
>>> moves to an "obsolete member map". Then by the obsolete check rule, that
>>> member will be terminated via cloud controller.
>>>
>>> It seems because of the property  terminationPendingMemberExpiryTime,
>>> default value of which is 30 minutes, this takes that amount of time to get
>>> terminated
>>>
>>> Sorry for asking, I might have missed some past discussions regarding
>>> this, could someone explain the purpose of moving the member to an
>>> intermediary map "termination pending member map", rather than moving
>>> directly to "obsolete member map"?
>>>
>>
>> The reason is to avoid event lost and graceful termination. Let me
>> explain the logic.
>>
>>    - When scaling down, AS will move the member from "active member
>>    list" to "termination pending member map".
>>    - There is a drool-rule "Cleanup Instances which are pending
>>    termination" which will run periodically and take all the members which are
>>    in "termination pending member map" and publish instance clean up event.
>>    - When CA receives instance clean up event, it will publish instance
>>    ready to shutdown.
>>    - When CC receives instance ready to shutdown event, it will publish
>>    member ready to shutdown.
>>    - When AS receives member ready to shutdown event, it will move the
>>    member from  "termination pending member map" to "obsolete member map".
>>    - Hence, until AS receives member ready to shutdown event, it will
>>    keep publishing instance clean up event in every cluster monitor interval
>>    (drool is running)
>>    - If AS is not receiving member ready to shutdown event for a member
>>    in "termination pending member map" within 30 min (upper limit), this
>>    member will be moved to obsolete list without waiting for the member ready
>>    to shutdown event.
>>
>> The reason for this complete cycle is graceful termination. If we put the
>> member into "obsolete member map", it will not be terminated gracefully.
>>
>> The reason why we are moving the member from "active member list" to
>> "termination pending member map" is to avoid event lost. We have had
>> situations where some event is lost in the above cycle. These events are
>> published only once. If we lost one event in this cycle, that member will
>> not be terminated forever. That is why we are putting the member in the
>> map. In every cluster monitor interval, we are taking all the members in
>> the "termination pending member map" and send the instance clean up event.
>> This will overcome event lost.
>>
>> 30 min is the upper limit, maximum time a member can resides in
>> "termination pending member map". You have faced the edge scenario, where
>> AS didn't receive the member ready to shutdown event. So AS took 30 min to
>> move the member to obsolete list.
>>
>>>
>>> Also, is terminationPendingMemberExpiryTime parameter configurable?
>>> (seems not) , and any reason for it to set to 30 minutes?
>>>
>>
>> This is not configurable yet. But other member list/map expiry times are
>> configurable AFAIR.
>>
>>>
>>> Further, we should make sleep times of  PendingMemberWatcher,
>>> ObsoletedMemberWatcher and TerminationPendingMemberWatcher configurable.
>>> WDYT?
>>>
>>
>> Yes we have to.
>>
>>
>>>
>>> We need to document those configurable parameters as well, @Mari please
>>> note.
>>>
>>>
>>> Thanks,
>>> Sajith
>>>
>>>
>>>
>>
>>
>> --
>> Rajkumar Rajaratnam
>> Committer & PMC Member, Apache Stratos
>> Software Engineer, WSO2
>>
>> Mobile : +94777568639
>> Blog : rajkumarr.com
>>
>
>


-- 
Rajkumar Rajaratnam
Committer & PMC Member, Apache Stratos
Software Engineer, WSO2

Mobile : +94777568639
Blog : rajkumarr.com

Re: Member termination took 30 minutes

Posted by Sajith Kariyawasam <sa...@wso2.com>.

Thanks for the explanation Raj! Its more clear now

On Thu, Feb 12, 2015 at 6:56 AM, Rajkumar Rajaratnam <ra...@wso2.com>
wrote:

> Hi Sajith,
>
> Please find my comments inline.
>
> On Thu, Feb 12, 2015 at 1:06 AM, Sajith Kariyawasam <sa...@wso2.com>
> wrote:
>
>> Hi Devs,
>>
>> While testing group scaling, I noticed when scaling down it takes 30
>> minutes from the moment scaling rule decides to terminate an instance.
>>
>> An active member, which was selected by the rule, first moves to a
>> "termination pending member map", and after a certain period
>> (terminationPendingMemberExpiryTime) that member
>> moves to an "obsolete member map". Then by the obsolete check rule, that
>> member will be terminated via cloud controller.
>>
>> It seems because of the property  terminationPendingMemberExpiryTime,
>> default value of which is 30 minutes, this takes that amount of time to get
>> terminated
>>
>> Sorry for asking, I might have missed some past discussions regarding
>> this, could someone explain the purpose of moving the member to an
>> intermediary map "termination pending member map", rather than moving
>> directly to "obsolete member map"?
>>
>
> The reason is to avoid event lost and graceful termination. Let me explain
> the logic.
>
>    - When scaling down, AS will move the member from "active member list"
>    to "termination pending member map".
>    - There is a drool-rule "Cleanup Instances which are pending
>    termination" which will run periodically and take all the members which are
>    in "termination pending member map" and publish instance clean up event.
>    - When CA receives instance clean up event, it will publish instance
>    ready to shutdown.
>    - When CC receives instance ready to shutdown event, it will publish
>    member ready to shutdown.
>    - When AS receives member ready to shutdown event, it will move the
>    member from  "termination pending member map" to "obsolete member map".
>    - Hence, until AS receives member ready to shutdown event, it will
>    keep publishing instance clean up event in every cluster monitor interval
>    (drool is running)
>    - If AS is not receiving member ready to shutdown event for a member
>    in "termination pending member map" within 30 min (upper limit), this
>    member will be moved to obsolete list without waiting for the member ready
>    to shutdown event.
>
> The reason for this complete cycle is graceful termination. If we put the
> member into "obsolete member map", it will not be terminated gracefully.
>
> The reason why we are moving the member from "active member list" to
> "termination pending member map" is to avoid event lost. We have had
> situations where some event is lost in the above cycle. These events are
> published only once. If we lost one event in this cycle, that member will
> not be terminated forever. That is why we are putting the member in the
> map. In every cluster monitor interval, we are taking all the members in
> the "termination pending member map" and send the instance clean up event.
> This will overcome event lost.
>
> 30 min is the upper limit, maximum time a member can resides in
> "termination pending member map". You have faced the edge scenario, where
> AS didn't receive the member ready to shutdown event. So AS took 30 min to
> move the member to obsolete list.
>
>>
>> Also, is terminationPendingMemberExpiryTime parameter configurable?
>> (seems not) , and any reason for it to set to 30 minutes?
>>
>
> This is not configurable yet. But other member list/map expiry times are
> configurable AFAIR.
>
>>
>> Further, we should make sleep times of  PendingMemberWatcher,
>> ObsoletedMemberWatcher and TerminationPendingMemberWatcher configurable.
>> WDYT?
>>
>
> Yes we have to.
>
>
>>
>> We need to document those configurable parameters as well, @Mari please
>> note.
>>
>>
>> Thanks,
>> Sajith
>>
>>
>>
>
>
> --
> Rajkumar Rajaratnam
> Committer & PMC Member, Apache Stratos
> Software Engineer, WSO2
>
> Mobile : +94777568639
> Blog : rajkumarr.com
>

Re: Member termination took 30 minutes

Posted by Rajkumar Rajaratnam <ra...@wso2.com>.

Hi Sajith,

Please find my comments inline.

On Thu, Feb 12, 2015 at 1:06 AM, Sajith Kariyawasam <sa...@wso2.com> wrote:

> Hi Devs,
>
> While testing group scaling, I noticed when scaling down it takes 30
> minutes from the moment scaling rule decides to terminate an instance.
>
> An active member, which was selected by the rule, first moves to a
> "termination pending member map", and after a certain period
> (terminationPendingMemberExpiryTime) that member
> moves to an "obsolete member map". Then by the obsolete check rule, that
> member will be terminated via cloud controller.
>
> It seems because of the property  terminationPendingMemberExpiryTime,
> default value of which is 30 minutes, this takes that amount of time to get
> terminated
>
> Sorry for asking, I might have missed some past discussions regarding
> this, could someone explain the purpose of moving the member to an
> intermediary map "termination pending member map", rather than moving
> directly to "obsolete member map"?
>

The reason is to avoid event lost and graceful termination. Let me explain
the logic.

   - When scaling down, AS will move the member from "active member list"
   to "termination pending member map".
   - There is a drool-rule "Cleanup Instances which are pending
   termination" which will run periodically and take all the members which are
   in "termination pending member map" and publish instance clean up event.
   - When CA receives instance clean up event, it will publish instance
   ready to shutdown.
   - When CC receives instance ready to shutdown event, it will publish
   member ready to shutdown.
   - When AS receives member ready to shutdown event, it will move the
   member from  "termination pending member map" to "obsolete member map".
   - Hence, until AS receives member ready to shutdown event, it will keep
   publishing instance clean up event in every cluster monitor interval (drool
   is running)
   - If AS is not receiving member ready to shutdown event for a member in
   "termination pending member map" within 30 min (upper limit), this member
   will be moved to obsolete list without waiting for the member ready to
   shutdown event.

The reason for this complete cycle is graceful termination. If we put the
member into "obsolete member map", it will not be terminated gracefully.

The reason why we are moving the member from "active member list" to
"termination pending member map" is to avoid event lost. We have had
situations where some event is lost in the above cycle. These events are
published only once. If we lost one event in this cycle, that member will
not be terminated forever. That is why we are putting the member in the
map. In every cluster monitor interval, we are taking all the members in
the "termination pending member map" and send the instance clean up event.
This will overcome event lost.

30 min is the upper limit, maximum time a member can resides in
"termination pending member map". You have faced the edge scenario, where
AS didn't receive the member ready to shutdown event. So AS took 30 min to
move the member to obsolete list.

>
> Also, is terminationPendingMemberExpiryTime parameter configurable?
> (seems not) , and any reason for it to set to 30 minutes?
>

This is not configurable yet. But other member list/map expiry times are
configurable AFAIR.

>
> Further, we should make sleep times of  PendingMemberWatcher,
> ObsoletedMemberWatcher and TerminationPendingMemberWatcher configurable.
> WDYT?
>

Yes we have to.

>
> We need to document those configurable parameters as well, @Mari please
> note.
>
>
> Thanks,
> Sajith
>
>
>

-- 
Rajkumar Rajaratnam
Committer & PMC Member, Apache Stratos
Software Engineer, WSO2

Mobile : +94777568639
Blog : rajkumarr.com