You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@helix.apache.org by Hang Qi <ha...@gmail.com> on 2015/05/17 01:48:58 UTC

Message throttling of controller behavior unexpectedly when there are multiple constraints

Hi folks,

We found a very strange behavior on message throttling of controller when
there is multiple constraints. Here is our setup ( we are using
helix-0.6.4, only one resource )

   - constraint 1: per node constraint, we only allow 3 state transitions
   happens on one node concurrently.
   - constraint 2: per partition constraint, we define the state transition
   priorities in the state model, and only allow one state transition happens
   on one single partition concurrently.

We are using MasterSlave state model, suppose we have two nodes A, B, each
has 8 partitions (p0-p7) respectively, and initially both A and B are
shutdown, and now we start them at the same time (say A is slightly earlier
than B).

The expected behavior might be

   1. p0, p1, p2 on A starts from Offline -> Slave; p3, p4, p5 on B starts
   from Offline -> Slave

But the real result is:

   1. p0, p1, p2 on A starts from Offline -> Slave, nothing happens on B
   2. until p0, p1, p2 all transited to Master state, p3, p4, p5 on A
   starts from Offline -> Slave; p0, p1, p2 on B starts from Offline -> Slave

As step Offline -> Slave might take long time, this behavior result in very
long time to bring up these two nodes (long down time result in long catch
up time as well), though ideally we should not let both nodes down at the
same time.

Looked at the controller code, the stage and pipeline based implementation
is well design, very easy to understand and to reason about.

The logic of MessageThrottleStage#throttle,


   1. it goes through each messages selected by MessageSelectionStage,
   2. for each message, it goes through all selected matched constraints,
   and decrease the quota of each constraints
   1. if any constraint's quota is less than 0, this message will be marked
      as throttled.

I think there is something wrong here, the message will take the quota of
constraints even it is not going to be sent out (throttled). That explains
our case,

   - all the messages have been generated by the beginning, (p0, A,
   Offline->Slave), ... (p7, A, Offline->Slave), (p0, B, Offline->Slave), ...,
   (p7, B, Offline->Slave)
   - in the messageThrottleStage#throttle
      - (p0, A, Offline->Slave), (p1, A, Offline->Slave), (p2, A,
      Offline->Slave) are good, and constraint 1 on A reaches 0,
constraint 2 on
      p0, p1, p2 reaches 0 as well
      - (p3, A, Offline->Slave), ... (p7, A, Offline->Slave) throttled by
      constraint 1 on A, also takes the quota of constraint 2 on those
partitions
      as well.
      - (p0, B, Offline->Slave), ... (p7, B, Offline->Slave) throttled by
      constraint 2
      - thus only (p0, A, Offline->Slave), (p1, A, Oflline->Slave), (p2, A,
      Offline->Slave) has been sent out by controller.

Does that make sense, or is there anything else you can think of to result
in this unexpected behavior? And is there any work around for it? One thing
comes into my mind is update constraint 2 to be only one state transition
is allowed of single partition on certain state transitions.

Thanks very much.

Thanks
Hang Qi

Re: Message throttling of controller behavior unexpectedly when there are multiple constraints

Posted by kishore g <g....@gmail.com>.

Yes, we can definitely help in reviewing the patch.

thanks,
Kishore G

On Sun, May 17, 2015 at 9:46 PM, Hang Qi <ha...@gmail.com> wrote:

> Hi Kishore,
>
> Thanks, should I go ahead to create a JIRA issue, and to add a test case
> and propose a patch for the fix?
>
> Thanks
> Hang Qi
>
> On Sun, May 17, 2015 at 6:19 AM, kishore g <g....@gmail.com> wrote:
>
>> Got it, that should be fixed. Would be great to get a patch to fix it.
>> Good find.
>>
>> Thanks
>> Kishore G
>> On May 16, 2015 11:50 PM, "Hang Qi" <ha...@gmail.com> wrote:
>>
>>> Hi Kishore,
>>>
>>> Thanks for your reply.
>>>
>>> I am not saying I want Offline->Slave higher priority than
>>> Slave->Master. I agree with you, one master is more important than two
>>> slaves, and that one only applies to one partition. What I am saying is
>>> during p0, p1, p2 Offline->Slave transition on node A, I also want p3, p4,
>>> p5 performing Offline->Slave transition on node B at the same time, but not
>>> wait until p0, p1, p2 becomes Master on node A, there begins to have
>>> partition transition on node B, that's kind of waste here.
>>>
>>> The reason to have one transition per partition at a time is summarized
>>> in following thread.
>>>
>>> http://mail-archives.apache.org/mod_mbox/helix-user/201503.mbox/%3CCAJ2%3DoXxBWF1VoCm%3DjjyhuFCWHuxw3wYPotGz8VRkEnzVhrmgwQ%40mail.gmail.com%3E
>>>
>>> Thanks
>>> Hang Qi
>>>
>>> On Sat, May 16, 2015 at 8:23 PM, kishore g <g....@gmail.com> wrote:
>>>
>>>> Thanks Hang for the detailed explanation.
>>>>
>>>> Before the MessageSelectionStage, there is a stage that orders the
>>>> messages according to the state transition priority list. I think
>>>> Slave-Master is always higher priority than offline-slave which makes sense
>>>> because in general having a master is probably more important than two
>>>> slaves.
>>>>
>>>> Can you provide the state transition priority list in your state model
>>>> definition. If you think that its important to get node B to Slave state
>>>> before promoting node A from Slave to Master, you can change the priority
>>>> order. Note: this can be changed dynamically and does not require re
>>>> starting the servers.
>>>>
>>>> Another question is what is the reason to have constraint #2 i.e only
>>>> one transition per partition at a time.
>>>>
>>>> thanks,
>>>> Kishore G
>>>>
>>>>
>>>>
>>>> On Sat, May 16, 2015 at 4:48 PM, Hang Qi <ha...@gmail.com> wrote:
>>>>
>>>>> Hi folks,
>>>>>
>>>>> We found a very strange behavior on message throttling of controller
>>>>> when there is multiple constraints. Here is our setup ( we are using
>>>>> helix-0.6.4, only one resource )
>>>>>
>>>>>    - constraint 1: per node constraint, we only allow 3 state
>>>>>    transitions happens on one node concurrently.
>>>>>    - constraint 2: per partition constraint, we define the state
>>>>>    transition priorities in the state model, and only allow one state
>>>>>    transition happens on one single partition concurrently.
>>>>>
>>>>> We are using MasterSlave state model, suppose we have two nodes A, B,
>>>>> each has 8 partitions (p0-p7) respectively, and initially both A and B are
>>>>> shutdown, and now we start them at the same time (say A is slightly earlier
>>>>> than B).
>>>>>
>>>>> The expected behavior might be
>>>>>
>>>>>    1. p0, p1, p2 on A starts from Offline -> Slave; p3, p4, p5 on B
>>>>>    starts from Offline -> Slave
>>>>>
>>>>> But the real result is:
>>>>>
>>>>>    1. p0, p1, p2 on A starts from Offline -> Slave, nothing happens
>>>>>    on B
>>>>>    2. until p0, p1, p2 all transited to Master state, p3, p4, p5 on A
>>>>>    starts from Offline -> Slave; p0, p1, p2 on B starts from Offline -> Slave
>>>>>
>>>>> As step Offline -> Slave might take long time, this behavior result in
>>>>> very long time to bring up these two nodes (long down time result in long
>>>>> catch up time as well), though ideally we should not let both nodes down at
>>>>> the same time.
>>>>>
>>>>> Looked at the controller code, the stage and pipeline based
>>>>> implementation is well design, very easy to understand and to reason about.
>>>>>
>>>>> The logic of MessageThrottleStage#throttle,
>>>>>
>>>>>
>>>>>    1. it goes through each messages selected by
>>>>>    MessageSelectionStage,
>>>>>    2. for each message, it goes through all selected matched
>>>>>    constraints, and decrease the quota of each constraints
>>>>>    1. if any constraint's quota is less than 0, this message will be
>>>>>       marked as throttled.
>>>>>
>>>>> I think there is something wrong here, the message will take the quota
>>>>> of constraints even it is not going to be sent out (throttled). That
>>>>> explains our case,
>>>>>
>>>>>    - all the messages have been generated by the beginning, (p0, A,
>>>>>    Offline->Slave), ... (p7, A, Offline->Slave), (p0, B, Offline->Slave), ...,
>>>>>    (p7, B, Offline->Slave)
>>>>>    - in the messageThrottleStage#throttle
>>>>>       - (p0, A, Offline->Slave), (p1, A, Offline->Slave), (p2, A,
>>>>>       Offline->Slave) are good, and constraint 1 on A reaches 0, constraint 2 on
>>>>>       p0, p1, p2 reaches 0 as well
>>>>>       - (p3, A, Offline->Slave), ... (p7, A, Offline->Slave)
>>>>>       throttled by constraint 1 on A, also takes the quota of constraint 2 on
>>>>>       those partitions as well.
>>>>>       - (p0, B, Offline->Slave), ... (p7, B, Offline->Slave)
>>>>>       throttled by constraint 2
>>>>>       - thus only (p0, A, Offline->Slave), (p1, A, Oflline->Slave),
>>>>>       (p2, A, Offline->Slave) has been sent out by controller.
>>>>>
>>>>> Does that make sense, or is there anything else you can think of to
>>>>> result in this unexpected behavior? And is there any work around for it?
>>>>> One thing comes into my mind is update constraint 2 to be only one state
>>>>> transition is allowed of single partition on certain state transitions.
>>>>>
>>>>> Thanks very much.
>>>>>
>>>>> Thanks
>>>>> Hang Qi
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Qi hang
>>>
>>
>
>
> --
> Qi hang
>

Re: Message throttling of controller behavior unexpectedly when there are multiple constraints

Posted by Hang Qi <ha...@gmail.com>.

Hi Kishore,

Thanks, should I go ahead to create a JIRA issue, and to add a test case
and propose a patch for the fix?

Thanks
Hang Qi

On Sun, May 17, 2015 at 6:19 AM, kishore g <g....@gmail.com> wrote:

> Got it, that should be fixed. Would be great to get a patch to fix it.
> Good find.
>
> Thanks
> Kishore G
> On May 16, 2015 11:50 PM, "Hang Qi" <ha...@gmail.com> wrote:
>
>> Hi Kishore,
>>
>> Thanks for your reply.
>>
>> I am not saying I want Offline->Slave higher priority than Slave->Master.
>> I agree with you, one master is more important than two slaves, and that
>> one only applies to one partition. What I am saying is during p0, p1, p2
>> Offline->Slave transition on node A, I also want p3, p4, p5 performing
>> Offline->Slave transition on node B at the same time, but not wait until
>> p0, p1, p2 becomes Master on node A, there begins to have partition
>> transition on node B, that's kind of waste here.
>>
>> The reason to have one transition per partition at a time is summarized
>> in following thread.
>>
>> http://mail-archives.apache.org/mod_mbox/helix-user/201503.mbox/%3CCAJ2%3DoXxBWF1VoCm%3DjjyhuFCWHuxw3wYPotGz8VRkEnzVhrmgwQ%40mail.gmail.com%3E
>>
>> Thanks
>> Hang Qi
>>
>> On Sat, May 16, 2015 at 8:23 PM, kishore g <g....@gmail.com> wrote:
>>
>>> Thanks Hang for the detailed explanation.
>>>
>>> Before the MessageSelectionStage, there is a stage that orders the
>>> messages according to the state transition priority list. I think
>>> Slave-Master is always higher priority than offline-slave which makes sense
>>> because in general having a master is probably more important than two
>>> slaves.
>>>
>>> Can you provide the state transition priority list in your state model
>>> definition. If you think that its important to get node B to Slave state
>>> before promoting node A from Slave to Master, you can change the priority
>>> order. Note: this can be changed dynamically and does not require re
>>> starting the servers.
>>>
>>> Another question is what is the reason to have constraint #2 i.e only
>>> one transition per partition at a time.
>>>
>>> thanks,
>>> Kishore G
>>>
>>>
>>>
>>> On Sat, May 16, 2015 at 4:48 PM, Hang Qi <ha...@gmail.com> wrote:
>>>
>>>> Hi folks,
>>>>
>>>> We found a very strange behavior on message throttling of controller
>>>> when there is multiple constraints. Here is our setup ( we are using
>>>> helix-0.6.4, only one resource )
>>>>
>>>>    - constraint 1: per node constraint, we only allow 3 state
>>>>    transitions happens on one node concurrently.
>>>>    - constraint 2: per partition constraint, we define the state
>>>>    transition priorities in the state model, and only allow one state
>>>>    transition happens on one single partition concurrently.
>>>>
>>>> We are using MasterSlave state model, suppose we have two nodes A, B,
>>>> each has 8 partitions (p0-p7) respectively, and initially both A and B are
>>>> shutdown, and now we start them at the same time (say A is slightly earlier
>>>> than B).
>>>>
>>>> The expected behavior might be
>>>>
>>>>    1. p0, p1, p2 on A starts from Offline -> Slave; p3, p4, p5 on B
>>>>    starts from Offline -> Slave
>>>>
>>>> But the real result is:
>>>>
>>>>    1. p0, p1, p2 on A starts from Offline -> Slave, nothing happens on
>>>>    B
>>>>    2. until p0, p1, p2 all transited to Master state, p3, p4, p5 on A
>>>>    starts from Offline -> Slave; p0, p1, p2 on B starts from Offline -> Slave
>>>>
>>>> As step Offline -> Slave might take long time, this behavior result in
>>>> very long time to bring up these two nodes (long down time result in long
>>>> catch up time as well), though ideally we should not let both nodes down at
>>>> the same time.
>>>>
>>>> Looked at the controller code, the stage and pipeline based
>>>> implementation is well design, very easy to understand and to reason about.
>>>>
>>>> The logic of MessageThrottleStage#throttle,
>>>>
>>>>
>>>>    1. it goes through each messages selected by MessageSelectionStage,
>>>>    2. for each message, it goes through all selected matched
>>>>    constraints, and decrease the quota of each constraints
>>>>    1. if any constraint's quota is less than 0, this message will be
>>>>       marked as throttled.
>>>>
>>>> I think there is something wrong here, the message will take the quota
>>>> of constraints even it is not going to be sent out (throttled). That
>>>> explains our case,
>>>>
>>>>    - all the messages have been generated by the beginning, (p0, A,
>>>>    Offline->Slave), ... (p7, A, Offline->Slave), (p0, B, Offline->Slave), ...,
>>>>    (p7, B, Offline->Slave)
>>>>    - in the messageThrottleStage#throttle
>>>>       - (p0, A, Offline->Slave), (p1, A, Offline->Slave), (p2, A,
>>>>       Offline->Slave) are good, and constraint 1 on A reaches 0, constraint 2 on
>>>>       p0, p1, p2 reaches 0 as well
>>>>       - (p3, A, Offline->Slave), ... (p7, A, Offline->Slave) throttled
>>>>       by constraint 1 on A, also takes the quota of constraint 2 on those
>>>>       partitions as well.
>>>>       - (p0, B, Offline->Slave), ... (p7, B, Offline->Slave) throttled
>>>>       by constraint 2
>>>>       - thus only (p0, A, Offline->Slave), (p1, A, Oflline->Slave),
>>>>       (p2, A, Offline->Slave) has been sent out by controller.
>>>>
>>>> Does that make sense, or is there anything else you can think of to
>>>> result in this unexpected behavior? And is there any work around for it?
>>>> One thing comes into my mind is update constraint 2 to be only one state
>>>> transition is allowed of single partition on certain state transitions.
>>>>
>>>> Thanks very much.
>>>>
>>>> Thanks
>>>> Hang Qi
>>>>
>>>
>>>
>>
>>
>> --
>> Qi hang
>>
>


-- 
Qi hang

Re: Message throttling of controller behavior unexpectedly when there are multiple constraints

Posted by kishore g <g....@gmail.com>.

Got it, that should be fixed. Would be great to get a patch to fix it. Good
find.

Thanks
Kishore G
On May 16, 2015 11:50 PM, "Hang Qi" <ha...@gmail.com> wrote:

> Hi Kishore,
>
> Thanks for your reply.
>
> I am not saying I want Offline->Slave higher priority than Slave->Master.
> I agree with you, one master is more important than two slaves, and that
> one only applies to one partition. What I am saying is during p0, p1, p2
> Offline->Slave transition on node A, I also want p3, p4, p5 performing
> Offline->Slave transition on node B at the same time, but not wait until
> p0, p1, p2 becomes Master on node A, there begins to have partition
> transition on node B, that's kind of waste here.
>
> The reason to have one transition per partition at a time is summarized in
> following thread.
>
> http://mail-archives.apache.org/mod_mbox/helix-user/201503.mbox/%3CCAJ2%3DoXxBWF1VoCm%3DjjyhuFCWHuxw3wYPotGz8VRkEnzVhrmgwQ%40mail.gmail.com%3E
>
> Thanks
> Hang Qi
>
> On Sat, May 16, 2015 at 8:23 PM, kishore g <g....@gmail.com> wrote:
>
>> Thanks Hang for the detailed explanation.
>>
>> Before the MessageSelectionStage, there is a stage that orders the
>> messages according to the state transition priority list. I think
>> Slave-Master is always higher priority than offline-slave which makes sense
>> because in general having a master is probably more important than two
>> slaves.
>>
>> Can you provide the state transition priority list in your state model
>> definition. If you think that its important to get node B to Slave state
>> before promoting node A from Slave to Master, you can change the priority
>> order. Note: this can be changed dynamically and does not require re
>> starting the servers.
>>
>> Another question is what is the reason to have constraint #2 i.e only one
>> transition per partition at a time.
>>
>> thanks,
>> Kishore G
>>
>>
>>
>> On Sat, May 16, 2015 at 4:48 PM, Hang Qi <ha...@gmail.com> wrote:
>>
>>> Hi folks,
>>>
>>> We found a very strange behavior on message throttling of controller
>>> when there is multiple constraints. Here is our setup ( we are using
>>> helix-0.6.4, only one resource )
>>>
>>>    - constraint 1: per node constraint, we only allow 3 state
>>>    transitions happens on one node concurrently.
>>>    - constraint 2: per partition constraint, we define the state
>>>    transition priorities in the state model, and only allow one state
>>>    transition happens on one single partition concurrently.
>>>
>>> We are using MasterSlave state model, suppose we have two nodes A, B,
>>> each has 8 partitions (p0-p7) respectively, and initially both A and B are
>>> shutdown, and now we start them at the same time (say A is slightly earlier
>>> than B).
>>>
>>> The expected behavior might be
>>>
>>>    1. p0, p1, p2 on A starts from Offline -> Slave; p3, p4, p5 on B
>>>    starts from Offline -> Slave
>>>
>>> But the real result is:
>>>
>>>    1. p0, p1, p2 on A starts from Offline -> Slave, nothing happens on B
>>>    2. until p0, p1, p2 all transited to Master state, p3, p4, p5 on A
>>>    starts from Offline -> Slave; p0, p1, p2 on B starts from Offline -> Slave
>>>
>>> As step Offline -> Slave might take long time, this behavior result in
>>> very long time to bring up these two nodes (long down time result in long
>>> catch up time as well), though ideally we should not let both nodes down at
>>> the same time.
>>>
>>> Looked at the controller code, the stage and pipeline based
>>> implementation is well design, very easy to understand and to reason about.
>>>
>>> The logic of MessageThrottleStage#throttle,
>>>
>>>
>>>    1. it goes through each messages selected by MessageSelectionStage,
>>>    2. for each message, it goes through all selected matched
>>>    constraints, and decrease the quota of each constraints
>>>    1. if any constraint's quota is less than 0, this message will be
>>>       marked as throttled.
>>>
>>> I think there is something wrong here, the message will take the quota
>>> of constraints even it is not going to be sent out (throttled). That
>>> explains our case,
>>>
>>>    - all the messages have been generated by the beginning, (p0, A,
>>>    Offline->Slave), ... (p7, A, Offline->Slave), (p0, B, Offline->Slave), ...,
>>>    (p7, B, Offline->Slave)
>>>    - in the messageThrottleStage#throttle
>>>       - (p0, A, Offline->Slave), (p1, A, Offline->Slave), (p2, A,
>>>       Offline->Slave) are good, and constraint 1 on A reaches 0, constraint 2 on
>>>       p0, p1, p2 reaches 0 as well
>>>       - (p3, A, Offline->Slave), ... (p7, A, Offline->Slave) throttled
>>>       by constraint 1 on A, also takes the quota of constraint 2 on those
>>>       partitions as well.
>>>       - (p0, B, Offline->Slave), ... (p7, B, Offline->Slave) throttled
>>>       by constraint 2
>>>       - thus only (p0, A, Offline->Slave), (p1, A, Oflline->Slave),
>>>       (p2, A, Offline->Slave) has been sent out by controller.
>>>
>>> Does that make sense, or is there anything else you can think of to
>>> result in this unexpected behavior? And is there any work around for it?
>>> One thing comes into my mind is update constraint 2 to be only one state
>>> transition is allowed of single partition on certain state transitions.
>>>
>>> Thanks very much.
>>>
>>> Thanks
>>> Hang Qi
>>>
>>
>>
>
>
> --
> Qi hang
>

Re: Message throttling of controller behavior unexpectedly when there are multiple constraints

Posted by Hang Qi <ha...@gmail.com>.

Hi Kishore,

Thanks for your reply.

I am not saying I want Offline->Slave higher priority than Slave->Master. I
agree with you, one master is more important than two slaves, and that one
only applies to one partition. What I am saying is during p0, p1, p2
Offline->Slave transition on node A, I also want p3, p4, p5 performing
Offline->Slave transition on node B at the same time, but not wait until
p0, p1, p2 becomes Master on node A, there begins to have partition
transition on node B, that's kind of waste here.

The reason to have one transition per partition at a time is summarized in
following thread.
http://mail-archives.apache.org/mod_mbox/helix-user/201503.mbox/%3CCAJ2%3DoXxBWF1VoCm%3DjjyhuFCWHuxw3wYPotGz8VRkEnzVhrmgwQ%40mail.gmail.com%3E

Thanks
Hang Qi

On Sat, May 16, 2015 at 8:23 PM, kishore g <g....@gmail.com> wrote:

> Thanks Hang for the detailed explanation.
>
> Before the MessageSelectionStage, there is a stage that orders the
> messages according to the state transition priority list. I think
> Slave-Master is always higher priority than offline-slave which makes sense
> because in general having a master is probably more important than two
> slaves.
>
> Can you provide the state transition priority list in your state model
> definition. If you think that its important to get node B to Slave state
> before promoting node A from Slave to Master, you can change the priority
> order. Note: this can be changed dynamically and does not require re
> starting the servers.
>
> Another question is what is the reason to have constraint #2 i.e only one
> transition per partition at a time.
>
> thanks,
> Kishore G
>
>
>
> On Sat, May 16, 2015 at 4:48 PM, Hang Qi <ha...@gmail.com> wrote:
>
>> Hi folks,
>>
>> We found a very strange behavior on message throttling of controller when
>> there is multiple constraints. Here is our setup ( we are using
>> helix-0.6.4, only one resource )
>>
>>    - constraint 1: per node constraint, we only allow 3 state
>>    transitions happens on one node concurrently.
>>    - constraint 2: per partition constraint, we define the state
>>    transition priorities in the state model, and only allow one state
>>    transition happens on one single partition concurrently.
>>
>> We are using MasterSlave state model, suppose we have two nodes A, B,
>> each has 8 partitions (p0-p7) respectively, and initially both A and B are
>> shutdown, and now we start them at the same time (say A is slightly earlier
>> than B).
>>
>> The expected behavior might be
>>
>>    1. p0, p1, p2 on A starts from Offline -> Slave; p3, p4, p5 on B
>>    starts from Offline -> Slave
>>
>> But the real result is:
>>
>>    1. p0, p1, p2 on A starts from Offline -> Slave, nothing happens on B
>>    2. until p0, p1, p2 all transited to Master state, p3, p4, p5 on A
>>    starts from Offline -> Slave; p0, p1, p2 on B starts from Offline -> Slave
>>
>> As step Offline -> Slave might take long time, this behavior result in
>> very long time to bring up these two nodes (long down time result in long
>> catch up time as well), though ideally we should not let both nodes down at
>> the same time.
>>
>> Looked at the controller code, the stage and pipeline based
>> implementation is well design, very easy to understand and to reason about.
>>
>> The logic of MessageThrottleStage#throttle,
>>
>>
>>    1. it goes through each messages selected by MessageSelectionStage,
>>    2. for each message, it goes through all selected matched
>>    constraints, and decrease the quota of each constraints
>>    1. if any constraint's quota is less than 0, this message will be
>>       marked as throttled.
>>
>> I think there is something wrong here, the message will take the quota of
>> constraints even it is not going to be sent out (throttled). That explains
>> our case,
>>
>>    - all the messages have been generated by the beginning, (p0, A,
>>    Offline->Slave), ... (p7, A, Offline->Slave), (p0, B, Offline->Slave), ...,
>>    (p7, B, Offline->Slave)
>>    - in the messageThrottleStage#throttle
>>       - (p0, A, Offline->Slave), (p1, A, Offline->Slave), (p2, A,
>>       Offline->Slave) are good, and constraint 1 on A reaches 0, constraint 2 on
>>       p0, p1, p2 reaches 0 as well
>>       - (p3, A, Offline->Slave), ... (p7, A, Offline->Slave) throttled
>>       by constraint 1 on A, also takes the quota of constraint 2 on those
>>       partitions as well.
>>       - (p0, B, Offline->Slave), ... (p7, B, Offline->Slave) throttled
>>       by constraint 2
>>       - thus only (p0, A, Offline->Slave), (p1, A, Oflline->Slave), (p2,
>>       A, Offline->Slave) has been sent out by controller.
>>
>> Does that make sense, or is there anything else you can think of to
>> result in this unexpected behavior? And is there any work around for it?
>> One thing comes into my mind is update constraint 2 to be only one state
>> transition is allowed of single partition on certain state transitions.
>>
>> Thanks very much.
>>
>> Thanks
>> Hang Qi
>>
>
>


-- 
Qi hang

Re: Message throttling of controller behavior unexpectedly when there are multiple constraints

Posted by kishore g <g....@gmail.com>.

Thanks Hang for the detailed explanation.

Before the MessageSelectionStage, there is a stage that orders the messages
according to the state transition priority list. I think Slave-Master is
always higher priority than offline-slave which makes sense because in
general having a master is probably more important than two slaves.

Can you provide the state transition priority list in your state model
definition. If you think that its important to get node B to Slave state
before promoting node A from Slave to Master, you can change the priority
order. Note: this can be changed dynamically and does not require re
starting the servers.

Another question is what is the reason to have constraint #2 i.e only one
transition per partition at a time.

thanks,
Kishore G



On Sat, May 16, 2015 at 4:48 PM, Hang Qi <ha...@gmail.com> wrote:

> Hi folks,
>
> We found a very strange behavior on message throttling of controller when
> there is multiple constraints. Here is our setup ( we are using
> helix-0.6.4, only one resource )
>
>    - constraint 1: per node constraint, we only allow 3 state transitions
>    happens on one node concurrently.
>    - constraint 2: per partition constraint, we define the state
>    transition priorities in the state model, and only allow one state
>    transition happens on one single partition concurrently.
>
> We are using MasterSlave state model, suppose we have two nodes A, B, each
> has 8 partitions (p0-p7) respectively, and initially both A and B are
> shutdown, and now we start them at the same time (say A is slightly earlier
> than B).
>
> The expected behavior might be
>
>    1. p0, p1, p2 on A starts from Offline -> Slave; p3, p4, p5 on B
>    starts from Offline -> Slave
>
> But the real result is:
>
>    1. p0, p1, p2 on A starts from Offline -> Slave, nothing happens on B
>    2. until p0, p1, p2 all transited to Master state, p3, p4, p5 on A
>    starts from Offline -> Slave; p0, p1, p2 on B starts from Offline -> Slave
>
> As step Offline -> Slave might take long time, this behavior result in
> very long time to bring up these two nodes (long down time result in long
> catch up time as well), though ideally we should not let both nodes down at
> the same time.
>
> Looked at the controller code, the stage and pipeline based implementation
> is well design, very easy to understand and to reason about.
>
> The logic of MessageThrottleStage#throttle,
>
>
>    1. it goes through each messages selected by MessageSelectionStage,
>    2. for each message, it goes through all selected matched constraints,
>    and decrease the quota of each constraints
>    1. if any constraint's quota is less than 0, this message will be
>       marked as throttled.
>
> I think there is something wrong here, the message will take the quota of
> constraints even it is not going to be sent out (throttled). That explains
> our case,
>
>    - all the messages have been generated by the beginning, (p0, A,
>    Offline->Slave), ... (p7, A, Offline->Slave), (p0, B, Offline->Slave), ...,
>    (p7, B, Offline->Slave)
>    - in the messageThrottleStage#throttle
>       - (p0, A, Offline->Slave), (p1, A, Offline->Slave), (p2, A,
>       Offline->Slave) are good, and constraint 1 on A reaches 0, constraint 2 on
>       p0, p1, p2 reaches 0 as well
>       - (p3, A, Offline->Slave), ... (p7, A, Offline->Slave) throttled by
>       constraint 1 on A, also takes the quota of constraint 2 on those partitions
>       as well.
>       - (p0, B, Offline->Slave), ... (p7, B, Offline->Slave) throttled by
>       constraint 2
>       - thus only (p0, A, Offline->Slave), (p1, A, Oflline->Slave), (p2,
>       A, Offline->Slave) has been sent out by controller.
>
> Does that make sense, or is there anything else you can think of to result
> in this unexpected behavior? And is there any work around for it? One thing
> comes into my mind is update constraint 2 to be only one state transition
> is allowed of single partition on certain state transitions.
>
> Thanks very much.
>
> Thanks
> Hang Qi
>