You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@helix.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2015/05/21 08:19:00 UTC

[jira] [Commented] (HELIX-596) Message throttling of controller behavior unexpectedly, throttled messages still take the constraint quota

    [ https://issues.apache.org/jira/browse/HELIX-596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14553697#comment-14553697 ] 

ASF GitHub Bot commented on HELIX-596:
--------------------------------------

GitHub user hangqi opened a pull request:

    https://github.com/apache/helix/pull/28

    [HELIX-596] fix throttled messages still take constraints' quota

    Corresponding review request:  https://reviews.apache.org/r/34345/
    
    Main changes in this pull request:
    
    perMessageThrottleQuotaMap records all matched constraints quota for this message, and update the overall throttleMap iff the message has not been throttled. Originally not matter the message will be sent out or not, it will always take the quota of the matched constraints.
    
    @zzhang5 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/hangqi/helix fix_constrain_quota

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/helix/pull/28.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #28
    
----
commit 9ddbefcacff6b8e229e6413299d53d89f1cbcd43
Author: Hang Qi <ha...@gmail.com>
Date:   2015-05-18T06:06:20Z

    [HELIX-596] fix throttled messages still take constraints' quota

----


> Message throttling of controller behavior unexpectedly, throttled messages still take the constraint quota
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: HELIX-596
>                 URL: https://issues.apache.org/jira/browse/HELIX-596
>             Project: Apache Helix
>          Issue Type: Bug
>          Components: helix-core
>    Affects Versions: 0.6.4
>            Reporter: Hang Qi
>             Fix For: master
>
>
> We found a very strange behavior on message throttling of controller when there is multiple constraints. Here is our setup ( we are using helix-0.6.4, only one resource )
>   - constraint 1: per node constraint, we only allow 3 state transitions happens on one node concurrently.
>   - constraint 2: per partition constraint, we define the state transition priorities in the state model, and only allow one state transition happens on one single partition concurrently.
> We are using MasterSlave state model, suppose we have two nodes A, B, each has 8 partitions (p0-p7) respectively, and initially both A and B are shutdown, and now we start them at the same time (say A is slightly earlier than B).
> The expected behavior might be
>   - p0, p1, p2 on A starts from Offline -> Slave; p3, p4, p5 on B starts from Offline -> Slave
> But the real result is:
>   - p0, p1, p2 on A starts from Offline -> Slave, nothing happens on B
>   - until p0, p1, p2 all transited to Master state, p3, p4, p5 on A starts from Offline -> Slave; p0, p1, p2 on B starts from Offline -> Slave
> As step Offline -> Slave might take long time, this behavior result in very long time to bring up these two nodes (long down time result in long catch up time as well), though ideally we should not let both nodes down at the same time.
> Looked at the controller code, I like the stage and pipeline based implementation, it is well design, very easy to understand and to reason about.
> The logic of MessageThrottleStage#throttle, 
>   - it goes through each messages selected by MessageSelectionStage, 
>   - for each message, it goes through all selected matched constraints, and decrease the quota of each constraints
>      - if any constraint's quota is less than 0, this message will be marked as throttled.
>  
> I think there is something wrong here, the message will take the quota of constraints even it is not going to be sent out (throttled). That explains our case, 
>   - all the messages have been generated by the beginning, (p0, A, Offline->Slave), ... (p7, A, Offline->Slave), (p0, B, Offline->Slave), ..., (p7, B, Offline->Slave)
>   - in the messageThrottleStage#throttle
>     - (p0, A, Offline->Slave), (p1, A, Offline->Slave), (p2, A, Offline->Slave) are good, and constraint 1 on A reaches 0, constraint 2 on p0, p1, p2 reaches 0 as well
>     - (p3, A, Offline->Slave), ... (p7, A, Offline->Slave) throttled by constraint 1 on A, also takes the quota of constraint 2 on those partitions as well.
>     - (p0, B, Offline->Slave), ... (p7, B, Offline->Slave) throttled by constraint 2
>     - thus only (p0, A, Offline->Slave), (p1, A, Oflline->Slave), (p2, A, Offline->Slave) has been sent out by controller.
> Does that make sense, or is there anything else you can think of to result in this unexpected behavior? And is there any work around for it? One thing comes into my mind is update constraint 2 to be only one state transition is allowed of single partition on certain state transitions.
> Thanks very much.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)