You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@bookkeeper.apache.org by "Rakesh R (JIRA)" <ji...@apache.org> on 2012/10/05 08:26:47 UTC

[jira] [Created] (BOOKKEEPER-420) Lock does not guarantee any access order and not giving chance to longest-waiting RW

Rakesh R created BOOKKEEPER-420:
-----------------------------------

             Summary: Lock does not guarantee any access order and not giving chance to longest-waiting RW
                 Key: BOOKKEEPER-420
                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-420
             Project: Bookkeeper
          Issue Type: Sub-task
          Components: bookkeeper-auto-recovery
            Reporter: Rakesh R
             Fix For: 4.2.0


Improve the distributed lock by giving fair chance to all the RW. Presently few RW can again and again acquire lock and pushing other RW away from rereplication.

+Example:+
Have five RWs...RW1, RW2, RW3, RW4, RW5. 

Say L0000000004 is underreplicated and RW1 acquired lock. Meantime all others will add watcher to this lock. After replication assume RW2 acquired lock and all others(including RW1) will be adding watcher. Here after RW2 releases, again RW1 can be more aggressive and acquire the lock. This will push others to starvation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (BOOKKEEPER-420) Lock does not guarantee any access order and not giving chance to longest-waiting RW

Posted by "Ivan Kelly (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13470189#comment-13470189 ] 

Ivan Kelly commented on BOOKKEEPER-420:
---------------------------------------

Has this presented a problem in your testing? I would be wary of adding more complexity if the problem is minor. Also, I think your characterisation of the problem assumes an ordering of operations across all nodes which doesn't exist (i.e. they run in parallel).
                
> Lock does not guarantee any access order and not giving chance to longest-waiting RW
> ------------------------------------------------------------------------------------
>
>                 Key: BOOKKEEPER-420
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-420
>             Project: Bookkeeper
>          Issue Type: Sub-task
>          Components: bookkeeper-auto-recovery
>            Reporter: Rakesh R
>            Assignee: Rakesh R
>             Fix For: 4.2.0
>
>
> Improve the distributed lock by giving fair chance to all the RW. Presently few RW can again and again acquire lock and pushing other RW away from rereplication.
> +Example:+
> Have five RWs...RW1, RW2, RW3, RW4, RW5. 
> Say L0000000004 is underreplicated and RW1 acquired lock. Meantime all others will add watcher to this lock. After replication assume RW2 acquired lock and all others(including RW1) will be adding watcher. Here after RW2 releases, again RW1 can be more aggressive and acquire the lock. This will push others to starvation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (BOOKKEEPER-420) Lock does not guarantee any access order and not giving chance to longest-waiting RW

Posted by "Rakesh R (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/BOOKKEEPER-420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rakesh R reassigned BOOKKEEPER-420:
-----------------------------------

    Assignee: Rakesh R
    
> Lock does not guarantee any access order and not giving chance to longest-waiting RW
> ------------------------------------------------------------------------------------
>
>                 Key: BOOKKEEPER-420
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-420
>             Project: Bookkeeper
>          Issue Type: Sub-task
>          Components: bookkeeper-auto-recovery
>            Reporter: Rakesh R
>            Assignee: Rakesh R
>             Fix For: 4.2.0
>
>
> Improve the distributed lock by giving fair chance to all the RW. Presently few RW can again and again acquire lock and pushing other RW away from rereplication.
> +Example:+
> Have five RWs...RW1, RW2, RW3, RW4, RW5. 
> Say L0000000004 is underreplicated and RW1 acquired lock. Meantime all others will add watcher to this lock. After replication assume RW2 acquired lock and all others(including RW1) will be adding watcher. Here after RW2 releases, again RW1 can be more aggressive and acquire the lock. This will push others to starvation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (BOOKKEEPER-420) Lock does not guarantee any access order and not giving chance to longest-waiting RW

Posted by "Ivan Kelly (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13507911#comment-13507911 ] 

Ivan Kelly commented on BOOKKEEPER-420:
---------------------------------------

[~rakeshr] Can I resolve this issue as a "Won't Fix", or do you think this is still something we need to address?
                
> Lock does not guarantee any access order and not giving chance to longest-waiting RW
> ------------------------------------------------------------------------------------
>
>                 Key: BOOKKEEPER-420
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-420
>             Project: Bookkeeper
>          Issue Type: Sub-task
>          Components: bookkeeper-auto-recovery
>            Reporter: Rakesh R
>            Assignee: Rakesh R
>             Fix For: 4.2.0
>
>
> Improve the distributed lock by giving fair chance to all the RW. Presently few RW can again and again acquire lock and pushing other RW away from rereplication.
> +Example:+
> Have five RWs...RW1, RW2, RW3, RW4, RW5. 
> Say L0000000004 is underreplicated and RW1 acquired lock. Meantime all others will add watcher to this lock. After replication assume RW2 acquired lock and all others(including RW1) will be adding watcher. Here after RW2 releases, again RW1 can be more aggressive and acquire the lock. This will push others to starvation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (BOOKKEEPER-420) Lock does not guarantee any access order and not giving chance to longest-waiting RW

Posted by "Rakesh R (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13470054#comment-13470054 ] 

Rakesh R commented on BOOKKEEPER-420:
-------------------------------------

Hi,

Just few thoughts to begin discussion:-

I'm thinking to queue-up the locks using ephemeral sequence znode to give fair chance to all the RWs like below. Here each guy will add watcher to his predecessor znode instead of 'urL0000000004' and act on znode deletion.

/ledgers/underreplication/locks/urL0000000004/L_0001, L_0002, L_0003, L0004, L0005. Always the lowest entity will get the lock and continue to rereplicate.
                
> Lock does not guarantee any access order and not giving chance to longest-waiting RW
> ------------------------------------------------------------------------------------
>
>                 Key: BOOKKEEPER-420
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-420
>             Project: Bookkeeper
>          Issue Type: Sub-task
>          Components: bookkeeper-auto-recovery
>            Reporter: Rakesh R
>            Assignee: Rakesh R
>             Fix For: 4.2.0
>
>
> Improve the distributed lock by giving fair chance to all the RW. Presently few RW can again and again acquire lock and pushing other RW away from rereplication.
> +Example:+
> Have five RWs...RW1, RW2, RW3, RW4, RW5. 
> Say L0000000004 is underreplicated and RW1 acquired lock. Meantime all others will add watcher to this lock. After replication assume RW2 acquired lock and all others(including RW1) will be adding watcher. Here after RW2 releases, again RW1 can be more aggressive and acquire the lock. This will push others to starvation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (BOOKKEEPER-420) Lock does not guarantee any access order and not giving chance to longest-waiting RW

Posted by "Rakesh R (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508177#comment-13508177 ] 

Rakesh R commented on BOOKKEEPER-420:
-------------------------------------

Yeah Ivan, I agree with you. Will see and address later(if its really affecting any flows).
                
> Lock does not guarantee any access order and not giving chance to longest-waiting RW
> ------------------------------------------------------------------------------------
>
>                 Key: BOOKKEEPER-420
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-420
>             Project: Bookkeeper
>          Issue Type: Sub-task
>          Components: bookkeeper-auto-recovery
>            Reporter: Rakesh R
>            Assignee: Rakesh R
>             Fix For: 4.2.0
>
>
> Improve the distributed lock by giving fair chance to all the RW. Presently few RW can again and again acquire lock and pushing other RW away from rereplication.
> +Example:+
> Have five RWs...RW1, RW2, RW3, RW4, RW5. 
> Say L0000000004 is underreplicated and RW1 acquired lock. Meantime all others will add watcher to this lock. After replication assume RW2 acquired lock and all others(including RW1) will be adding watcher. Here after RW2 releases, again RW1 can be more aggressive and acquire the lock. This will push others to starvation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (BOOKKEEPER-420) Lock does not guarantee any access order and not giving chance to longest-waiting RW

Posted by "Rakesh R (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13470220#comment-13470220 ] 

Rakesh R commented on BOOKKEEPER-420:
-------------------------------------

@Ivan
bq.Has this presented a problem in your testing?

What I have noticed is, bookie which was already acquired lock is again revisiting one or more times unnecessarily and not giving others a chance. I have seen after few cycles other guys are able to acquire lock(in tests haven't seen indefinitely masking other bookies). I just thought of giving fair locking so all will have equal chance in round robin fashion or could think of some other fair approach.

@Uma @Ivan
I also agree, not to make code messy to handle this if everyone feels its Ok.
                
> Lock does not guarantee any access order and not giving chance to longest-waiting RW
> ------------------------------------------------------------------------------------
>
>                 Key: BOOKKEEPER-420
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-420
>             Project: Bookkeeper
>          Issue Type: Sub-task
>          Components: bookkeeper-auto-recovery
>            Reporter: Rakesh R
>            Assignee: Rakesh R
>             Fix For: 4.2.0
>
>
> Improve the distributed lock by giving fair chance to all the RW. Presently few RW can again and again acquire lock and pushing other RW away from rereplication.
> +Example:+
> Have five RWs...RW1, RW2, RW3, RW4, RW5. 
> Say L0000000004 is underreplicated and RW1 acquired lock. Meantime all others will add watcher to this lock. After replication assume RW2 acquired lock and all others(including RW1) will be adding watcher. Here after RW2 releases, again RW1 can be more aggressive and acquire the lock. This will push others to starvation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (BOOKKEEPER-420) Lock does not guarantee any access order and not giving chance to longest-waiting RW

Posted by "Uma Maheswara Rao G (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13470195#comment-13470195 ] 

Uma Maheswara Rao G commented on BOOKKEEPER-420:
------------------------------------------------

@Rakesh, even if we have 2 ledgers, shuffling will help as it may try alternatively and will get chance for both.

Per your proposal:
I am worrying bit, is that, we may add more lock znodes(number of ledgers to replicate* cluster size in worst case) which may be unnecessary when you have good cluster. Because they all will participate equally for getting the work and at the same time shuffeling will help in randomizing the things. So, same Bookies ending up in getting same ledger in a loop will be rare I feel. Window will be little high when you have only one ledger to replicate and eligible Bookies for replication is very very less.
                
> Lock does not guarantee any access order and not giving chance to longest-waiting RW
> ------------------------------------------------------------------------------------
>
>                 Key: BOOKKEEPER-420
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-420
>             Project: Bookkeeper
>          Issue Type: Sub-task
>          Components: bookkeeper-auto-recovery
>            Reporter: Rakesh R
>            Assignee: Rakesh R
>             Fix For: 4.2.0
>
>
> Improve the distributed lock by giving fair chance to all the RW. Presently few RW can again and again acquire lock and pushing other RW away from rereplication.
> +Example:+
> Have five RWs...RW1, RW2, RW3, RW4, RW5. 
> Say L0000000004 is underreplicated and RW1 acquired lock. Meantime all others will add watcher to this lock. After replication assume RW2 acquired lock and all others(including RW1) will be adding watcher. Here after RW2 releases, again RW1 can be more aggressive and acquire the lock. This will push others to starvation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (BOOKKEEPER-420) Lock does not guarantee any access order and not giving chance to longest-waiting RW

Posted by "Rakesh R (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13470096#comment-13470096 ] 

Rakesh R commented on BOOKKEEPER-420:
-------------------------------------

Yeah Uma, shuffling logic will help if have more ledgers. The window gap is narrow and the possibility only with fewer number of ledgers. I'm just trying to have a fair locking mechanism, this would give equal chance to everyone. Does this sound fine?
                
> Lock does not guarantee any access order and not giving chance to longest-waiting RW
> ------------------------------------------------------------------------------------
>
>                 Key: BOOKKEEPER-420
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-420
>             Project: Bookkeeper
>          Issue Type: Sub-task
>          Components: bookkeeper-auto-recovery
>            Reporter: Rakesh R
>            Assignee: Rakesh R
>             Fix For: 4.2.0
>
>
> Improve the distributed lock by giving fair chance to all the RW. Presently few RW can again and again acquire lock and pushing other RW away from rereplication.
> +Example:+
> Have five RWs...RW1, RW2, RW3, RW4, RW5. 
> Say L0000000004 is underreplicated and RW1 acquired lock. Meantime all others will add watcher to this lock. After replication assume RW2 acquired lock and all others(including RW1) will be adding watcher. Here after RW2 releases, again RW1 can be more aggressive and acquire the lock. This will push others to starvation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (BOOKKEEPER-420) Lock does not guarantee any access order and not giving chance to longest-waiting RW

Posted by "Ivan Kelly (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/BOOKKEEPER-420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ivan Kelly resolved BOOKKEEPER-420.
-----------------------------------

    Resolution: Won't Fix
    
> Lock does not guarantee any access order and not giving chance to longest-waiting RW
> ------------------------------------------------------------------------------------
>
>                 Key: BOOKKEEPER-420
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-420
>             Project: Bookkeeper
>          Issue Type: Sub-task
>          Components: bookkeeper-auto-recovery
>            Reporter: Rakesh R
>            Assignee: Rakesh R
>             Fix For: 4.2.0
>
>
> Improve the distributed lock by giving fair chance to all the RW. Presently few RW can again and again acquire lock and pushing other RW away from rereplication.
> +Example:+
> Have five RWs...RW1, RW2, RW3, RW4, RW5. 
> Say L0000000004 is underreplicated and RW1 acquired lock. Meantime all others will add watcher to this lock. After replication assume RW2 acquired lock and all others(including RW1) will be adding watcher. Here after RW2 releases, again RW1 can be more aggressive and acquire the lock. This will push others to starvation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (BOOKKEEPER-420) Lock does not guarantee any access order and not giving chance to longest-waiting RW

Posted by "Uma Maheswara Rao G (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/BOOKKEEPER-420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13470052#comment-13470052 ] 

Uma Maheswara Rao G commented on BOOKKEEPER-420:
------------------------------------------------

As we are shuffeling once we get the childrens, this problem may not come when you have more than one ledger to replicate. There is a chance of this problem rarely when you have only one ledger to replicate and also that eligible target Bookie is slow when compared to others and all other Bookies are not eligible to replicate. 
                
> Lock does not guarantee any access order and not giving chance to longest-waiting RW
> ------------------------------------------------------------------------------------
>
>                 Key: BOOKKEEPER-420
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-420
>             Project: Bookkeeper
>          Issue Type: Sub-task
>          Components: bookkeeper-auto-recovery
>            Reporter: Rakesh R
>            Assignee: Rakesh R
>             Fix For: 4.2.0
>
>
> Improve the distributed lock by giving fair chance to all the RW. Presently few RW can again and again acquire lock and pushing other RW away from rereplication.
> +Example:+
> Have five RWs...RW1, RW2, RW3, RW4, RW5. 
> Say L0000000004 is underreplicated and RW1 acquired lock. Meantime all others will add watcher to this lock. After replication assume RW2 acquired lock and all others(including RW1) will be adding watcher. Here after RW2 releases, again RW1 can be more aggressive and acquire the lock. This will push others to starvation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira