You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hbase.apache.org by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org> on 2011/04/16 00:54:05 UTC

[jira] [Created] (HBASE-3789) Cleanup the locking contention in the master

Cleanup the locking contention in the master
--------------------------------------------

                 Key: HBASE-3789
                 URL: https://issues.apache.org/jira/browse/HBASE-3789
             Project: HBase
          Issue Type: Improvement
    Affects Versions: 0.90.2
            Reporter: Jean-Daniel Cryans
            Priority: Blocker
             Fix For: 0.92.0


The new master uses a lot of synchronized blocks to be safe, but it only takes a few jstacks to see that there's multiple layers of lock contention when a bunch of regions are moving (like when the balancer runs). The main culprits are regionInTransition in AssignmentManager, ZKAssign that uses ZKW.getZNnodes (basically another set of region in transitions), and locking at the RegionState level. 

My understanding is that even tho we have multiple threads to handle regions in transition, everything is actually serialized. Most of the time, lock holders are talking to ZK or a region server, which can take a few milliseconds.

A simple example is when AssignmentManager wants to update the timers for all the regions on a RS, it will usually be waiting on another thread that's holding the lock while talking to ZK.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3789) Cleanup the locking contention in the master

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-3789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13038229#comment-13038229 ] 

stack commented on HBASE-3789:
------------------------------

You should remove rather than comment out code.

There is more to be done on this still, right J-D?

> Cleanup the locking contention in the master
> --------------------------------------------
>
>                 Key: HBASE-3789
>                 URL: https://issues.apache.org/jira/browse/HBASE-3789
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.90.2
>            Reporter: Jean-Daniel Cryans
>            Priority: Blocker
>             Fix For: 0.92.0
>
>         Attachments: HBASE-3789.patch
>
>
> The new master uses a lot of synchronized blocks to be safe, but it only takes a few jstacks to see that there's multiple layers of lock contention when a bunch of regions are moving (like when the balancer runs). The main culprits are regionInTransition in AssignmentManager, ZKAssign that uses ZKW.getZNnodes (basically another set of region in transitions), and locking at the RegionState level. 
> My understanding is that even tho we have multiple threads to handle regions in transition, everything is actually serialized. Most of the time, lock holders are talking to ZK or a region server, which can take a few milliseconds.
> A simple example is when AssignmentManager wants to update the timers for all the regions on a RS, it will usually be waiting on another thread that's holding the lock while talking to ZK.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3789) Cleanup the locking contention in the master

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-3789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jean-Daniel Cryans updated HBASE-3789:
--------------------------------------

    Attachment: HBASE-3789-v2-0.90.patch

New cleaned up patch that does what I listed in the previous comment, except the splitting part since this is currently a patch for 0.90

My testing shows that creating the znodes when closing in the master is slower since, duh, it's not done in parallel by the region servers. The patch is still much faster than the current master and people tweaking the number of handlers higher should see good speedups. I haven't seen any bug with this patch.

Up next is more testing and porting to trunk with the handling of splits.

> Cleanup the locking contention in the master
> --------------------------------------------
>
>                 Key: HBASE-3789
>                 URL: https://issues.apache.org/jira/browse/HBASE-3789
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.90.2
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>            Priority: Blocker
>             Fix For: 0.92.0
>
>         Attachments: HBASE-3789-v2-0.90.patch, HBASE-3789.patch
>
>
> The new master uses a lot of synchronized blocks to be safe, but it only takes a few jstacks to see that there's multiple layers of lock contention when a bunch of regions are moving (like when the balancer runs). The main culprits are regionInTransition in AssignmentManager, ZKAssign that uses ZKW.getZNnodes (basically another set of region in transitions), and locking at the RegionState level. 
> My understanding is that even tho we have multiple threads to handle regions in transition, everything is actually serialized. Most of the time, lock holders are talking to ZK or a region server, which can take a few milliseconds.
> A simple example is when AssignmentManager wants to update the timers for all the regions on a RS, it will usually be waiting on another thread that's holding the lock while talking to ZK.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3789) Cleanup the locking contention in the master

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-3789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13020473#comment-13020473 ] 

Ted Yu commented on HBASE-3789:
-------------------------------

Good job J-D
Can you disclose roughly how many regions were moved during the 1 second balancing ?

> Cleanup the locking contention in the master
> --------------------------------------------
>
>                 Key: HBASE-3789
>                 URL: https://issues.apache.org/jira/browse/HBASE-3789
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.90.2
>            Reporter: Jean-Daniel Cryans
>            Priority: Blocker
>             Fix For: 0.92.0
>
>         Attachments: HBASE-3789.patch
>
>
> The new master uses a lot of synchronized blocks to be safe, but it only takes a few jstacks to see that there's multiple layers of lock contention when a bunch of regions are moving (like when the balancer runs). The main culprits are regionInTransition in AssignmentManager, ZKAssign that uses ZKW.getZNnodes (basically another set of region in transitions), and locking at the RegionState level. 
> My understanding is that even tho we have multiple threads to handle regions in transition, everything is actually serialized. Most of the time, lock holders are talking to ZK or a region server, which can take a few milliseconds.
> A simple example is when AssignmentManager wants to update the timers for all the regions on a RS, it will usually be waiting on another thread that's holding the lock while talking to ZK.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HBASE-3789) Cleanup the locking contention in the master

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-3789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jean-Daniel Cryans reassigned HBASE-3789:
-----------------------------------------

    Assignee: Jean-Daniel Cryans

> Cleanup the locking contention in the master
> --------------------------------------------
>
>                 Key: HBASE-3789
>                 URL: https://issues.apache.org/jira/browse/HBASE-3789
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.90.2
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>            Priority: Blocker
>             Fix For: 0.92.0
>
>         Attachments: HBASE-3789.patch
>
>
> The new master uses a lot of synchronized blocks to be safe, but it only takes a few jstacks to see that there's multiple layers of lock contention when a bunch of regions are moving (like when the balancer runs). The main culprits are regionInTransition in AssignmentManager, ZKAssign that uses ZKW.getZNnodes (basically another set of region in transitions), and locking at the RegionState level. 
> My understanding is that even tho we have multiple threads to handle regions in transition, everything is actually serialized. Most of the time, lock holders are talking to ZK or a region server, which can take a few milliseconds.
> A simple example is when AssignmentManager wants to update the timers for all the regions on a RS, it will usually be waiting on another thread that's holding the lock while talking to ZK.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3789) Cleanup the locking contention in the master

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-3789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13020476#comment-13020476 ] 

Jean-Daniel Cryans commented on HBASE-3789:
-------------------------------------------

The balancer only sends the unassign messages before returning. About 500. The total time to balance is halved.

> Cleanup the locking contention in the master
> --------------------------------------------
>
>                 Key: HBASE-3789
>                 URL: https://issues.apache.org/jira/browse/HBASE-3789
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.90.2
>            Reporter: Jean-Daniel Cryans
>            Priority: Blocker
>             Fix For: 0.92.0
>
>         Attachments: HBASE-3789.patch
>
>
> The new master uses a lot of synchronized blocks to be safe, but it only takes a few jstacks to see that there's multiple layers of lock contention when a bunch of regions are moving (like when the balancer runs). The main culprits are regionInTransition in AssignmentManager, ZKAssign that uses ZKW.getZNnodes (basically another set of region in transitions), and locking at the RegionState level. 
> My understanding is that even tho we have multiple threads to handle regions in transition, everything is actually serialized. Most of the time, lock holders are talking to ZK or a region server, which can take a few milliseconds.
> A simple example is when AssignmentManager wants to update the timers for all the regions on a RS, it will usually be waiting on another thread that's holding the lock while talking to ZK.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3789) Cleanup the locking contention in the master

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-3789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13048898#comment-13048898 ] 

stack commented on HBASE-3789:
------------------------------

I love this patch.  +1

> Cleanup the locking contention in the master
> --------------------------------------------
>
>                 Key: HBASE-3789
>                 URL: https://issues.apache.org/jira/browse/HBASE-3789
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.90.2
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>            Priority: Blocker
>             Fix For: 0.92.0
>
>         Attachments: HBASE-3789-trunk.patch, HBASE-3789-v3-0.90.patch
>
>
> The new master uses a lot of synchronized blocks to be safe, but it only takes a few jstacks to see that there's multiple layers of lock contention when a bunch of regions are moving (like when the balancer runs). The main culprits are regionInTransition in AssignmentManager, ZKAssign that uses ZKW.getZNnodes (basically another set of region in transitions), and locking at the RegionState level. 
> My understanding is that even tho we have multiple threads to handle regions in transition, everything is actually serialized. Most of the time, lock holders are talking to ZK or a region server, which can take a few milliseconds.
> A simple example is when AssignmentManager wants to update the timers for all the regions on a RS, it will usually be waiting on another thread that's holding the lock while talking to ZK.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3789) Cleanup the locking contention in the master

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-3789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jean-Daniel Cryans updated HBASE-3789:
--------------------------------------

    Attachment: HBASE-3789-v3-0.90.patch

Patch without the conf/ stuff in it (duh).

> Cleanup the locking contention in the master
> --------------------------------------------
>
>                 Key: HBASE-3789
>                 URL: https://issues.apache.org/jira/browse/HBASE-3789
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.90.2
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>            Priority: Blocker
>             Fix For: 0.92.0
>
>         Attachments: HBASE-3789-v2-0.90.patch, HBASE-3789-v3-0.90.patch, HBASE-3789.patch
>
>
> The new master uses a lot of synchronized blocks to be safe, but it only takes a few jstacks to see that there's multiple layers of lock contention when a bunch of regions are moving (like when the balancer runs). The main culprits are regionInTransition in AssignmentManager, ZKAssign that uses ZKW.getZNnodes (basically another set of region in transitions), and locking at the RegionState level. 
> My understanding is that even tho we have multiple threads to handle regions in transition, everything is actually serialized. Most of the time, lock holders are talking to ZK or a region server, which can take a few milliseconds.
> A simple example is when AssignmentManager wants to update the timers for all the regions on a RS, it will usually be waiting on another thread that's holding the lock while talking to ZK.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3789) Cleanup the locking contention in the master

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-3789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jean-Daniel Cryans updated HBASE-3789:
--------------------------------------

    Attachment: HBASE-3789.patch

This first patch is a dirty work in progress. 

First thing I changed is in ZKW I used a ConcurrentSkipListSet instead of a HashSet which resulted in removing all the weird locking in ZKAssign.

Next up is AssignmentManager where I removed all the sync done in nodeCreated/DataChanged/ChildrenChanged since it was already handled inside handleRegion(). Also it is doing ZK operations under that sync so it's double bad.

Third I changed RegionState so that the stamp is atomically modifiable since it doesn't really matter that both the state and the stamp be changed by exactly the same person, what you want in the end is progress. This was also the source of a lot of locking in updateTimers.

I tested this patch under load while killing region servers (multiple at a time), and then running the balancer. Didn't a single bug. Still needs more testing and need to document the locking and see if there's any other optimization that could be done.

At least, my profiling shows that the master is now only waiting on ZK.



> Cleanup the locking contention in the master
> --------------------------------------------
>
>                 Key: HBASE-3789
>                 URL: https://issues.apache.org/jira/browse/HBASE-3789
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.90.2
>            Reporter: Jean-Daniel Cryans
>            Priority: Blocker
>             Fix For: 0.92.0
>
>         Attachments: HBASE-3789.patch
>
>
> The new master uses a lot of synchronized blocks to be safe, but it only takes a few jstacks to see that there's multiple layers of lock contention when a bunch of regions are moving (like when the balancer runs). The main culprits are regionInTransition in AssignmentManager, ZKAssign that uses ZKW.getZNnodes (basically another set of region in transitions), and locking at the RegionState level. 
> My understanding is that even tho we have multiple threads to handle regions in transition, everything is actually serialized. Most of the time, lock holders are talking to ZK or a region server, which can take a few milliseconds.
> A simple example is when AssignmentManager wants to update the timers for all the regions on a RS, it will usually be waiting on another thread that's holding the lock while talking to ZK.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3789) Cleanup the locking contention in the master

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-3789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13048905#comment-13048905 ] 

Jean-Daniel Cryans commented on HBASE-3789:
-------------------------------------------

Thanks Stack for the +1.

Also it's important to note that the 0.90 patch breaks rolling restarts because of the way it changes the closing sequence (which is why this is targeted for 0.92). Apply at your own risk :)

> Cleanup the locking contention in the master
> --------------------------------------------
>
>                 Key: HBASE-3789
>                 URL: https://issues.apache.org/jira/browse/HBASE-3789
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.90.2
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>            Priority: Blocker
>             Fix For: 0.92.0
>
>         Attachments: HBASE-3789-trunk.patch, HBASE-3789-v3-0.90.patch
>
>
> The new master uses a lot of synchronized blocks to be safe, but it only takes a few jstacks to see that there's multiple layers of lock contention when a bunch of regions are moving (like when the balancer runs). The main culprits are regionInTransition in AssignmentManager, ZKAssign that uses ZKW.getZNnodes (basically another set of region in transitions), and locking at the RegionState level. 
> My understanding is that even tho we have multiple threads to handle regions in transition, everything is actually serialized. Most of the time, lock holders are talking to ZK or a region server, which can take a few milliseconds.
> A simple example is when AssignmentManager wants to update the timers for all the regions on a RS, it will usually be waiting on another thread that's holding the lock while talking to ZK.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3789) Cleanup the locking contention in the master

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-3789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jean-Daniel Cryans updated HBASE-3789:
--------------------------------------

    Attachment:     (was: HBASE-3789-v3-0.90.patch)

> Cleanup the locking contention in the master
> --------------------------------------------
>
>                 Key: HBASE-3789
>                 URL: https://issues.apache.org/jira/browse/HBASE-3789
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.90.2
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>            Priority: Blocker
>             Fix For: 0.92.0
>
>         Attachments: HBASE-3789-v2-0.90.patch, HBASE-3789-v3-0.90.patch, HBASE-3789.patch
>
>
> The new master uses a lot of synchronized blocks to be safe, but it only takes a few jstacks to see that there's multiple layers of lock contention when a bunch of regions are moving (like when the balancer runs). The main culprits are regionInTransition in AssignmentManager, ZKAssign that uses ZKW.getZNnodes (basically another set of region in transitions), and locking at the RegionState level. 
> My understanding is that even tho we have multiple threads to handle regions in transition, everything is actually serialized. Most of the time, lock holders are talking to ZK or a region server, which can take a few milliseconds.
> A simple example is when AssignmentManager wants to update the timers for all the regions on a RS, it will usually be waiting on another thread that's holding the lock while talking to ZK.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3789) Cleanup the locking contention in the master

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-3789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jean-Daniel Cryans updated HBASE-3789:
--------------------------------------

    Attachment: HBASE-3789-v3-0.90.patch

With the previous patch all the tests passed except for hbck. Looking deeper, I see hbck creates it's own znodes so now the master doesn't see that. It's not clear to my why it's not using HBA.assign instead of the trickery with the HBCK_CODE_NAME.

This patch modifies hbck so that it uses "normal" tools provided by the master instead of bypassing it.

I'm also working on porting that to trunk. I got the previous patch I posted working but didn't do the hbck stuff yet because it's different.

Also I still didn't touch the splitting code in trunk.

> Cleanup the locking contention in the master
> --------------------------------------------
>
>                 Key: HBASE-3789
>                 URL: https://issues.apache.org/jira/browse/HBASE-3789
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.90.2
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>            Priority: Blocker
>             Fix For: 0.92.0
>
>         Attachments: HBASE-3789-v2-0.90.patch, HBASE-3789-v3-0.90.patch, HBASE-3789.patch
>
>
> The new master uses a lot of synchronized blocks to be safe, but it only takes a few jstacks to see that there's multiple layers of lock contention when a bunch of regions are moving (like when the balancer runs). The main culprits are regionInTransition in AssignmentManager, ZKAssign that uses ZKW.getZNnodes (basically another set of region in transitions), and locking at the RegionState level. 
> My understanding is that even tho we have multiple threads to handle regions in transition, everything is actually serialized. Most of the time, lock holders are talking to ZK or a region server, which can take a few milliseconds.
> A simple example is when AssignmentManager wants to update the timers for all the regions on a RS, it will usually be waiting on another thread that's holding the lock while talking to ZK.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3789) Cleanup the locking contention in the master

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-3789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jean-Daniel Cryans updated HBASE-3789:
--------------------------------------

    Attachment:     (was: HBASE-3789-v3-0.90.patch)

> Cleanup the locking contention in the master
> --------------------------------------------
>
>                 Key: HBASE-3789
>                 URL: https://issues.apache.org/jira/browse/HBASE-3789
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.90.2
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>            Priority: Blocker
>             Fix For: 0.92.0
>
>         Attachments: HBASE-3789-trunk.patch
>
>
> The new master uses a lot of synchronized blocks to be safe, but it only takes a few jstacks to see that there's multiple layers of lock contention when a bunch of regions are moving (like when the balancer runs). The main culprits are regionInTransition in AssignmentManager, ZKAssign that uses ZKW.getZNnodes (basically another set of region in transitions), and locking at the RegionState level. 
> My understanding is that even tho we have multiple threads to handle regions in transition, everything is actually serialized. Most of the time, lock holders are talking to ZK or a region server, which can take a few milliseconds.
> A simple example is when AssignmentManager wants to update the timers for all the regions on a RS, it will usually be waiting on another thread that's holding the lock while talking to ZK.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3789) Cleanup the locking contention in the master

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-3789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jean-Daniel Cryans updated HBASE-3789:
--------------------------------------

    Attachment:     (was: HBASE-3789.patch)

> Cleanup the locking contention in the master
> --------------------------------------------
>
>                 Key: HBASE-3789
>                 URL: https://issues.apache.org/jira/browse/HBASE-3789
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.90.2
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>            Priority: Blocker
>             Fix For: 0.92.0
>
>         Attachments: HBASE-3789-trunk.patch, HBASE-3789-v3-0.90.patch
>
>
> The new master uses a lot of synchronized blocks to be safe, but it only takes a few jstacks to see that there's multiple layers of lock contention when a bunch of regions are moving (like when the balancer runs). The main culprits are regionInTransition in AssignmentManager, ZKAssign that uses ZKW.getZNnodes (basically another set of region in transitions), and locking at the RegionState level. 
> My understanding is that even tho we have multiple threads to handle regions in transition, everything is actually serialized. Most of the time, lock holders are talking to ZK or a region server, which can take a few milliseconds.
> A simple example is when AssignmentManager wants to update the timers for all the regions on a RS, it will usually be waiting on another thread that's holding the lock while talking to ZK.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HBASE-3789) Cleanup the locking contention in the master

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-3789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jean-Daniel Cryans resolved HBASE-3789.
---------------------------------------

      Resolution: Fixed
    Release Note: 
The master now creates the znode when closing a region, which is an incompatible change.
SplitTransaction now waits on the master to delete the znode that it created before it can finish.
The master doesn't keep track of znodes being deleted and created anymore, it was getting out of sync too easily. 
    Hadoop Flags: [Incompatible change, Reviewed]

Committed to trunk, thanks for the review Stack.

> Cleanup the locking contention in the master
> --------------------------------------------
>
>                 Key: HBASE-3789
>                 URL: https://issues.apache.org/jira/browse/HBASE-3789
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.90.2
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>            Priority: Blocker
>             Fix For: 0.92.0
>
>         Attachments: HBASE-3789-trunk.patch, HBASE-3789-v3-0.90.patch
>
>
> The new master uses a lot of synchronized blocks to be safe, but it only takes a few jstacks to see that there's multiple layers of lock contention when a bunch of regions are moving (like when the balancer runs). The main culprits are regionInTransition in AssignmentManager, ZKAssign that uses ZKW.getZNnodes (basically another set of region in transitions), and locking at the RegionState level. 
> My understanding is that even tho we have multiple threads to handle regions in transition, everything is actually serialized. Most of the time, lock holders are talking to ZK or a region server, which can take a few milliseconds.
> A simple example is when AssignmentManager wants to update the timers for all the regions on a RS, it will usually be waiting on another thread that's holding the lock while talking to ZK.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3789) Cleanup the locking contention in the master

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-3789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jean-Daniel Cryans updated HBASE-3789:
--------------------------------------

    Attachment:     (was: HBASE-3789-v2-0.90.patch)

> Cleanup the locking contention in the master
> --------------------------------------------
>
>                 Key: HBASE-3789
>                 URL: https://issues.apache.org/jira/browse/HBASE-3789
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.90.2
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>            Priority: Blocker
>             Fix For: 0.92.0
>
>         Attachments: HBASE-3789-trunk.patch, HBASE-3789-v3-0.90.patch
>
>
> The new master uses a lot of synchronized blocks to be safe, but it only takes a few jstacks to see that there's multiple layers of lock contention when a bunch of regions are moving (like when the balancer runs). The main culprits are regionInTransition in AssignmentManager, ZKAssign that uses ZKW.getZNnodes (basically another set of region in transitions), and locking at the RegionState level. 
> My understanding is that even tho we have multiple threads to handle regions in transition, everything is actually serialized. Most of the time, lock holders are talking to ZK or a region server, which can take a few milliseconds.
> A simple example is when AssignmentManager wants to update the timers for all the regions on a RS, it will usually be waiting on another thread that's holding the lock while talking to ZK.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3789) Cleanup the locking contention in the master

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-3789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jean-Daniel Cryans updated HBASE-3789:
--------------------------------------

    Attachment:     (was: HBASE-3789-trunk-wip.patch)

> Cleanup the locking contention in the master
> --------------------------------------------
>
>                 Key: HBASE-3789
>                 URL: https://issues.apache.org/jira/browse/HBASE-3789
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.90.2
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>            Priority: Blocker
>             Fix For: 0.92.0
>
>         Attachments: HBASE-3789-trunk.patch, HBASE-3789-v3-0.90.patch
>
>
> The new master uses a lot of synchronized blocks to be safe, but it only takes a few jstacks to see that there's multiple layers of lock contention when a bunch of regions are moving (like when the balancer runs). The main culprits are regionInTransition in AssignmentManager, ZKAssign that uses ZKW.getZNnodes (basically another set of region in transitions), and locking at the RegionState level. 
> My understanding is that even tho we have multiple threads to handle regions in transition, everything is actually serialized. Most of the time, lock holders are talking to ZK or a region server, which can take a few milliseconds.
> A simple example is when AssignmentManager wants to update the timers for all the regions on a RS, it will usually be waiting on another thread that's holding the lock while talking to ZK.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3789) Cleanup the locking contention in the master

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-3789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13038231#comment-13038231 ] 

Jean-Daniel Cryans commented on HBASE-3789:
-------------------------------------------

Yes, like I wrote in the first comment it's a dirty WIP.

> Cleanup the locking contention in the master
> --------------------------------------------
>
>                 Key: HBASE-3789
>                 URL: https://issues.apache.org/jira/browse/HBASE-3789
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.90.2
>            Reporter: Jean-Daniel Cryans
>            Priority: Blocker
>             Fix For: 0.92.0
>
>         Attachments: HBASE-3789.patch
>
>
> The new master uses a lot of synchronized blocks to be safe, but it only takes a few jstacks to see that there's multiple layers of lock contention when a bunch of regions are moving (like when the balancer runs). The main culprits are regionInTransition in AssignmentManager, ZKAssign that uses ZKW.getZNnodes (basically another set of region in transitions), and locking at the RegionState level. 
> My understanding is that even tho we have multiple threads to handle regions in transition, everything is actually serialized. Most of the time, lock holders are talking to ZK or a region server, which can take a few milliseconds.
> A simple example is when AssignmentManager wants to update the timers for all the regions on a RS, it will usually be waiting on another thread that's holding the lock while talking to ZK.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3789) Cleanup the locking contention in the master

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-3789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050991#comment-13050991 ] 

Hudson commented on HBASE-3789:
-------------------------------

Integrated in HBase-TRUNK #1976 (See [https://builds.apache.org/job/HBase-TRUNK/1976/])
    

> Cleanup the locking contention in the master
> --------------------------------------------
>
>                 Key: HBASE-3789
>                 URL: https://issues.apache.org/jira/browse/HBASE-3789
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.90.2
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>            Priority: Blocker
>             Fix For: 0.92.0
>
>         Attachments: HBASE-3789-trunk.patch, HBASE-3789-v4-0.90.patch
>
>
> The new master uses a lot of synchronized blocks to be safe, but it only takes a few jstacks to see that there's multiple layers of lock contention when a bunch of regions are moving (like when the balancer runs). The main culprits are regionInTransition in AssignmentManager, ZKAssign that uses ZKW.getZNnodes (basically another set of region in transitions), and locking at the RegionState level. 
> My understanding is that even tho we have multiple threads to handle regions in transition, everything is actually serialized. Most of the time, lock holders are talking to ZK or a region server, which can take a few milliseconds.
> A simple example is when AssignmentManager wants to update the timers for all the regions on a RS, it will usually be waiting on another thread that's holding the lock while talking to ZK.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3789) Cleanup the locking contention in the master

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-3789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jean-Daniel Cryans updated HBASE-3789:
--------------------------------------

    Attachment: HBASE-3789-trunk.patch

Patch for trunk with the split fixes. I had to remove a test because it wasn't an issue anymore (the master now creates the znode when closing a region), then I had to do a bunch of fixes for AssignmentManager for cases when we report regions that are already split or skipped steps and finally I added the part of the code that waits for the master to delete the znode.

One thing I might do further cleanup on is the latter part of SplitTransaction that has a few methods that all look the same. Also I'm not thrilled having to do a sleep to wait on the master, but that was the easiest way.

> Cleanup the locking contention in the master
> --------------------------------------------
>
>                 Key: HBASE-3789
>                 URL: https://issues.apache.org/jira/browse/HBASE-3789
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.90.2
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>            Priority: Blocker
>             Fix For: 0.92.0
>
>         Attachments: HBASE-3789-trunk.patch, HBASE-3789-v3-0.90.patch
>
>
> The new master uses a lot of synchronized blocks to be safe, but it only takes a few jstacks to see that there's multiple layers of lock contention when a bunch of regions are moving (like when the balancer runs). The main culprits are regionInTransition in AssignmentManager, ZKAssign that uses ZKW.getZNnodes (basically another set of region in transitions), and locking at the RegionState level. 
> My understanding is that even tho we have multiple threads to handle regions in transition, everything is actually serialized. Most of the time, lock holders are talking to ZK or a region server, which can take a few milliseconds.
> A simple example is when AssignmentManager wants to update the timers for all the regions on a RS, it will usually be waiting on another thread that's holding the lock while talking to ZK.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3789) Cleanup the locking contention in the master

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-3789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13039368#comment-13039368 ] 

Jean-Daniel Cryans commented on HBASE-3789:
-------------------------------------------

There's one major issue with my current patch and it's that there's a race between the master's OpenedRegionHandler and the events thread. It goes like this:

 - RS transitions a region to OPENING
 - RS transitions again to OPENING
 - Master receives the first event, reads ZK and sees OPENING
 - RS transitions to OPENED
 - Master receives the second event, reads ZK and sees OPENED instead of OPENING, kicks of the OpenedRegionHandler
 - The handler will at some point delete the znode in the ZKW.getNodes structure (such a bad method name) before deleting the actual znode
 - Master receives the third event, reads ZK, sees OPENED but finds that getNodes doesn't contain the znode and considers this as a new region in transition so it adds it back in getNodes()
 - The handler deletes the znode
 - The Master does a no-op when it figures it cannot transition from OPEN to OPENED

At this point the region is assigned and everything is "fine"... until the master decides for any reason to unassign the region. It sends the unassignment, receives an event but doesn't process it in nodeChildrenChanged because ZKW.getNodes() already has it. From the point the master will spin in "Region has been PENDING_CLOSE for too long" until it's put out of its misery.

The issue here is that the region server is creating the unassigned znode by itself, unlike an assignment where it's the master that does it. Doing that in the master won't fully solve the issue tho because in 0.92 the RS still create znodes for splits and there's no way that could be done by the master is it would be basically like returning back to how it used to work.

So this is what Stack and I thought about:

 - The master needs to create the unassigned znode before telling a RS to close a region, the RS will now just update it
 - ZKW needs to stop keeping track of the znodes, getting into a situation where we have a mismatch is too easy
 - The SplitTransaction will still create the znode, but it will then wait at the very end until it gets deleted by the master. To make sure the master sees the change, it will tickle the znode like we do for OPENING so that the master doesn't miss it
 - The method AssignmentManager.nodeChildrenChanged will only put watchers on znodes and won't keep track of anything

> Cleanup the locking contention in the master
> --------------------------------------------
>
>                 Key: HBASE-3789
>                 URL: https://issues.apache.org/jira/browse/HBASE-3789
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.90.2
>            Reporter: Jean-Daniel Cryans
>            Priority: Blocker
>             Fix For: 0.92.0
>
>         Attachments: HBASE-3789.patch
>
>
> The new master uses a lot of synchronized blocks to be safe, but it only takes a few jstacks to see that there's multiple layers of lock contention when a bunch of regions are moving (like when the balancer runs). The main culprits are regionInTransition in AssignmentManager, ZKAssign that uses ZKW.getZNnodes (basically another set of region in transitions), and locking at the RegionState level. 
> My understanding is that even tho we have multiple threads to handle regions in transition, everything is actually serialized. Most of the time, lock holders are talking to ZK or a region server, which can take a few milliseconds.
> A simple example is when AssignmentManager wants to update the timers for all the regions on a RS, it will usually be waiting on another thread that's holding the lock while talking to ZK.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3789) Cleanup the locking contention in the master

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-3789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jean-Daniel Cryans updated HBASE-3789:
--------------------------------------

    Attachment: HBASE-3789-v4-0.90.patch

Attaching the 0.90 patch refreshed for that branch. It's not meant for inclusion, leaving it here if it's of use to anyone. Remember it breaks rolling restarts if you plan on deploying it.

> Cleanup the locking contention in the master
> --------------------------------------------
>
>                 Key: HBASE-3789
>                 URL: https://issues.apache.org/jira/browse/HBASE-3789
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.90.2
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>            Priority: Blocker
>             Fix For: 0.92.0
>
>         Attachments: HBASE-3789-trunk.patch, HBASE-3789-v4-0.90.patch
>
>
> The new master uses a lot of synchronized blocks to be safe, but it only takes a few jstacks to see that there's multiple layers of lock contention when a bunch of regions are moving (like when the balancer runs). The main culprits are regionInTransition in AssignmentManager, ZKAssign that uses ZKW.getZNnodes (basically another set of region in transitions), and locking at the RegionState level. 
> My understanding is that even tho we have multiple threads to handle regions in transition, everything is actually serialized. Most of the time, lock holders are talking to ZK or a region server, which can take a few milliseconds.
> A simple example is when AssignmentManager wants to update the timers for all the regions on a RS, it will usually be waiting on another thread that's holding the lock while talking to ZK.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3789) Cleanup the locking contention in the master

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-3789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jean-Daniel Cryans updated HBASE-3789:
--------------------------------------

    Attachment: HBASE-3789-trunk-wip.patch

I'm posting the equivalent to the v3 patch for trunk. I still need to handle the splitting.

> Cleanup the locking contention in the master
> --------------------------------------------
>
>                 Key: HBASE-3789
>                 URL: https://issues.apache.org/jira/browse/HBASE-3789
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.90.2
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>            Priority: Blocker
>             Fix For: 0.92.0
>
>         Attachments: HBASE-3789-trunk-wip.patch, HBASE-3789-v2-0.90.patch, HBASE-3789-v3-0.90.patch, HBASE-3789.patch
>
>
> The new master uses a lot of synchronized blocks to be safe, but it only takes a few jstacks to see that there's multiple layers of lock contention when a bunch of regions are moving (like when the balancer runs). The main culprits are regionInTransition in AssignmentManager, ZKAssign that uses ZKW.getZNnodes (basically another set of region in transitions), and locking at the RegionState level. 
> My understanding is that even tho we have multiple threads to handle regions in transition, everything is actually serialized. Most of the time, lock holders are talking to ZK or a region server, which can take a few milliseconds.
> A simple example is when AssignmentManager wants to update the timers for all the regions on a RS, it will usually be waiting on another thread that's holding the lock while talking to ZK.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3789) Cleanup the locking contention in the master

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-3789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13020471#comment-13020471 ] 

Jean-Daniel Cryans commented on HBASE-3789:
-------------------------------------------

I might also add that with this patch, when it usually took 25 seconds to run the balancer command it now returns under 1 second.

> Cleanup the locking contention in the master
> --------------------------------------------
>
>                 Key: HBASE-3789
>                 URL: https://issues.apache.org/jira/browse/HBASE-3789
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.90.2
>            Reporter: Jean-Daniel Cryans
>            Priority: Blocker
>             Fix For: 0.92.0
>
>         Attachments: HBASE-3789.patch
>
>
> The new master uses a lot of synchronized blocks to be safe, but it only takes a few jstacks to see that there's multiple layers of lock contention when a bunch of regions are moving (like when the balancer runs). The main culprits are regionInTransition in AssignmentManager, ZKAssign that uses ZKW.getZNnodes (basically another set of region in transitions), and locking at the RegionState level. 
> My understanding is that even tho we have multiple threads to handle regions in transition, everything is actually serialized. Most of the time, lock holders are talking to ZK or a region server, which can take a few milliseconds.
> A simple example is when AssignmentManager wants to update the timers for all the regions on a RS, it will usually be waiting on another thread that's holding the lock while talking to ZK.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira