You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "stack (JIRA)" <ji...@apache.org> on 2011/03/11 01:35:59 UTC
[jira] Created: (HBASE-3621) The timeout handler in
AssignmentManager does an RPC while holding lock on RIT; a big no-no
The timeout handler in AssignmentManager does an RPC while holding lock on RIT; a big no-no
-------------------------------------------------------------------------------------------
Key: HBASE-3621
URL: https://issues.apache.org/jira/browse/HBASE-3621
Project: HBase
Issue Type: Bug
Reporter: stack
Fix For: 0.90.2
J-D found this debugging a failure on Dmitriy's cluster; we're RPC'ing under a synchronized(regionsInTransition). Fix.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Assigned: (HBASE-3621) The timeout handler in
AssignmentManager does an RPC while holding lock on RIT; a big no-no
Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-3621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ted Yu reassigned HBASE-3621:
-----------------------------
Assignee: Ted Yu
> The timeout handler in AssignmentManager does an RPC while holding lock on RIT; a big no-no
> -------------------------------------------------------------------------------------------
>
> Key: HBASE-3621
> URL: https://issues.apache.org/jira/browse/HBASE-3621
> Project: HBase
> Issue Type: Bug
> Reporter: stack
> Assignee: Ted Yu
> Fix For: 0.90.2
>
>
> J-D found this debugging a failure on Dmitriy's cluster; we're RPC'ing under a synchronized(regionsInTransition). Fix.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3621) The timeout handler in
AssignmentManager does an RPC while holding lock on RIT; a big no-no
Posted by "Hudson (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-3621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13009958#comment-13009958 ]
Hudson commented on HBASE-3621:
-------------------------------
Integrated in HBase-TRUNK #1803 (See [https://hudson.apache.org/hudson/job/HBase-TRUNK/1803/])
> The timeout handler in AssignmentManager does an RPC while holding lock on RIT; a big no-no
> -------------------------------------------------------------------------------------------
>
> Key: HBASE-3621
> URL: https://issues.apache.org/jira/browse/HBASE-3621
> Project: HBase
> Issue Type: Bug
> Reporter: stack
> Assignee: Ted Yu
> Fix For: 0.90.2
>
> Attachments: 3621-v2.txt, hbase-3621.txt
>
>
> J-D found this debugging a failure on Dmitriy's cluster; we're RPC'ing under a synchronized(regionsInTransition). Fix.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HBASE-3621) The timeout handler in
AssignmentManager does an RPC while holding lock on RIT; a big no-no
Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-3621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ted Yu updated HBASE-3621:
--------------------------
Attachment: hbase-3621.txt
Initial attempt.
I moved the processing of regions in PENDING_CLOSE state outside of synchronized block.
> The timeout handler in AssignmentManager does an RPC while holding lock on RIT; a big no-no
> -------------------------------------------------------------------------------------------
>
> Key: HBASE-3621
> URL: https://issues.apache.org/jira/browse/HBASE-3621
> Project: HBase
> Issue Type: Bug
> Reporter: stack
> Assignee: Ted Yu
> Fix For: 0.90.2
>
> Attachments: hbase-3621.txt
>
>
> J-D found this debugging a failure on Dmitriy's cluster; we're RPC'ing under a synchronized(regionsInTransition). Fix.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-3621) The timeout handler in
AssignmentManager does an RPC while holding lock on RIT; a big no-no
Posted by "stack (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-3621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
stack resolved HBASE-3621.
--------------------------
Resolution: Fixed
Hadoop Flags: [Reviewed]
Committed to branch and trunk. Thanks for the patch Ted.
> The timeout handler in AssignmentManager does an RPC while holding lock on RIT; a big no-no
> -------------------------------------------------------------------------------------------
>
> Key: HBASE-3621
> URL: https://issues.apache.org/jira/browse/HBASE-3621
> Project: HBase
> Issue Type: Bug
> Reporter: stack
> Assignee: Ted Yu
> Fix For: 0.90.2
>
> Attachments: 3621-v2.txt, hbase-3621.txt
>
>
> J-D found this debugging a failure on Dmitriy's cluster; we're RPC'ing under a synchronized(regionsInTransition). Fix.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-3621) The timeout handler in
AssignmentManager does an RPC while holding lock on RIT; a big no-no
Posted by "stack (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-3621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
stack updated HBASE-3621:
-------------------------
Attachment: 3621-v2.txt
This is what I committed. Its Teds patch plus I do same thing for assign (Teds patch covered unassigns happening under locks -- I noticed that we are also assigning under locks so I copied Teds pattern moving assign out to run outside of the lock on regionsInTransition.
> The timeout handler in AssignmentManager does an RPC while holding lock on RIT; a big no-no
> -------------------------------------------------------------------------------------------
>
> Key: HBASE-3621
> URL: https://issues.apache.org/jira/browse/HBASE-3621
> Project: HBase
> Issue Type: Bug
> Reporter: stack
> Assignee: Ted Yu
> Fix For: 0.90.2
>
> Attachments: 3621-v2.txt, hbase-3621.txt
>
>
> J-D found this debugging a failure on Dmitriy's cluster; we're RPC'ing under a synchronized(regionsInTransition). Fix.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HBASE-3621) The timeout handler in
AssignmentManager does an RPC while holding lock on RIT; a big no-no
Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-3621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13005441#comment-13005441 ]
Ted Yu commented on HBASE-3621:
-------------------------------
unassign() has the following block which is before making RPC call:
{code}
synchronized (regionsInTransition) {
}
{code}
so we don't need to enclose unassign() in the synchronized block of chore().
> The timeout handler in AssignmentManager does an RPC while holding lock on RIT; a big no-no
> -------------------------------------------------------------------------------------------
>
> Key: HBASE-3621
> URL: https://issues.apache.org/jira/browse/HBASE-3621
> Project: HBase
> Issue Type: Bug
> Reporter: stack
> Fix For: 0.90.2
>
>
> J-D found this debugging a failure on Dmitriy's cluster; we're RPC'ing under a synchronized(regionsInTransition). Fix.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HBASE-3621) The timeout handler in
AssignmentManager does an RPC while holding lock on RIT; a big no-no
Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-3621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13005438#comment-13005438 ]
Jean-Daniel Cryans commented on HBASE-3621:
-------------------------------------------
For example:
{code}
"somenode.prod.twitter.com:60000.timeoutMonitor" daemon prio=10 tid=0x00002aacb8567800 nid=0x772 in Object.wait() [0x0000000045bf1000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:485)
at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:757)
- locked <0x00002aaab2a10da8> (a org.apache.hadoop.hbase.ipc.HBaseClient$Call)
at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
at $Proxy6.closeRegion(Unknown Source)
at org.apache.hadoop.hbase.master.ServerManager.sendRegionClose(ServerManager.java:589)
at org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1093)
at org.apache.hadoop.hbase.master.AssignmentManager$TimeoutMonitor.chore(AssignmentManager.java:1672)
- locked <0x00002aaabf759858> (a java.util.concurrent.ConcurrentSkipListMap)
at org.apache.hadoop.hbase.Chore.run(Chore.java:66
...
"main-EventThread" daemon prio=10 tid=0x00002aacb850b000 nid=0x761 waiting for monitor entry [0x00000000455eb000]
java.lang.Thread.State: BLOCKED (on object monitor)
at org.apache.hadoop.hbase.master.AssignmentManager.nodeDataChanged(AssignmentManager.java:525)
- waiting to lock <0x00002aaabf759858> (a java.util.concurrent.ConcurrentSkipListMap)
at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:268)
at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:530)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506)
{code}
The ZK event thread is blocked by that other thread that talks to a RS that doesn't answer. All ZK events get severely delayed.
> The timeout handler in AssignmentManager does an RPC while holding lock on RIT; a big no-no
> -------------------------------------------------------------------------------------------
>
> Key: HBASE-3621
> URL: https://issues.apache.org/jira/browse/HBASE-3621
> Project: HBase
> Issue Type: Bug
> Reporter: stack
> Fix For: 0.90.2
>
>
> J-D found this debugging a failure on Dmitriy's cluster; we're RPC'ing under a synchronized(regionsInTransition). Fix.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HBASE-3621) The timeout handler in
AssignmentManager does an RPC while holding lock on RIT; a big no-no
Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-3621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13007974#comment-13007974 ]
Ted Yu commented on HBASE-3621:
-------------------------------
assign() calls the following method:
{code}
serverManager.sendRegionOpen(plan.getDestination(), state.getRegion());
{code}
I think we should move call to assign() outside of
{code}
synchronized (regionsInTransition) {
}
{code}
block.
I noticed the call assign(regionState, false, true) at line 1790 doesn't obtain lock on regionState, inconsistent with the other calls to:
{code}
private void assign(final RegionState state, final boolean setOfflineInZK,
final boolean forceNewPlan) {
{code}
> The timeout handler in AssignmentManager does an RPC while holding lock on RIT; a big no-no
> -------------------------------------------------------------------------------------------
>
> Key: HBASE-3621
> URL: https://issues.apache.org/jira/browse/HBASE-3621
> Project: HBase
> Issue Type: Bug
> Reporter: stack
> Assignee: Ted Yu
> Fix For: 0.90.2
>
> Attachments: hbase-3621.txt
>
>
> J-D found this debugging a failure on Dmitriy's cluster; we're RPC'ing under a synchronized(regionsInTransition). Fix.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira