You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Sean Busbey (JIRA)" <ji...@apache.org> on 2015/12/18 01:26:47 UTC
[jira] [Updated] (HBASE-14241) Fix deadlock during cluster shutdown
due to concurrent connection close
[ https://issues.apache.org/jira/browse/HBASE-14241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Busbey updated HBASE-14241:
--------------------------------
Component/s: metrics
master
> Fix deadlock during cluster shutdown due to concurrent connection close
> -----------------------------------------------------------------------
>
> Key: HBASE-14241
> URL: https://issues.apache.org/jira/browse/HBASE-14241
> Project: HBase
> Issue Type: Bug
> Components: master, metrics
> Affects Versions: 1.0.2
> Reporter: Andrew Purtell
> Assignee: Ted Yu
> Priority: Critical
> Fix For: 2.0.0, 1.0.2, 1.2.0, 1.1.2, 1.3.0
>
> Attachments: 14241-v2.txt, 14241-v3.txt, 14241-v4.txt, 14241-v5.txt, deadlock.txt.gz
>
>
> Caught while testing branch-1.0, shutting down TestMasterMetricsWrapper.
> Found one Java-level deadlock:
> =============================
> "MASTER_META_SERVER_OPERATIONS-ip-10-32-130-237:55342-0":
> waiting to lock monitor 0x00007f2a040051c8 (object 0x00000007e36108a8, a org.apache.hadoop.hbase.util.PoolMap),
> which is held by "M:0;ip-10-32-130-237:55342"
> "M:0;ip-10-32-130-237:55342":
> waiting to lock monitor 0x00007f2a04005118 (object 0x00000007e3610b00, a org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection),
> which is held by "MASTER_META_SERVER_OPERATIONS-ip-10-32-130-237:55342-0"
> Full stack dump and deadlock debug output attached.
> Root cause:
> In RpcClientImpl#close(), we obtain lock on connections first:
> {code}
> synchronized (connections) {
> for (Connection conn : connections.values()) {
> {code}
> Then markClosed() tries to obtain lock on connection object:
> {code}
> if (!conn.isAlive()) {
> conn.markClosed(new InterruptedIOException("RpcClient is closing"));
> conn.close();
> {code}
> Another thread, MetaServerShutdownHandler, calls RpcClientImpl$Connection#setupIOstreams() where :
> {code}
> markClosed(e);
> close();
> {code}
> Lock on connection object is obtained first, then lock on connections is attempted, leading to deadlock:
> {code}
> synchronized (connections) {
> connections.removeValue(remoteId, this);
> }
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)