You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Sean Busbey (JIRA)" <ji...@apache.org> on 2015/12/18 01:26:47 UTC

[jira] [Updated] (HBASE-14241) Fix deadlock during cluster shutdown due to concurrent connection close

     [ https://issues.apache.org/jira/browse/HBASE-14241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Busbey updated HBASE-14241:
--------------------------------
    Component/s: metrics
                 master

> Fix deadlock during cluster shutdown due to concurrent connection close
> -----------------------------------------------------------------------
>
>                 Key: HBASE-14241
>                 URL: https://issues.apache.org/jira/browse/HBASE-14241
>             Project: HBase
>          Issue Type: Bug
>          Components: master, metrics
>    Affects Versions: 1.0.2
>            Reporter: Andrew Purtell
>            Assignee: Ted Yu
>            Priority: Critical
>             Fix For: 2.0.0, 1.0.2, 1.2.0, 1.1.2, 1.3.0
>
>         Attachments: 14241-v2.txt, 14241-v3.txt, 14241-v4.txt, 14241-v5.txt, deadlock.txt.gz
>
>
> Caught while testing branch-1.0, shutting down TestMasterMetricsWrapper.
> Found one Java-level deadlock:
> =============================
> "MASTER_META_SERVER_OPERATIONS-ip-10-32-130-237:55342-0":
>   waiting to lock monitor 0x00007f2a040051c8 (object 0x00000007e36108a8, a org.apache.hadoop.hbase.util.PoolMap),
>   which is held by "M:0;ip-10-32-130-237:55342"
> "M:0;ip-10-32-130-237:55342":
>   waiting to lock monitor 0x00007f2a04005118 (object 0x00000007e3610b00, a org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection),
>   which is held by "MASTER_META_SERVER_OPERATIONS-ip-10-32-130-237:55342-0"
> Full stack dump and deadlock debug output attached.
> Root cause:
> In RpcClientImpl#close(), we obtain lock on connections first:
> {code}
>     synchronized (connections) {
>       for (Connection conn : connections.values()) {
> {code}
> Then markClosed() tries to obtain lock on connection object:
> {code}
>         if (!conn.isAlive()) {
>           conn.markClosed(new InterruptedIOException("RpcClient is closing"));
>           conn.close();
> {code}
> Another thread, MetaServerShutdownHandler, calls RpcClientImpl$Connection#setupIOstreams() where :
> {code}
>         markClosed(e);
>         close();
> {code}
> Lock on connection object is obtained first, then lock on connections is attempted, leading to deadlock:
> {code}
>       synchronized (connections) {
>         connections.removeValue(remoteId, this);
>       }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)