You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Kannan Muthukkaruppan (JIRA)" <ji...@apache.org> on 2010/09/08 07:46:32 UTC

[jira] Created: (HBASE-2966) HBase client stuck on org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1278) holding regionLockObject lock

HBase client stuck on org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1278) holding regionLockObject lock
-----------------------------------------------------------------------------------------------------------------------

                 Key: HBASE-2966
                 URL: https://issues.apache.org/jira/browse/HBASE-2966
             Project: HBase
          Issue Type: Bug
            Reporter: Kannan Muthukkaruppan


We noticed in one case the HBase client program got stuck on Zookeeper.exists() call.
 
One of the threads was stuck here on the ZK call while holding an HBase level lock (regionLockObject in locateRegionInMeta()).

{code} 
"thrift-0-thread-8" prio=10 tid=0x00007f189ca4c000 nid=0x550f in Object.wait() [0x0000000044241000]
   java.lang.Thread.State: WAITING (on object monitor)
                at java.lang.Object.wait(Native Method)
                at java.lang.Object.wait(Object.java:485)
                at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1278)
                - locked <0x00007f1903a0c280> (a org.apache.zookeeper.ClientCnxn$Packet)
                at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:804)
                at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:837)
                at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.getRSDirectoryCount(ZooKeeperWrapper.java:765)
                at org.apache.hadoop.hbase.client.HTable.getCurrentNrHRS(HTable.java:173)
                at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:147)
                at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:124)
                at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:89)
                at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.prefetchRegionCache(HConnectionManager.java:734)
                at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:785)
                - locked <0x00007f190d868848> (a java.lang.Object)
                at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:679)
                at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:646)
                at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionLocation(HConnectionManager.java:472)
                at org.apache.hadoop.hbase.client.ServerCallable.instantiateServer(ServerCallable.java:57)
                at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:1147)
                at org.apache.hadoop.hbase.client.HTable.get(HTable.java:503)
{code} 

The remaining other threads are all waiting on the regionLockObject lock (held by the above thread) with stacks like:
 
{code}
thrift-0-thread-7" prio=10 tid=0x00007f189ca4a800 nid=0x550e waiting for monitor entry [0x0000000044141000]
   java.lang.Thread.State: BLOCKED (on object monitor)
                at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:783)
                - waiting to lock <0x00007f190d868848> (a java.lang.Object)
                at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:679)
                at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:646)
                at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionLocation(HConnectionManager.java:472)
                at org.apache.hadoop.hbase.client.ServerCallable.instantiateServer(ServerCallable.java:57)
                at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:1147)
                at org.apache.hadoop.hbase.client.HTable.get(HTable.java:503)
{code}

Any ideas?
 
Meanwhile, I'll look into the ZK logs from the relevant time some more and get back if I have more information.
 


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2966) HBase client stuck on org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1278) holding regionLockObject lock

Posted by "Kannan Muthukkaruppan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12908254#action_12908254 ] 

Kannan Muthukkaruppan commented on HBASE-2966:
----------------------------------------------

Todd, Patrick: ZOOKEEPER-846 seems to mention about the issue happening when ZooKeeper session is closed.

In our case, this was an active HBaseClient client working against an HBase cluster, and I would not expect it to be doing a ZK session close. Nor is there a ZK close in the stack I uploaded (unlike the ZOOKEEPER-846 case).

Thoughts?

> HBase client stuck on org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1278) holding regionLockObject lock
> -----------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-2966
>                 URL: https://issues.apache.org/jira/browse/HBASE-2966
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Kannan Muthukkaruppan
>         Attachments: stack.txt
>
>
> We noticed in one case the HBase client program got stuck on Zookeeper.exists() call.
>  
> One of the threads was stuck here on the ZK call while holding an HBase level lock (regionLockObject in locateRegionInMeta()).
> {code} 
> "thrift-0-thread-8" prio=10 tid=0x00007f189ca4c000 nid=0x550f in Object.wait() [0x0000000044241000]
>    java.lang.Thread.State: WAITING (on object monitor)
>                 at java.lang.Object.wait(Native Method)
>                 at java.lang.Object.wait(Object.java:485)
>                 at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1278)
>                 - locked <0x00007f1903a0c280> (a org.apache.zookeeper.ClientCnxn$Packet)
>                 at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:804)
>                 at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:837)
>                 at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.getRSDirectoryCount(ZooKeeperWrapper.java:765)
>                 at org.apache.hadoop.hbase.client.HTable.getCurrentNrHRS(HTable.java:173)
>                 at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:147)
>                 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:124)
>                 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:89)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.prefetchRegionCache(HConnectionManager.java:734)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:785)
>                 - locked <0x00007f190d868848> (a java.lang.Object)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:679)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:646)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionLocation(HConnectionManager.java:472)
>                 at org.apache.hadoop.hbase.client.ServerCallable.instantiateServer(ServerCallable.java:57)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:1147)
>                 at org.apache.hadoop.hbase.client.HTable.get(HTable.java:503)
> {code} 
> The remaining other threads are all waiting on the regionLockObject lock (held by the above thread) with stacks like:
>  
> {code}
> thrift-0-thread-7" prio=10 tid=0x00007f189ca4a800 nid=0x550e waiting for monitor entry [0x0000000044141000]
>    java.lang.Thread.State: BLOCKED (on object monitor)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:783)
>                 - waiting to lock <0x00007f190d868848> (a java.lang.Object)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:679)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:646)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionLocation(HConnectionManager.java:472)
>                 at org.apache.hadoop.hbase.client.ServerCallable.instantiateServer(ServerCallable.java:57)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:1147)
>                 at org.apache.hadoop.hbase.client.HTable.get(HTable.java:503)
> {code}
> Any ideas?
>  
> Meanwhile, I'll look into the ZK logs from the relevant time some more and get back if I have more information.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2966) HBase client stuck on org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1278) holding regionLockObject lock

Posted by "Patrick Hunt (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12907495#action_12907495 ] 

Patrick Hunt commented on HBASE-2966:
-------------------------------------

Kannan: no, version X.Y.Z, Z+ is just a bug fix release, that's fine

> HBase client stuck on org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1278) holding regionLockObject lock
> -----------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-2966
>                 URL: https://issues.apache.org/jira/browse/HBASE-2966
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Kannan Muthukkaruppan
>         Attachments: stack.txt
>
>
> We noticed in one case the HBase client program got stuck on Zookeeper.exists() call.
>  
> One of the threads was stuck here on the ZK call while holding an HBase level lock (regionLockObject in locateRegionInMeta()).
> {code} 
> "thrift-0-thread-8" prio=10 tid=0x00007f189ca4c000 nid=0x550f in Object.wait() [0x0000000044241000]
>    java.lang.Thread.State: WAITING (on object monitor)
>                 at java.lang.Object.wait(Native Method)
>                 at java.lang.Object.wait(Object.java:485)
>                 at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1278)
>                 - locked <0x00007f1903a0c280> (a org.apache.zookeeper.ClientCnxn$Packet)
>                 at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:804)
>                 at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:837)
>                 at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.getRSDirectoryCount(ZooKeeperWrapper.java:765)
>                 at org.apache.hadoop.hbase.client.HTable.getCurrentNrHRS(HTable.java:173)
>                 at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:147)
>                 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:124)
>                 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:89)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.prefetchRegionCache(HConnectionManager.java:734)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:785)
>                 - locked <0x00007f190d868848> (a java.lang.Object)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:679)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:646)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionLocation(HConnectionManager.java:472)
>                 at org.apache.hadoop.hbase.client.ServerCallable.instantiateServer(ServerCallable.java:57)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:1147)
>                 at org.apache.hadoop.hbase.client.HTable.get(HTable.java:503)
> {code} 
> The remaining other threads are all waiting on the regionLockObject lock (held by the above thread) with stacks like:
>  
> {code}
> thrift-0-thread-7" prio=10 tid=0x00007f189ca4a800 nid=0x550e waiting for monitor entry [0x0000000044141000]
>    java.lang.Thread.State: BLOCKED (on object monitor)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:783)
>                 - waiting to lock <0x00007f190d868848> (a java.lang.Object)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:679)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:646)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionLocation(HConnectionManager.java:472)
>                 at org.apache.hadoop.hbase.client.ServerCallable.instantiateServer(ServerCallable.java:57)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:1147)
>                 at org.apache.hadoop.hbase.client.HTable.get(HTable.java:503)
> {code}
> Any ideas?
>  
> Meanwhile, I'll look into the ZK logs from the relevant time some more and get back if I have more information.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2966) HBase client stuck on org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1278) holding regionLockObject lock

Posted by "Kannan Muthukkaruppan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12909028#action_12909028 ] 

Kannan Muthukkaruppan commented on HBASE-2966:
----------------------------------------------

Patrick: Makes sense. 

Is 3.3.2 going to be out soon? Or should we look into manually applying these to 3.3.1. 
(I suppose we need both ZOOKEEPER-846 & ZOOKEEPER-795).

Todd/Stack et. al: What are your plans around picking up a fix for these ZK issue?

> HBase client stuck on org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1278) holding regionLockObject lock
> -----------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-2966
>                 URL: https://issues.apache.org/jira/browse/HBASE-2966
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Kannan Muthukkaruppan
>         Attachments: stack.txt
>
>
> We noticed in one case the HBase client program got stuck on Zookeeper.exists() call.
>  
> One of the threads was stuck here on the ZK call while holding an HBase level lock (regionLockObject in locateRegionInMeta()).
> {code} 
> "thrift-0-thread-8" prio=10 tid=0x00007f189ca4c000 nid=0x550f in Object.wait() [0x0000000044241000]
>    java.lang.Thread.State: WAITING (on object monitor)
>                 at java.lang.Object.wait(Native Method)
>                 at java.lang.Object.wait(Object.java:485)
>                 at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1278)
>                 - locked <0x00007f1903a0c280> (a org.apache.zookeeper.ClientCnxn$Packet)
>                 at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:804)
>                 at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:837)
>                 at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.getRSDirectoryCount(ZooKeeperWrapper.java:765)
>                 at org.apache.hadoop.hbase.client.HTable.getCurrentNrHRS(HTable.java:173)
>                 at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:147)
>                 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:124)
>                 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:89)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.prefetchRegionCache(HConnectionManager.java:734)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:785)
>                 - locked <0x00007f190d868848> (a java.lang.Object)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:679)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:646)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionLocation(HConnectionManager.java:472)
>                 at org.apache.hadoop.hbase.client.ServerCallable.instantiateServer(ServerCallable.java:57)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:1147)
>                 at org.apache.hadoop.hbase.client.HTable.get(HTable.java:503)
> {code} 
> The remaining other threads are all waiting on the regionLockObject lock (held by the above thread) with stacks like:
>  
> {code}
> thrift-0-thread-7" prio=10 tid=0x00007f189ca4a800 nid=0x550e waiting for monitor entry [0x0000000044141000]
>    java.lang.Thread.State: BLOCKED (on object monitor)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:783)
>                 - waiting to lock <0x00007f190d868848> (a java.lang.Object)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:679)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:646)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionLocation(HConnectionManager.java:472)
>                 at org.apache.hadoop.hbase.client.ServerCallable.instantiateServer(ServerCallable.java:57)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:1147)
>                 at org.apache.hadoop.hbase.client.HTable.get(HTable.java:503)
> {code}
> Any ideas?
>  
> Meanwhile, I'll look into the ZK logs from the relevant time some more and get back if I have more information.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2966) HBase client stuck on org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1278) holding regionLockObject lock

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12909073#action_12909073 ] 

stack commented on HBASE-2966:
------------------------------

@Kannan We are ZK guinea pigs -- we'll run anything Patrick or Mahadev tell us to.  Patrick, anything you want us to try or should we patch up something w/ the above two issues?

> HBase client stuck on org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1278) holding regionLockObject lock
> -----------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-2966
>                 URL: https://issues.apache.org/jira/browse/HBASE-2966
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Kannan Muthukkaruppan
>         Attachments: stack.txt
>
>
> We noticed in one case the HBase client program got stuck on Zookeeper.exists() call.
>  
> One of the threads was stuck here on the ZK call while holding an HBase level lock (regionLockObject in locateRegionInMeta()).
> {code} 
> "thrift-0-thread-8" prio=10 tid=0x00007f189ca4c000 nid=0x550f in Object.wait() [0x0000000044241000]
>    java.lang.Thread.State: WAITING (on object monitor)
>                 at java.lang.Object.wait(Native Method)
>                 at java.lang.Object.wait(Object.java:485)
>                 at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1278)
>                 - locked <0x00007f1903a0c280> (a org.apache.zookeeper.ClientCnxn$Packet)
>                 at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:804)
>                 at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:837)
>                 at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.getRSDirectoryCount(ZooKeeperWrapper.java:765)
>                 at org.apache.hadoop.hbase.client.HTable.getCurrentNrHRS(HTable.java:173)
>                 at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:147)
>                 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:124)
>                 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:89)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.prefetchRegionCache(HConnectionManager.java:734)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:785)
>                 - locked <0x00007f190d868848> (a java.lang.Object)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:679)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:646)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionLocation(HConnectionManager.java:472)
>                 at org.apache.hadoop.hbase.client.ServerCallable.instantiateServer(ServerCallable.java:57)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:1147)
>                 at org.apache.hadoop.hbase.client.HTable.get(HTable.java:503)
> {code} 
> The remaining other threads are all waiting on the regionLockObject lock (held by the above thread) with stacks like:
>  
> {code}
> thrift-0-thread-7" prio=10 tid=0x00007f189ca4a800 nid=0x550e waiting for monitor entry [0x0000000044141000]
>    java.lang.Thread.State: BLOCKED (on object monitor)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:783)
>                 - waiting to lock <0x00007f190d868848> (a java.lang.Object)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:679)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:646)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionLocation(HConnectionManager.java:472)
>                 at org.apache.hadoop.hbase.client.ServerCallable.instantiateServer(ServerCallable.java:57)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:1147)
>                 at org.apache.hadoop.hbase.client.HTable.get(HTable.java:503)
> {code}
> Any ideas?
>  
> Meanwhile, I'll look into the ZK logs from the relevant time some more and get back if I have more information.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2966) HBase client stuck on org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1278) holding regionLockObject lock

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12907100#action_12907100 ] 

Todd Lipcon commented on HBASE-2966:
------------------------------------

I replied on IRC but guess you missed it.. got a stack trace for the other ZK threads? There should be a SendThread and an EventThread

> HBase client stuck on org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1278) holding regionLockObject lock
> -----------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-2966
>                 URL: https://issues.apache.org/jira/browse/HBASE-2966
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Kannan Muthukkaruppan
>
> We noticed in one case the HBase client program got stuck on Zookeeper.exists() call.
>  
> One of the threads was stuck here on the ZK call while holding an HBase level lock (regionLockObject in locateRegionInMeta()).
> {code} 
> "thrift-0-thread-8" prio=10 tid=0x00007f189ca4c000 nid=0x550f in Object.wait() [0x0000000044241000]
>    java.lang.Thread.State: WAITING (on object monitor)
>                 at java.lang.Object.wait(Native Method)
>                 at java.lang.Object.wait(Object.java:485)
>                 at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1278)
>                 - locked <0x00007f1903a0c280> (a org.apache.zookeeper.ClientCnxn$Packet)
>                 at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:804)
>                 at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:837)
>                 at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.getRSDirectoryCount(ZooKeeperWrapper.java:765)
>                 at org.apache.hadoop.hbase.client.HTable.getCurrentNrHRS(HTable.java:173)
>                 at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:147)
>                 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:124)
>                 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:89)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.prefetchRegionCache(HConnectionManager.java:734)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:785)
>                 - locked <0x00007f190d868848> (a java.lang.Object)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:679)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:646)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionLocation(HConnectionManager.java:472)
>                 at org.apache.hadoop.hbase.client.ServerCallable.instantiateServer(ServerCallable.java:57)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:1147)
>                 at org.apache.hadoop.hbase.client.HTable.get(HTable.java:503)
> {code} 
> The remaining other threads are all waiting on the regionLockObject lock (held by the above thread) with stacks like:
>  
> {code}
> thrift-0-thread-7" prio=10 tid=0x00007f189ca4a800 nid=0x550e waiting for monitor entry [0x0000000044141000]
>    java.lang.Thread.State: BLOCKED (on object monitor)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:783)
>                 - waiting to lock <0x00007f190d868848> (a java.lang.Object)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:679)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:646)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionLocation(HConnectionManager.java:472)
>                 at org.apache.hadoop.hbase.client.ServerCallable.instantiateServer(ServerCallable.java:57)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:1147)
>                 at org.apache.hadoop.hbase.client.HTable.get(HTable.java:503)
> {code}
> Any ideas?
>  
> Meanwhile, I'll look into the ZK logs from the relevant time some more and get back if I have more information.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2966) HBase client stuck on org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1278) holding regionLockObject lock

Posted by "Patrick Hunt (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914096#action_12914096 ] 

Patrick Hunt commented on HBASE-2966:
-------------------------------------

A fix release for 3.3.2 is in progress on the zk dev list if you'd like to follow along. Hopefully we'll get this out soon. You could try the current ZK branch-3.3 which includes fixes for this issue (and others).

> HBase client stuck on org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1278) holding regionLockObject lock
> -----------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-2966
>                 URL: https://issues.apache.org/jira/browse/HBASE-2966
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Kannan Muthukkaruppan
>         Attachments: stack.txt
>
>
> We noticed in one case the HBase client program got stuck on Zookeeper.exists() call.
>  
> One of the threads was stuck here on the ZK call while holding an HBase level lock (regionLockObject in locateRegionInMeta()).
> {code} 
> "thrift-0-thread-8" prio=10 tid=0x00007f189ca4c000 nid=0x550f in Object.wait() [0x0000000044241000]
>    java.lang.Thread.State: WAITING (on object monitor)
>                 at java.lang.Object.wait(Native Method)
>                 at java.lang.Object.wait(Object.java:485)
>                 at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1278)
>                 - locked <0x00007f1903a0c280> (a org.apache.zookeeper.ClientCnxn$Packet)
>                 at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:804)
>                 at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:837)
>                 at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.getRSDirectoryCount(ZooKeeperWrapper.java:765)
>                 at org.apache.hadoop.hbase.client.HTable.getCurrentNrHRS(HTable.java:173)
>                 at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:147)
>                 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:124)
>                 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:89)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.prefetchRegionCache(HConnectionManager.java:734)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:785)
>                 - locked <0x00007f190d868848> (a java.lang.Object)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:679)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:646)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionLocation(HConnectionManager.java:472)
>                 at org.apache.hadoop.hbase.client.ServerCallable.instantiateServer(ServerCallable.java:57)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:1147)
>                 at org.apache.hadoop.hbase.client.HTable.get(HTable.java:503)
> {code} 
> The remaining other threads are all waiting on the regionLockObject lock (held by the above thread) with stacks like:
>  
> {code}
> thrift-0-thread-7" prio=10 tid=0x00007f189ca4a800 nid=0x550e waiting for monitor entry [0x0000000044141000]
>    java.lang.Thread.State: BLOCKED (on object monitor)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:783)
>                 - waiting to lock <0x00007f190d868848> (a java.lang.Object)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:679)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:646)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionLocation(HConnectionManager.java:472)
>                 at org.apache.hadoop.hbase.client.ServerCallable.instantiateServer(ServerCallable.java:57)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:1147)
>                 at org.apache.hadoop.hbase.client.HTable.get(HTable.java:503)
> {code}
> Any ideas?
>  
> Meanwhile, I'll look into the ZK logs from the relevant time some more and get back if I have more information.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2966) HBase client stuck on org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1278) holding regionLockObject lock

Posted by "Patrick Hunt (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12907524#action_12907524 ] 

Patrick Hunt commented on HBASE-2966:
-------------------------------------

I believe I've found the problem. Documented on ZOOKEEPER-846
https://issues.apache.org/jira/browse/ZOOKEEPER-846?focusedCommentId=12907523&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12907523

This is a blocker, we'll be fixing this as part of 3.3.2

> HBase client stuck on org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1278) holding regionLockObject lock
> -----------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-2966
>                 URL: https://issues.apache.org/jira/browse/HBASE-2966
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Kannan Muthukkaruppan
>         Attachments: stack.txt
>
>
> We noticed in one case the HBase client program got stuck on Zookeeper.exists() call.
>  
> One of the threads was stuck here on the ZK call while holding an HBase level lock (regionLockObject in locateRegionInMeta()).
> {code} 
> "thrift-0-thread-8" prio=10 tid=0x00007f189ca4c000 nid=0x550f in Object.wait() [0x0000000044241000]
>    java.lang.Thread.State: WAITING (on object monitor)
>                 at java.lang.Object.wait(Native Method)
>                 at java.lang.Object.wait(Object.java:485)
>                 at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1278)
>                 - locked <0x00007f1903a0c280> (a org.apache.zookeeper.ClientCnxn$Packet)
>                 at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:804)
>                 at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:837)
>                 at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.getRSDirectoryCount(ZooKeeperWrapper.java:765)
>                 at org.apache.hadoop.hbase.client.HTable.getCurrentNrHRS(HTable.java:173)
>                 at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:147)
>                 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:124)
>                 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:89)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.prefetchRegionCache(HConnectionManager.java:734)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:785)
>                 - locked <0x00007f190d868848> (a java.lang.Object)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:679)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:646)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionLocation(HConnectionManager.java:472)
>                 at org.apache.hadoop.hbase.client.ServerCallable.instantiateServer(ServerCallable.java:57)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:1147)
>                 at org.apache.hadoop.hbase.client.HTable.get(HTable.java:503)
> {code} 
> The remaining other threads are all waiting on the regionLockObject lock (held by the above thread) with stacks like:
>  
> {code}
> thrift-0-thread-7" prio=10 tid=0x00007f189ca4a800 nid=0x550e waiting for monitor entry [0x0000000044141000]
>    java.lang.Thread.State: BLOCKED (on object monitor)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:783)
>                 - waiting to lock <0x00007f190d868848> (a java.lang.Object)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:679)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:646)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionLocation(HConnectionManager.java:472)
>                 at org.apache.hadoop.hbase.client.ServerCallable.instantiateServer(ServerCallable.java:57)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:1147)
>                 at org.apache.hadoop.hbase.client.HTable.get(HTable.java:503)
> {code}
> Any ideas?
>  
> Meanwhile, I'll look into the ZK logs from the relevant time some more and get back if I have more information.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-2966) HBase client stuck on org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1278) holding regionLockObject lock

Posted by "Kannan Muthukkaruppan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-2966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kannan Muthukkaruppan updated HBASE-2966:
-----------------------------------------

    Attachment: stack.txt

Todd: Find attached client side program's jstack dump. I see a bunch of "EventThread"s, but don't see a sendThread.

> HBase client stuck on org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1278) holding regionLockObject lock
> -----------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-2966
>                 URL: https://issues.apache.org/jira/browse/HBASE-2966
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Kannan Muthukkaruppan
>         Attachments: stack.txt
>
>
> We noticed in one case the HBase client program got stuck on Zookeeper.exists() call.
>  
> One of the threads was stuck here on the ZK call while holding an HBase level lock (regionLockObject in locateRegionInMeta()).
> {code} 
> "thrift-0-thread-8" prio=10 tid=0x00007f189ca4c000 nid=0x550f in Object.wait() [0x0000000044241000]
>    java.lang.Thread.State: WAITING (on object monitor)
>                 at java.lang.Object.wait(Native Method)
>                 at java.lang.Object.wait(Object.java:485)
>                 at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1278)
>                 - locked <0x00007f1903a0c280> (a org.apache.zookeeper.ClientCnxn$Packet)
>                 at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:804)
>                 at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:837)
>                 at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.getRSDirectoryCount(ZooKeeperWrapper.java:765)
>                 at org.apache.hadoop.hbase.client.HTable.getCurrentNrHRS(HTable.java:173)
>                 at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:147)
>                 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:124)
>                 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:89)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.prefetchRegionCache(HConnectionManager.java:734)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:785)
>                 - locked <0x00007f190d868848> (a java.lang.Object)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:679)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:646)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionLocation(HConnectionManager.java:472)
>                 at org.apache.hadoop.hbase.client.ServerCallable.instantiateServer(ServerCallable.java:57)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:1147)
>                 at org.apache.hadoop.hbase.client.HTable.get(HTable.java:503)
> {code} 
> The remaining other threads are all waiting on the regionLockObject lock (held by the above thread) with stacks like:
>  
> {code}
> thrift-0-thread-7" prio=10 tid=0x00007f189ca4a800 nid=0x550e waiting for monitor entry [0x0000000044141000]
>    java.lang.Thread.State: BLOCKED (on object monitor)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:783)
>                 - waiting to lock <0x00007f190d868848> (a java.lang.Object)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:679)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:646)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionLocation(HConnectionManager.java:472)
>                 at org.apache.hadoop.hbase.client.ServerCallable.instantiateServer(ServerCallable.java:57)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:1147)
>                 at org.apache.hadoop.hbase.client.HTable.get(HTable.java:503)
> {code}
> Any ideas?
>  
> Meanwhile, I'll look into the ZK logs from the relevant time some more and get back if I have more information.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2966) HBase client stuck on org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1278) holding regionLockObject lock

Posted by "Patrick Hunt (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12907395#action_12907395 ] 

Patrick Hunt commented on HBASE-2966:
-------------------------------------

Do you have the logs from the client side? (zk) would be helpful.

What version of zk client library is being used?

> HBase client stuck on org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1278) holding regionLockObject lock
> -----------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-2966
>                 URL: https://issues.apache.org/jira/browse/HBASE-2966
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Kannan Muthukkaruppan
>         Attachments: stack.txt
>
>
> We noticed in one case the HBase client program got stuck on Zookeeper.exists() call.
>  
> One of the threads was stuck here on the ZK call while holding an HBase level lock (regionLockObject in locateRegionInMeta()).
> {code} 
> "thrift-0-thread-8" prio=10 tid=0x00007f189ca4c000 nid=0x550f in Object.wait() [0x0000000044241000]
>    java.lang.Thread.State: WAITING (on object monitor)
>                 at java.lang.Object.wait(Native Method)
>                 at java.lang.Object.wait(Object.java:485)
>                 at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1278)
>                 - locked <0x00007f1903a0c280> (a org.apache.zookeeper.ClientCnxn$Packet)
>                 at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:804)
>                 at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:837)
>                 at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.getRSDirectoryCount(ZooKeeperWrapper.java:765)
>                 at org.apache.hadoop.hbase.client.HTable.getCurrentNrHRS(HTable.java:173)
>                 at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:147)
>                 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:124)
>                 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:89)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.prefetchRegionCache(HConnectionManager.java:734)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:785)
>                 - locked <0x00007f190d868848> (a java.lang.Object)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:679)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:646)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionLocation(HConnectionManager.java:472)
>                 at org.apache.hadoop.hbase.client.ServerCallable.instantiateServer(ServerCallable.java:57)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:1147)
>                 at org.apache.hadoop.hbase.client.HTable.get(HTable.java:503)
> {code} 
> The remaining other threads are all waiting on the regionLockObject lock (held by the above thread) with stacks like:
>  
> {code}
> thrift-0-thread-7" prio=10 tid=0x00007f189ca4a800 nid=0x550e waiting for monitor entry [0x0000000044141000]
>    java.lang.Thread.State: BLOCKED (on object monitor)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:783)
>                 - waiting to lock <0x00007f190d868848> (a java.lang.Object)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:679)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:646)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionLocation(HConnectionManager.java:472)
>                 at org.apache.hadoop.hbase.client.ServerCallable.instantiateServer(ServerCallable.java:57)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:1147)
>                 at org.apache.hadoop.hbase.client.HTable.get(HTable.java:503)
> {code}
> Any ideas?
>  
> Meanwhile, I'll look into the ZK logs from the relevant time some more and get back if I have more information.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2966) HBase client stuck on org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1278) holding regionLockObject lock

Posted by "Jonathan Gray (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12907420#action_12907420 ] 

Jonathan Gray commented on HBASE-2966:
--------------------------------------

It's ZK 3.3.1.

I don't think we were capturing client-side logs on this run.  We're trying to recreate now.

> HBase client stuck on org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1278) holding regionLockObject lock
> -----------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-2966
>                 URL: https://issues.apache.org/jira/browse/HBASE-2966
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Kannan Muthukkaruppan
>         Attachments: stack.txt
>
>
> We noticed in one case the HBase client program got stuck on Zookeeper.exists() call.
>  
> One of the threads was stuck here on the ZK call while holding an HBase level lock (regionLockObject in locateRegionInMeta()).
> {code} 
> "thrift-0-thread-8" prio=10 tid=0x00007f189ca4c000 nid=0x550f in Object.wait() [0x0000000044241000]
>    java.lang.Thread.State: WAITING (on object monitor)
>                 at java.lang.Object.wait(Native Method)
>                 at java.lang.Object.wait(Object.java:485)
>                 at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1278)
>                 - locked <0x00007f1903a0c280> (a org.apache.zookeeper.ClientCnxn$Packet)
>                 at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:804)
>                 at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:837)
>                 at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.getRSDirectoryCount(ZooKeeperWrapper.java:765)
>                 at org.apache.hadoop.hbase.client.HTable.getCurrentNrHRS(HTable.java:173)
>                 at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:147)
>                 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:124)
>                 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:89)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.prefetchRegionCache(HConnectionManager.java:734)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:785)
>                 - locked <0x00007f190d868848> (a java.lang.Object)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:679)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:646)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionLocation(HConnectionManager.java:472)
>                 at org.apache.hadoop.hbase.client.ServerCallable.instantiateServer(ServerCallable.java:57)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:1147)
>                 at org.apache.hadoop.hbase.client.HTable.get(HTable.java:503)
> {code} 
> The remaining other threads are all waiting on the regionLockObject lock (held by the above thread) with stacks like:
>  
> {code}
> thrift-0-thread-7" prio=10 tid=0x00007f189ca4a800 nid=0x550e waiting for monitor entry [0x0000000044141000]
>    java.lang.Thread.State: BLOCKED (on object monitor)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:783)
>                 - waiting to lock <0x00007f190d868848> (a java.lang.Object)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:679)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:646)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionLocation(HConnectionManager.java:472)
>                 at org.apache.hadoop.hbase.client.ServerCallable.instantiateServer(ServerCallable.java:57)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:1147)
>                 at org.apache.hadoop.hbase.client.HTable.get(HTable.java:503)
> {code}
> Any ideas?
>  
> Meanwhile, I'll look into the ZK logs from the relevant time some more and get back if I have more information.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2966) HBase client stuck on org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1278) holding regionLockObject lock

Posted by "Patrick Hunt (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12908903#action_12908903 ] 

Patrick Hunt commented on HBASE-2966:
-------------------------------------

Hi Kannan, one of the reasons I asked about the log was to verify the scenario I found. Since it's not available we can only speculate, however I feel pretty confident this is the same issue.

Notice in the stack dump that there is no "SendThread" anywhere, although there are a large number of event threads. This indicates that something was going on perhaps session expirations given that the "event thread not shutting down" is triggered by session expiration, ZOOKEEPER-795 detailed earlier in this jira. Notice that "exists" is the hanging operation in the stack dump of this JIRA (vs close in 846), however the issue in both cases (hang) is the same underlying problem - both close and exists queue packets to be sent to the server, they can hang if the queue is not cleaned up properly. One discrepancy is that that "sendthread" should only shut down on client issued close (or zk state closed, which doesn't trigger this bug). If there is no way that your code is calling close then this bug should not be triggered, but w/o the logs it's hard to me to speculate. Is it possible that close was called due to some network issue? (error handling due to the network instability causing the session epirations say).




> HBase client stuck on org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1278) holding regionLockObject lock
> -----------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-2966
>                 URL: https://issues.apache.org/jira/browse/HBASE-2966
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Kannan Muthukkaruppan
>         Attachments: stack.txt
>
>
> We noticed in one case the HBase client program got stuck on Zookeeper.exists() call.
>  
> One of the threads was stuck here on the ZK call while holding an HBase level lock (regionLockObject in locateRegionInMeta()).
> {code} 
> "thrift-0-thread-8" prio=10 tid=0x00007f189ca4c000 nid=0x550f in Object.wait() [0x0000000044241000]
>    java.lang.Thread.State: WAITING (on object monitor)
>                 at java.lang.Object.wait(Native Method)
>                 at java.lang.Object.wait(Object.java:485)
>                 at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1278)
>                 - locked <0x00007f1903a0c280> (a org.apache.zookeeper.ClientCnxn$Packet)
>                 at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:804)
>                 at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:837)
>                 at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.getRSDirectoryCount(ZooKeeperWrapper.java:765)
>                 at org.apache.hadoop.hbase.client.HTable.getCurrentNrHRS(HTable.java:173)
>                 at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:147)
>                 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:124)
>                 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:89)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.prefetchRegionCache(HConnectionManager.java:734)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:785)
>                 - locked <0x00007f190d868848> (a java.lang.Object)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:679)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:646)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionLocation(HConnectionManager.java:472)
>                 at org.apache.hadoop.hbase.client.ServerCallable.instantiateServer(ServerCallable.java:57)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:1147)
>                 at org.apache.hadoop.hbase.client.HTable.get(HTable.java:503)
> {code} 
> The remaining other threads are all waiting on the regionLockObject lock (held by the above thread) with stacks like:
>  
> {code}
> thrift-0-thread-7" prio=10 tid=0x00007f189ca4a800 nid=0x550e waiting for monitor entry [0x0000000044141000]
>    java.lang.Thread.State: BLOCKED (on object monitor)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:783)
>                 - waiting to lock <0x00007f190d868848> (a java.lang.Object)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:679)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:646)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionLocation(HConnectionManager.java:472)
>                 at org.apache.hadoop.hbase.client.ServerCallable.instantiateServer(ServerCallable.java:57)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:1147)
>                 at org.apache.hadoop.hbase.client.HTable.get(HTable.java:503)
> {code}
> Any ideas?
>  
> Meanwhile, I'll look into the ZK logs from the relevant time some more and get back if I have more information.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2966) HBase client stuck on org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1278) holding regionLockObject lock

Posted by "Kannan Muthukkaruppan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12907475#action_12907475 ] 

Kannan Muthukkaruppan commented on HBASE-2966:
----------------------------------------------

phunt: I just learned that the client was 3.3.0 but server was 3.3.1. Wondering if that could cause this type of issue? 

> HBase client stuck on org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1278) holding regionLockObject lock
> -----------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-2966
>                 URL: https://issues.apache.org/jira/browse/HBASE-2966
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Kannan Muthukkaruppan
>         Attachments: stack.txt
>
>
> We noticed in one case the HBase client program got stuck on Zookeeper.exists() call.
>  
> One of the threads was stuck here on the ZK call while holding an HBase level lock (regionLockObject in locateRegionInMeta()).
> {code} 
> "thrift-0-thread-8" prio=10 tid=0x00007f189ca4c000 nid=0x550f in Object.wait() [0x0000000044241000]
>    java.lang.Thread.State: WAITING (on object monitor)
>                 at java.lang.Object.wait(Native Method)
>                 at java.lang.Object.wait(Object.java:485)
>                 at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1278)
>                 - locked <0x00007f1903a0c280> (a org.apache.zookeeper.ClientCnxn$Packet)
>                 at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:804)
>                 at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:837)
>                 at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.getRSDirectoryCount(ZooKeeperWrapper.java:765)
>                 at org.apache.hadoop.hbase.client.HTable.getCurrentNrHRS(HTable.java:173)
>                 at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:147)
>                 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:124)
>                 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:89)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.prefetchRegionCache(HConnectionManager.java:734)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:785)
>                 - locked <0x00007f190d868848> (a java.lang.Object)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:679)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:646)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionLocation(HConnectionManager.java:472)
>                 at org.apache.hadoop.hbase.client.ServerCallable.instantiateServer(ServerCallable.java:57)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:1147)
>                 at org.apache.hadoop.hbase.client.HTable.get(HTable.java:503)
> {code} 
> The remaining other threads are all waiting on the regionLockObject lock (held by the above thread) with stacks like:
>  
> {code}
> thrift-0-thread-7" prio=10 tid=0x00007f189ca4a800 nid=0x550e waiting for monitor entry [0x0000000044141000]
>    java.lang.Thread.State: BLOCKED (on object monitor)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:783)
>                 - waiting to lock <0x00007f190d868848> (a java.lang.Object)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:679)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:646)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionLocation(HConnectionManager.java:472)
>                 at org.apache.hadoop.hbase.client.ServerCallable.instantiateServer(ServerCallable.java:57)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:1147)
>                 at org.apache.hadoop.hbase.client.HTable.get(HTable.java:503)
> {code}
> Any ideas?
>  
> Meanwhile, I'll look into the ZK logs from the relevant time some more and get back if I have more information.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2966) HBase client stuck on org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1278) holding regionLockObject lock

Posted by "Patrick Hunt (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12907462#action_12907462 ] 

Patrick Hunt commented on HBASE-2966:
-------------------------------------

btw, the large number of event threads in the stack dump is related to this issue, which is fixed in 3.3.2 (not yet released):
https://issues.apache.org/jira/browse/ZOOKEEPER-795
however that's unrelated to this specific issue.

> HBase client stuck on org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1278) holding regionLockObject lock
> -----------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-2966
>                 URL: https://issues.apache.org/jira/browse/HBASE-2966
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Kannan Muthukkaruppan
>         Attachments: stack.txt
>
>
> We noticed in one case the HBase client program got stuck on Zookeeper.exists() call.
>  
> One of the threads was stuck here on the ZK call while holding an HBase level lock (regionLockObject in locateRegionInMeta()).
> {code} 
> "thrift-0-thread-8" prio=10 tid=0x00007f189ca4c000 nid=0x550f in Object.wait() [0x0000000044241000]
>    java.lang.Thread.State: WAITING (on object monitor)
>                 at java.lang.Object.wait(Native Method)
>                 at java.lang.Object.wait(Object.java:485)
>                 at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1278)
>                 - locked <0x00007f1903a0c280> (a org.apache.zookeeper.ClientCnxn$Packet)
>                 at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:804)
>                 at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:837)
>                 at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.getRSDirectoryCount(ZooKeeperWrapper.java:765)
>                 at org.apache.hadoop.hbase.client.HTable.getCurrentNrHRS(HTable.java:173)
>                 at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:147)
>                 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:124)
>                 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:89)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.prefetchRegionCache(HConnectionManager.java:734)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:785)
>                 - locked <0x00007f190d868848> (a java.lang.Object)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:679)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:646)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionLocation(HConnectionManager.java:472)
>                 at org.apache.hadoop.hbase.client.ServerCallable.instantiateServer(ServerCallable.java:57)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:1147)
>                 at org.apache.hadoop.hbase.client.HTable.get(HTable.java:503)
> {code} 
> The remaining other threads are all waiting on the regionLockObject lock (held by the above thread) with stacks like:
>  
> {code}
> thrift-0-thread-7" prio=10 tid=0x00007f189ca4a800 nid=0x550e waiting for monitor entry [0x0000000044141000]
>    java.lang.Thread.State: BLOCKED (on object monitor)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:783)
>                 - waiting to lock <0x00007f190d868848> (a java.lang.Object)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:679)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:646)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionLocation(HConnectionManager.java:472)
>                 at org.apache.hadoop.hbase.client.ServerCallable.instantiateServer(ServerCallable.java:57)
>                 at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:1147)
>                 at org.apache.hadoop.hbase.client.HTable.get(HTable.java:503)
> {code}
> Any ideas?
>  
> Meanwhile, I'll look into the ZK logs from the relevant time some more and get back if I have more information.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.