You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Prathyusha (Jira)" <ji...@apache.org> on 2021/01/14 19:22:00 UTC

[jira] [Comment Edited] (HBASE-24972) Wait for connection attempt to succeed before performing operations on ZK

    [ https://issues.apache.org/jira/browse/HBASE-24972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17265148#comment-17265148 ] 

Prathyusha edited comment on HBASE-24972 at 1/14/21, 7:21 PM:
--------------------------------------------------------------

[~stack] Below is the stack trace of a failure incident we have seen -


 Cause: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/table/SYSTEM.CATALOG
 StackTrace: 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
 org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
 org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1337)
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:354)
 org.apache.hadoop.hbase.zookeeper.ZKUtil.getData(ZKUtil.java:625)
 ...
 StackTraceId: 429763122


 But yes, I see the retries in place where ever we are doing write operations. [~sandeep.guggilam] These retries should suffice I guess. Any thoughts?


was (Author: prathyu6):
[~stack] Below is the stack trace of a failure incident we have seen -
Cause: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/table/SYSTEM.CATALOG
StackTrace: 
org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1337)
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:354)
org.apache.hadoop.hbase.zookeeper.ZKUtil.getData(ZKUtil.java:625)
...
StackTraceId: 429763122
But yes, I see the retries in place where ever we are doing write operations. [~sandeep.guggilam] These retries should suffice I guess. Any thoughts?

> Wait for connection attempt to succeed before performing operations on ZK
> -------------------------------------------------------------------------
>
>                 Key: HBASE-24972
>                 URL: https://issues.apache.org/jira/browse/HBASE-24972
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Sandeep Guggilam
>            Assignee: Prathyusha
>            Priority: Minor
>
> {color:#1d1c1d}Creating the connection with ZK  is asynchronous and notified via the passed in watcher about the  successful connection event. When we attempt any operations, we try to create a connection and then perform a read/write ({color}{color:#1d1c1d}[https://github.com/apache/hbase/blob/979edfe72046b2075adcc869c65ae820e6f3ec2d/hbase-zookeeper/src/main/java/org/apache/hadoop/hbase/zookeeper/RecoverableZooKeeper.java#L323]{color}{color:#1d1c1d}) without really waiting for the notification event ([https://github.com/apache/hbase/blob/979edfe72046b2075adcc869c65ae820e6f3ec2d/hbase-zookeeper/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKWatcher.java#L582)]{color}
>  
> {color:#1d1c1d}It is possible we get ConnectionLoss errors when we perform operations on ZK without waiting for the connection attempt to succeed{color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)