You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Hudson (Jira)" <ji...@apache.org> on 2021/03/23 15:10:01 UTC

[jira] [Commented] (HBASE-25627) HBase replication should have a metric to represent if the source is stuck getting initialized

    [ https://issues.apache.org/jira/browse/HBASE-25627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17307154#comment-17307154 ] 

Hudson commented on HBASE-25627:
--------------------------------

Results for branch branch-1
	[build #101 on builds.a.o|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-1/101/]: (x) *{color:red}-1 overall{color}*
----
details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-1/101//General_Nightly_Build_Report/]


(x) {color:red}-1 jdk7 checks{color}
-- For more information [see jdk7 report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-1/101//JDK7_Nightly_Build_Report/]


(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-1/101//JDK8_Nightly_Build_Report_(Hadoop2)/]




(x) {color:red}-1 source release artifact{color}
-- See build output for details.


> HBase replication should have a metric to represent if the source is stuck getting initialized
> ----------------------------------------------------------------------------------------------
>
>                 Key: HBASE-25627
>                 URL: https://issues.apache.org/jira/browse/HBASE-25627
>             Project: HBase
>          Issue Type: Improvement
>          Components: Replication
>    Affects Versions: 3.0.0-alpha-1, 1.7.0, 2.5.0, 2.3.5, 2.4.3
>            Reporter: Sandeep Pal
>            Assignee: Sandeep Pal
>            Priority: Major
>             Fix For: 3.0.0-alpha-1, 1.7.0, 2.5.0, 2.4.3
>
>
> There can be situation when the cluster is not able to talk to peer cluster ZK, in that case, yes the logQueue will be accumulating but without digging into the logs, we cannot know what's the reason of loqQueue getting accumulating on the source. 
> Since the replication source doesn't even start the shipper in this case, it is good to have a dedicated metric if the RS cannot talk to the peer's ZK at all. 
>  
> {code:java}
> 2021-03-03 04:02:10,704 DEBUG [peerId] zookeeper.RecoverableZooKeeper - Possibly transient ZooKeeper, quorum=zookeeper-0.zookeeper-a.fakeAddress:2181,zookeeper-1.zookeeper-a.fakeAddress:2181,zookeeper-2.zookeeper-a.fakeAddress:2181,zookeeper-3.zookeeper-a.fakeAddress:2181,zookeeper-4.zookeeper-a.fakeAddress:2181, exception=org.apache.zookeeper.KeeperException$AuthFailedException: KeeperErrorCode = AuthFailed for /hbase/hbaseid2021-03-03 04:02:10,704 DEBUG [peerId] zookeeper.RecoverableZooKeeper - Possibly transient ZooKeeper, quorum=zookeeper-0.zookeeper-a.fakeAddress:2181,zookeeper-1.zookeeper-a.fakeAddress:2181,zookeeper-2.zookeeper-a.fakeAddress:2181,zookeeper-3.zookeeper-a.fakeAddress:2181,zookeeper-4.zookeeper-a.fakeAddress:2181, exception=org.apache.zookeeper.KeeperException$AuthFailedException: KeeperErrorCode = AuthFailed for /hbase/hbaseidorg.apache.zookeeper.KeeperException$AuthFailedException: KeeperErrorCode = AuthFailed for /hbase/hbaseid at org.apache.zookeeper.KeeperException.create(KeeperException.java:126) at org.apache.zookeeper.KeeperException.create(KeeperException.java:54) at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1119) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:284) at org.apache.hadoop.hbase.zookeeper.ZKUtil.checkExists(ZKUtil.java:469) at org.apache.hadoop.hbase.zookeeper.ZKClusterId.readClusterIdZNode(ZKClusterId.java:65) at org.apache.hadoop.hbase.zookeeper.ZKClusterId.getUUIDForCluster(ZKClusterId.java:96) at org.apache.hadoop.hbase.replication.HBaseReplicationEndpoint.getPeerUUID(HBaseReplicationEndpoint.java:104) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:306)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)