You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Sandeep Pal (Jira)" <ji...@apache.org> on 2021/04/07 01:36:00 UTC
[jira] [Created] (HBASE-25741) Replication Source still having the
replication metrics for peer ID which doesn't exist.
Sandeep Pal created HBASE-25741:
-----------------------------------
Summary: Replication Source still having the replication metrics for peer ID which doesn't exist.
Key: HBASE-25741
URL: https://issues.apache.org/jira/browse/HBASE-25741
Project: HBase
Issue Type: Bug
Affects Versions: 1.8.0
Reporter: Sandeep Pal
Assignee: Sandeep Pal
We have observed that replication source metrics for peer exists on some region servers even though peer has been removed. This is because when we encounter the NoNodeException in ReplicationSource, it calls the `peerRemoved` workflow which should eventually terminate the source and removes the source from the source manager. Now, the problem is ReplicationSource thread terminates itself and thus the action to removePeer is not complete leaving the metrics there forever for source. This is the flow, replication source trying to clean wals [here|https://github.com/apache/hbase/blob/branch-1/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java#L801] and on NoNodeException it calls the [peerRemoved|https://github.com/apache/hbase/blob/b231dd620f107b488b88599e16dc846eb856972c/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java#L244] and terminate the source (itself), leaving the terminated source there in sourcemanager and not clearing it's [metrics|https://github.com/apache/hbase/blob/b231dd620f107b488b88599e16dc846eb856972c/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java#L645].
--
This message was sent by Atlassian Jira
(v8.3.4#803005)