You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Ashu Pachauri (JIRA)" <ji...@apache.org> on 2017/08/09 20:59:00 UTC
[jira] [Created] (HBASE-18549) Unclaimed replication queues can go
undetected
Ashu Pachauri created HBASE-18549:
-------------------------------------
Summary: Unclaimed replication queues can go undetected
Key: HBASE-18549
URL: https://issues.apache.org/jira/browse/HBASE-18549
Project: HBase
Issue Type: Bug
Components: Replication
Reporter: Ashu Pachauri
Priority: Critical
Fix For: 1.3.2
We have come across this situation multiple times where a zookeeper issues can cause NodeFailoverWorker to fail picking up replication queue for a dead region server silently. One example is when the znode size for a particular queue exceed jute.maxBuffer value.
There can be other situations that may lead to this and just go undetected. We need to have a metric for number of unclaimed replication queues. This will help in mitigating the problem through alerting on the metric and identifying underlying issues.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)