You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Rushabh Shah (Jira)" <ji...@apache.org> on 2021/05/11 19:07:00 UTC

[jira] [Created] (HBASE-25881) Create a chore to update age related metrics.

Rushabh Shah created HBASE-25881:
------------------------------------

             Summary: Create a chore to update age related metrics.
                 Key: HBASE-25881
                 URL: https://issues.apache.org/jira/browse/HBASE-25881
             Project: HBase
          Issue Type: Improvement
            Reporter: Rushabh Shah


We had a case where logRoller and ReplicationShipper thread were stuck for a day since some other thread was holding the lock.

We were not rolling the wal for 1 day and we were not shipping any edits for 1 day.
Still the oldestWalAge and age of last ship metric were not spiking as they should.

The way we calculate any age related metric is we calculate the diff between current time and the time at which any event happens and we add that to metrics Framework. We lose the event timestamp at that point.

If the thread populating the metric is stuck then we will always carry forward the same value forever. This will make it look like there is no problem in the system. In this case the oldestWalAge metric was stuck at 809 value and age of last ship metric was 0 the whole time and no PD alert was fired.

From Andrew Purtell,
We have the Chore/ScheduledChore framework. We could be making more use of it. Much of this is legacy, before Chore was formalized as it is today.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)