You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Rahul Kumar (Jira)" <ji...@apache.org> on 2021/05/24 07:26:00 UTC

[jira] [Work started] (HBASE-25881) Create a chore to update age related metrics.

     [ https://issues.apache.org/jira/browse/HBASE-25881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Work on HBASE-25881 started by Rahul Kumar.
-------------------------------------------
> Create a chore to update age related metrics.
> ---------------------------------------------
>
>                 Key: HBASE-25881
>                 URL: https://issues.apache.org/jira/browse/HBASE-25881
>             Project: HBase
>          Issue Type: Improvement
>          Components: Replication
>            Reporter: Rushabh Shah
>            Assignee: Rahul Kumar
>            Priority: Major
>
> We had a case where logRoller and ReplicationShipper thread were stuck for a day since some other thread was holding the lock.
> We were not rolling the wal for 1 day and we were not shipping any edits for 1 day.
> Still the oldestWalAge and age of last ship metric were not spiking as they should.
> The way we calculate any age related metric is we calculate the diff between current time and the time at which any event happens and we add that to metrics Framework. We lose the event timestamp at that point.
> If the thread populating the metric is stuck then we will always carry forward the same value forever. This will make it look like there is no problem in the system. In this case the oldestWalAge metric was stuck at 809 value and age of last ship metric was 0 the whole time and no PD alert was fired.
> From Andrew Purtell,
> We have the Chore/ScheduledChore framework. We could be making more use of it. Much of this is legacy, before Chore was formalized as it is today.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)