You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Lars Hofhansl (JIRA)" <ji...@apache.org> on 2014/07/30 20:40:41 UTC

[jira] [Commented] (HBASE-11143) Improve replication metrics

    [ https://issues.apache.org/jira/browse/HBASE-11143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14079734#comment-14079734 ] 

Lars Hofhansl commented on HBASE-11143:
---------------------------------------

Turns out there more problems (0.98 at least):
# ageOfLastShippedOp will not increase when there is nothing to ship, but it will be stuck at whatever the age of the last shipped edit was. If there is nothing to ship we are (by definition) current. So I think I should do the same as I did in 0.94: Set the ageOfLastShippedEdit to 0 just as I did in 0.94.
# ageOfLastAppliedOp is ever increasing even when there is nothing to replicate. 0.94 does not have this, only 0.98 (brought up on the mailing list by [~nidmhbase]).

I'll file a new issue to fix these.

> Improve replication metrics
> ---------------------------
>
>                 Key: HBASE-11143
>                 URL: https://issues.apache.org/jira/browse/HBASE-11143
>             Project: HBase
>          Issue Type: Bug
>          Components: Replication
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.99.0, 0.94.20, 0.98.3
>
>         Attachments: 11143-0.94-v2.txt, 11143-0.94-v3.txt, 11143-0.94.txt, 11143-trunk.txt
>
>
> We are trying to report on replication lag and find that there is no good single metric to do that.
> ageOfLastShippedOp is close, but unfortunately it is increased even when there is nothing to ship on a particular RegionServer.
> I would like discuss a few options here:
> Add a new metric: replicationQueueTime (or something) with the above meaning. I.e. if we have something to ship we set the age of that last shipped edit, if we fail we increment that last time (just like we do now). But if there is nothing to replicate we set it to current time (and hence that metric is reported to close to 0).
> Alternatively we could change the meaning of ageOfLastShippedOp to mean to do that. That might lead to surprises, but the current behavior is clearly weird when there is nothing to replicate.
> Comments? [~jdcryans], [~stack].
> If approach sounds good, I'll make a patch for all branches.
> Edit: Also adds a new shippedKBs metric to track the amount of data that is shipped via replication.



--
This message was sent by Atlassian JIRA
(v6.2#6252)