You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2020/02/14 17:48:00 UTC

[jira] [Commented] (KUDU-3048) Add time/clock synchronization metrics

    [ https://issues.apache.org/jira/browse/KUDU-3048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17037147#comment-17037147 ] 

ASF subversion and git services commented on KUDU-3048:
-------------------------------------------------------

Commit 8808b041c9db0af7642311390d7d9189032cc36a in kudu's branch refs/heads/master from Alexey Serbin
[ https://gitbox.apache.org/repos/asf?p=kudu.git;h=8808b04 ]

[clock] update on Clock interface

This patch re-factors Clock-related classes:
  * removed Clock::RegisterMetrics() method
  * HybridClock constructor requires metric entity
  * LogicalClock constructor accepts metric entity as optional
    second parameter
  * LogicalClock constructor is now public
  * LogicalClock::CreateStartingAt() is gone

I also did other minor re-factoring, partially due to warnings reported
by ClangTidy on the code I touched.

The motivation for this change is to prepare for follow-up changelists
addressing KUDU-3048 (adding clock metrics for better observability).

Change-Id: Ic4c1944d54bf50e54c06c12e2fb9e57fc452b877
Reviewed-on: http://gerrit.cloudera.org:8080/15215
Tested-by: Kudu Jenkins
Reviewed-by: Volodymyr Verovkin <ve...@cloudera.com>
Reviewed-by: Adar Dembo <ad...@cloudera.com>


> Add time/clock synchronization metrics
> --------------------------------------
>
>                 Key: KUDU-3048
>                 URL: https://issues.apache.org/jira/browse/KUDU-3048
>             Project: Kudu
>          Issue Type: Improvement
>          Components: clock, master, tserver
>            Reporter: Alexey Serbin
>            Assignee: Alexey Serbin
>            Priority: Major
>              Labels: clock
>
> For better visibility, it would be great to add metrics reflecting time/clock synchronization parameters:
> * the stats on the max_error sampled while reading the underlying clock
> * the stats on time intervals when the underlying clock was extrapolated instead of using the actual readings: number of such intervals and stats on the interval duration
> * whether hybrid clock timestamps are generated using interpolated clock readings instead of real ones
> * if using the {{built-in}} time source:
> ** the number of servers used for the true time tracking (good references)
> ** the number of servers not used for the true time tracking (bad references)
> As for the rationale behind the new metrics:
> * max_error shows how far the clock is from the true time, and maybe it's time to use other set of NTP servers or instead increase the {{\-\-max_clock_sync_error_usec}} flag value
> * presence of the extrapolation intervals for the hybrid clock signals about periods of non-availability for NTP servers, and possible action would be re-visiting the set of NTP servers
> * if hybrid timestamps are being extrapolated for some time, Kudu masters and tablet servers might crash if the clock errors eventually goes beyond the configured threshold: it's time to start troubleshooting the issue to avoid possible non-availability of the cluster
> The new metrics can be used for monitoring and alerting, allowing for pro-active maintenance of a Kudu cluster.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)