You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2021/12/14 02:42:00 UTC

[jira] [Commented] (KUDU-1959) Hard to tell when a cluster is done starting up

    [ https://issues.apache.org/jira/browse/KUDU-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17458870#comment-17458870 ] 

ASF subversion and git services commented on KUDU-1959:
-------------------------------------------------------

Commit 60d34b68f4c42da04dd0a064db135568c5d75af9 in kudu's branch refs/heads/master from Abhishek Chennaka
[ https://gitbox.apache.org/repos/asf?p=kudu.git;h=60d34b6 ]

[KUDU-1959] - Fix the counter in StartupProgressStepsRemainingMetric()

The counter was incremented twice if tablets are not processed
during the startup of a tablet server.

This is a follow-up to 59070bf.

Change-Id: I6570f438dd85aafa16093465ae654ece8d056eb5
Reviewed-on: http://gerrit.cloudera.org:8080/18073
Reviewed-by: Alexey Serbin <as...@cloudera.com>
Tested-by: Kudu Jenkins


> Hard to tell when a cluster is done starting up
> -----------------------------------------------
>
>                 Key: KUDU-1959
>                 URL: https://issues.apache.org/jira/browse/KUDU-1959
>             Project: Kudu
>          Issue Type: Improvement
>          Components: ops-tooling
>            Reporter: Jean-Daniel Cryans
>            Assignee: Abhishek
>            Priority: Major
>              Labels: roadmap-candidate, usability
>
> Restarting a cluster that has a good amount of data, it's hard to tell when it's "done". Right now the things I do:
>  - Run ksck, wait until most tablets are not in "unavailable" or "boostrapping" state.
>  - Watch the metrics and see when the data under management is close to where it was before restarting (it grows as tablets are getting bootstrapped).
>  - Look at the tablet server web UIs for tablets, compare how many are done bootstrapping VS in the process of VS not started.
> Ideas on how to improve this:
>  - In the master's web UI for tablet servers, show how many tablets are running VS not running (I wouldn't add anything about tombstoned tablets)
>  - Add metrics for tablets in different states.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)