You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Attila Bukor (Code Review)" <ge...@cloudera.org> on 2018/06/02 15:37:10 UTC

[kudu-CR] [tools] KUDU-2461 Add election metrics to ksck

Hello Will Berkeley, Mike Percy, Todd Lipcon,

I'd like you to do a code review. Please visit

    http://gerrit.cloudera.org:8080/10588

to review the following change.


Change subject: [tools] KUDU-2461 Add election metrics to ksck
......................................................................

[tools] KUDU-2461 Add election metrics to ksck

KUDU-2287 added two new metrics:
* failed_elections_since_stable_leader
* time_since_last_leader_heartbeat

This commit adds this information to the ksck consensus matrix.

Consensus matrix of a healthy tablet:
 Config source |   Replicas   | Current term | Config index | Failed elections | Millis since heartbeat | Committed?
---------------+--------------+--------------+--------------+------------------+------------------------+------------
 master        | A   B*  C    |              |              |                  |                        | Yes
 A             | A   B*  C    | 2            | -1           |                  | 486                    | Yes
 B             | A   B*  C    | 2            | -1           |                  |                        | Yes
 C             | A   B*  C    | 2            | -1           |                  | 379                    | Yes

Consensus matrix of an unhealty tablet with leader missing:
 Config source |        Replicas        | Current term | Config index | Failed elections | Millis since heartbeat | Committed?
---------------+------------------------+--------------+--------------+------------------+------------------------+------------
 master        | A   B*  C              |              |              |                  |                        | Yes
 A             | A   B*  C              | 4            | -1           | 3                | 9787                   | Yes
 B             | [config not available] |              |              |                  |                        |
 C             | [config not available] |              |              |                  |                        |

Change-Id: I16fa6f32e6df7698365e908c7b48e63b0e52c745
---
M src/kudu/consensus/metadata.proto
M src/kudu/consensus/raft_consensus.cc
M src/kudu/tools/ksck-test.cc
M src/kudu/tools/ksck.cc
M src/kudu/tools/ksck_results.cc
M src/kudu/tools/ksck_results.h
6 files changed, 74 insertions(+), 42 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/88/10588/1
-- 
To view, visit http://gerrit.cloudera.org:8080/10588
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I16fa6f32e6df7698365e908c7b48e63b0e52c745
Gerrit-Change-Number: 10588
Gerrit-PatchSet: 1
Gerrit-Owner: Attila Bukor <ab...@cloudera.com>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] [tools] KUDU-2461 Add election metrics to ksck

Posted by "Attila Bukor (Code Review)" <ge...@cloudera.org>.
Hello Will Berkeley, Mike Percy, Kudu Jenkins, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10588

to look at the new patch set (#4).

Change subject: [tools] KUDU-2461 Add election metrics to ksck
......................................................................

[tools] KUDU-2461 Add election metrics to ksck

KUDU-2287 added two new metrics:
* failed_elections_since_stable_leader
* time_since_last_leader_heartbeat

This commit adds this information to the ksck consensus matrix.

Consensus matrix of a healthy tablet:
 Config source |   Replicas   | Current term | Config index | Failed elections | Millis since heartbeat | Committed?
---------------+--------------+--------------+--------------+------------------+------------------------+------------
 master        | A   B*  C    |              |              |                  |                        | Yes
 A             | A   B*  C    | 2            | -1           |                  | 486                    | Yes
 B             | A   B*  C    | 2            | -1           |                  |                        | Yes
 C             | A   B*  C    | 2            | -1           |                  | 379                    | Yes

Consensus matrix of an unhealty tablet with leader missing:
 Config source |        Replicas        | Current term | Config index | Failed elections | Millis since heartbeat | Committed?
---------------+------------------------+--------------+--------------+------------------+------------------------+------------
 master        | A   B*  C              |              |              |                  |                        | Yes
 A             | A   B*  C              | 4            | -1           | 3                | 9787                   | Yes
 B             | [config not available] |              |              |                  |                        |
 C             | [config not available] |              |              |                  |                        |

Change-Id: I16fa6f32e6df7698365e908c7b48e63b0e52c745
---
M src/kudu/consensus/metadata.proto
M src/kudu/consensus/raft_consensus.cc
M src/kudu/tools/ksck-test.cc
M src/kudu/tools/ksck.cc
M src/kudu/tools/ksck_results.cc
M src/kudu/tools/ksck_results.h
6 files changed, 113 insertions(+), 42 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/88/10588/4
-- 
To view, visit http://gerrit.cloudera.org:8080/10588
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I16fa6f32e6df7698365e908c7b48e63b0e52c745
Gerrit-Change-Number: 10588
Gerrit-PatchSet: 4
Gerrit-Owner: Attila Bukor <ab...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] [tools] KUDU-2461 Add election metrics to ksck

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/10588 )

Change subject: [tools] KUDU-2461 Add election metrics to ksck
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10588/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10588/4//COMMIT_MSG@24
PS4, Line 24:  Config source |        Replicas        | Current term | Config index | Failed elections | Millis since heartbeat | Committed?
didn't look at the patch yet, but this is starting to get really wide. Maybe we could abbreviate these columns a bit? eg just "term" instead of "current term", etc. Would be nice if it fit in 80-100 chars in the common case.

Another thought: maybe we only show millis since heartbeat and failed elections if they are interesting (eg above some threshold like a few elections, or 15+sec heartbeat or something)



-- 
To view, visit http://gerrit.cloudera.org:8080/10588
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I16fa6f32e6df7698365e908c7b48e63b0e52c745
Gerrit-Change-Number: 10588
Gerrit-PatchSet: 4
Gerrit-Owner: Attila Bukor <ab...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Tue, 05 Jun 2018 17:38:11 +0000
Gerrit-HasComments: Yes

[kudu-CR] [tools] KUDU-2461 Add election metrics to ksck

Posted by "Attila Bukor (Code Review)" <ge...@cloudera.org>.
Hello Will Berkeley, Mike Percy, Kudu Jenkins, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10588

to look at the new patch set (#2).

Change subject: [tools] KUDU-2461 Add election metrics to ksck
......................................................................

[tools] KUDU-2461 Add election metrics to ksck

KUDU-2287 added two new metrics:
* failed_elections_since_stable_leader
* time_since_last_leader_heartbeat

This commit adds this information to the ksck consensus matrix.

Consensus matrix of a healthy tablet:
 Config source |   Replicas   | Current term | Config index | Failed elections | Millis since heartbeat | Committed?
---------------+--------------+--------------+--------------+------------------+------------------------+------------
 master        | A   B*  C    |              |              |                  |                        | Yes
 A             | A   B*  C    | 2            | -1           |                  | 486                    | Yes
 B             | A   B*  C    | 2            | -1           |                  |                        | Yes
 C             | A   B*  C    | 2            | -1           |                  | 379                    | Yes

Consensus matrix of an unhealty tablet with leader missing:
 Config source |        Replicas        | Current term | Config index | Failed elections | Millis since heartbeat | Committed?
---------------+------------------------+--------------+--------------+------------------+------------------------+------------
 master        | A   B*  C              |              |              |                  |                        | Yes
 A             | A   B*  C              | 4            | -1           | 3                | 9787                   | Yes
 B             | [config not available] |              |              |                  |                        |
 C             | [config not available] |              |              |                  |                        |

Change-Id: I16fa6f32e6df7698365e908c7b48e63b0e52c745
---
M src/kudu/consensus/metadata.proto
M src/kudu/consensus/raft_consensus.cc
M src/kudu/tools/ksck-test.cc
M src/kudu/tools/ksck.cc
M src/kudu/tools/ksck_results.cc
M src/kudu/tools/ksck_results.h
6 files changed, 113 insertions(+), 42 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/88/10588/2
-- 
To view, visit http://gerrit.cloudera.org:8080/10588
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I16fa6f32e6df7698365e908c7b48e63b0e52c745
Gerrit-Change-Number: 10588
Gerrit-PatchSet: 2
Gerrit-Owner: Attila Bukor <ab...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] [tools] KUDU-2461 Add election metrics to ksck

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/10588 )

Change subject: [tools] KUDU-2461 Add election metrics to ksck
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10588/4/src/kudu/consensus/metadata.proto
File src/kudu/consensus/metadata.proto:

http://gerrit.cloudera.org:8080/#/c/10588/4/src/kudu/consensus/metadata.proto@161
PS4, Line 161: // Number of failed elections and pre-elections since the last successful election
             :   optional int64 failed_elections = 5;
             : 
             :   // Time in milliseconds since the last leader heartbeat
             :   optional int64 millis_since_heartbeat = 6;
> I don't think I feel good about shoving in these metrics with the cstate. T
I think Will's idea about an RPC endpoint for metrics makes sense because it avoids having to pass the web UI port to ksck.



-- 
To view, visit http://gerrit.cloudera.org:8080/10588
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I16fa6f32e6df7698365e908c7b48e63b0e52c745
Gerrit-Change-Number: 10588
Gerrit-PatchSet: 4
Gerrit-Owner: Attila Bukor <ab...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Fri, 22 Jun 2018 00:53:15 +0000
Gerrit-HasComments: Yes

[kudu-CR] [WIP] [tools] KUDU-2461 Add election metrics to ksck

Posted by "Attila Bukor (Code Review)" <ge...@cloudera.org>.
Attila Bukor has abandoned this change. ( http://gerrit.cloudera.org:8080/10588 )

Change subject: [WIP] [tools] KUDU-2461 Add election metrics to ksck
......................................................................


Abandoned

doesn't seem all that important and ksck changed a lot since this patch was originally drafted, so rebase is not clean
-- 
To view, visit http://gerrit.cloudera.org:8080/10588
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: abandon
Gerrit-Change-Id: I16fa6f32e6df7698365e908c7b48e63b0e52c745
Gerrit-Change-Number: 10588
Gerrit-PatchSet: 6
Gerrit-Owner: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] [WIP] [tools] KUDU-2461 Add election metrics to ksck

Posted by "Attila Bukor (Code Review)" <ge...@cloudera.org>.
Hello Will Berkeley, Mike Percy, Kudu Jenkins, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10588

to look at the new patch set (#5).

Change subject: [WIP] [tools] KUDU-2461 Add election metrics to ksck
......................................................................

[WIP] [tools] KUDU-2461 Add election metrics to ksck

KUDU-2287 added two new metrics:
* failed_elections_since_stable_leader
* time_since_last_leader_heartbeat

This commit adds two ksck configuration options:

* failed_elections_warn_limit (default 3)
* millis_since_heartbeat_warn_limit (default 5000)

If any of them is reached, a warning is added to the ksck output:

Replica 39d3ec2c1f9541e18162a344645a4cdc of
e996c2436f3149eba8f7207bb6281a0d has not heard from the leader in 275991
ms and has called 33 failed election(s) since then.

Change-Id: I16fa6f32e6df7698365e908c7b48e63b0e52c745
---
M src/kudu/consensus/consensus.proto
M src/kudu/consensus/metadata.proto
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
M src/kudu/tools/ksck-test.cc
M src/kudu/tools/ksck.cc
M src/kudu/tools/ksck.h
M src/kudu/tools/ksck_remote.cc
M src/kudu/tools/ksck_remote.h
M src/kudu/tserver/tablet_service.cc
M src/kudu/tserver/tablet_service.h
11 files changed, 147 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/88/10588/5
-- 
To view, visit http://gerrit.cloudera.org:8080/10588
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I16fa6f32e6df7698365e908c7b48e63b0e52c745
Gerrit-Change-Number: 10588
Gerrit-PatchSet: 5
Gerrit-Owner: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] [WIP] [tools] KUDU-2461 Add election metrics to ksck

Posted by "Attila Bukor (Code Review)" <ge...@cloudera.org>.
Hello Will Berkeley, Mike Percy, Kudu Jenkins, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10588

to look at the new patch set (#6).

Change subject: [WIP] [tools] KUDU-2461 Add election metrics to ksck
......................................................................

[WIP] [tools] KUDU-2461 Add election metrics to ksck

KUDU-2287 added two new metrics:
* failed_elections_since_stable_leader
* time_since_last_leader_heartbeat

This commit adds two ksck configuration options:

* failed_elections_warn_limit (default 3)
* millis_since_heartbeat_warn_limit (default 5000)

If any of them is reached, a warning is added to the ksck output:

Replica 39d3ec2c1f9541e18162a344645a4cdc of
e996c2436f3149eba8f7207bb6281a0d has not heard from the leader in 275991
ms and has called 33 failed election(s) since then.

Change-Id: I16fa6f32e6df7698365e908c7b48e63b0e52c745
---
M src/kudu/consensus/consensus.proto
M src/kudu/consensus/metadata.proto
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
M src/kudu/tools/ksck-test.cc
M src/kudu/tools/ksck.cc
M src/kudu/tools/ksck.h
M src/kudu/tools/ksck_remote.cc
M src/kudu/tools/ksck_remote.h
M src/kudu/tserver/tablet_service.cc
M src/kudu/tserver/tablet_service.h
11 files changed, 154 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/88/10588/6
-- 
To view, visit http://gerrit.cloudera.org:8080/10588
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I16fa6f32e6df7698365e908c7b48e63b0e52c745
Gerrit-Change-Number: 10588
Gerrit-PatchSet: 6
Gerrit-Owner: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] [tools] KUDU-2461 Add election metrics to ksck

Posted by "Attila Bukor (Code Review)" <ge...@cloudera.org>.
Hello Will Berkeley, Mike Percy, Kudu Jenkins, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10588

to look at the new patch set (#3).

Change subject: [tools] KUDU-2461 Add election metrics to ksck
......................................................................

[tools] KUDU-2461 Add election metrics to ksck

KUDU-2287 added two new metrics:
* failed_elections_since_stable_leader
* time_since_last_leader_heartbeat

This commit adds this information to the ksck consensus matrix.

Consensus matrix of a healthy tablet:
 Config source |   Replicas   | Current term | Config index | Failed elections | Millis since heartbeat | Committed?
---------------+--------------+--------------+--------------+------------------+------------------------+------------
 master        | A   B*  C    |              |              |                  |                        | Yes
 A             | A   B*  C    | 2            | -1           |                  | 486                    | Yes
 B             | A   B*  C    | 2            | -1           |                  |                        | Yes
 C             | A   B*  C    | 2            | -1           |                  | 379                    | Yes

Consensus matrix of an unhealty tablet with leader missing:
 Config source |        Replicas        | Current term | Config index | Failed elections | Millis since heartbeat | Committed?
---------------+------------------------+--------------+--------------+------------------+------------------------+------------
 master        | A   B*  C              |              |              |                  |                        | Yes
 A             | A   B*  C              | 4            | -1           | 3                | 9787                   | Yes
 B             | [config not available] |              |              |                  |                        |
 C             | [config not available] |              |              |                  |                        |

Change-Id: I16fa6f32e6df7698365e908c7b48e63b0e52c745
---
M src/kudu/consensus/metadata.proto
M src/kudu/consensus/raft_consensus.cc
M src/kudu/tools/ksck-test.cc
M src/kudu/tools/ksck.cc
M src/kudu/tools/ksck_results.cc
M src/kudu/tools/ksck_results.h
6 files changed, 113 insertions(+), 42 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/88/10588/3
-- 
To view, visit http://gerrit.cloudera.org:8080/10588
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I16fa6f32e6df7698365e908c7b48e63b0e52c745
Gerrit-Change-Number: 10588
Gerrit-PatchSet: 3
Gerrit-Owner: Attila Bukor <ab...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] [WIP] [tools] KUDU-2461 Add election metrics to ksck

Posted by "Attila Bukor (Code Review)" <ge...@cloudera.org>.
Attila Bukor has posted comments on this change. ( http://gerrit.cloudera.org:8080/10588 )

Change subject: [WIP] [tools] KUDU-2461 Add election metrics to ksck
......................................................................


Patch Set 5:

tagged it as WIP as I think I should write tests for this, but before doing so, I wanted to ask if this is a good approach.

Having an RPC to list all metrics seems like an overkill to me, so for now I implemented one for consensus metrics.

Will, Todd, Mike, what do you think?


-- 
To view, visit http://gerrit.cloudera.org:8080/10588
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I16fa6f32e6df7698365e908c7b48e63b0e52c745
Gerrit-Change-Number: 10588
Gerrit-PatchSet: 5
Gerrit-Owner: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Mon, 13 Aug 2018 22:24:37 +0000
Gerrit-HasComments: No

[kudu-CR] [tools] KUDU-2461 Add election metrics to ksck

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/10588 )

Change subject: [tools] KUDU-2461 Add election metrics to ksck
......................................................................


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/10588/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10588/4//COMMIT_MSG@24
PS4, Line 24:  Config source |        Replicas        | Current term | Config index | Failed elections | Millis since heartbeat | Committed?
> didn't look at the patch yet, but this is starting to get really wide. Mayb
I agree. It's great to have so much useful info but let's only show the election stuff when it's noteworthy, and in that case I think it's worth adding extra callouts below the consensus matrix. e.g. something like

     Config source |   Replicas
    ---------------+--------------
     master        | A   B*  C
     A             | A   B*  C
     B             | A   B*  C
     C             | A   B  C
     WARNING: Replica C has not heard from the leader in 7483ms
     WARNING: Replica C has called 2 failed election(s) since it last heard from a leader.

Maybe they should be combine in one message as well, since it's kind of redundant to say it hasn't heard from a leader and it's called elections, one a few seconds have passed.


http://gerrit.cloudera.org:8080/#/c/10588/4/src/kudu/consensus/metadata.proto
File src/kudu/consensus/metadata.proto:

http://gerrit.cloudera.org:8080/#/c/10588/4/src/kudu/consensus/metadata.proto@161
PS4, Line 161: // Number of failed elections and pre-elections since the last successful election
             :   optional int64 failed_elections = 5;
             : 
             :   // Time in milliseconds since the last leader heartbeat
             :   optional int64 millis_since_heartbeat = 6;
I don't think I feel good about shoving in these metrics with the cstate. This PB is used a lot of other places e.g. TS -> master heartbeats and in consensus itself, and it contains just core consensus info right now. I don't think it's wise to set a precedent of putting things in it just to communicate them to ksck.

While it expands the scope of the patch, I think instead we should introduce a basic mechanism for ksck to grab metrics from the cluster. AFAIK the way to do this is through the /metrics http endpoint but I think I'd rather have an RPC, since it helps fix what's on the wire and it avoids making parts of ksck contingent on the web interface being available.



-- 
To view, visit http://gerrit.cloudera.org:8080/10588
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I16fa6f32e6df7698365e908c7b48e63b0e52c745
Gerrit-Change-Number: 10588
Gerrit-PatchSet: 4
Gerrit-Owner: Attila Bukor <ab...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Thu, 14 Jun 2018 17:49:52 +0000
Gerrit-HasComments: Yes