You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Andrew Wong (Code Review)" <ge...@cloudera.org> on 2020/02/29 02:50:35 UTC

[kudu-CR] ksck: display quiecing-related info

Hello Alexey Serbin, Adar Dembo,

I'd like you to do a code review. Please visit

    http://gerrit.cloudera.org:8080/15323

to review the following change.


Change subject: ksck: display quiecing-related info
......................................................................

ksck: display quiecing-related info

This patch adds quiescing-related info to ksck's "Tablet Server Summary"
section. Specifically, it displays the quiescing state, the number of
tablet leaders, and the number of active scanners[1].

If none of the tablet servers are quiescing, the quiescing state column
is omitted. If none of the tablet servers support the quiescing RPC, all
related columns are omitted.

I manually tested against a cluster that fully didn't support quiescing,
as well as one that partially supports quiescing[2].

[1] Sample output:
Tablet Server Summary
               UUID               |            Address             | Status  | Location | Quiescing | Tablet Leaders | Active Scanners
----------------------------------+--------------------------------+---------+----------+-----------+----------------+-----------------
 1e8c8c55d0e24110b29caaecdae491ca | ve1318.halxg.cloudera.com:7050 | HEALTHY | /default | false     |       2        |       0
 36e8894c4e6d48c690f64ade8b5fe52d | ve1320.halxg.cloudera.com:7050 | HEALTHY | /default | false     |       7        |       0
 629bbaecfead49f18247d7963cfa98af | ve1319.halxg.cloudera.com:7050 | HEALTHY | /default | false     |       0        |       0
 9dfdd5aac2814353bd50cefca2d77403 | ve1321.halxg.cloudera.com:7050 | HEALTHY | /default | false     |       8        |       0
 9fe2954950ea4f4eaecc4ef97c6eb44a | ve1317.halxg.cloudera.com:7050 | HEALTHY | /default | false     |       6        |       0
 a5dd443f61464c34aca585a905e87926 | ve1322.halxg.cloudera.com:7050 | HEALTHY | /default | false     |       7        |       0
 dffda2ef2d33481993d29009f3f87420 | ve1323.halxg.cloudera.com:7050 | HEALTHY | /default | true      |       6        |       0
 e6c9b1df642a4cf69c47f36480dd4723 | ve1316.halxg.cloudera.com:7050 | HEALTHY | /default | false     |       6        |       0
 efc1275241604b0aa886494f8da9e00b | ve1324.halxg.cloudera.com:7050 | HEALTHY | /default | false     |       7        |       0

[2] Output of partial support for quiescing across the cluster yields "partial"
    results; not the prettiest, but it's also not a scenario we expect often:
I0228 18:36:40.200479 383527 ksck_remote.cc:290] Couldn't fetch quiescing info from tablet server 629bbaecfead49f18247d7963cfa98af (ve1319.halxg.cloudera.com:7050): Remote error: unsupported feature flags
I0228 18:36:40.200585 383525 ksck_remote.cc:290] Couldn't fetch quiescing info from tablet server 1e8c8c55d0e24110b29caaecdae491ca (ve1318.halxg.cloudera.com:7050): Remote error: unsupported feature flags
I0228 18:36:40.201057 383526 ksck_remote.cc:290] Couldn't fetch quiescing info from tablet server 36e8894c4e6d48c690f64ade8b5fe52d (ve1320.halxg.cloudera.com:7050): Remote error: unsupported feature flags
I0228 18:36:40.202527 383528 ksck_remote.cc:290] Couldn't fetch quiescing info from tablet server 9dfdd5aac2814353bd50cefca2d77403 (ve1321.halxg.cloudera.com:7050): Remote error: unsupported feature flags
I0228 18:36:40.202736 383530 ksck_remote.cc:290] Couldn't fetch quiescing info from tablet server a5dd443f61464c34aca585a905e87926 (ve1322.halxg.cloudera.com:7050): Remote error: unsupported feature flags
I0228 18:36:40.202940 383532 ksck_remote.cc:290] Couldn't fetch quiescing info from tablet server dffda2ef2d33481993d29009f3f87420 (ve1323.halxg.cloudera.com:7050): Remote error: unsupported feature flags
I0228 18:36:40.203280 383536 ksck_remote.cc:290] Couldn't fetch quiescing info from tablet server efc1275241604b0aa886494f8da9e00b (ve1324.halxg.cloudera.com:7050): Remote error: unsupported feature flags
...
Tablet Server Summary
               UUID               |            Address             | Status  | Location | Quiescing | Tablet Leaders | Active Scanners
----------------------------------+--------------------------------+---------+----------+-----------+----------------+-----------------
 1e8c8c55d0e24110b29caaecdae491ca | ve1318.halxg.cloudera.com:7050 | HEALTHY | /default | n/a       | n/a            | n/a
 36e8894c4e6d48c690f64ade8b5fe52d | ve1320.halxg.cloudera.com:7050 | HEALTHY | /default | n/a       | n/a            | n/a
 629bbaecfead49f18247d7963cfa98af | ve1319.halxg.cloudera.com:7050 | HEALTHY | /default | n/a       | n/a            | n/a
 9dfdd5aac2814353bd50cefca2d77403 | ve1321.halxg.cloudera.com:7050 | HEALTHY | /default | n/a       | n/a            | n/a
 9fe2954950ea4f4eaecc4ef97c6eb44a | ve1317.halxg.cloudera.com:7050 | HEALTHY | /default | true      |       5        |       0
 a5dd443f61464c34aca585a905e87926 | ve1322.halxg.cloudera.com:7050 | HEALTHY | /default | n/a       | n/a            | n/a
 dffda2ef2d33481993d29009f3f87420 | ve1323.halxg.cloudera.com:7050 | HEALTHY | /default | n/a       | n/a            | n/a
 e6c9b1df642a4cf69c47f36480dd4723 | ve1316.halxg.cloudera.com:7050 | HEALTHY | /default | false     |       6        |       0
 efc1275241604b0aa886494f8da9e00b | ve1324.halxg.cloudera.com:7050 | HEALTHY | /default | n/a       | n/a            | n/a

Change-Id: Ibdc650eb3ee30e8993330f2cbd389076ea2bad49
---
M src/kudu/integration-tests/tablet_server_quiescing-itest.cc
M src/kudu/rebalance/cluster_status.h
M src/kudu/tools/ksck-test.cc
M src/kudu/tools/ksck.cc
M src/kudu/tools/ksck.h
M src/kudu/tools/ksck_remote.cc
M src/kudu/tools/ksck_remote.h
M src/kudu/tools/ksck_results.cc
M src/kudu/tools/tool_action_cluster.cc
M src/kudu/tserver/tablet_service.cc
M src/kudu/tserver/tablet_service.h
M src/kudu/tserver/tserver.proto
12 files changed, 147 insertions(+), 5 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/23/15323/1
-- 
To view, visit http://gerrit.cloudera.org:8080/15323
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ibdc650eb3ee30e8993330f2cbd389076ea2bad49
Gerrit-Change-Number: 15323
Gerrit-PatchSet: 1
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>

[kudu-CR] ksck: display quiecing-related info

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/15323 )

Change subject: ksck: display quiecing-related info
......................................................................


Patch Set 2:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/15323/1/src/kudu/integration-tests/tablet_server_quiescing-itest.cc
File src/kudu/integration-tests/tablet_server_quiescing-itest.cc:

http://gerrit.cloudera.org:8080/#/c/15323/1/src/kudu/integration-tests/tablet_server_quiescing-itest.cc@398
PS1, Line 398:         "-----------+----------------+-----------------\n"
             :         " true      |       1        |       0");
             :     ASSERT_TRUE(ts->server()->quiescing());
             : 
             :     // Same with ksck.
             :     ASSERT_OK(RunKuduTool({ "cluster", "ksck", master_addr }, &stdout));
             :     ASSERT_STR_MATCHES(stdout,
             :         ".* Quiescing | Tablet Leaders | Active Scanners\n"
             :         ".*-----------+----------------+-----------------\n"
             :         ".* true      |       1        |      0");
             :     ASSERT_TRUE(ts->server()->quiescing());
> nit: maybe, make the names of the corresponding columns match for both 'tse
Done


http://gerrit.cloudera.org:8080/#/c/15323/1/src/kudu/integration-tests/tablet_server_quiescing-itest.cc@433
PS1, Line 433: ;
> Instead, maybe output the quiescing information only if --quiescing_info fl
I'm a bit conflicted about this. It's not information that you'd really think to look for without extra context about a rolling restart, but I think it'll be valuable for cases even beyond just debugging a rolling restart. It makes `tserver quiesce status` less valuable for sure, but I don't think it makes it entirely useless since running it without the rest of `ksck` might be preferred (especially on larger clusters).

I'll keep this as is for now, curious whether you think this _isn't_ worth having by default, or whether this comment is more towards reducing duplication of `tserver quiesce status`.


http://gerrit.cloudera.org:8080/#/c/15323/1/src/kudu/tools/ksck_remote.cc
File src/kudu/tools/ksck_remote.cc:

http://gerrit.cloudera.org:8080/#/c/15323/1/src/kudu/tools/ksck_remote.cc@81
PS1, Line 81: true
> Do we really want to include the information into ksck by default?  Given t
Re: backwards compatibility, I don't think it's unreasonable to just log a message noting the lack of support. I hope the more common case will be that it'll be run against newer versions of Kudu.

Re: the sub-command, I mentioned this on your other comment, but I think it's still valuable to have that tool as separate from ksck, since ksck is a pretty heavy-weight operation. That said, it is our go-to when it comes to understanding what's going on in a cluster, and quiescing info is a huge value add IMO when it comes to understanding performance and workload skew.


http://gerrit.cloudera.org:8080/#/c/15323/1/src/kudu/tools/ksck_remote.cc@82
PS1, Line 82: to displ
> What does 'check' means here?  Simply 'display'?
Done


http://gerrit.cloudera.org:8080/#/c/15323/1/src/kudu/tools/ksck_remote.cc@298
PS1, Line 298:   quiescing_info_ = qinfo;
> warning: std::move of the variable 'qinfo' of the trivially-copyable type '
Done


http://gerrit.cloudera.org:8080/#/c/15323/1/src/kudu/tserver/tablet_service.cc
File src/kudu/tserver/tablet_service.cc:

http://gerrit.cloudera.org:8080/#/c/15323/1/src/kudu/tserver/tablet_service.cc@1020
PS1, Line 1020:     case TabletServerFeatures::QUIESCING:
> What about other already supported features?
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/15323
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibdc650eb3ee30e8993330f2cbd389076ea2bad49
Gerrit-Change-Number: 15323
Gerrit-PatchSet: 2
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Mon, 02 Mar 2020 22:23:19 +0000
Gerrit-HasComments: Yes

[kudu-CR] ksck: display quiecing-related info

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has removed a vote on this change.

Change subject: ksck: display quiecing-related info
......................................................................


Removed Verified-1 by Kudu Jenkins (120)
-- 
To view, visit http://gerrit.cloudera.org:8080/15323
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: Ibdc650eb3ee30e8993330f2cbd389076ea2bad49
Gerrit-Change-Number: 15323
Gerrit-PatchSet: 3
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)

[kudu-CR] ksck: display quiecing-related info

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/15323 )

Change subject: ksck: display quiecing-related info
......................................................................


Patch Set 4:

Just rebased and fixed the merge conflict with c3122b6


-- 
To view, visit http://gerrit.cloudera.org:8080/15323
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibdc650eb3ee30e8993330f2cbd389076ea2bad49
Gerrit-Change-Number: 15323
Gerrit-PatchSet: 4
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Wed, 04 Mar 2020 08:35:36 +0000
Gerrit-HasComments: No

[kudu-CR] ksck: display quiecing-related info

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/15323 )

Change subject: ksck: display quiecing-related info
......................................................................


Patch Set 2: Code-Review+1

(2 comments)

http://gerrit.cloudera.org:8080/#/c/15323/1/src/kudu/integration-tests/tablet_server_quiescing-itest.cc
File src/kudu/integration-tests/tablet_server_quiescing-itest.cc:

http://gerrit.cloudera.org:8080/#/c/15323/1/src/kudu/integration-tests/tablet_server_quiescing-itest.cc@433
PS1, Line 433: ;
> I'm a bit conflicted about this. It's not information that you'd really thi
I think there is some value in seeing number of leader replicas and number of active scanners in ksck output, and I think it's nice to have that by default: it's a nice extension and probably we shouldn't be constrained here by backwards compatibility of the ksck output (if any).

For the 'quiescing' column I'm not quite sure about including it by default into ksck.  From the other side, given its rather dynamic nature (it appears only when there is at least one quiescing server), it looks a bit tricky to me to reason about.

It would be great to understand the use-case here.  Up to what extent do we want to use ksck to signal about on-going quiescing if it's already covered by a dedicated kudu CLI sub-command?  Is this just to let the operator know that a cluster  in not-so-regular mode of operation?  If so, then maybe it should be added just as a note/warning somewhere instead of outputting the quiescing status of every server?

Anyway, I don't feel strong about this.  It would be great to get more feedback on this from other people.


http://gerrit.cloudera.org:8080/#/c/15323/1/src/kudu/tools/ksck_remote.cc
File src/kudu/tools/ksck_remote.cc:

http://gerrit.cloudera.org:8080/#/c/15323/1/src/kudu/tools/ksck_remote.cc@81
PS1, Line 81: true
> Re: backwards compatibility, I don't think it's unreasonable to just log a 
Yep, that makes sense to me.  One question: once quiescing status is observed in ksck as in this patch, do we expect people to run quiesce-specific sub-command anyways or this output from ksck will be enough to collect all the necessary information?



-- 
To view, visit http://gerrit.cloudera.org:8080/15323
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibdc650eb3ee30e8993330f2cbd389076ea2bad49
Gerrit-Change-Number: 15323
Gerrit-PatchSet: 2
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Tue, 03 Mar 2020 00:27:54 +0000
Gerrit-HasComments: Yes

[kudu-CR] ksck: display quiecing-related info

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/15323 )

Change subject: ksck: display quiecing-related info
......................................................................


Patch Set 1:

(5 comments)

The test failures seem relevant to these changes.

http://gerrit.cloudera.org:8080/#/c/15323/1/src/kudu/integration-tests/tablet_server_quiescing-itest.cc
File src/kudu/integration-tests/tablet_server_quiescing-itest.cc:

http://gerrit.cloudera.org:8080/#/c/15323/1/src/kudu/integration-tests/tablet_server_quiescing-itest.cc@398
PS1, Line 398:         " Quiescing | Tablet leader count | Active scanner count\n"
             :         "-----------+---------------------+----------------------\n"
             :         " true      |       1             |       0");
             :     ASSERT_TRUE(ts->server()->quiescing());
             : 
             :     // Same with ksck.
             :     ASSERT_OK(RunKuduTool({ "cluster", "ksck", master_addr }, &stdout));
             :     ASSERT_STR_MATCHES(stdout,
             :         ".* Quiescing | Tablet Leaders | Active Scanners\n"
             :         ".* ----------+----------------+----------------\n"
             :         ".* true      |       1        |      0");
nit: maybe, make the names of the corresponding columns match for both 'tserver quiesce' and 'cluster ksck'?


http://gerrit.cloudera.org:8080/#/c/15323/1/src/kudu/integration-tests/tablet_server_quiescing-itest.cc@433
PS1, Line 433: --noquiescing_info
Instead, maybe output the quiescing information only if --quiescing_info flag is specified?


http://gerrit.cloudera.org:8080/#/c/15323/1/src/kudu/tools/ksck_remote.cc
File src/kudu/tools/ksck_remote.cc:

http://gerrit.cloudera.org:8080/#/c/15323/1/src/kudu/tools/ksck_remote.cc@81
PS1, Line 81: true
Do we really want to include the information into ksck by default?  Given the presence of a dedicated sub-command for quiescing and thinking about backwards compatibility, I would expect it is set to 'false' by default, no?


http://gerrit.cloudera.org:8080/#/c/15323/1/src/kudu/tools/ksck_remote.cc@82
PS1, Line 82: to check
What does 'check' means here?  Simply 'display'?


http://gerrit.cloudera.org:8080/#/c/15323/1/src/kudu/tserver/tablet_service.cc
File src/kudu/tserver/tablet_service.cc:

http://gerrit.cloudera.org:8080/#/c/15323/1/src/kudu/tserver/tablet_service.cc@1020
PS1, Line 1020:     default:
What about other already supported features?



-- 
To view, visit http://gerrit.cloudera.org:8080/15323
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibdc650eb3ee30e8993330f2cbd389076ea2bad49
Gerrit-Change-Number: 15323
Gerrit-PatchSet: 1
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Sun, 01 Mar 2020 06:52:38 +0000
Gerrit-HasComments: Yes

[kudu-CR] ksck: display quiecing-related info

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has removed a vote on this change.

Change subject: ksck: display quiecing-related info
......................................................................


Removed Verified-1 by Kudu Jenkins (120)
-- 
To view, visit http://gerrit.cloudera.org:8080/15323
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: Ibdc650eb3ee30e8993330f2cbd389076ea2bad49
Gerrit-Change-Number: 15323
Gerrit-PatchSet: 2
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)

[kudu-CR] ksck: display quiecing-related info

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/15323 )

Change subject: ksck: display quiecing-related info
......................................................................


Patch Set 4: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/15323
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibdc650eb3ee30e8993330f2cbd389076ea2bad49
Gerrit-Change-Number: 15323
Gerrit-PatchSet: 4
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Wed, 04 Mar 2020 08:35:35 +0000
Gerrit-HasComments: No

[kudu-CR] ksck: display quiecing-related info

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Kudu Jenkins, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/15323

to look at the new patch set (#2).

Change subject: ksck: display quiecing-related info
......................................................................

ksck: display quiecing-related info

This patch adds quiescing-related info to ksck's "Tablet Server Summary"
section. Specifically, it displays the quiescing state, the number of
tablet leaders, and the number of active scanners[1].

If none of the tablet servers are quiescing, the quiescing state column
is omitted. If none of the tablet servers support the quiescing RPC, all
related columns are omitted.

I manually tested against a cluster that fully didn't support quiescing,
as well as one that partially supports quiescing[2].

The info is displayed by default with ksck, since the information may be
invaluable in debugging performance or workload skew. The info can be
ommitted by setting `--quiescing_info` to false.

[1] Sample output:
Tablet Server Summary
               UUID               |            Address             | Status  | Location | Quiescing | Tablet Leaders | Active Scanners
----------------------------------+--------------------------------+---------+----------+-----------+----------------+-----------------
 1e8c8c55d0e24110b29caaecdae491ca | ve1318.halxg.cloudera.com:7050 | HEALTHY | /default | false     |       2        |       0
 36e8894c4e6d48c690f64ade8b5fe52d | ve1320.halxg.cloudera.com:7050 | HEALTHY | /default | false     |       7        |       0
 629bbaecfead49f18247d7963cfa98af | ve1319.halxg.cloudera.com:7050 | HEALTHY | /default | false     |       0        |       0
 9dfdd5aac2814353bd50cefca2d77403 | ve1321.halxg.cloudera.com:7050 | HEALTHY | /default | false     |       8        |       0
 9fe2954950ea4f4eaecc4ef97c6eb44a | ve1317.halxg.cloudera.com:7050 | HEALTHY | /default | false     |       6        |       0
 a5dd443f61464c34aca585a905e87926 | ve1322.halxg.cloudera.com:7050 | HEALTHY | /default | false     |       7        |       0
 dffda2ef2d33481993d29009f3f87420 | ve1323.halxg.cloudera.com:7050 | HEALTHY | /default | true      |       6        |       0
 e6c9b1df642a4cf69c47f36480dd4723 | ve1316.halxg.cloudera.com:7050 | HEALTHY | /default | false     |       6        |       0
 efc1275241604b0aa886494f8da9e00b | ve1324.halxg.cloudera.com:7050 | HEALTHY | /default | false     |       7        |       0

[2] Output of partial support for quiescing across the cluster yields "partial"
    results; not the prettiest, but it's also not a scenario we expect often:
I0228 18:36:40.200479 383527 ksck_remote.cc:290] Couldn't fetch quiescing info from tablet server 629bbaecfead49f18247d7963cfa98af (ve1319.halxg.cloudera.com:7050): Remote error: unsupported feature flags
I0228 18:36:40.200585 383525 ksck_remote.cc:290] Couldn't fetch quiescing info from tablet server 1e8c8c55d0e24110b29caaecdae491ca (ve1318.halxg.cloudera.com:7050): Remote error: unsupported feature flags
I0228 18:36:40.201057 383526 ksck_remote.cc:290] Couldn't fetch quiescing info from tablet server 36e8894c4e6d48c690f64ade8b5fe52d (ve1320.halxg.cloudera.com:7050): Remote error: unsupported feature flags
I0228 18:36:40.202527 383528 ksck_remote.cc:290] Couldn't fetch quiescing info from tablet server 9dfdd5aac2814353bd50cefca2d77403 (ve1321.halxg.cloudera.com:7050): Remote error: unsupported feature flags
I0228 18:36:40.202736 383530 ksck_remote.cc:290] Couldn't fetch quiescing info from tablet server a5dd443f61464c34aca585a905e87926 (ve1322.halxg.cloudera.com:7050): Remote error: unsupported feature flags
I0228 18:36:40.202940 383532 ksck_remote.cc:290] Couldn't fetch quiescing info from tablet server dffda2ef2d33481993d29009f3f87420 (ve1323.halxg.cloudera.com:7050): Remote error: unsupported feature flags
I0228 18:36:40.203280 383536 ksck_remote.cc:290] Couldn't fetch quiescing info from tablet server efc1275241604b0aa886494f8da9e00b (ve1324.halxg.cloudera.com:7050): Remote error: unsupported feature flags
...
Tablet Server Summary
               UUID               |            Address             | Status  | Location | Quiescing | Tablet Leaders | Active Scanners
----------------------------------+--------------------------------+---------+----------+-----------+----------------+-----------------
 1e8c8c55d0e24110b29caaecdae491ca | ve1318.halxg.cloudera.com:7050 | HEALTHY | /default | n/a       | n/a            | n/a
 36e8894c4e6d48c690f64ade8b5fe52d | ve1320.halxg.cloudera.com:7050 | HEALTHY | /default | n/a       | n/a            | n/a
 629bbaecfead49f18247d7963cfa98af | ve1319.halxg.cloudera.com:7050 | HEALTHY | /default | n/a       | n/a            | n/a
 9dfdd5aac2814353bd50cefca2d77403 | ve1321.halxg.cloudera.com:7050 | HEALTHY | /default | n/a       | n/a            | n/a
 9fe2954950ea4f4eaecc4ef97c6eb44a | ve1317.halxg.cloudera.com:7050 | HEALTHY | /default | true      |       5        |       0
 a5dd443f61464c34aca585a905e87926 | ve1322.halxg.cloudera.com:7050 | HEALTHY | /default | n/a       | n/a            | n/a
 dffda2ef2d33481993d29009f3f87420 | ve1323.halxg.cloudera.com:7050 | HEALTHY | /default | n/a       | n/a            | n/a
 e6c9b1df642a4cf69c47f36480dd4723 | ve1316.halxg.cloudera.com:7050 | HEALTHY | /default | false     |       6        |       0
 efc1275241604b0aa886494f8da9e00b | ve1324.halxg.cloudera.com:7050 | HEALTHY | /default | n/a       | n/a            | n/a

Change-Id: Ibdc650eb3ee30e8993330f2cbd389076ea2bad49
---
M src/kudu/integration-tests/tablet_server_quiescing-itest.cc
M src/kudu/rebalance/cluster_status.h
M src/kudu/tools/ksck-test.cc
M src/kudu/tools/ksck.cc
M src/kudu/tools/ksck.h
M src/kudu/tools/ksck_remote.cc
M src/kudu/tools/ksck_remote.h
M src/kudu/tools/ksck_results.cc
M src/kudu/tools/tool_action_cluster.cc
M src/kudu/tools/tool_action_tserver.cc
M src/kudu/tserver/tablet_service.cc
M src/kudu/tserver/tablet_service.h
M src/kudu/tserver/tserver.proto
13 files changed, 166 insertions(+), 18 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/23/15323/2
-- 
To view, visit http://gerrit.cloudera.org:8080/15323
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibdc650eb3ee30e8993330f2cbd389076ea2bad49
Gerrit-Change-Number: 15323
Gerrit-PatchSet: 2
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)

[kudu-CR] ksck: display quiecing-related info

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/15323 )

Change subject: ksck: display quiecing-related info
......................................................................


Patch Set 3: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/15323
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibdc650eb3ee30e8993330f2cbd389076ea2bad49
Gerrit-Change-Number: 15323
Gerrit-PatchSet: 3
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Wed, 04 Mar 2020 05:32:21 +0000
Gerrit-HasComments: No

[kudu-CR] ksck: display quiecing-related info

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/15323 )

Change subject: ksck: display quiecing-related info
......................................................................

ksck: display quiecing-related info

This patch adds quiescing-related info to ksck's "Tablet Server Summary"
section. Specifically, it displays the quiescing state, the number of
tablet leaders, and the number of active scanners[1].

If none of the tablet servers are quiescing, the quiescing state column
is omitted. If none of the tablet servers support the quiescing RPC, all
related columns are omitted.

I manually tested against a cluster that fully didn't support quiescing,
as well as one that partially supports quiescing[2].

The info is displayed by default with ksck, since the information may be
invaluable in debugging performance or workload skew. The info can be
ommitted by setting `--quiescing_info` to false.

[1] Sample output:
Tablet Server Summary
               UUID               |            Address             | Status  | Location | Quiescing | Tablet Leaders | Active Scanners
----------------------------------+--------------------------------+---------+----------+-----------+----------------+-----------------
 1e8c8c55d0e24110b29caaecdae491ca | ve1318.halxg.cloudera.com:7050 | HEALTHY | /default | false     |       2        |       0
 36e8894c4e6d48c690f64ade8b5fe52d | ve1320.halxg.cloudera.com:7050 | HEALTHY | /default | false     |       7        |       0
 629bbaecfead49f18247d7963cfa98af | ve1319.halxg.cloudera.com:7050 | HEALTHY | /default | false     |       0        |       0
 9dfdd5aac2814353bd50cefca2d77403 | ve1321.halxg.cloudera.com:7050 | HEALTHY | /default | false     |       8        |       0
 9fe2954950ea4f4eaecc4ef97c6eb44a | ve1317.halxg.cloudera.com:7050 | HEALTHY | /default | false     |       6        |       0
 a5dd443f61464c34aca585a905e87926 | ve1322.halxg.cloudera.com:7050 | HEALTHY | /default | false     |       7        |       0
 dffda2ef2d33481993d29009f3f87420 | ve1323.halxg.cloudera.com:7050 | HEALTHY | /default | true      |       6        |       0
 e6c9b1df642a4cf69c47f36480dd4723 | ve1316.halxg.cloudera.com:7050 | HEALTHY | /default | false     |       6        |       0
 efc1275241604b0aa886494f8da9e00b | ve1324.halxg.cloudera.com:7050 | HEALTHY | /default | false     |       7        |       0

[2] Output of partial support for quiescing across the cluster yields "partial"
    results; not the prettiest, but it's also not a scenario we expect often:
W0228 18:36:40.200479 383527 ksck_remote.cc:290] Couldn't fetch quiescing info from tablet server 629bbaecfead49f18247d7963cfa98af (ve1319.halxg.cloudera.com:7050): Remote error: unsupported feature flags
W0228 18:36:40.200585 383525 ksck_remote.cc:290] Couldn't fetch quiescing info from tablet server 1e8c8c55d0e24110b29caaecdae491ca (ve1318.halxg.cloudera.com:7050): Remote error: unsupported feature flags
W0228 18:36:40.201057 383526 ksck_remote.cc:290] Couldn't fetch quiescing info from tablet server 36e8894c4e6d48c690f64ade8b5fe52d (ve1320.halxg.cloudera.com:7050): Remote error: unsupported feature flags
W0228 18:36:40.202527 383528 ksck_remote.cc:290] Couldn't fetch quiescing info from tablet server 9dfdd5aac2814353bd50cefca2d77403 (ve1321.halxg.cloudera.com:7050): Remote error: unsupported feature flags
W0228 18:36:40.202736 383530 ksck_remote.cc:290] Couldn't fetch quiescing info from tablet server a5dd443f61464c34aca585a905e87926 (ve1322.halxg.cloudera.com:7050): Remote error: unsupported feature flags
W0228 18:36:40.202940 383532 ksck_remote.cc:290] Couldn't fetch quiescing info from tablet server dffda2ef2d33481993d29009f3f87420 (ve1323.halxg.cloudera.com:7050): Remote error: unsupported feature flags
W0228 18:36:40.203280 383536 ksck_remote.cc:290] Couldn't fetch quiescing info from tablet server efc1275241604b0aa886494f8da9e00b (ve1324.halxg.cloudera.com:7050): Remote error: unsupported feature flags
...
Tablet Server Summary
               UUID               |            Address             | Status  | Location | Quiescing | Tablet Leaders | Active Scanners
----------------------------------+--------------------------------+---------+----------+-----------+----------------+-----------------
 1e8c8c55d0e24110b29caaecdae491ca | ve1318.halxg.cloudera.com:7050 | HEALTHY | /default | n/a       | n/a            | n/a
 36e8894c4e6d48c690f64ade8b5fe52d | ve1320.halxg.cloudera.com:7050 | HEALTHY | /default | n/a       | n/a            | n/a
 629bbaecfead49f18247d7963cfa98af | ve1319.halxg.cloudera.com:7050 | HEALTHY | /default | n/a       | n/a            | n/a
 9dfdd5aac2814353bd50cefca2d77403 | ve1321.halxg.cloudera.com:7050 | HEALTHY | /default | n/a       | n/a            | n/a
 9fe2954950ea4f4eaecc4ef97c6eb44a | ve1317.halxg.cloudera.com:7050 | HEALTHY | /default | true      |       5        |       0
 a5dd443f61464c34aca585a905e87926 | ve1322.halxg.cloudera.com:7050 | HEALTHY | /default | n/a       | n/a            | n/a
 dffda2ef2d33481993d29009f3f87420 | ve1323.halxg.cloudera.com:7050 | HEALTHY | /default | n/a       | n/a            | n/a
 e6c9b1df642a4cf69c47f36480dd4723 | ve1316.halxg.cloudera.com:7050 | HEALTHY | /default | false     |       6        |       0
 efc1275241604b0aa886494f8da9e00b | ve1324.halxg.cloudera.com:7050 | HEALTHY | /default | n/a       | n/a            | n/a

Change-Id: Ibdc650eb3ee30e8993330f2cbd389076ea2bad49
Reviewed-on: http://gerrit.cloudera.org:8080/15323
Reviewed-by: Adar Dembo <ad...@cloudera.com>
Tested-by: Kudu Jenkins
---
M src/kudu/integration-tests/tablet_server_quiescing-itest.cc
M src/kudu/rebalance/cluster_status.h
M src/kudu/tools/ksck-test.cc
M src/kudu/tools/ksck.cc
M src/kudu/tools/ksck.h
M src/kudu/tools/ksck_remote.cc
M src/kudu/tools/ksck_remote.h
M src/kudu/tools/ksck_results.cc
M src/kudu/tools/tool_action_cluster.cc
M src/kudu/tools/tool_action_tserver.cc
M src/kudu/tserver/tablet_service.cc
M src/kudu/tserver/tablet_service.h
M src/kudu/tserver/tserver.proto
13 files changed, 164 insertions(+), 17 deletions(-)

Approvals:
  Adar Dembo: Looks good to me, approved
  Kudu Jenkins: Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/15323
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ibdc650eb3ee30e8993330f2cbd389076ea2bad49
Gerrit-Change-Number: 15323
Gerrit-PatchSet: 5
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)

[kudu-CR] ksck: display quiecing-related info

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/15323 )

Change subject: ksck: display quiecing-related info
......................................................................


Patch Set 2: Verified+1

KUDU-801 strikes again.


-- 
To view, visit http://gerrit.cloudera.org:8080/15323
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibdc650eb3ee30e8993330f2cbd389076ea2bad49
Gerrit-Change-Number: 15323
Gerrit-PatchSet: 2
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Mon, 02 Mar 2020 23:29:07 +0000
Gerrit-HasComments: No

[kudu-CR] ksck: display quiecing-related info

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/15323 )

Change subject: ksck: display quiecing-related info
......................................................................


Patch Set 3: Verified+1

Unrelated flake of maintenance_mode-itest.


-- 
To view, visit http://gerrit.cloudera.org:8080/15323
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibdc650eb3ee30e8993330f2cbd389076ea2bad49
Gerrit-Change-Number: 15323
Gerrit-PatchSet: 3
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Tue, 03 Mar 2020 22:21:53 +0000
Gerrit-HasComments: No

[kudu-CR] ksck: display quiecing-related info

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Kudu Jenkins, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/15323

to look at the new patch set (#3).

Change subject: ksck: display quiecing-related info
......................................................................

ksck: display quiecing-related info

This patch adds quiescing-related info to ksck's "Tablet Server Summary"
section. Specifically, it displays the quiescing state, the number of
tablet leaders, and the number of active scanners[1].

If none of the tablet servers are quiescing, the quiescing state column
is omitted. If none of the tablet servers support the quiescing RPC, all
related columns are omitted.

I manually tested against a cluster that fully didn't support quiescing,
as well as one that partially supports quiescing[2].

The info is displayed by default with ksck, since the information may be
invaluable in debugging performance or workload skew. The info can be
ommitted by setting `--quiescing_info` to false.

[1] Sample output:
Tablet Server Summary
               UUID               |            Address             | Status  | Location | Quiescing | Tablet Leaders | Active Scanners
----------------------------------+--------------------------------+---------+----------+-----------+----------------+-----------------
 1e8c8c55d0e24110b29caaecdae491ca | ve1318.halxg.cloudera.com:7050 | HEALTHY | /default | false     |       2        |       0
 36e8894c4e6d48c690f64ade8b5fe52d | ve1320.halxg.cloudera.com:7050 | HEALTHY | /default | false     |       7        |       0
 629bbaecfead49f18247d7963cfa98af | ve1319.halxg.cloudera.com:7050 | HEALTHY | /default | false     |       0        |       0
 9dfdd5aac2814353bd50cefca2d77403 | ve1321.halxg.cloudera.com:7050 | HEALTHY | /default | false     |       8        |       0
 9fe2954950ea4f4eaecc4ef97c6eb44a | ve1317.halxg.cloudera.com:7050 | HEALTHY | /default | false     |       6        |       0
 a5dd443f61464c34aca585a905e87926 | ve1322.halxg.cloudera.com:7050 | HEALTHY | /default | false     |       7        |       0
 dffda2ef2d33481993d29009f3f87420 | ve1323.halxg.cloudera.com:7050 | HEALTHY | /default | true      |       6        |       0
 e6c9b1df642a4cf69c47f36480dd4723 | ve1316.halxg.cloudera.com:7050 | HEALTHY | /default | false     |       6        |       0
 efc1275241604b0aa886494f8da9e00b | ve1324.halxg.cloudera.com:7050 | HEALTHY | /default | false     |       7        |       0

[2] Output of partial support for quiescing across the cluster yields "partial"
    results; not the prettiest, but it's also not a scenario we expect often:
W0228 18:36:40.200479 383527 ksck_remote.cc:290] Couldn't fetch quiescing info from tablet server 629bbaecfead49f18247d7963cfa98af (ve1319.halxg.cloudera.com:7050): Remote error: unsupported feature flags
W0228 18:36:40.200585 383525 ksck_remote.cc:290] Couldn't fetch quiescing info from tablet server 1e8c8c55d0e24110b29caaecdae491ca (ve1318.halxg.cloudera.com:7050): Remote error: unsupported feature flags
W0228 18:36:40.201057 383526 ksck_remote.cc:290] Couldn't fetch quiescing info from tablet server 36e8894c4e6d48c690f64ade8b5fe52d (ve1320.halxg.cloudera.com:7050): Remote error: unsupported feature flags
W0228 18:36:40.202527 383528 ksck_remote.cc:290] Couldn't fetch quiescing info from tablet server 9dfdd5aac2814353bd50cefca2d77403 (ve1321.halxg.cloudera.com:7050): Remote error: unsupported feature flags
W0228 18:36:40.202736 383530 ksck_remote.cc:290] Couldn't fetch quiescing info from tablet server a5dd443f61464c34aca585a905e87926 (ve1322.halxg.cloudera.com:7050): Remote error: unsupported feature flags
W0228 18:36:40.202940 383532 ksck_remote.cc:290] Couldn't fetch quiescing info from tablet server dffda2ef2d33481993d29009f3f87420 (ve1323.halxg.cloudera.com:7050): Remote error: unsupported feature flags
W0228 18:36:40.203280 383536 ksck_remote.cc:290] Couldn't fetch quiescing info from tablet server efc1275241604b0aa886494f8da9e00b (ve1324.halxg.cloudera.com:7050): Remote error: unsupported feature flags
...
Tablet Server Summary
               UUID               |            Address             | Status  | Location | Quiescing | Tablet Leaders | Active Scanners
----------------------------------+--------------------------------+---------+----------+-----------+----------------+-----------------
 1e8c8c55d0e24110b29caaecdae491ca | ve1318.halxg.cloudera.com:7050 | HEALTHY | /default | n/a       | n/a            | n/a
 36e8894c4e6d48c690f64ade8b5fe52d | ve1320.halxg.cloudera.com:7050 | HEALTHY | /default | n/a       | n/a            | n/a
 629bbaecfead49f18247d7963cfa98af | ve1319.halxg.cloudera.com:7050 | HEALTHY | /default | n/a       | n/a            | n/a
 9dfdd5aac2814353bd50cefca2d77403 | ve1321.halxg.cloudera.com:7050 | HEALTHY | /default | n/a       | n/a            | n/a
 9fe2954950ea4f4eaecc4ef97c6eb44a | ve1317.halxg.cloudera.com:7050 | HEALTHY | /default | true      |       5        |       0
 a5dd443f61464c34aca585a905e87926 | ve1322.halxg.cloudera.com:7050 | HEALTHY | /default | n/a       | n/a            | n/a
 dffda2ef2d33481993d29009f3f87420 | ve1323.halxg.cloudera.com:7050 | HEALTHY | /default | n/a       | n/a            | n/a
 e6c9b1df642a4cf69c47f36480dd4723 | ve1316.halxg.cloudera.com:7050 | HEALTHY | /default | false     |       6        |       0
 efc1275241604b0aa886494f8da9e00b | ve1324.halxg.cloudera.com:7050 | HEALTHY | /default | n/a       | n/a            | n/a

Change-Id: Ibdc650eb3ee30e8993330f2cbd389076ea2bad49
---
M src/kudu/integration-tests/tablet_server_quiescing-itest.cc
M src/kudu/rebalance/cluster_status.h
M src/kudu/tools/ksck-test.cc
M src/kudu/tools/ksck.cc
M src/kudu/tools/ksck.h
M src/kudu/tools/ksck_remote.cc
M src/kudu/tools/ksck_remote.h
M src/kudu/tools/ksck_results.cc
M src/kudu/tools/tool_action_cluster.cc
M src/kudu/tools/tool_action_tserver.cc
M src/kudu/tserver/tablet_service.cc
M src/kudu/tserver/tablet_service.h
M src/kudu/tserver/tserver.proto
13 files changed, 167 insertions(+), 18 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/23/15323/3
-- 
To view, visit http://gerrit.cloudera.org:8080/15323
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibdc650eb3ee30e8993330f2cbd389076ea2bad49
Gerrit-Change-Number: 15323
Gerrit-PatchSet: 3
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)

[kudu-CR] ksck: display quiecing-related info

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/15323 )

Change subject: ksck: display quiecing-related info
......................................................................


Patch Set 3: Code-Review+2

LGTM, maybe Adar has more feedback on PS3.


-- 
To view, visit http://gerrit.cloudera.org:8080/15323
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibdc650eb3ee30e8993330f2cbd389076ea2bad49
Gerrit-Change-Number: 15323
Gerrit-PatchSet: 3
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Tue, 03 Mar 2020 22:56:50 +0000
Gerrit-HasComments: No

[kudu-CR] ksck: display quiecing-related info

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/15323 )

Change subject: ksck: display quiecing-related info
......................................................................


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/15323/2/src/kudu/tools/ksck.h
File src/kudu/tools/ksck.h:

http://gerrit.cloudera.org:8080/#/c/15323/2/src/kudu/tools/ksck.h@401
PS2, Line 401:   std::atomic<uint64_t> timestamp_;
Nit: separate from the above with an empty line, so it's clear the comment only applies to quiescing_info_ and not the rest?


http://gerrit.cloudera.org:8080/#/c/15323/2/src/kudu/tools/ksck_remote.cc
File src/kudu/tools/ksck_remote.cc:

http://gerrit.cloudera.org:8080/#/c/15323/2/src/kudu/tools/ksck_remote.cc@290
PS2, Line 290: INFO
Not WARNING?



-- 
To view, visit http://gerrit.cloudera.org:8080/15323
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibdc650eb3ee30e8993330f2cbd389076ea2bad49
Gerrit-Change-Number: 15323
Gerrit-PatchSet: 2
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Tue, 03 Mar 2020 06:29:19 +0000
Gerrit-HasComments: Yes

[kudu-CR] ksck: display quiecing-related info

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Kudu Jenkins, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/15323

to look at the new patch set (#4).

Change subject: ksck: display quiecing-related info
......................................................................

ksck: display quiecing-related info

This patch adds quiescing-related info to ksck's "Tablet Server Summary"
section. Specifically, it displays the quiescing state, the number of
tablet leaders, and the number of active scanners[1].

If none of the tablet servers are quiescing, the quiescing state column
is omitted. If none of the tablet servers support the quiescing RPC, all
related columns are omitted.

I manually tested against a cluster that fully didn't support quiescing,
as well as one that partially supports quiescing[2].

The info is displayed by default with ksck, since the information may be
invaluable in debugging performance or workload skew. The info can be
ommitted by setting `--quiescing_info` to false.

[1] Sample output:
Tablet Server Summary
               UUID               |            Address             | Status  | Location | Quiescing | Tablet Leaders | Active Scanners
----------------------------------+--------------------------------+---------+----------+-----------+----------------+-----------------
 1e8c8c55d0e24110b29caaecdae491ca | ve1318.halxg.cloudera.com:7050 | HEALTHY | /default | false     |       2        |       0
 36e8894c4e6d48c690f64ade8b5fe52d | ve1320.halxg.cloudera.com:7050 | HEALTHY | /default | false     |       7        |       0
 629bbaecfead49f18247d7963cfa98af | ve1319.halxg.cloudera.com:7050 | HEALTHY | /default | false     |       0        |       0
 9dfdd5aac2814353bd50cefca2d77403 | ve1321.halxg.cloudera.com:7050 | HEALTHY | /default | false     |       8        |       0
 9fe2954950ea4f4eaecc4ef97c6eb44a | ve1317.halxg.cloudera.com:7050 | HEALTHY | /default | false     |       6        |       0
 a5dd443f61464c34aca585a905e87926 | ve1322.halxg.cloudera.com:7050 | HEALTHY | /default | false     |       7        |       0
 dffda2ef2d33481993d29009f3f87420 | ve1323.halxg.cloudera.com:7050 | HEALTHY | /default | true      |       6        |       0
 e6c9b1df642a4cf69c47f36480dd4723 | ve1316.halxg.cloudera.com:7050 | HEALTHY | /default | false     |       6        |       0
 efc1275241604b0aa886494f8da9e00b | ve1324.halxg.cloudera.com:7050 | HEALTHY | /default | false     |       7        |       0

[2] Output of partial support for quiescing across the cluster yields "partial"
    results; not the prettiest, but it's also not a scenario we expect often:
W0228 18:36:40.200479 383527 ksck_remote.cc:290] Couldn't fetch quiescing info from tablet server 629bbaecfead49f18247d7963cfa98af (ve1319.halxg.cloudera.com:7050): Remote error: unsupported feature flags
W0228 18:36:40.200585 383525 ksck_remote.cc:290] Couldn't fetch quiescing info from tablet server 1e8c8c55d0e24110b29caaecdae491ca (ve1318.halxg.cloudera.com:7050): Remote error: unsupported feature flags
W0228 18:36:40.201057 383526 ksck_remote.cc:290] Couldn't fetch quiescing info from tablet server 36e8894c4e6d48c690f64ade8b5fe52d (ve1320.halxg.cloudera.com:7050): Remote error: unsupported feature flags
W0228 18:36:40.202527 383528 ksck_remote.cc:290] Couldn't fetch quiescing info from tablet server 9dfdd5aac2814353bd50cefca2d77403 (ve1321.halxg.cloudera.com:7050): Remote error: unsupported feature flags
W0228 18:36:40.202736 383530 ksck_remote.cc:290] Couldn't fetch quiescing info from tablet server a5dd443f61464c34aca585a905e87926 (ve1322.halxg.cloudera.com:7050): Remote error: unsupported feature flags
W0228 18:36:40.202940 383532 ksck_remote.cc:290] Couldn't fetch quiescing info from tablet server dffda2ef2d33481993d29009f3f87420 (ve1323.halxg.cloudera.com:7050): Remote error: unsupported feature flags
W0228 18:36:40.203280 383536 ksck_remote.cc:290] Couldn't fetch quiescing info from tablet server efc1275241604b0aa886494f8da9e00b (ve1324.halxg.cloudera.com:7050): Remote error: unsupported feature flags
...
Tablet Server Summary
               UUID               |            Address             | Status  | Location | Quiescing | Tablet Leaders | Active Scanners
----------------------------------+--------------------------------+---------+----------+-----------+----------------+-----------------
 1e8c8c55d0e24110b29caaecdae491ca | ve1318.halxg.cloudera.com:7050 | HEALTHY | /default | n/a       | n/a            | n/a
 36e8894c4e6d48c690f64ade8b5fe52d | ve1320.halxg.cloudera.com:7050 | HEALTHY | /default | n/a       | n/a            | n/a
 629bbaecfead49f18247d7963cfa98af | ve1319.halxg.cloudera.com:7050 | HEALTHY | /default | n/a       | n/a            | n/a
 9dfdd5aac2814353bd50cefca2d77403 | ve1321.halxg.cloudera.com:7050 | HEALTHY | /default | n/a       | n/a            | n/a
 9fe2954950ea4f4eaecc4ef97c6eb44a | ve1317.halxg.cloudera.com:7050 | HEALTHY | /default | true      |       5        |       0
 a5dd443f61464c34aca585a905e87926 | ve1322.halxg.cloudera.com:7050 | HEALTHY | /default | n/a       | n/a            | n/a
 dffda2ef2d33481993d29009f3f87420 | ve1323.halxg.cloudera.com:7050 | HEALTHY | /default | n/a       | n/a            | n/a
 e6c9b1df642a4cf69c47f36480dd4723 | ve1316.halxg.cloudera.com:7050 | HEALTHY | /default | false     |       6        |       0
 efc1275241604b0aa886494f8da9e00b | ve1324.halxg.cloudera.com:7050 | HEALTHY | /default | n/a       | n/a            | n/a

Change-Id: Ibdc650eb3ee30e8993330f2cbd389076ea2bad49
---
M src/kudu/integration-tests/tablet_server_quiescing-itest.cc
M src/kudu/rebalance/cluster_status.h
M src/kudu/tools/ksck-test.cc
M src/kudu/tools/ksck.cc
M src/kudu/tools/ksck.h
M src/kudu/tools/ksck_remote.cc
M src/kudu/tools/ksck_remote.h
M src/kudu/tools/ksck_results.cc
M src/kudu/tools/tool_action_cluster.cc
M src/kudu/tools/tool_action_tserver.cc
M src/kudu/tserver/tablet_service.cc
M src/kudu/tserver/tablet_service.h
M src/kudu/tserver/tserver.proto
13 files changed, 164 insertions(+), 17 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/23/15323/4
-- 
To view, visit http://gerrit.cloudera.org:8080/15323
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibdc650eb3ee30e8993330f2cbd389076ea2bad49
Gerrit-Change-Number: 15323
Gerrit-PatchSet: 4
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)

[kudu-CR] ksck: display quiecing-related info

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/15323 )

Change subject: ksck: display quiecing-related info
......................................................................


Patch Set 3:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/15323/1/src/kudu/integration-tests/tablet_server_quiescing-itest.cc
File src/kudu/integration-tests/tablet_server_quiescing-itest.cc:

http://gerrit.cloudera.org:8080/#/c/15323/1/src/kudu/integration-tests/tablet_server_quiescing-itest.cc@433
PS1, Line 433: ;
> I think there is some value in seeing number of leader replicas and number 
Yeah, I'm taking a "show it if it's important to know about" approach, and if we're not quiescing at all, it's not important to know about.

It is somewhat that this is a note on quiescing servers because that's all we need for the sake of quiescing. But leaders/scanners itself seems useful to know about, regardless of quiescing status.

I chatted with Grant a bit and he agrees that showing this stuff by default seems desirable, albeit at the cost of making ksck even bulkier.


http://gerrit.cloudera.org:8080/#/c/15323/2/src/kudu/tools/ksck.h
File src/kudu/tools/ksck.h:

http://gerrit.cloudera.org:8080/#/c/15323/2/src/kudu/tools/ksck.h@401
PS2, Line 401: 
> Nit: separate from the above with an empty line, so it's clear the comment 
Done


http://gerrit.cloudera.org:8080/#/c/15323/1/src/kudu/tools/ksck_remote.cc
File src/kudu/tools/ksck_remote.cc:

http://gerrit.cloudera.org:8080/#/c/15323/1/src/kudu/tools/ksck_remote.cc@81
PS1, Line 81: true
> Yep, that makes sense to me.  One question: once quiescing status is observ
They may, or if they know that a specific set of servers is being quiesced, they may want to periodically check the status of that as leaders and scanners dwindle, but they may not want to run ksck, since that'd return info on the entire cluster.


http://gerrit.cloudera.org:8080/#/c/15323/2/src/kudu/tools/ksck_remote.cc
File src/kudu/tools/ksck_remote.cc:

http://gerrit.cloudera.org:8080/#/c/15323/2/src/kudu/tools/ksck_remote.cc@290
PS2, Line 290: WARN
> Not WARNING?
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/15323
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibdc650eb3ee30e8993330f2cbd389076ea2bad49
Gerrit-Change-Number: 15323
Gerrit-PatchSet: 3
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Tue, 03 Mar 2020 21:15:13 +0000
Gerrit-HasComments: Yes