You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Will Berkeley (Code Review)" <ge...@cloudera.org> on 2018/04/13 18:12:28 UTC

[kudu-CR] [tools] ksck improvements [5/n]: Checks for experimental, unsafe, hidden flags

Will Berkeley has uploaded this change for review. ( http://gerrit.cloudera.org:8080/10061


Change subject: [tools] ksck improvements [5/n]: Checks for experimental, unsafe, hidden flags
......................................................................

[tools] ksck improvements [5/n]: Checks for experimental, unsafe, hidden flags

This adds checks to ksck that look for experimental, unsafe, and hidden
flags set to non-default values on Kudu masters and tablet servers. If
any are found, ksck generates a table summarizing the different flags and
their values. For example:

WARNING: Some masters have hidden, experimental, or unsafe flags set:
          Flag          |        Value        |         Tags         |                    Master
------------------------+---------------------+----------------------+----------------------------------------------
 codegen_dump_functions | true                | runtime,experimental | localhost:7052,localhost:7053
 min_compression_ratio  | 0.80000000000000004 | experimental         | localhost:7052,localhost:7051,localhost:7053
 safe_time_max_lag_ms   | 40000               | experimental         | localhost:7052
 safe_time_max_lag_ms   | 50000               | experimental         | localhost:7053

The table has one row for each unique (flag, value) pair, listing all
daemons with --flag=value. So, in the above output, there are two rows
for the flag --safe_time_max_lag_ms  because it's set to two different
values on two masters. This makes it easy to scan for concerning flags
and their values.

A new flag --check_unusual_flags controls whether flag checks are done.
It default to true. Flag checks may be worth turning off if a cluster
has a large amount of tablet servers with experimental flags, or if a
newer kudu tool is run against a cluster without GetFlags RPC support,
since in these cases the flag check output will be noisy.

Change-Id: Idd6c179e5256b2f2bae2f7486c5e0365ef184706
---
M src/kudu/integration-tests/cluster_verifier.cc
M src/kudu/tools/ksck-test.cc
M src/kudu/tools/ksck.cc
M src/kudu/tools/ksck.h
M src/kudu/tools/ksck_remote-test.cc
M src/kudu/tools/ksck_remote.cc
M src/kudu/tools/ksck_remote.h
M src/kudu/tools/tool_action_cluster.cc
8 files changed, 323 insertions(+), 1 deletion(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/61/10061/1
-- 
To view, visit http://gerrit.cloudera.org:8080/10061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Idd6c179e5256b2f2bae2f7486c5e0365ef184706
Gerrit-Change-Number: 10061
Gerrit-PatchSet: 1
Gerrit-Owner: Will Berkeley <wd...@gmail.com>

[kudu-CR] [tools] ksck improvements [5/n]: Checks for experimental, unsafe, hidden flags

Posted by "Attila Bukor (Code Review)" <ge...@cloudera.org>.
Attila Bukor has posted comments on this change. ( http://gerrit.cloudera.org:8080/10061 )

Change subject: [tools] ksck improvements [5/n]: Checks for experimental, unsafe, hidden flags
......................................................................


Patch Set 7: Code-Review+1

(2 comments)

http://gerrit.cloudera.org:8080/#/c/10061/5/src/kudu/tools/ksck.cc
File src/kudu/tools/ksck.cc:

http://gerrit.cloudera.org:8080/#/c/10061/5/src/kudu/tools/ksck.cc@369
PS5, Line 369:           // Failing to gather flags is only a warning.
> Because the first and most important part is filling out the summary. The f
makes sense, thanks


http://gerrit.cloudera.org:8080/#/c/10061/5/src/kudu/tools/ksck_remote-test.cc
File src/kudu/tools/ksck_remote-test.cc:

http://gerrit.cloudera.org:8080/#/c/10061/5/src/kudu/tools/ksck_remote-test.cc@95
PS5, Line 95: 
> If the flag moves to stable we pick a new flag. I don't think it's worth ad
I meant adding this flag only in the test context, but I'm not sure if it's possible to do and changing the flag used for this test if safe_time_max_lag_ms ever gets stable sounds good enough. The comment seems clear enough that whoever breaks this test by graduating this flag should know what to do.



-- 
To view, visit http://gerrit.cloudera.org:8080/10061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idd6c179e5256b2f2bae2f7486c5e0365ef184706
Gerrit-Change-Number: 10061
Gerrit-PatchSet: 7
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Fri, 25 May 2018 19:57:55 +0000
Gerrit-HasComments: Yes

[kudu-CR] [tools] ksck improvements [5/n]: Checks for experimental, unsafe, hidden flags

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/10061 )

Change subject: [tools] ksck improvements [5/n]: Checks for experimental, unsafe, hidden flags
......................................................................

[tools] ksck improvements [5/n]: Checks for experimental, unsafe, hidden flags

This adds checks to ksck that look for experimental, unsafe, and hidden
flags set to non-default values on Kudu masters and tablet servers. If
any are found, ksck generates a table summarizing the different flags and
their values. For example:

          Flag          |        Value        |         Tags         |            Master
------------------------+---------------------+----------------------+-------------------------------
 codegen_dump_functions | true                | runtime,experimental | localhost:7052,localhost:7053
 min_compression_ratio  | 0.80000000000000004 | experimental         | all 3 server(s) checked
 safe_time_max_lag_ms   | 40000               | experimental         | localhost:7052
 safe_time_max_lag_ms   | 50000               | experimental         | localhost:7053

The table has one row for each unique (flag, value) pair, listing all
daemons with --flag=value. So, in the above output, there are two rows
for the flag --safe_time_max_lag_ms  because it's set to two different
values on two masters. This makes it easy to scan for concerning flags
and their values.

Since the output might not scale to a large number of
servers, the CSV of servers is abbreviated, by default, to 3 entries,
with the number of additional servers indicated. The number of entries
before truncation kicks in is controlled by --truncate_server_csv_length.
Additionally, if all checked servers have an unusual --flag=value we call
that out specially. For example, the above table reprinted with
--truncate_server_csv_length=1 would look like

          Flag          |        Value        |         Tags         |             Master
------------------------+---------------------+----------------------+--------------------------------------
 codegen_dump_functions | true                | runtime,experimental | localhost:7052 and 1 other server(s)
 min_compression_ratio  | 0.80000000000000004 | experimental         | all 3 server(s) checked
 safe_time_max_lag_ms   | 40000               | experimental         | localhost:7052
 safe_time_max_lag_ms   | 50000               | experimental         | localhost:7053

assuming that there are 3 servers checked in total.

Having unusual flags or failing to gather flags isn't considered an
error, since it doesn't indicate the cluster is unhealthy (in the latter
case because the daemon may not support the GetFlags RPC). Instead,
flag checks surface their results in a new warnings section near the
end of the ksck output.

The new warnings section looks like this in context:

==================
Warnings:
==================
Some masters have unsafe, experimental, or hidden flags set
unable to get flag information for tablet server 812db6461bae4f62a651e132f783ab53 (127.0.0.1:7250): could not get status from server: Client connection negotiation failed: client connection to 127.0.0.1:7250: connect: Connection refused (error 61)
Some tablet servers have unsafe, experimental, or hidden flags set
tserver flag check error: 1 of 3 tservers' flags were not available

==================
Errors:
==================
Network error: error fetching info from tablet servers: failed to gather info for all tablet servers: 1 of 3 had errors

FAILED
Runtime error: ksck discovered errors

Change-Id: Idd6c179e5256b2f2bae2f7486c5e0365ef184706
Reviewed-on: http://gerrit.cloudera.org:8080/10061
Tested-by: Kudu Jenkins
Reviewed-by: Andrew Wong <aw...@cloudera.com>
Reviewed-by: Attila Bukor <ab...@cloudera.com>
---
M src/kudu/tools/ksck-test.cc
M src/kudu/tools/ksck.cc
M src/kudu/tools/ksck.h
M src/kudu/tools/ksck_remote-test.cc
M src/kudu/tools/ksck_remote.cc
M src/kudu/tools/ksck_remote.h
M src/kudu/tools/ksck_results.cc
M src/kudu/tools/ksck_results.h
8 files changed, 459 insertions(+), 4 deletions(-)

Approvals:
  Kudu Jenkins: Verified
  Andrew Wong: Looks good to me, approved
  Attila Bukor: Looks good to me, but someone else must approve

-- 
To view, visit http://gerrit.cloudera.org:8080/10061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Idd6c179e5256b2f2bae2f7486c5e0365ef184706
Gerrit-Change-Number: 10061
Gerrit-PatchSet: 11
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] [tools] ksck improvements [5/n]: Checks for experimental, unsafe, hidden flags

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Attila Bukor, Kudu Jenkins, Andrew Wong, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10061

to look at the new patch set (#6).

Change subject: [tools] ksck improvements [5/n]: Checks for experimental, unsafe, hidden flags
......................................................................

[tools] ksck improvements [5/n]: Checks for experimental, unsafe, hidden flags

This adds checks to ksck that look for experimental, unsafe, and hidden
flags set to non-default values on Kudu masters and tablet servers. If
any are found, ksck generates a table summarizing the different flags and
their values. For example:

          Flag          |        Value        |         Tags         |            Master
------------------------+---------------------+----------------------+-------------------------------
 codegen_dump_functions | true                | runtime,experimental | localhost:7052,localhost:7053
 min_compression_ratio  | 0.80000000000000004 | experimental         | all 3 server(s) checked
 safe_time_max_lag_ms   | 40000               | experimental         | localhost:7052
 safe_time_max_lag_ms   | 50000               | experimental         | localhost:7053

The table has one row for each unique (flag, value) pair, listing all
daemons with --flag=value. So, in the above output, there are two rows
for the flag --safe_time_max_lag_ms  because it's set to two different
values on two masters. This makes it easy to scan for concerning flags
and their values.

Since the output might not scale to a large number of
servers, the CSV of servers is abbreviated, by default, to 3 entries,
with the number of additional servers indicated. The number of entries
before truncation kicks in is controlled by --truncate_server_csv_length.
Additionally, if all checked servers have an unusual --flag=value we call
that out specially. For example, the above table reprinted with
--truncate_server_csv_length=2 would look like

          Flag          |        Value        |         Tags         |             Master
------------------------+---------------------+----------------------+--------------------------------------
 codegen_dump_functions | true                | runtime,experimental | localhost:7052 and 1 other server(s)
 min_compression_ratio  | 0.80000000000000004 | experimental         | all 3 server(s) checked
 safe_time_max_lag_ms   | 40000               | experimental         | localhost:7052
 safe_time_max_lag_ms   | 50000               | experimental         | localhost:7053

assuming that there are 3 servers checked in total.

Having unusual flags or failing to gather flags isn't considered an
error, since it doesn't indicate the cluster is unhealthy (in the latter
case because the daemon may not support the GetFlags RPC). Instead,
flag checks surface their results in a new warnings section near the
end of the ksck output.

The new warnings section looks like this in context:

==================
Warnings:
==================
Some masters have unsafe, experimental, or hidden flags set
unable to get flag information for tablet server 812db6461bae4f62a651e132f783ab53 (127.0.0.1:7250): could not get status from server: Client connection negotiation failed: client connection to 127.0.0.1:7250: connect: Connection refused (error 61)
Some tablet servers have unsafe, experimental, or hidden flags set
tserver flag check error: 1 of 3 tservers' flags were not available

==================
Errors:
==================
Network error: error fetching info from tablet servers: failed to gather info for all tablet servers: 1 of 3 had errors

FAILED
Runtime error: ksck discovered errors

Change-Id: Idd6c179e5256b2f2bae2f7486c5e0365ef184706
---
M src/kudu/tools/ksck-test.cc
M src/kudu/tools/ksck.cc
M src/kudu/tools/ksck.h
M src/kudu/tools/ksck_remote-test.cc
M src/kudu/tools/ksck_remote.cc
M src/kudu/tools/ksck_remote.h
M src/kudu/tools/ksck_results.cc
M src/kudu/tools/ksck_results.h
8 files changed, 457 insertions(+), 9 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/61/10061/6
-- 
To view, visit http://gerrit.cloudera.org:8080/10061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idd6c179e5256b2f2bae2f7486c5e0365ef184706
Gerrit-Change-Number: 10061
Gerrit-PatchSet: 6
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] [tools] ksck improvements [5/n]: Checks for experimental, unsafe, hidden flags

Posted by "Attila Bukor (Code Review)" <ge...@cloudera.org>.
Attila Bukor has posted comments on this change. ( http://gerrit.cloudera.org:8080/10061 )

Change subject: [tools] ksck improvements [5/n]: Checks for experimental, unsafe, hidden flags
......................................................................


Patch Set 10: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10061/7/src/kudu/tools/ksck_results.cc
File src/kudu/tools/ksck_results.cc:

http://gerrit.cloudera.org:8080/#/c/10061/7/src/kudu/tools/ksck_results.cc@149
PS7, Line 149: n
> It's unsigned in the latest patch set.
damn, it seems I still can't use gerrit properly...



-- 
To view, visit http://gerrit.cloudera.org:8080/10061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idd6c179e5256b2f2bae2f7486c5e0365ef184706
Gerrit-Change-Number: 10061
Gerrit-PatchSet: 10
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Sun, 27 May 2018 08:37:46 +0000
Gerrit-HasComments: Yes

[kudu-CR] [tools] ksck improvements [5/n]: Checks for experimental, unsafe, hidden flags

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/10061 )

Change subject: [tools] ksck improvements [5/n]: Checks for experimental, unsafe, hidden flags
......................................................................


Patch Set 8:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/10061/7/src/kudu/tools/ksck_remote-test.cc
File src/kudu/tools/ksck_remote-test.cc:

http://gerrit.cloudera.org:8080/#/c/10061/7/src/kudu/tools/ksck_remote-test.cc@216
PS7, Line 216:  private:
             :   client::KuduSchema schema_;
             :   Random random_;
             : };
             : 
             : TEST_F(RemoteKsckTest, TestClusterOk) {
             :   ASSERT_OK(ksck_->CheckMasterHealth());
> I thought not but actually the test will usually fail if this is replaced w
Yep, sgtm. Was just curious since it would also hammer in the point that these other tests _aren't_ necessarily doing a full ksck run.


http://gerrit.cloudera.org:8080/#/c/10061/7/src/kudu/tools/ksck_results.cc
File src/kudu/tools/ksck_results.cc:

http://gerrit.cloudera.org:8080/#/c/10061/7/src/kudu/tools/ksck_results.cc@149
PS7, Line 149: n
> The block L144-146 guarantees that n > server.size() here.
Hmm... maybe i'm missing something. Doesn't that block return in the `n >= server.size()` case, and mean that over here, we're guaranteed the opposite? `n < server.size()`



-- 
To view, visit http://gerrit.cloudera.org:8080/10061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idd6c179e5256b2f2bae2f7486c5e0365ef184706
Gerrit-Change-Number: 10061
Gerrit-PatchSet: 8
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Fri, 25 May 2018 22:22:45 +0000
Gerrit-HasComments: Yes

[kudu-CR] [tools] ksck improvements [5/n]: Checks for experimental, unsafe, hidden flags

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Attila Bukor, Kudu Jenkins, Andrew Wong, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10061

to look at the new patch set (#5).

Change subject: [tools] ksck improvements [5/n]: Checks for experimental, unsafe, hidden flags
......................................................................

[tools] ksck improvements [5/n]: Checks for experimental, unsafe, hidden flags

This adds checks to ksck that look for experimental, unsafe, and hidden
flags set to non-default values on Kudu masters and tablet servers. If
any are found, ksck generates a table summarizing the different flags and
their values. For example:

          Flag          |        Value        |         Tags         |                    Master
------------------------+---------------------+----------------------+----------------------------------------------
 codegen_dump_functions | true                | runtime,experimental | localhost:7052,localhost:7053
 min_compression_ratio  | 0.80000000000000004 | experimental         | localhost:7052,localhost:7051,localhost:7053
 safe_time_max_lag_ms   | 40000               | experimental         | localhost:7052
 safe_time_max_lag_ms   | 50000               | experimental         | localhost:7053

The table has one row for each unique (flag, value) pair, listing all
daemons with --flag=value. So, in the above output, there are two rows
for the flag --safe_time_max_lag_ms  because it's set to two different
values on two masters. This makes it easy to scan for concerning flags
and their values.

Having unusual flags or failing to gather flags isn't considered an
error, since it doesn't indicate the cluster is unhealthy (in the latter
case because the daemon may not support the GetFlags RPC). Instead,
flag checks surface their results in a new warnings section near the
end of the ksck output.

Finally, since the output might not scale to a large number of
servers, the CSV of servers is abbreviated, by default, to 3 entries,
with the number of additional servers indicated. The number of entries
before truncation kicks in is controlled by --truncate_server_csv_length.

Change-Id: Idd6c179e5256b2f2bae2f7486c5e0365ef184706
---
M src/kudu/tools/ksck-test.cc
M src/kudu/tools/ksck.cc
M src/kudu/tools/ksck.h
M src/kudu/tools/ksck_remote-test.cc
M src/kudu/tools/ksck_remote.cc
M src/kudu/tools/ksck_remote.h
M src/kudu/tools/ksck_results.cc
M src/kudu/tools/ksck_results.h
8 files changed, 444 insertions(+), 12 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/61/10061/5
-- 
To view, visit http://gerrit.cloudera.org:8080/10061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idd6c179e5256b2f2bae2f7486c5e0365ef184706
Gerrit-Change-Number: 10061
Gerrit-PatchSet: 5
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] [tools] ksck improvements [5/n]: Checks for experimental, unsafe, hidden flags

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/10061 )

Change subject: [tools] ksck improvements [5/n]: Checks for experimental, unsafe, hidden flags
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10061/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10061/1//COMMIT_MSG@22
PS1, Line 22: listing all
            : daemons
> is this output going to scale to large clusters? eg if you have 200 tablet 
Mmm I had the same thought which inspired the flag to turn off the flags check. It's a bummer not to see which tablet servers have a flag set. Maybe, if we did something like list the first few tablet servers, and after some threshold abbreviate, e.g.

  my-tserver-0, my-tserver-1, my-tserver-2, and 86 other tablet servers

since users running a lot of tablet servers are most likely using config management software, showing a few addresses might help them figure out why the experimental flag is set. For example, maybe they set an experimental flag on a rack and forgot to remove it from the config group for that rack, and seeing a few hostnames that have rack info encoded in them might help.

Thoughts?



-- 
To view, visit http://gerrit.cloudera.org:8080/10061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idd6c179e5256b2f2bae2f7486c5e0365ef184706
Gerrit-Change-Number: 10061
Gerrit-PatchSet: 1
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Fri, 13 Apr 2018 20:00:38 +0000
Gerrit-HasComments: Yes

[kudu-CR] [tools] ksck improvements [5/n]: Checks for experimental, unsafe, hidden flags

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Attila Bukor, Kudu Jenkins, Andrew Wong, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10061

to look at the new patch set (#7).

Change subject: [tools] ksck improvements [5/n]: Checks for experimental, unsafe, hidden flags
......................................................................

[tools] ksck improvements [5/n]: Checks for experimental, unsafe, hidden flags

This adds checks to ksck that look for experimental, unsafe, and hidden
flags set to non-default values on Kudu masters and tablet servers. If
any are found, ksck generates a table summarizing the different flags and
their values. For example:

          Flag          |        Value        |         Tags         |            Master
------------------------+---------------------+----------------------+-------------------------------
 codegen_dump_functions | true                | runtime,experimental | localhost:7052,localhost:7053
 min_compression_ratio  | 0.80000000000000004 | experimental         | all 3 server(s) checked
 safe_time_max_lag_ms   | 40000               | experimental         | localhost:7052
 safe_time_max_lag_ms   | 50000               | experimental         | localhost:7053

The table has one row for each unique (flag, value) pair, listing all
daemons with --flag=value. So, in the above output, there are two rows
for the flag --safe_time_max_lag_ms  because it's set to two different
values on two masters. This makes it easy to scan for concerning flags
and their values.

Since the output might not scale to a large number of
servers, the CSV of servers is abbreviated, by default, to 3 entries,
with the number of additional servers indicated. The number of entries
before truncation kicks in is controlled by --truncate_server_csv_length.
Additionally, if all checked servers have an unusual --flag=value we call
that out specially. For example, the above table reprinted with
--truncate_server_csv_length=2 would look like

          Flag          |        Value        |         Tags         |             Master
------------------------+---------------------+----------------------+--------------------------------------
 codegen_dump_functions | true                | runtime,experimental | localhost:7052 and 1 other server(s)
 min_compression_ratio  | 0.80000000000000004 | experimental         | all 3 server(s) checked
 safe_time_max_lag_ms   | 40000               | experimental         | localhost:7052
 safe_time_max_lag_ms   | 50000               | experimental         | localhost:7053

assuming that there are 3 servers checked in total.

Having unusual flags or failing to gather flags isn't considered an
error, since it doesn't indicate the cluster is unhealthy (in the latter
case because the daemon may not support the GetFlags RPC). Instead,
flag checks surface their results in a new warnings section near the
end of the ksck output.

The new warnings section looks like this in context:

==================
Warnings:
==================
Some masters have unsafe, experimental, or hidden flags set
unable to get flag information for tablet server 812db6461bae4f62a651e132f783ab53 (127.0.0.1:7250): could not get status from server: Client connection negotiation failed: client connection to 127.0.0.1:7250: connect: Connection refused (error 61)
Some tablet servers have unsafe, experimental, or hidden flags set
tserver flag check error: 1 of 3 tservers' flags were not available

==================
Errors:
==================
Network error: error fetching info from tablet servers: failed to gather info for all tablet servers: 1 of 3 had errors

FAILED
Runtime error: ksck discovered errors

Change-Id: Idd6c179e5256b2f2bae2f7486c5e0365ef184706
---
M src/kudu/tools/ksck-test.cc
M src/kudu/tools/ksck.cc
M src/kudu/tools/ksck.h
M src/kudu/tools/ksck_remote-test.cc
M src/kudu/tools/ksck_remote.cc
M src/kudu/tools/ksck_remote.h
M src/kudu/tools/ksck_results.cc
M src/kudu/tools/ksck_results.h
8 files changed, 458 insertions(+), 9 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/61/10061/7
-- 
To view, visit http://gerrit.cloudera.org:8080/10061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idd6c179e5256b2f2bae2f7486c5e0365ef184706
Gerrit-Change-Number: 10061
Gerrit-PatchSet: 7
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] [tools] ksck improvements [5/n]: Checks for experimental, unsafe, hidden flags

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/10061 )

Change subject: [tools] ksck improvements [5/n]: Checks for experimental, unsafe, hidden flags
......................................................................


Patch Set 10: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10061/7/src/kudu/tools/ksck_results.cc
File src/kudu/tools/ksck_results.cc:

http://gerrit.cloudera.org:8080/#/c/10061/7/src/kudu/tools/ksck_results.cc@149
PS7, Line 149: n
> Oh, you were talking about n this whole time? I'll make the truncate server
Aggh yeah, sorry I wasn't clear! Should've highlighted the flag.



-- 
To view, visit http://gerrit.cloudera.org:8080/10061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idd6c179e5256b2f2bae2f7486c5e0365ef184706
Gerrit-Change-Number: 10061
Gerrit-PatchSet: 10
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Sun, 27 May 2018 02:55:03 +0000
Gerrit-HasComments: Yes

[kudu-CR] [tools] ksck improvements [5/n]: Checks for experimental, unsafe, hidden flags

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/10061 )

Change subject: [tools] ksck improvements [5/n]: Checks for experimental, unsafe, hidden flags
......................................................................


Patch Set 8:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10061/7/src/kudu/tools/ksck_results.cc
File src/kudu/tools/ksck_results.cc:

http://gerrit.cloudera.org:8080/#/c/10061/7/src/kudu/tools/ksck_results.cc@149
PS7, Line 149: n
> Ugh...I must still be tired from yesterday. The block returns if servers.si
I don't think it's redundant. E.g. server.size() - (-1) > 0 is valid, but n = -1 shouldn't be valid, no?



-- 
To view, visit http://gerrit.cloudera.org:8080/10061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idd6c179e5256b2f2bae2f7486c5e0365ef184706
Gerrit-Change-Number: 10061
Gerrit-PatchSet: 8
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Fri, 25 May 2018 22:36:41 +0000
Gerrit-HasComments: Yes

[kudu-CR] [tools] ksck improvements [5/n]: Checks for experimental, unsafe, hidden flags

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/10061 )

Change subject: [tools] ksck improvements [5/n]: Checks for experimental, unsafe, hidden flags
......................................................................


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10061/7/src/kudu/tools/ksck_results.cc
File src/kudu/tools/ksck_results.cc:

http://gerrit.cloudera.org:8080/#/c/10061/7/src/kudu/tools/ksck_results.cc@149
PS7, Line 149: n
> I don't think it's redundant. E.g. server.size() - (-1) > 0 is valid, but n
Oh, you were talking about n this whole time? I'll make the truncate server csv flag an unsigned int.



-- 
To view, visit http://gerrit.cloudera.org:8080/10061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idd6c179e5256b2f2bae2f7486c5e0365ef184706
Gerrit-Change-Number: 10061
Gerrit-PatchSet: 7
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Sat, 26 May 2018 07:09:31 +0000
Gerrit-HasComments: Yes

[kudu-CR] [tools] ksck improvements [5/n]: Checks for experimental, unsafe, hidden flags

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/10061 )

Change subject: [tools] ksck improvements [5/n]: Checks for experimental, unsafe, hidden flags
......................................................................


Patch Set 7:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/10061/7//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10061/7//COMMIT_MSG@37
PS7, Line 37: localhost:7052 and 1 other server(s)
> Shouldn't this be not truncated if `--truncate_server_csv_length=2`?
Whoops, meant = 1.


http://gerrit.cloudera.org:8080/#/c/10061/7/src/kudu/tools/ksck.cc
File src/kudu/tools/ksck.cc:

http://gerrit.cloudera.org:8080/#/c/10061/7/src/kudu/tools/ksck.cc@202
PS7, Line 202:    });
             : 
             :     sh
> nit: extra line here
Well, it's actually intentional, but if you don't like it that much...


http://gerrit.cloudera.org:8080/#/c/10061/7/src/kudu/tools/ksck.cc@349
PS7, Line 349:    if (FLAGS_consensus) {
             :                 return ts->FetchConsensusState(&health);
             :               }
             :               return Status::OK();
             :             });
> nit: revert here too
Done


http://gerrit.cloudera.org:8080/#/c/10061/5/src/kudu/tools/ksck_remote-test.cc
File src/kudu/tools/ksck_remote-test.cc:

http://gerrit.cloudera.org:8080/#/c/10061/5/src/kudu/tools/ksck_remote-test.cc@95
PS5, Line 95: 
> I agree with Attila, it'd be weird for programmers and code reviewers in th
Mmm I see what you guys mean, it works because we are using an internal minicluster. Sorry I was thick and didn't figure that out before.


http://gerrit.cloudera.org:8080/#/c/10061/7/src/kudu/tools/ksck_remote-test.cc
File src/kudu/tools/ksck_remote-test.cc:

http://gerrit.cloudera.org:8080/#/c/10061/7/src/kudu/tools/ksck_remote-test.cc@216
PS7, Line 216:   ASSERT_OK(ksck_->CheckMasterHealth());
             :   ASSERT_OK(ksck_->CheckMasterUnusualFlags());
             :   ASSERT_OK(ksck_->CheckMasterConsensus());
             :   ASSERT_OK(ksck_->CheckClusterRunning());
             :   ASSERT_OK(ksck_->FetchTableAndTabletInfo());
             :   ASSERT_OK(ksck_->FetchInfoFromTabletServers());
             :   ASSERT_OK(ksck_->CheckTabletServerUnusualFlags());
> nit: any reason to not use ksck_->Run() here? No checksumming?
I thought not but actually the test will usually fail if this is replaced with ksck_->Run(). The reason is that calling CheckTablesConsistency will frequently find at least one tablet in the state where it has a leader but the leader hasn't asserted its leadership yet, so the other nodes think there's no leader. It's easy to solve by just inserting some rows at the start of the test, and we ought to actually do that in each of these tests as its the simplest way to wait until the tables are fully ready. Then we can use ksck_->RunAndPrintResults() everywhere.

Doing that involves changing every test and doing a workaround for some checksum tests as they insert N rows and expect to checksum N rows, so if we insert rows in the test fixture we need to expect those in the checksum tests.

Is it ok if I do a quick followup for that stuff after this patch?


http://gerrit.cloudera.org:8080/#/c/10061/7/src/kudu/tools/ksck_results.cc
File src/kudu/tools/ksck_results.cc:

http://gerrit.cloudera.org:8080/#/c/10061/7/src/kudu/tools/ksck_results.cc@149
PS7, Line 149: n
> Do you think it's worth validating this is positive?
The block L144-146 guarantees that n > server.size() here.


http://gerrit.cloudera.org:8080/#/c/10061/7/src/kudu/tools/ksck_results.cc@374
PS7, Line 374: const string& name = flag.first.first;
             :   const string& value = flag.first.second;
             :   flags_table.AddRow({name,
             :                       value,
             :                       FindOrDie(flag_tags_map, name),
             :                       ServerCsv(num_servers, flag.second)});
             :   }
> nit: spacing
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/10061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idd6c179e5256b2f2bae2f7486c5e0365ef184706
Gerrit-Change-Number: 10061
Gerrit-PatchSet: 7
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Fri, 25 May 2018 21:36:11 +0000
Gerrit-HasComments: Yes

[kudu-CR] [tools] ksck improvements [5/n]: Checks for experimental, unsafe, hidden flags

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/10061 )

Change subject: [tools] ksck improvements [5/n]: Checks for experimental, unsafe, hidden flags
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10061/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10061/1//COMMIT_MSG@22
PS1, Line 22: listing all
            : daemons
is this output going to scale to large clusters? eg if you have 200 tablet servers, it's likely that all of them have the same experimental flags set. Maybe that's fine and we'll just have a really long line in the output, but maybe we should just abbreviate to a count like "89/89 tservers"?



-- 
To view, visit http://gerrit.cloudera.org:8080/10061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idd6c179e5256b2f2bae2f7486c5e0365ef184706
Gerrit-Change-Number: 10061
Gerrit-PatchSet: 1
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Fri, 13 Apr 2018 19:52:22 +0000
Gerrit-HasComments: Yes

[kudu-CR] [tools] ksck improvements [5/n]: Checks for experimental, unsafe, hidden flags

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10061

to look at the new patch set (#2).

Change subject: [tools] ksck improvements [5/n]: Checks for experimental, unsafe, hidden flags
......................................................................

[tools] ksck improvements [5/n]: Checks for experimental, unsafe, hidden flags

This adds checks to ksck that look for experimental, unsafe, and hidden
flags set to non-default values on Kudu masters and tablet servers. If
any are found, ksck generates a table summarizing the different flags and
their values. For example:

WARNING: Some masters have hidden, experimental, or unsafe flags set:
          Flag          |        Value        |         Tags         |                    Master
------------------------+---------------------+----------------------+----------------------------------------------
 codegen_dump_functions | true                | runtime,experimental | localhost:7052,localhost:7053
 min_compression_ratio  | 0.80000000000000004 | experimental         | localhost:7052,localhost:7051,localhost:7053
 safe_time_max_lag_ms   | 40000               | experimental         | localhost:7052
 safe_time_max_lag_ms   | 50000               | experimental         | localhost:7053

The table has one row for each unique (flag, value) pair, listing all
daemons with --flag=value. So, in the above output, there are two rows
for the flag --safe_time_max_lag_ms  because it's set to two different
values on two masters. This makes it easy to scan for concerning flags
and their values.

A new flag --check_unusual_flags controls whether flag checks are done.
It default to true. Flag checks may be worth turning off if a cluster
has a large amount of tablet servers with experimental flags, or if a
newer kudu tool is run against a cluster without GetFlags RPC support,
since in these cases the flag check output will be noisy.

Additionally, since the output might not scale to a large number of
servers, the CSV of servers is abbreviated, by default, to 3 entries,
with the number of additional servers indicated. The number of entries
before truncation kicks in is controlled by --truncate_server_csv_length.

Change-Id: Idd6c179e5256b2f2bae2f7486c5e0365ef184706
---
M src/kudu/integration-tests/cluster_verifier.cc
M src/kudu/tools/ksck-test.cc
M src/kudu/tools/ksck.cc
M src/kudu/tools/ksck.h
M src/kudu/tools/ksck_remote-test.cc
M src/kudu/tools/ksck_remote.cc
M src/kudu/tools/ksck_remote.h
M src/kudu/tools/tool_action_cluster.cc
8 files changed, 346 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/61/10061/2
-- 
To view, visit http://gerrit.cloudera.org:8080/10061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idd6c179e5256b2f2bae2f7486c5e0365ef184706
Gerrit-Change-Number: 10061
Gerrit-PatchSet: 2
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] [tools] ksck improvements [5/n]: Checks for experimental, unsafe, hidden flags

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/10061 )

Change subject: [tools] ksck improvements [5/n]: Checks for experimental, unsafe, hidden flags
......................................................................


Patch Set 7:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/10061/7//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10061/7//COMMIT_MSG@37
PS7, Line 37: localhost:7052 and 1 other server(s)
Shouldn't this be not truncated if `--truncate_server_csv_length=2`?


http://gerrit.cloudera.org:8080/#/c/10061/7/src/kudu/tools/ksck.cc
File src/kudu/tools/ksck.cc:

http://gerrit.cloudera.org:8080/#/c/10061/7/src/kudu/tools/ksck.cc@202
PS7, Line 202:    });
             : 
             :     sh
nit: extra line here


http://gerrit.cloudera.org:8080/#/c/10061/7/src/kudu/tools/ksck.cc@349
PS7, Line 349:    if (FLAGS_consensus) {
             :                 return ts->FetchConsensusState(&health);
             :               }
             :               return Status::OK();
             :             });
nit: revert here too


http://gerrit.cloudera.org:8080/#/c/10061/5/src/kudu/tools/ksck_remote-test.cc
File src/kudu/tools/ksck_remote-test.cc:

http://gerrit.cloudera.org:8080/#/c/10061/5/src/kudu/tools/ksck_remote-test.cc@95
PS5, Line 95: 
> I meant adding this flag only in the test context, but I'm not sure if it's
I agree with Attila, it'd be weird for programmers and code reviewers in the future to have to spend cycles on an update to this test for an update of the flag.


http://gerrit.cloudera.org:8080/#/c/10061/7/src/kudu/tools/ksck_remote-test.cc
File src/kudu/tools/ksck_remote-test.cc:

http://gerrit.cloudera.org:8080/#/c/10061/7/src/kudu/tools/ksck_remote-test.cc@216
PS7, Line 216:   ASSERT_OK(ksck_->CheckMasterHealth());
             :   ASSERT_OK(ksck_->CheckMasterUnusualFlags());
             :   ASSERT_OK(ksck_->CheckMasterConsensus());
             :   ASSERT_OK(ksck_->CheckClusterRunning());
             :   ASSERT_OK(ksck_->FetchTableAndTabletInfo());
             :   ASSERT_OK(ksck_->FetchInfoFromTabletServers());
             :   ASSERT_OK(ksck_->CheckTabletServerUnusualFlags());
nit: any reason to not use ksck_->Run() here? No checksumming?


http://gerrit.cloudera.org:8080/#/c/10061/7/src/kudu/tools/ksck_results.cc
File src/kudu/tools/ksck_results.cc:

http://gerrit.cloudera.org:8080/#/c/10061/7/src/kudu/tools/ksck_results.cc@149
PS7, Line 149: n
Do you think it's worth validating this is positive?



-- 
To view, visit http://gerrit.cloudera.org:8080/10061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idd6c179e5256b2f2bae2f7486c5e0365ef184706
Gerrit-Change-Number: 10061
Gerrit-PatchSet: 7
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Fri, 25 May 2018 20:16:41 +0000
Gerrit-HasComments: Yes

[kudu-CR] [tools] ksck improvements [5/n]: Checks for experimental, unsafe, hidden flags

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/10061 )

Change subject: [tools] ksck improvements [5/n]: Checks for experimental, unsafe, hidden flags
......................................................................


Patch Set 7:

(1 comment)

Sorry for the extra churn

http://gerrit.cloudera.org:8080/#/c/10061/7/src/kudu/tools/ksck_results.cc
File src/kudu/tools/ksck_results.cc:

http://gerrit.cloudera.org:8080/#/c/10061/7/src/kudu/tools/ksck_results.cc@374
PS7, Line 374: const string& name = flag.first.first;
             :   const string& value = flag.first.second;
             :   flags_table.AddRow({name,
             :                       value,
             :                       FindOrDie(flag_tags_map, name),
             :                       ServerCsv(num_servers, flag.second)});
             :   }
nit: spacing



-- 
To view, visit http://gerrit.cloudera.org:8080/10061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idd6c179e5256b2f2bae2f7486c5e0365ef184706
Gerrit-Change-Number: 10061
Gerrit-PatchSet: 7
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Fri, 25 May 2018 20:48:15 +0000
Gerrit-HasComments: Yes

[kudu-CR] [tools] ksck improvements [5/n]: Checks for experimental, unsafe, hidden flags

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/10061 )

Change subject: [tools] ksck improvements [5/n]: Checks for experimental, unsafe, hidden flags
......................................................................


Patch Set 4:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/10061/4/src/kudu/tools/ksck.h
File src/kudu/tools/ksck.h:

http://gerrit.cloudera.org:8080/#/c/10061/4/src/kudu/tools/ksck.h@356
PS4, Line 356: CHECK_NE(state_, KsckFetchState::UNINITIALIZED);
Does state_ reflect all of FetchUnusualFlags, FetchConsensusState, and FetchInfo? It's probably worth documenting its semantics, e.g. if FetchInfo() succeeds but FetchUnusualFlags() hasn't been called, is it still valid to call flags()? Is that an important point to make?

Since we don't set the state in FetchUnusualFlags(), it seems a little weird to have this check here.


http://gerrit.cloudera.org:8080/#/c/10061/4/src/kudu/tools/ksck.h@356
PS4, Line 356: state_, KsckFetchState::UNINITIALIZED
nit: reverse the order of these here and L252


http://gerrit.cloudera.org:8080/#/c/10061/4/src/kudu/tools/ksck.h@485
PS4, Line 485: Check for "unusual" flags on masters.
             :   // "Unusual" flags are ones tagged hidden, experimental, or unsafe.
Note only non-default unusual flags. I don't think that's mentioned anywhere except the commit message.


http://gerrit.cloudera.org:8080/#/c/10061/4/src/kudu/tools/ksck.cc
File src/kudu/tools/ksck.cc:

http://gerrit.cloudera.org:8080/#/c/10061/4/src/kudu/tools/ksck.cc@298
PS4, Line 298: "Some masters have unsafe, experimental, or hidden flags set"));
nit: spacing


http://gerrit.cloudera.org:8080/#/c/10061/4/src/kudu/tools/ksck.cc@302
PS4, Line 302: return Status::Incomplete(
nit: spacing


http://gerrit.cloudera.org:8080/#/c/10061/4/src/kudu/tools/ksck.cc@436
PS4, Line 436:           "Some tablet servers have unsafe, experimental, or hidden flags set"));
nit: spacing


http://gerrit.cloudera.org:8080/#/c/10061/4/src/kudu/tools/ksck_remote-test.cc
File src/kudu/tools/ksck_remote-test.cc:

http://gerrit.cloudera.org:8080/#/c/10061/4/src/kudu/tools/ksck_remote-test.cc@94
PS4, Line 94: // Add a harmless experimental flag so we can detect it with ksck.
            :     FLAGS_safe_time_max_lag_ms = 60000;
Can we put this down in TestCheckUnusualFlags? It's kind of weird seeing L244 without seeing this.


http://gerrit.cloudera.org:8080/#/c/10061/4/src/kudu/tools/ksck_remote-test.cc@229
PS4, Line 229:   ASSERT_OK(ksck_->CheckMasterHealth());
             :   ASSERT_OK(ksck_->CheckMasterHealth());
             :   ASSERT_OK(ksck_->CheckMasterUnusualFlags());
             :   ASSERT_OK(ksck_->CheckMasterConsensus());
             :   ASSERT_OK(ksck_->CheckClusterRunning());
             :   ASSERT_OK(ksck_->FetchTableAndTabletInfo());
             :   ASSERT_OK(ksck_->FetchInfoFromTabletServers());
             :   ASSERT_OK(ksck_->CheckTabletServerUnusualFlags());
Must all of these be run to pass this test?



-- 
To view, visit http://gerrit.cloudera.org:8080/10061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idd6c179e5256b2f2bae2f7486c5e0365ef184706
Gerrit-Change-Number: 10061
Gerrit-PatchSet: 4
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Fri, 25 May 2018 02:49:24 +0000
Gerrit-HasComments: Yes

[kudu-CR] [tools] ksck improvements [5/n]: Checks for experimental, unsafe, hidden flags

Posted by "Attila Bukor (Code Review)" <ge...@cloudera.org>.
Attila Bukor has posted comments on this change. ( http://gerrit.cloudera.org:8080/10061 )

Change subject: [tools] ksck improvements [5/n]: Checks for experimental, unsafe, hidden flags
......................................................................


Patch Set 10:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10061/7/src/kudu/tools/ksck_results.cc
File src/kudu/tools/ksck_results.cc:

http://gerrit.cloudera.org:8080/#/c/10061/7/src/kudu/tools/ksck_results.cc@149
PS7, Line 149: n
> Aggh yeah, sorry I wasn't clear! Should've highlighted the flag.
Andrew, you +2d, but this flag is still signed, shouldn't it be fixed first?



-- 
To view, visit http://gerrit.cloudera.org:8080/10061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idd6c179e5256b2f2bae2f7486c5e0365ef184706
Gerrit-Change-Number: 10061
Gerrit-PatchSet: 10
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Sun, 27 May 2018 07:14:02 +0000
Gerrit-HasComments: Yes

[kudu-CR] [tools] ksck improvements [5/n]: Checks for experimental, unsafe, hidden flags

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Attila Bukor, Kudu Jenkins, Andrew Wong, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10061

to look at the new patch set (#8).

Change subject: [tools] ksck improvements [5/n]: Checks for experimental, unsafe, hidden flags
......................................................................

[tools] ksck improvements [5/n]: Checks for experimental, unsafe, hidden flags

This adds checks to ksck that look for experimental, unsafe, and hidden
flags set to non-default values on Kudu masters and tablet servers. If
any are found, ksck generates a table summarizing the different flags and
their values. For example:

          Flag          |        Value        |         Tags         |            Master
------------------------+---------------------+----------------------+-------------------------------
 codegen_dump_functions | true                | runtime,experimental | localhost:7052,localhost:7053
 min_compression_ratio  | 0.80000000000000004 | experimental         | all 3 server(s) checked
 safe_time_max_lag_ms   | 40000               | experimental         | localhost:7052
 safe_time_max_lag_ms   | 50000               | experimental         | localhost:7053

The table has one row for each unique (flag, value) pair, listing all
daemons with --flag=value. So, in the above output, there are two rows
for the flag --safe_time_max_lag_ms  because it's set to two different
values on two masters. This makes it easy to scan for concerning flags
and their values.

Since the output might not scale to a large number of
servers, the CSV of servers is abbreviated, by default, to 3 entries,
with the number of additional servers indicated. The number of entries
before truncation kicks in is controlled by --truncate_server_csv_length.
Additionally, if all checked servers have an unusual --flag=value we call
that out specially. For example, the above table reprinted with
--truncate_server_csv_length=1 would look like

          Flag          |        Value        |         Tags         |             Master
------------------------+---------------------+----------------------+--------------------------------------
 codegen_dump_functions | true                | runtime,experimental | localhost:7052 and 1 other server(s)
 min_compression_ratio  | 0.80000000000000004 | experimental         | all 3 server(s) checked
 safe_time_max_lag_ms   | 40000               | experimental         | localhost:7052
 safe_time_max_lag_ms   | 50000               | experimental         | localhost:7053

assuming that there are 3 servers checked in total.

Having unusual flags or failing to gather flags isn't considered an
error, since it doesn't indicate the cluster is unhealthy (in the latter
case because the daemon may not support the GetFlags RPC). Instead,
flag checks surface their results in a new warnings section near the
end of the ksck output.

The new warnings section looks like this in context:

==================
Warnings:
==================
Some masters have unsafe, experimental, or hidden flags set
unable to get flag information for tablet server 812db6461bae4f62a651e132f783ab53 (127.0.0.1:7250): could not get status from server: Client connection negotiation failed: client connection to 127.0.0.1:7250: connect: Connection refused (error 61)
Some tablet servers have unsafe, experimental, or hidden flags set
tserver flag check error: 1 of 3 tservers' flags were not available

==================
Errors:
==================
Network error: error fetching info from tablet servers: failed to gather info for all tablet servers: 1 of 3 had errors

FAILED
Runtime error: ksck discovered errors

Change-Id: Idd6c179e5256b2f2bae2f7486c5e0365ef184706
---
M src/kudu/tools/ksck-test.cc
M src/kudu/tools/ksck.cc
M src/kudu/tools/ksck.h
M src/kudu/tools/ksck_remote-test.cc
M src/kudu/tools/ksck_remote.cc
M src/kudu/tools/ksck_remote.h
M src/kudu/tools/ksck_results.cc
M src/kudu/tools/ksck_results.h
8 files changed, 457 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/61/10061/8
-- 
To view, visit http://gerrit.cloudera.org:8080/10061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idd6c179e5256b2f2bae2f7486c5e0365ef184706
Gerrit-Change-Number: 10061
Gerrit-PatchSet: 8
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] [tools] ksck improvements [5/n]: Checks for experimental, unsafe, hidden flags

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10061

to look at the new patch set (#3).

Change subject: [tools] ksck improvements [5/n]: Checks for experimental, unsafe, hidden flags
......................................................................

[tools] ksck improvements [5/n]: Checks for experimental, unsafe, hidden flags

This adds checks to ksck that look for experimental, unsafe, and hidden
flags set to non-default values on Kudu masters and tablet servers. If
any are found, ksck generates a table summarizing the different flags and
their values. For example:

WARNING: Some masters have hidden, experimental, or unsafe flags set:
          Flag          |        Value        |         Tags         |                    Master
------------------------+---------------------+----------------------+----------------------------------------------
 codegen_dump_functions | true                | runtime,experimental | localhost:7052,localhost:7053
 min_compression_ratio  | 0.80000000000000004 | experimental         | localhost:7052,localhost:7051,localhost:7053
 safe_time_max_lag_ms   | 40000               | experimental         | localhost:7052
 safe_time_max_lag_ms   | 50000               | experimental         | localhost:7053

The table has one row for each unique (flag, value) pair, listing all
daemons with --flag=value. So, in the above output, there are two rows
for the flag --safe_time_max_lag_ms  because it's set to two different
values on two masters. This makes it easy to scan for concerning flags
and their values.

A new flag --check_unusual_flags controls whether flag checks are done.
It default to true. Flag checks may be worth turning off if a cluster
has a large amount of tablet servers with experimental flags, or if a
newer kudu tool is run against a cluster without GetFlags RPC support,
since in these cases the flag check output will be noisy.

Additionally, since the output might not scale to a large number of
servers, the CSV of servers is abbreviated, by default, to 3 entries,
with the number of additional servers indicated. The number of entries
before truncation kicks in is controlled by --truncate_server_csv_length.

Change-Id: Idd6c179e5256b2f2bae2f7486c5e0365ef184706
---
M src/kudu/integration-tests/cluster_verifier.cc
M src/kudu/tools/ksck-test.cc
M src/kudu/tools/ksck.cc
M src/kudu/tools/ksck.h
M src/kudu/tools/ksck_remote-test.cc
M src/kudu/tools/ksck_remote.cc
M src/kudu/tools/ksck_remote.h
M src/kudu/tools/tool_action_cluster.cc
8 files changed, 346 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/61/10061/3
-- 
To view, visit http://gerrit.cloudera.org:8080/10061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idd6c179e5256b2f2bae2f7486c5e0365ef184706
Gerrit-Change-Number: 10061
Gerrit-PatchSet: 3
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] [tools] ksck improvements [5/n]: Checks for experimental, unsafe, hidden flags

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/10061 )

Change subject: [tools] ksck improvements [5/n]: Checks for experimental, unsafe, hidden flags
......................................................................


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10061/7/src/kudu/tools/ksck_results.cc
File src/kudu/tools/ksck_results.cc:

http://gerrit.cloudera.org:8080/#/c/10061/7/src/kudu/tools/ksck_results.cc@149
PS7, Line 149: n
> Hmm... maybe i'm missing something. Doesn't that block return in the `n >= 
Ugh...I must still be tired from yesterday. The block returns if servers.size() <= n, so below, server.size() > n i.e. server.size() - n > 0 and a check is redundant.



-- 
To view, visit http://gerrit.cloudera.org:8080/10061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idd6c179e5256b2f2bae2f7486c5e0365ef184706
Gerrit-Change-Number: 10061
Gerrit-PatchSet: 7
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Fri, 25 May 2018 22:29:58 +0000
Gerrit-HasComments: Yes

[kudu-CR] [tools] ksck improvements [5/n]: Checks for experimental, unsafe, hidden flags

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/10061 )

Change subject: [tools] ksck improvements [5/n]: Checks for experimental, unsafe, hidden flags
......................................................................


Patch Set 4:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/10061/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10061/1//COMMIT_MSG@22
PS1, Line 22: are two rows
            : for the
> Mmm I had the same thought which inspired the flag to turn off the flags ch
Done


http://gerrit.cloudera.org:8080/#/c/10061/4/src/kudu/tools/ksck.h
File src/kudu/tools/ksck.h:

http://gerrit.cloudera.org:8080/#/c/10061/4/src/kudu/tools/ksck.h@356
PS4, Line 356: state_, KsckFetchState::UNINITIALIZED
> nit: reverse the order of these here and L252
Done


http://gerrit.cloudera.org:8080/#/c/10061/4/src/kudu/tools/ksck.h@356
PS4, Line 356: CHECK_NE(state_, KsckFetchState::UNINITIALIZED);
> Does state_ reflect all of FetchUnusualFlags, FetchConsensusState, and Fetc
state_ should reflect the tripartite state of fetching the core info from the server, so we shouldn't touch it in the flags stuff since it's extra. Just for sanity checks I can add another state for flags fetching.


http://gerrit.cloudera.org:8080/#/c/10061/4/src/kudu/tools/ksck.h@485
PS4, Line 485: Check for "unusual" flags on masters.
             :   // "Unusual" flags are ones tagged hidden, experimental, or unsafe.
> Note only non-default unusual flags. I don't think that's mentioned anywher
Done


http://gerrit.cloudera.org:8080/#/c/10061/4/src/kudu/tools/ksck.cc
File src/kudu/tools/ksck.cc:

http://gerrit.cloudera.org:8080/#/c/10061/4/src/kudu/tools/ksck.cc@298
PS4, Line 298: "Some masters have unsafe, experimental, or hidden flags set"));
> nit: spacing
Done


http://gerrit.cloudera.org:8080/#/c/10061/4/src/kudu/tools/ksck.cc@302
PS4, Line 302: return Status::Incomplete(
> nit: spacing
Done


http://gerrit.cloudera.org:8080/#/c/10061/4/src/kudu/tools/ksck.cc@436
PS4, Line 436:           "Some tablet servers have unsafe, experimental, or hidden flags set"));
> nit: spacing
Done


http://gerrit.cloudera.org:8080/#/c/10061/4/src/kudu/tools/ksck_remote-test.cc
File src/kudu/tools/ksck_remote-test.cc:

http://gerrit.cloudera.org:8080/#/c/10061/4/src/kudu/tools/ksck_remote-test.cc@94
PS4, Line 94: // Add a harmless experimental flag so we can detect it with ksck.
            :     FLAGS_safe_time_max_lag_ms = 60000;
> Can we put this down in TestCheckUnusualFlags? It's kind of weird seeing L2
We should set it here in SetUp before we start the minicluster. I will make a note that we set an unusual flag in the test itself.

I could also subclass RemoteKsckTest but that feels like overkill.


http://gerrit.cloudera.org:8080/#/c/10061/4/src/kudu/tools/ksck_remote-test.cc@229
PS4, Line 229:   ASSERT_OK(ksck_->CheckMasterHealth());
             :   ASSERT_OK(ksck_->CheckMasterHealth());
             :   ASSERT_OK(ksck_->CheckMasterUnusualFlags());
             :   ASSERT_OK(ksck_->CheckMasterConsensus());
             :   ASSERT_OK(ksck_->CheckClusterRunning());
             :   ASSERT_OK(ksck_->FetchTableAndTabletInfo());
             :   ASSERT_OK(ksck_->FetchInfoFromTabletServers());
             :   ASSERT_OK(ksck_->CheckTabletServerUnusualFlags());
> Must all of these be run to pass this test?
Weeeeell there was an extra CheckMasterHealth in there, and CheckMasterConsensus isn't needed, but otherwise yes.

CheckMasterHealth gathers the info.
CheckMasterUnusualFlags does the master flag check
The next 3 do the info gathering for tservers.
CheckTabletServerUnusualFlags does the tserver flag check.



-- 
To view, visit http://gerrit.cloudera.org:8080/10061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idd6c179e5256b2f2bae2f7486c5e0365ef184706
Gerrit-Change-Number: 10061
Gerrit-PatchSet: 4
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Fri, 25 May 2018 07:19:46 +0000
Gerrit-HasComments: Yes

[kudu-CR] [tools] ksck improvements [5/n]: Checks for experimental, unsafe, hidden flags

Posted by "Attila Bukor (Code Review)" <ge...@cloudera.org>.
Attila Bukor has posted comments on this change. ( http://gerrit.cloudera.org:8080/10061 )

Change subject: [tools] ksck improvements [5/n]: Checks for experimental, unsafe, hidden flags
......................................................................


Patch Set 5:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/10061/5/src/kudu/tools/ksck.cc
File src/kudu/tools/ksck.cc:

http://gerrit.cloudera.org:8080/#/c/10061/5/src/kudu/tools/ksck.cc@369
PS5, Line 369:           // Fetch the flags information.
why isn't this part simply above the lock instead of moving the lock to a new block?


http://gerrit.cloudera.org:8080/#/c/10061/5/src/kudu/tools/ksck_remote-test.cc
File src/kudu/tools/ksck_remote-test.cc:

http://gerrit.cloudera.org:8080/#/c/10061/5/src/kudu/tools/ksck_remote-test.cc@95
PS5, Line 95:     FLAGS_safe_time_max_lag_ms = 60000;
can we define a test flag instead? moving this flag to stable would break these tests


http://gerrit.cloudera.org:8080/#/c/10061/5/src/kudu/tools/ksck_results.cc
File src/kudu/tools/ksck_results.cc:

http://gerrit.cloudera.org:8080/#/c/10061/5/src/kudu/tools/ksck_results.cc@138
PS5, Line 138:   return JoinStrings(first_n, ", ");
flags are usually defined the same way on each server, especially when a cluster management tool like Cloudera Manager is used, so we should check if all servers are affected and print something like "ALL" instead of listing them.



-- 
To view, visit http://gerrit.cloudera.org:8080/10061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idd6c179e5256b2f2bae2f7486c5e0365ef184706
Gerrit-Change-Number: 10061
Gerrit-PatchSet: 5
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Fri, 25 May 2018 15:57:20 +0000
Gerrit-HasComments: Yes

[kudu-CR] [tools] ksck improvements [5/n]: Checks for experimental, unsafe, hidden flags

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Attila Bukor, Kudu Jenkins, Andrew Wong, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10061

to look at the new patch set (#9).

Change subject: [tools] ksck improvements [5/n]: Checks for experimental, unsafe, hidden flags
......................................................................

[tools] ksck improvements [5/n]: Checks for experimental, unsafe, hidden flags

This adds checks to ksck that look for experimental, unsafe, and hidden
flags set to non-default values on Kudu masters and tablet servers. If
any are found, ksck generates a table summarizing the different flags and
their values. For example:

          Flag          |        Value        |         Tags         |            Master
------------------------+---------------------+----------------------+-------------------------------
 codegen_dump_functions | true                | runtime,experimental | localhost:7052,localhost:7053
 min_compression_ratio  | 0.80000000000000004 | experimental         | all 3 server(s) checked
 safe_time_max_lag_ms   | 40000               | experimental         | localhost:7052
 safe_time_max_lag_ms   | 50000               | experimental         | localhost:7053

The table has one row for each unique (flag, value) pair, listing all
daemons with --flag=value. So, in the above output, there are two rows
for the flag --safe_time_max_lag_ms  because it's set to two different
values on two masters. This makes it easy to scan for concerning flags
and their values.

Since the output might not scale to a large number of
servers, the CSV of servers is abbreviated, by default, to 3 entries,
with the number of additional servers indicated. The number of entries
before truncation kicks in is controlled by --truncate_server_csv_length.
Additionally, if all checked servers have an unusual --flag=value we call
that out specially. For example, the above table reprinted with
--truncate_server_csv_length=1 would look like

          Flag          |        Value        |         Tags         |             Master
------------------------+---------------------+----------------------+--------------------------------------
 codegen_dump_functions | true                | runtime,experimental | localhost:7052 and 1 other server(s)
 min_compression_ratio  | 0.80000000000000004 | experimental         | all 3 server(s) checked
 safe_time_max_lag_ms   | 40000               | experimental         | localhost:7052
 safe_time_max_lag_ms   | 50000               | experimental         | localhost:7053

assuming that there are 3 servers checked in total.

Having unusual flags or failing to gather flags isn't considered an
error, since it doesn't indicate the cluster is unhealthy (in the latter
case because the daemon may not support the GetFlags RPC). Instead,
flag checks surface their results in a new warnings section near the
end of the ksck output.

The new warnings section looks like this in context:

==================
Warnings:
==================
Some masters have unsafe, experimental, or hidden flags set
unable to get flag information for tablet server 812db6461bae4f62a651e132f783ab53 (127.0.0.1:7250): could not get status from server: Client connection negotiation failed: client connection to 127.0.0.1:7250: connect: Connection refused (error 61)
Some tablet servers have unsafe, experimental, or hidden flags set
tserver flag check error: 1 of 3 tservers' flags were not available

==================
Errors:
==================
Network error: error fetching info from tablet servers: failed to gather info for all tablet servers: 1 of 3 had errors

FAILED
Runtime error: ksck discovered errors

Change-Id: Idd6c179e5256b2f2bae2f7486c5e0365ef184706
---
M src/kudu/tools/ksck-test.cc
M src/kudu/tools/ksck.cc
M src/kudu/tools/ksck.h
M src/kudu/tools/ksck_remote-test.cc
M src/kudu/tools/ksck_remote.cc
M src/kudu/tools/ksck_remote.h
M src/kudu/tools/ksck_results.cc
M src/kudu/tools/ksck_results.h
8 files changed, 457 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/61/10061/9
-- 
To view, visit http://gerrit.cloudera.org:8080/10061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idd6c179e5256b2f2bae2f7486c5e0365ef184706
Gerrit-Change-Number: 10061
Gerrit-PatchSet: 9
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] [tools] ksck improvements [5/n]: Checks for experimental, unsafe, hidden flags

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/10061 )

Change subject: [tools] ksck improvements [5/n]: Checks for experimental, unsafe, hidden flags
......................................................................


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10061/7/src/kudu/tools/ksck_results.cc
File src/kudu/tools/ksck_results.cc:

http://gerrit.cloudera.org:8080/#/c/10061/7/src/kudu/tools/ksck_results.cc@149
PS7, Line 149: n
> Andrew, you +2d, but this flag is still signed, shouldn't it be fixed first
It's unsigned in the latest patch set.



-- 
To view, visit http://gerrit.cloudera.org:8080/10061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idd6c179e5256b2f2bae2f7486c5e0365ef184706
Gerrit-Change-Number: 10061
Gerrit-PatchSet: 7
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Sun, 27 May 2018 08:26:03 +0000
Gerrit-HasComments: Yes

[kudu-CR] [tools] ksck improvements [5/n]: Checks for experimental, unsafe, hidden flags

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/10061 )

Change subject: [tools] ksck improvements [5/n]: Checks for experimental, unsafe, hidden flags
......................................................................


Patch Set 6:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/10061/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10061/5//COMMIT_MSG@30
PS5, Line 30: rolled by --truncate_server_csv_length.
            : Additionally, if all ch
> nit: mind including this in the above snippet too?
The two are far away from each other in the ksck output so I made a second snippet.


http://gerrit.cloudera.org:8080/#/c/10061/5/src/kudu/tools/ksck.cc
File src/kudu/tools/ksck.cc:

http://gerrit.cloudera.org:8080/#/c/10061/5/src/kudu/tools/ksck.cc@200
PS5, Line 200: tatus s = master->FetchInfo().AndThen([&]() {
             :           return master->FetchConsensusState();
             :         });
             : 
> nit: revert formatting?
Done


http://gerrit.cloudera.org:8080/#/c/10061/5/src/kudu/tools/ksck.cc@369
PS5, Line 369:           // Failing to gather flags is only a warning.
> why isn't this part simply above the lock instead of moving the lock to a n
Because the first and most important part is filling out the summary. The flags is ordered after as it's a secondary concern. If the lock part is below the flags part, it looks like the flags part has something to do with the locking, which it doesn't.


http://gerrit.cloudera.org:8080/#/c/10061/4/src/kudu/tools/ksck_remote-test.cc
File src/kudu/tools/ksck_remote-test.cc:

http://gerrit.cloudera.org:8080/#/c/10061/4/src/kudu/tools/ksck_remote-test.cc@94
PS4, Line 94: InternalMiniClusterOptions opts;
            : 
> Oh huh. I thought IMCs reflected all in-process updates of flags. Those don
My bad. You are right. We will see the flag change. It may or may not affect the behavior of the server, and that's irrelevant here, which is why I made the mistake of thinking we should set it before start.


http://gerrit.cloudera.org:8080/#/c/10061/4/src/kudu/tools/ksck_remote-test.cc@229
PS4, Line 229:   ASSERT_OK(ksck_->CheckMasterHealth());
             :   ASSERT_OK(ksck_->CheckMasterUnusualFlags());
             :   ASSERT_OK(ksck_->CheckClusterRunning());
             :   ASSERT_OK(ksck_->FetchTableAndTabletInfo());
             :   ASSERT_OK(ksck_->FetchInfoFromTabletServers());
             :   ASSERT_OK(ksck_->CheckTabletServerUnusualFlags());
             :   ASSERT_OK(ksck_->PrintResults());
             : 
> Ah, I would've thought CheckMasterUnusualFlags() and CheckTabletServerUnusu
Done


http://gerrit.cloudera.org:8080/#/c/10061/5/src/kudu/tools/ksck_remote-test.cc
File src/kudu/tools/ksck_remote-test.cc:

http://gerrit.cloudera.org:8080/#/c/10061/5/src/kudu/tools/ksck_remote-test.cc@95
PS5, Line 95: 
> can we define a test flag instead? moving this flag to stable would break t
If the flag moves to stable we pick a new flag. I don't think it's worth adding a new flag to the server just for these tests.


http://gerrit.cloudera.org:8080/#/c/10061/5/src/kudu/tools/ksck_results.cc
File src/kudu/tools/ksck_results.cc:

http://gerrit.cloudera.org:8080/#/c/10061/5/src/kudu/tools/ksck_results.cc@138
PS5, Line 138:   DCHECK_LE(servers.size(), server_count);
> flags are usually defined the same way on each server, especially when a cl
Good idea!


http://gerrit.cloudera.org:8080/#/c/10061/5/src/kudu/tools/ksck_results.cc@267
PS5, Line 267: 
> Since these are warnings, let's just print `s.message()`. That way we won't
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/10061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idd6c179e5256b2f2bae2f7486c5e0365ef184706
Gerrit-Change-Number: 10061
Gerrit-PatchSet: 6
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Fri, 25 May 2018 19:00:22 +0000
Gerrit-HasComments: Yes

[kudu-CR] [tools] ksck improvements [5/n]: Checks for experimental, unsafe, hidden flags

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10061

to look at the new patch set (#4).

Change subject: [tools] ksck improvements [5/n]: Checks for experimental, unsafe, hidden flags
......................................................................

[tools] ksck improvements [5/n]: Checks for experimental, unsafe, hidden flags

This adds checks to ksck that look for experimental, unsafe, and hidden
flags set to non-default values on Kudu masters and tablet servers. If
any are found, ksck generates a table summarizing the different flags and
their values. For example:

          Flag          |        Value        |         Tags         |                    Master
------------------------+---------------------+----------------------+----------------------------------------------
 codegen_dump_functions | true                | runtime,experimental | localhost:7052,localhost:7053
 min_compression_ratio  | 0.80000000000000004 | experimental         | localhost:7052,localhost:7051,localhost:7053
 safe_time_max_lag_ms   | 40000               | experimental         | localhost:7052
 safe_time_max_lag_ms   | 50000               | experimental         | localhost:7053

The table has one row for each unique (flag, value) pair, listing all
daemons with --flag=value. So, in the above output, there are two rows
for the flag --safe_time_max_lag_ms  because it's set to two different
values on two masters. This makes it easy to scan for concerning flags
and their values.

Having unusual flags or failing to gather flags isn't considered an
error, since it doesn't indicate the cluster is unhealthy (in the latter
case because the daemon may not support the GetFlags RPC). Instead,
flag checks surface their results in a new warnings section near the
end of the ksck output.

Finally, since the output might not scale to a large number of
servers, the CSV of servers is abbreviated, by default, to 3 entries,
with the number of additional servers indicated. The number of entries
before truncation kicks in is controlled by --truncate_server_csv_length.

Change-Id: Idd6c179e5256b2f2bae2f7486c5e0365ef184706
---
M src/kudu/tools/ksck-test.cc
M src/kudu/tools/ksck.cc
M src/kudu/tools/ksck.h
M src/kudu/tools/ksck_remote-test.cc
M src/kudu/tools/ksck_remote.cc
M src/kudu/tools/ksck_remote.h
M src/kudu/tools/ksck_results.cc
M src/kudu/tools/ksck_results.h
8 files changed, 396 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/61/10061/4
-- 
To view, visit http://gerrit.cloudera.org:8080/10061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idd6c179e5256b2f2bae2f7486c5e0365ef184706
Gerrit-Change-Number: 10061
Gerrit-PatchSet: 4
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] [tools] ksck improvements [5/n]: Checks for experimental, unsafe, hidden flags

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/10061 )

Change subject: [tools] ksck improvements [5/n]: Checks for experimental, unsafe, hidden flags
......................................................................


Patch Set 5:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/10061/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10061/5//COMMIT_MSG@30
PS5, Line 30: in a new warnings section near the
            : end of the ksck output.
nit: mind including this in the above snippet too?


http://gerrit.cloudera.org:8080/#/c/10061/5/src/kudu/tools/ksck.cc
File src/kudu/tools/ksck.cc:

http://gerrit.cloudera.org:8080/#/c/10061/5/src/kudu/tools/ksck.cc@200
PS5, Line 200: tatus s = master->FetchInfo()
             :       .AndThen([&]() {
             :         return master->FetchConsensusState();
             :       });
nit: revert formatting?


http://gerrit.cloudera.org:8080/#/c/10061/4/src/kudu/tools/ksck_remote-test.cc
File src/kudu/tools/ksck_remote-test.cc:

http://gerrit.cloudera.org:8080/#/c/10061/4/src/kudu/tools/ksck_remote-test.cc@94
PS4, Line 94: // Add a harmless experimental flag so we can detect it with ksck.
            :     FLAGS_safe_time_max_lag_ms = 60000;
> We should set it here in SetUp before we start the minicluster. I will make
Oh huh. I thought IMCs reflected all in-process updates of flags. Those don't show up in the flags endpoint?


http://gerrit.cloudera.org:8080/#/c/10061/4/src/kudu/tools/ksck_remote-test.cc@229
PS4, Line 229: TEST_F(RemoteKsckTest, TestCheckUnusualFlags) {
             :   ASSERT_OK(ksck_->CheckMasterHealth());
             :   ASSERT_OK(ksck_->CheckMasterUnusualFlags());
             :   ASSERT_OK(ksck_->CheckClusterRunning());
             :   ASSERT_OK(ksck_->FetchTableAndTabletInfo());
             :   ASSERT_OK(ksck_->FetchInfoFromTabletServers());
             :   ASSERT_OK(ksck_->CheckTabletServerUnusualFlags());
             :   ASSERT_OK(ksck_->PrintResults());
> Weeeeell there was an extra CheckMasterHealth in there, and CheckMasterCons
Ah, I would've thought CheckMasterUnusualFlags() and CheckTabletServerUnusualFlags() were standalone. If that's not the case, mind documenting the prerequisites to running them?


http://gerrit.cloudera.org:8080/#/c/10061/5/src/kudu/tools/ksck_results.cc
File src/kudu/tools/ksck_results.cc:

http://gerrit.cloudera.org:8080/#/c/10061/5/src/kudu/tools/ksck_results.cc@267
PS5, Line 267: s.ToString()
Since these are warnings, let's just print `s.message()`. That way we won't confuse users by presenting an error type.



-- 
To view, visit http://gerrit.cloudera.org:8080/10061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idd6c179e5256b2f2bae2f7486c5e0365ef184706
Gerrit-Change-Number: 10061
Gerrit-PatchSet: 5
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Fri, 25 May 2018 16:08:03 +0000
Gerrit-HasComments: Yes

[kudu-CR] [tools] ksck improvements [5/n]: Checks for experimental, unsafe, hidden flags

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Attila Bukor, Kudu Jenkins, Andrew Wong, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10061

to look at the new patch set (#10).

Change subject: [tools] ksck improvements [5/n]: Checks for experimental, unsafe, hidden flags
......................................................................

[tools] ksck improvements [5/n]: Checks for experimental, unsafe, hidden flags

This adds checks to ksck that look for experimental, unsafe, and hidden
flags set to non-default values on Kudu masters and tablet servers. If
any are found, ksck generates a table summarizing the different flags and
their values. For example:

          Flag          |        Value        |         Tags         |            Master
------------------------+---------------------+----------------------+-------------------------------
 codegen_dump_functions | true                | runtime,experimental | localhost:7052,localhost:7053
 min_compression_ratio  | 0.80000000000000004 | experimental         | all 3 server(s) checked
 safe_time_max_lag_ms   | 40000               | experimental         | localhost:7052
 safe_time_max_lag_ms   | 50000               | experimental         | localhost:7053

The table has one row for each unique (flag, value) pair, listing all
daemons with --flag=value. So, in the above output, there are two rows
for the flag --safe_time_max_lag_ms  because it's set to two different
values on two masters. This makes it easy to scan for concerning flags
and their values.

Since the output might not scale to a large number of
servers, the CSV of servers is abbreviated, by default, to 3 entries,
with the number of additional servers indicated. The number of entries
before truncation kicks in is controlled by --truncate_server_csv_length.
Additionally, if all checked servers have an unusual --flag=value we call
that out specially. For example, the above table reprinted with
--truncate_server_csv_length=1 would look like

          Flag          |        Value        |         Tags         |             Master
------------------------+---------------------+----------------------+--------------------------------------
 codegen_dump_functions | true                | runtime,experimental | localhost:7052 and 1 other server(s)
 min_compression_ratio  | 0.80000000000000004 | experimental         | all 3 server(s) checked
 safe_time_max_lag_ms   | 40000               | experimental         | localhost:7052
 safe_time_max_lag_ms   | 50000               | experimental         | localhost:7053

assuming that there are 3 servers checked in total.

Having unusual flags or failing to gather flags isn't considered an
error, since it doesn't indicate the cluster is unhealthy (in the latter
case because the daemon may not support the GetFlags RPC). Instead,
flag checks surface their results in a new warnings section near the
end of the ksck output.

The new warnings section looks like this in context:

==================
Warnings:
==================
Some masters have unsafe, experimental, or hidden flags set
unable to get flag information for tablet server 812db6461bae4f62a651e132f783ab53 (127.0.0.1:7250): could not get status from server: Client connection negotiation failed: client connection to 127.0.0.1:7250: connect: Connection refused (error 61)
Some tablet servers have unsafe, experimental, or hidden flags set
tserver flag check error: 1 of 3 tservers' flags were not available

==================
Errors:
==================
Network error: error fetching info from tablet servers: failed to gather info for all tablet servers: 1 of 3 had errors

FAILED
Runtime error: ksck discovered errors

Change-Id: Idd6c179e5256b2f2bae2f7486c5e0365ef184706
---
M src/kudu/tools/ksck-test.cc
M src/kudu/tools/ksck.cc
M src/kudu/tools/ksck.h
M src/kudu/tools/ksck_remote-test.cc
M src/kudu/tools/ksck_remote.cc
M src/kudu/tools/ksck_remote.h
M src/kudu/tools/ksck_results.cc
M src/kudu/tools/ksck_results.h
8 files changed, 459 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/61/10061/10
-- 
To view, visit http://gerrit.cloudera.org:8080/10061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idd6c179e5256b2f2bae2f7486c5e0365ef184706
Gerrit-Change-Number: 10061
Gerrit-PatchSet: 10
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>