You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Abhishek Chennaka (Code Review)" <ge...@cloudera.org> on 2021/10/12 00:28:41 UTC
[kudu-CR] KUDU-1959 - Implement tests for the startup web page for Kudu servers
Abhishek Chennaka has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17915
Change subject: KUDU-1959 - Implement tests for the startup web page for Kudu servers
......................................................................
KUDU-1959 - Implement tests for the startup web page for Kudu servers
This patch implements the tests for the startup page.
For tablet servers,
- We inject latency to bootstrap tablets while reading the webpage every
10 milliseconds and validating the status for each step.
- Fail a data directory and validate the status of each startup step.
For the master,
- We start it up and validate the status of each startup step.
We use mini tablet server and mini master for both these cases.
Change-Id: I80880ecbda6a9f6db85baaf109af7e701111328d
---
M src/kudu/master/master-test.cc
M src/kudu/master/master.cc
M src/kudu/mini-cluster/webui_checker.h
M src/kudu/tserver/tablet_server-test.cc
4 files changed, 173 insertions(+), 5 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/15/17915/1
--
To view, visit http://gerrit.cloudera.org:8080/17915
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I80880ecbda6a9f6db85baaf109af7e701111328d
Gerrit-Change-Number: 17915
Gerrit-PatchSet: 1
Gerrit-Owner: Abhishek Chennaka <ac...@cloudera.com>
[kudu-CR] KUDU-1959 - Implement tests for the startup web page for Kudu servers
Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/17915 )
Change subject: KUDU-1959 - Implement tests for the startup web page for Kudu servers
......................................................................
Patch Set 1:
(8 comments)
http://gerrit.cloudera.org:8080/#/c/17915/1//COMMIT_MSG
Commit Message:
http://gerrit.cloudera.org:8080/#/c/17915/1//COMMIT_MSG@11
PS1, Line 11: - We inject latency to bootstrap tablets while reading the webpage every
: 10 milliseconds and validating the status for each step.
: - Fail a data directory and validate the status of each startup step.
: For the master,
: - We start it up and validate the status of each startup step.
nit: make the spacing consistent?
http://gerrit.cloudera.org:8080/#/c/17915/1/src/kudu/master/master.cc
File src/kudu/master/master.cc:
http://gerrit.cloudera.org:8080/#/c/17915/1/src/kudu/master/master.cc@a227
PS1, Line 227:
:
:
:
:
:
:
nit: seems like this patch is more than just adding tests. Consider separating this into two patches:
- fixing the master timers, which includes master tests
- adding tests for tservers
http://gerrit.cloudera.org:8080/#/c/17915/1/src/kudu/tserver/tablet_server-test.cc
File src/kudu/tserver/tablet_server-test.cc:
http://gerrit.cloudera.org:8080/#/c/17915/1/src/kudu/tserver/tablet_server-test.cc@803
PS1, Line 803: bool
This will be racy, so we should use an atomic<bool>
http://gerrit.cloudera.org:8080/#/c/17915/1/src/kudu/tserver/tablet_server-test.cc@811
PS1, Line 811: continue;
nit: maybe sleep here too so we're not a tight loop while starting up. Same below
http://gerrit.cloudera.org:8080/#/c/17915/1/src/kudu/tserver/tablet_server-test.cc@827
PS1, Line 827: {
: is_started = true;
: read_startup_page.join();
: });
nit: in our codebase this is usually formatted as
SCOPED_CLEANUP({
is_started = true;
read_startup_page.join();
});
Same below.
http://gerrit.cloudera.org:8080/#/c/17915/1/src/kudu/tserver/tablet_server-test.cc@846
PS1, Line 846: // Fail a data directory.
: FLAGS_env_inject_eio_globs = fs_manager->GetDataRootDirs()[1];
: FLAGS_env_inject_eio = 1.0;
nit: it's probably worth moving this down to right below calling Start()
http://gerrit.cloudera.org:8080/#/c/17915/1/src/kudu/tserver/tablet_server-test.cc@862
PS1, Line 862: ASSERT_STR_MATCHES(thread_buf.ToString(), "\"init_status\":[100|0]"
: ".*\"read_filesystem_status\":[100|0]"
: ".*\"read_instance_metadatafiles_status\":[100|0]"
: ".*\"read_data_directories_status\":"
: "([0-9]|[1-9][0-9]|100)"
: ".*\"start_tablets_status\":([0-9]|[1-9][0-9]|100)"
: ".*\"start_rpc_server_status\":[100|0]");
nit: maybe define this in an anonymous namespace as a helper function that validates that an input string matches this, maybe returning a bool? Same with the 100 check.
http://gerrit.cloudera.org:8080/#/c/17915/1/src/kudu/tserver/tablet_server-test.cc@890
PS1, Line 890: INSTANTIATE_TEST_SUITE_P(Params, TabletServerStartupWebPageTest,
: ::testing::ValuesIn(BlockManager::block_manager_types()));
Given the above tests are near identical, consider using ::testing::Combine() to cross test block manager types with whether we're injecting disk failures
See TsLocationAssignmentITest for an example
--
To view, visit http://gerrit.cloudera.org:8080/17915
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I80880ecbda6a9f6db85baaf109af7e701111328d
Gerrit-Change-Number: 17915
Gerrit-PatchSet: 1
Gerrit-Owner: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Wed, 27 Oct 2021 00:39:02 +0000
Gerrit-HasComments: Yes
[kudu-CR] KUDU-1959 - Implement tests for the startup web page for Kudu servers
Posted by "Abhishek Chennaka (Code Review)" <ge...@cloudera.org>.
Abhishek Chennaka has abandoned this change. ( http://gerrit.cloudera.org:8080/17915 )
Change subject: KUDU-1959 - Implement tests for the startup web page for Kudu servers
......................................................................
Abandoned
--
To view, visit http://gerrit.cloudera.org:8080/17915
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: abandon
Gerrit-Change-Id: I80880ecbda6a9f6db85baaf109af7e701111328d
Gerrit-Change-Number: 17915
Gerrit-PatchSet: 1
Gerrit-Owner: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
[kudu-CR] KUDU-1959 - Implement tests for the startup web page for Kudu servers
Posted by "Abhishek Chennaka (Code Review)" <ge...@cloudera.org>.
Abhishek Chennaka has posted comments on this change. ( http://gerrit.cloudera.org:8080/17915 )
Change subject: KUDU-1959 - Implement tests for the startup web page for Kudu servers
......................................................................
Patch Set 1:
(6 comments)
All done and abandoning this patch as this is split now
http://gerrit.cloudera.org:8080/#/c/17915/1/src/kudu/tserver/tablet_server-test.cc
File src/kudu/tserver/tablet_server-test.cc:
http://gerrit.cloudera.org:8080/#/c/17915/1/src/kudu/tserver/tablet_server-test.cc@803
PS1, Line 803: bool
> This will be racy, so we should use an atomic<bool>
Done
http://gerrit.cloudera.org:8080/#/c/17915/1/src/kudu/tserver/tablet_server-test.cc@811
PS1, Line 811: continue;
> nit: maybe sleep here too so we're not a tight loop while starting up. Same
Done
http://gerrit.cloudera.org:8080/#/c/17915/1/src/kudu/tserver/tablet_server-test.cc@827
PS1, Line 827: {
: is_started = true;
: read_startup_page.join();
: });
> nit: in our codebase this is usually formatted as
Done
http://gerrit.cloudera.org:8080/#/c/17915/1/src/kudu/tserver/tablet_server-test.cc@846
PS1, Line 846: // Fail a data directory.
: FLAGS_env_inject_eio_globs = fs_manager->GetDataRootDirs()[1];
: FLAGS_env_inject_eio = 1.0;
> nit: it's probably worth moving this down to right below calling Start()
Done, and the same for the "FLAGS_tablet_bootstrap_inject_latency_ms" in the above test.
http://gerrit.cloudera.org:8080/#/c/17915/1/src/kudu/tserver/tablet_server-test.cc@862
PS1, Line 862: ASSERT_STR_MATCHES(thread_buf.ToString(), "\"init_status\":[100|0]"
: ".*\"read_filesystem_status\":[100|0]"
: ".*\"read_instance_metadatafiles_status\":[100|0]"
: ".*\"read_data_directories_status\":"
: "([0-9]|[1-9][0-9]|100)"
: ".*\"start_tablets_status\":([0-9]|[1-9][0-9]|100)"
: ".*\"start_rpc_server_status\":[100|0]");
> nit: maybe define this in an anonymous namespace as a helper function that
Done
http://gerrit.cloudera.org:8080/#/c/17915/1/src/kudu/tserver/tablet_server-test.cc@890
PS1, Line 890: INSTANTIATE_TEST_SUITE_P(Params, TabletServerStartupWebPageTest,
: ::testing::ValuesIn(BlockManager::block_manager_types()));
> Given the above tests are near identical, consider using ::testing::Combine
Guess it makes more sense to cross test block manager types with whether we're injecting tablet bootstrap latency. This approach seems to cover all the cases.
--
To view, visit http://gerrit.cloudera.org:8080/17915
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I80880ecbda6a9f6db85baaf109af7e701111328d
Gerrit-Change-Number: 17915
Gerrit-PatchSet: 1
Gerrit-Owner: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Tue, 02 Nov 2021 05:27:56 +0000
Gerrit-HasComments: Yes