You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Alexey Serbin (Code Review)" <ge...@cloudera.org> on 2018/02/08 07:56:01 UTC
[kudu-CR] WIP [non voter-itest] stress test for the 3-4-3 replacement scheme
Alexey Serbin has uploaded this change for review. ( http://gerrit.cloudera.org:8080/9255
Change subject: WIP [non_voter-itest] stress test for the 3-4-3 replacement scheme
......................................................................
WIP [non_voter-itest] stress test for the 3-4-3 replacement scheme
Using this test, I was able to hit the following issue:
F0208 05:35:22.152496 2721 ts_tablet_manager.cc:563] T 5384471d823e46929029f9ff6ce212a3 P c713ac498df040caa897d3229214baa3: Tablet Copy: Found tablet in TABLET_DATA_COPYING state during StartTabletCopy()
*** Check failure stack trace: ***
@ 0x7f22d7fa92fd google::LogMessage::Fail() at ??:0
@ 0x7f22d7fab1bd google::LogMessage::SendToLog() at ??:0
@ 0x7f22d7fa8e39 google::LogMessage::Flush() at ??:0
@ 0x7f22d7fabc5f google::LogMessageFatal::~LogMessageFatal() at ??:0
@ 0x7f22e1fa6ed7 kudu::tserver::TSTabletManager::RunTabletCopy() at ??:0
@ 0x7f22e1fb296f kudu::tserver::TabletCopyRunnable::Run() at ??:0
@ 0x7f22d9c9a1f5 kudu::ThreadPool::DispatchThread() at ??:0
@ 0x7f22d9ca9069 boost::_mfi::mf0<>::operator()() at ??:0
@ 0x7f22d9ca8fd0 boost::_bi::list1<>::operator()<>() at ??:0
@ 0x7f22d9ca8f7a boost::_bi::bind_t<>::operator()() at ??:0
@ 0x7f22d9ca8d5d boost::detail::function::void_function_obj_invoker0<>::invoke() at ??:0
@ 0x7f22db98d018 boost::function0<>::operator()() at ??:0
@ 0x7f22d9c8bdfd kudu::Thread::SuperviseThread() at ??:0
@ 0x7f22ddc68184 start_thread at ??:0
WIP: Fix the issue and verify the fix. Probably, the dist-test machines
filled up because I ran this test: both the tablet servers and the master
were very chatty before I added the --minloglevel=2 option.
Change-Id: I036a882f7e9132a5c26013227c50c0699b59ed6e
---
M src/kudu/consensus/consensus_meta.cc
M src/kudu/consensus/consensus_queue.cc
M src/kudu/consensus/consensus_queue.h
M src/kudu/consensus/raft_consensus.cc
M src/kudu/integration-tests/cluster_verifier.h
M src/kudu/integration-tests/raft_consensus_nonvoter-itest.cc
6 files changed, 202 insertions(+), 1 deletion(-)
git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/55/9255/1
--
To view, visit http://gerrit.cloudera.org:8080/9255
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I036a882f7e9132a5c26013227c50c0699b59ed6e
Gerrit-Change-Number: 9255
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
[kudu-CR] WIP [non voter-itest] stress test for the 3-4-3 replacement scheme
Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/9255 )
Change subject: WIP [non_voter-itest] stress test for the 3-4-3 replacement scheme
......................................................................
Patch Set 1:
(4 comments)
Nice torture test
http://gerrit.cloudera.org:8080/#/c/9255/1/src/kudu/integration-tests/cluster_verifier.h
File src/kudu/integration-tests/cluster_verifier.h:
http://gerrit.cloudera.org:8080/#/c/9255/1/src/kudu/integration-tests/cluster_verifier.h@79
PS1, Line 79: DoKsck
nit: maybe rename to RunKsck() since DoKsck() sounds like the name of an internal method
http://gerrit.cloudera.org:8080/#/c/9255/1/src/kudu/integration-tests/raft_consensus_nonvoter-itest.cc
File src/kudu/integration-tests/raft_consensus_nonvoter-itest.cc:
http://gerrit.cloudera.org:8080/#/c/9255/1/src/kudu/integration-tests/raft_consensus_nonvoter-itest.cc@1915
PS1, Line 1915: too chatty, so logging error and fatal messages only
nit: comment punctuation here and below
http://gerrit.cloudera.org:8080/#/c/9255/1/src/kudu/integration-tests/raft_consensus_nonvoter-itest.cc@1916
PS1, Line 1916: "--raft_prepare_replacement_before_eviction=true",
consider adding --tablet_copy_early_session_timeout_prob=.2 to create more tablet copy failures and chaos
http://gerrit.cloudera.org:8080/#/c/9255/1/src/kudu/integration-tests/raft_consensus_nonvoter-itest.cc@1988
PS1, Line 1988: tablet_id_
always the same tablet?
--
To view, visit http://gerrit.cloudera.org:8080/9255
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I036a882f7e9132a5c26013227c50c0699b59ed6e
Gerrit-Change-Number: 9255
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Sat, 10 Feb 2018 03:08:42 +0000
Gerrit-HasComments: Yes
[kudu-CR] KUDU-2274 [itest] stress test for replica replacement
Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/9255 )
Change subject: KUDU-2274 [itest] stress test for replica replacement
......................................................................
Patch Set 4:
(16 comments)
http://gerrit.cloudera.org:8080/#/c/9255/3/src/kudu/integration-tests/raft_consensus_stress-itest.cc
File src/kudu/integration-tests/raft_consensus_stress-itest.cc:
http://gerrit.cloudera.org:8080/#/c/9255/3/src/kudu/integration-tests/raft_consensus_stress-itest.cc@66
PS3, Line 66: 2
magic number, can we add something like /* ERROR */ or ideally use the constant from glog if possible? Also an explanation as to why we chose this as the default would be helpful for posterity.
http://gerrit.cloudera.org:8080/#/c/9255/3/src/kudu/integration-tests/raft_consensus_stress-itest.cc@79
PS3, Line 79: 2
20 for ASAN and 5 for everybody else?
http://gerrit.cloudera.org:8080/#/c/9255/3/src/kudu/integration-tests/raft_consensus_stress-itest.cc@113
PS3, Line 113: RaftConsensusITestBase
Since we are not using tablet_id_ etc, can we instead use ExternalMiniClusterITestBase as the base class?
http://gerrit.cloudera.org:8080/#/c/9255/3/src/kudu/integration-tests/raft_consensus_stress-itest.cc@117
PS3, Line 117: being
nit: s/being/are/
http://gerrit.cloudera.org:8080/#/c/9255/3/src/kudu/integration-tests/raft_consensus_stress-itest.cc@122
PS3, Line 122: this test is not fit to run under TSAN
nit: how about:
this test is not reliable under TSAN; skipping test
http://gerrit.cloudera.org:8080/#/c/9255/3/src/kudu/integration-tests/raft_consensus_stress-itest.cc@170
PS3, Line 170: 2
kNumReplicas ?
http://gerrit.cloudera.org:8080/#/c/9255/3/src/kudu/integration-tests/raft_consensus_stress-itest.cc@175
PS3, Line 175: TODO(aserbin)
TODO(KUDU-1188)
http://gerrit.cloudera.org:8080/#/c/9255/3/src/kudu/integration-tests/raft_consensus_stress-itest.cc@212
PS3, Line 212: if (do_pauses) {
this loop will run hot if do_pauses is set to false
http://gerrit.cloudera.org:8080/#/c/9255/3/src/kudu/integration-tests/raft_consensus_stress-itest.cc@241
PS3, Line 241: do_elections = false;
: do_pauses = false;
are these lines needed?
http://gerrit.cloudera.org:8080/#/c/9255/3/src/kudu/integration-tests/raft_consensus_stress-itest.cc@265
PS3, Line 265: 0
false
http://gerrit.cloudera.org:8080/#/c/9255/3/src/kudu/integration-tests/raft_consensus_stress-itest.cc@266
PS3, Line 266: 0
false
http://gerrit.cloudera.org:8080/#/c/9255/3/src/kudu/integration-tests/raft_consensus_stress-itest.cc@276
PS3, Line 276: 1
true
http://gerrit.cloudera.org:8080/#/c/9255/3/src/kudu/integration-tests/raft_consensus_stress-itest.cc@277
PS3, Line 277: 1
true
http://gerrit.cloudera.org:8080/#/c/9255/3/src/kudu/integration-tests/raft_consensus_stress-itest.cc@284
PS3, Line 284: do_elections = false;
: do_pauses = false;
remove these lines?
http://gerrit.cloudera.org:8080/#/c/9255/3/src/kudu/integration-tests/raft_consensus_stress-itest.cc@293
PS3, Line 293: onst auto& s
Consider saving this in a Status last_ksck_status defined next to ksck_failures_in_a_row (outside the loop) and saved on line 273, since it would be confusing to print a success message on failure if the final attempt succeeds.
http://gerrit.cloudera.org:8080/#/c/9255/3/src/kudu/integration-tests/raft_consensus_stress-itest.cc@305
PS3, Line 305: //NO_FATALS(v.CheckRowCount(workload.table_name(),
Doesn't CheckRowCount() retry? Leader leases won't help here, I don't think. If we allow writes to time out, then we can actually end up writing more rows than got acknowledged. Since we are using workload.set_timeout_allowed(true), we should be using (ClusterVerifier::AT_LEAST, workload.rows_inserted()), not ClusterVerifier::EXACTLY.
--
To view, visit http://gerrit.cloudera.org:8080/9255
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I036a882f7e9132a5c26013227c50c0699b59ed6e
Gerrit-Change-Number: 9255
Gerrit-PatchSet: 4
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Wed, 14 Feb 2018 02:19:16 +0000
Gerrit-HasComments: Yes
[kudu-CR] KUDU-2274 [itest] stress test for replica replacement
Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/9255 )
Change subject: KUDU-2274 [itest] stress test for replica replacement
......................................................................
KUDU-2274 [itest] stress test for replica replacement
This is a stress test for the automatic replica replacement in Kudu.
Parameters of the test are configurable via run-time gflags, so it's
possible to run it in a 'standalone' mode, targeting it to be a sort
of an endurance test.
This test reproduces the race described in KUDU-2274: it triggers
DCHECK() assertions added in 194fd8b169f29aafbd78a47709ac51d2e8354a1a
before the relevant fixes for KUDU-2274 were checked in.
Change-Id: I036a882f7e9132a5c26013227c50c0699b59ed6e
Reviewed-on: http://gerrit.cloudera.org:8080/9255
Tested-by: Kudu Jenkins
Reviewed-by: Mike Percy <mp...@apache.org>
---
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/integration-tests/cluster_verifier.cc
M src/kudu/integration-tests/cluster_verifier.h
A src/kudu/integration-tests/raft_consensus_stress-itest.cc
4 files changed, 305 insertions(+), 4 deletions(-)
Approvals:
Kudu Jenkins: Verified
Mike Percy: Looks good to me, approved
--
To view, visit http://gerrit.cloudera.org:8080/9255
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I036a882f7e9132a5c26013227c50c0699b59ed6e
Gerrit-Change-Number: 9255
Gerrit-PatchSet: 7
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
[kudu-CR] [itest] stress test for replica replacement
Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/9255 )
Change subject: [itest] stress test for replica replacement
......................................................................
Patch Set 2:
(5 comments)
http://gerrit.cloudera.org:8080/#/c/9255/1/src/kudu/integration-tests/cluster_verifier.h
File src/kudu/integration-tests/cluster_verifier.h:
http://gerrit.cloudera.org:8080/#/c/9255/1/src/kudu/integration-tests/cluster_verifier.h@79
PS1, Line 79: the ks
> nit: maybe rename to RunKsck() since DoKsck() sounds like the name of an in
Done
http://gerrit.cloudera.org:8080/#/c/9255/1/src/kudu/integration-tests/raft_consensus_nonvoter-itest.cc
File src/kudu/integration-tests/raft_consensus_nonvoter-itest.cc:
http://gerrit.cloudera.org:8080/#/c/9255/1/src/kudu/integration-tests/raft_consensus_nonvoter-itest.cc@1915
PS1, Line 1915:
> nit: comment punctuation here and below
Done
http://gerrit.cloudera.org:8080/#/c/9255/1/src/kudu/integration-tests/raft_consensus_nonvoter-itest.cc@1916
PS1, Line 1916:
> consider adding --tablet_copy_early_session_timeout_prob=.2 to create more
That's a good idea!
Done.
http://gerrit.cloudera.org:8080/#/c/9255/1/src/kudu/integration-tests/raft_consensus_nonvoter-itest.cc@1939
PS1, Line 1939:
> warning: either cast from 'int' to 'long' is ineffective, or there is loss
Done
http://gerrit.cloudera.org:8080/#/c/9255/1/src/kudu/integration-tests/raft_consensus_nonvoter-itest.cc@1988
PS1, Line 1988:
> always the same tablet?
woops, indeed. It will make sense to do that for different tablets.
--
To view, visit http://gerrit.cloudera.org:8080/9255
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I036a882f7e9132a5c26013227c50c0699b59ed6e
Gerrit-Change-Number: 9255
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Mon, 12 Feb 2018 07:23:45 +0000
Gerrit-HasComments: Yes
[kudu-CR] KUDU-2274 [itest] stress test for replica replacement
Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Mike Percy, Kudu Jenkins,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/9255
to look at the new patch set (#3).
Change subject: KUDU-2274 [itest] stress test for replica replacement
......................................................................
KUDU-2274 [itest] stress test for replica replacement
This is a stress test for the automatic replica replacement in Kudu.
Parameters of the test is configurable via run-time gflags, so it's
possible to run it in a 'standalone' mode, targeting it to be a sort
of an endurance test.
This test reproduces the race described in KUDU-2274: it triggers
DCHECK() assertions added in 194fd8b169f29aafbd78a47709ac51d2e8354a1a
before the relevant fixes for KUDU-2274 were checked in.
Change-Id: I036a882f7e9132a5c26013227c50c0699b59ed6e
---
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/integration-tests/cluster_verifier.cc
M src/kudu/integration-tests/cluster_verifier.h
A src/kudu/integration-tests/raft_consensus_stress-itest.cc
4 files changed, 317 insertions(+), 4 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/55/9255/3
--
To view, visit http://gerrit.cloudera.org:8080/9255
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I036a882f7e9132a5c26013227c50c0699b59ed6e
Gerrit-Change-Number: 9255
Gerrit-PatchSet: 3
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
[kudu-CR] KUDU-2274 [itest] stress test for replica replacement
Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Mike Percy,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/9255
to look at the new patch set (#6).
Change subject: KUDU-2274 [itest] stress test for replica replacement
......................................................................
KUDU-2274 [itest] stress test for replica replacement
This is a stress test for the automatic replica replacement in Kudu.
Parameters of the test are configurable via run-time gflags, so it's
possible to run it in a 'standalone' mode, targeting it to be a sort
of an endurance test.
This test reproduces the race described in KUDU-2274: it triggers
DCHECK() assertions added in 194fd8b169f29aafbd78a47709ac51d2e8354a1a
before the relevant fixes for KUDU-2274 were checked in.
Change-Id: I036a882f7e9132a5c26013227c50c0699b59ed6e
---
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/integration-tests/cluster_verifier.cc
M src/kudu/integration-tests/cluster_verifier.h
A src/kudu/integration-tests/raft_consensus_stress-itest.cc
4 files changed, 305 insertions(+), 4 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/55/9255/6
--
To view, visit http://gerrit.cloudera.org:8080/9255
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I036a882f7e9132a5c26013227c50c0699b59ed6e
Gerrit-Change-Number: 9255
Gerrit-PatchSet: 6
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
[kudu-CR] KUDU-2274 [itest] stress test for replica replacement
Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/9255 )
Change subject: KUDU-2274 [itest] stress test for replica replacement
......................................................................
Patch Set 5:
(15 comments)
http://gerrit.cloudera.org:8080/#/c/9255/3/src/kudu/integration-tests/raft_consensus_stress-itest.cc
File src/kudu/integration-tests/raft_consensus_stress-itest.cc:
http://gerrit.cloudera.org:8080/#/c/9255/3/src/kudu/integration-tests/raft_consensus_stress-itest.cc@66
PS3, Line 66: t
> magic number, can we add something like /* ERROR */ or ideally use the cons
Done
http://gerrit.cloudera.org:8080/#/c/9255/3/src/kudu/integration-tests/raft_consensus_stress-itest.cc@79
PS3, Line 79: "
> 20 for ASAN and 5 for everybody else?
Yep, that's the ultimate typo. It should be inversed/swapped.
http://gerrit.cloudera.org:8080/#/c/9255/3/src/kudu/integration-tests/raft_consensus_stress-itest.cc@117
PS3, Line 117: int
> nit: s/being/are/
Done
http://gerrit.cloudera.org:8080/#/c/9255/3/src/kudu/integration-tests/raft_consensus_stress-itest.cc@122
PS3, Line 122: ShortTimeout = MonoDelta::FromSeconds(
> nit: how about:
well, as it turned out, after fixing the ultimate typo it works for TSAN even without patch for TestWorkload.
http://gerrit.cloudera.org:8080/#/c/9255/3/src/kudu/integration-tests/raft_consensus_stress-itest.cc@170
PS3, Line 170:
> kNumReplicas ?
By my understanding (after looking into catalog_manager.cc), FLAGS_max_create_tablets_per_ts limits the number of partitions/tablets, not the number of total replicas. My original idea was to have FLAGS_test_num_tablets_per_server as a proxy for FLAGS_max_create_tablets_per_ts.
However, I updated the signature and now the test speaks in number of replicas per tablet server: I agree it's more convenient.
http://gerrit.cloudera.org:8080/#/c/9255/3/src/kudu/integration-tests/raft_consensus_stress-itest.cc@175
PS3, Line 175: le (workload.
> TODO(KUDU-1188)
Done
http://gerrit.cloudera.org:8080/#/c/9255/3/src/kudu/integration-tests/raft_consensus_stress-itest.cc@212
PS3, Line 212: const auto ts_
> this loop will run hot if do_pauses is set to false
Right. I moved SleepFor(MonoDelta::FromMilliseconds(250)) out of the if() closure.
http://gerrit.cloudera.org:8080/#/c/9255/3/src/kudu/integration-tests/raft_consensus_stress-itest.cc@241
PS3, Line 241: workload.StopAndJoin();
: rows_inserted = wo
> are these lines needed?
Yep, now I think we can get rid of them safely. Maybe, in one of the former revisions they were useful.
http://gerrit.cloudera.org:8080/#/c/9255/3/src/kudu/integration-tests/raft_consensus_stress-itest.cc@265
PS3, Line 265:
> false
Done
http://gerrit.cloudera.org:8080/#/c/9255/3/src/kudu/integration-tests/raft_consensus_stress-itest.cc@266
PS3, Line 266: e
> false
Done
http://gerrit.cloudera.org:8080/#/c/9255/3/src/kudu/integration-tests/raft_consensus_stress-itest.cc@276
PS3, Line 276: T
> true
Done
http://gerrit.cloudera.org:8080/#/c/9255/3/src/kudu/integration-tests/raft_consensus_stress-itest.cc@277
PS3, Line 277: r
> true
Done
http://gerrit.cloudera.org:8080/#/c/9255/3/src/kudu/integration-tests/raft_consensus_stress-itest.cc@284
PS3, Line 284: FAIL() << Substitute("$0: tablet replicas haven't converged", s.ToString());
: }
> remove these lines?
Done
http://gerrit.cloudera.org:8080/#/c/9255/3/src/kudu/integration-tests/raft_consensus_stress-itest.cc@293
PS3, Line 293: FATALS(v.Che
> Consider saving this in a Status last_ksck_status defined next to ksck_fail
Good catch. It seems like just another lost update due to multiple revisions of this test. I corrected the code as necessary.
http://gerrit.cloudera.org:8080/#/c/9255/3/src/kudu/integration-tests/raft_consensus_stress-itest.cc@305
PS3, Line 305:
> Doesn't CheckRowCount() retry? Leader leases won't help here, I don't think
Indeed, thank you for pointing at the right direction. I thought the message was about reading less than expected, but as it turned out it was about reading more than expected.
--
To view, visit http://gerrit.cloudera.org:8080/9255
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I036a882f7e9132a5c26013227c50c0699b59ed6e
Gerrit-Change-Number: 9255
Gerrit-PatchSet: 5
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Wed, 14 Feb 2018 06:56:13 +0000
Gerrit-HasComments: Yes
[kudu-CR] KUDU-2274 [itest] stress test for replica replacement
Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has removed Kudu Jenkins from this change. ( http://gerrit.cloudera.org:8080/9255 )
Change subject: KUDU-2274 [itest] stress test for replica replacement
......................................................................
Removed reviewer Kudu Jenkins with the following votes:
* Verified-1 by Kudu Jenkins (120)
--
To view, visit http://gerrit.cloudera.org:8080/9255
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteReviewer
Gerrit-Change-Id: I036a882f7e9132a5c26013227c50c0699b59ed6e
Gerrit-Change-Number: 9255
Gerrit-PatchSet: 5
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
[kudu-CR] KUDU-2274 [itest] stress test for replica replacement
Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Mike Percy, Kudu Jenkins,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/9255
to look at the new patch set (#4).
Change subject: KUDU-2274 [itest] stress test for replica replacement
......................................................................
KUDU-2274 [itest] stress test for replica replacement
This is a stress test for the automatic replica replacement in Kudu.
Parameters of the test is configurable via run-time gflags, so it's
possible to run it in a 'standalone' mode, targeting it to be a sort
of an endurance test.
This test reproduces the race described in KUDU-2274: it triggers
DCHECK() assertions added in 194fd8b169f29aafbd78a47709ac51d2e8354a1a
before the relevant fixes for KUDU-2274 were checked in.
Change-Id: I036a882f7e9132a5c26013227c50c0699b59ed6e
---
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/integration-tests/cluster_verifier.cc
M src/kudu/integration-tests/cluster_verifier.h
A src/kudu/integration-tests/raft_consensus_stress-itest.cc
4 files changed, 317 insertions(+), 4 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/55/9255/4
--
To view, visit http://gerrit.cloudera.org:8080/9255
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I036a882f7e9132a5c26013227c50c0699b59ed6e
Gerrit-Change-Number: 9255
Gerrit-PatchSet: 4
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
[kudu-CR] KUDU-2274 [itest] stress test for replica replacement
Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Mike Percy, Kudu Jenkins,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/9255
to look at the new patch set (#5).
Change subject: KUDU-2274 [itest] stress test for replica replacement
......................................................................
KUDU-2274 [itest] stress test for replica replacement
This is a stress test for the automatic replica replacement in Kudu.
Parameters of the test is configurable via run-time gflags, so it's
possible to run it in a 'standalone' mode, targeting it to be a sort
of an endurance test.
This test reproduces the race described in KUDU-2274: it triggers
DCHECK() assertions added in 194fd8b169f29aafbd78a47709ac51d2e8354a1a
before the relevant fixes for KUDU-2274 were checked in.
Change-Id: I036a882f7e9132a5c26013227c50c0699b59ed6e
---
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/integration-tests/cluster_verifier.cc
M src/kudu/integration-tests/cluster_verifier.h
A src/kudu/integration-tests/raft_consensus_stress-itest.cc
4 files changed, 305 insertions(+), 4 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/55/9255/5
--
To view, visit http://gerrit.cloudera.org:8080/9255
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I036a882f7e9132a5c26013227c50c0699b59ed6e
Gerrit-Change-Number: 9255
Gerrit-PatchSet: 5
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
[kudu-CR] KUDU-2274 [itest] stress test for replica replacement
Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/9255 )
Change subject: KUDU-2274 [itest] stress test for replica replacement
......................................................................
Patch Set 5: Verified+1
(1 comment)
Unreleated flakes in
DiskReservationITest.TestFillMultipleDisks
DiskReservationITest.TestWalWriteToFullDiskAborts
due to NTP failure
F0214 06:53:26.686384 28842 master_main.cc:74] Check failed: _s.ok() Bad status: Service unavailable: Cannot initialize clock: Error reading clock. Clock considered unsynchronized
http://gerrit.cloudera.org:8080/#/c/9255/3/src/kudu/integration-tests/raft_consensus_stress-itest.cc
File src/kudu/integration-tests/raft_consensus_stress-itest.cc:
http://gerrit.cloudera.org:8080/#/c/9255/3/src/kudu/integration-tests/raft_consensus_stress-itest.cc@113
PS3, Line 113:
> Since we are not using tablet_id_ etc, can we instead use ExternalMiniClust
This test uses tablet_servers_ and the code to populate that. I'm also thinking that we might expand functionality of this test in the future, so I would prefer to keep RaftConsensusITestBase here.
However, if you feel strongly about that, I'll update this to use ExternalMiniClusterITestBase.
--
To view, visit http://gerrit.cloudera.org:8080/9255
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I036a882f7e9132a5c26013227c50c0699b59ed6e
Gerrit-Change-Number: 9255
Gerrit-PatchSet: 5
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Wed, 14 Feb 2018 19:49:02 +0000
Gerrit-HasComments: Yes
[kudu-CR] KUDU-2274 [itest] stress test for replica replacement
Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/9255 )
Change subject: KUDU-2274 [itest] stress test for replica replacement
......................................................................
Patch Set 5: Code-Review+1
(3 comments)
http://gerrit.cloudera.org:8080/#/c/9255/5//COMMIT_MSG
Commit Message:
http://gerrit.cloudera.org:8080/#/c/9255/5//COMMIT_MSG@11
PS5, Line 11: is
are
http://gerrit.cloudera.org:8080/#/c/9255/5/src/kudu/integration-tests/raft_consensus_stress-itest.cc
File src/kudu/integration-tests/raft_consensus_stress-itest.cc:
http://gerrit.cloudera.org:8080/#/c/9255/5/src/kudu/integration-tests/raft_consensus_stress-itest.cc@236
PS5, Line 236:
nit: why not iteration++ here?
http://gerrit.cloudera.org:8080/#/c/9255/5/src/kudu/integration-tests/raft_consensus_stress-itest.cc@277
PS5, Line 277: if (ksck_failures_in_a_row > FLAGS_test_max_ksck_failures) {
: // Suspecting a Raft consensus failure while running ksck with shorter
: // timeout (see above). Run an extra round of ksck with the regular timeout
: // to verify that replicas haven't really converged and, if so, just bail
: // right at this point.
: const auto& s = v.RunKsck();
: if (!s.ok()) {
: FAIL() << Substitute("$0: tablet replicas haven't converged", s.ToString());
: }
: }
isn't this the same as line 288 where we run v.CheckCluster() ?
--
To view, visit http://gerrit.cloudera.org:8080/9255
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I036a882f7e9132a5c26013227c50c0699b59ed6e
Gerrit-Change-Number: 9255
Gerrit-PatchSet: 5
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Tue, 20 Feb 2018 23:42:10 +0000
Gerrit-HasComments: Yes
[kudu-CR] KUDU-2274 [itest] stress test for replica replacement
Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/9255 )
Change subject: KUDU-2274 [itest] stress test for replica replacement
......................................................................
Patch Set 6:
(3 comments)
http://gerrit.cloudera.org:8080/#/c/9255/5//COMMIT_MSG
Commit Message:
http://gerrit.cloudera.org:8080/#/c/9255/5//COMMIT_MSG@11
PS5, Line 11: ar
> are
Done
http://gerrit.cloudera.org:8080/#/c/9255/5/src/kudu/integration-tests/raft_consensus_stress-itest.cc
File src/kudu/integration-tests/raft_consensus_stress-itest.cc:
http://gerrit.cloudera.org:8080/#/c/9255/5/src/kudu/integration-tests/raft_consensus_stress-itest.cc@236
PS5, Line 236:
> nit: why not iteration++ here?
Because I don't want to count retry due to a ksck failure as an iteration: if RunKsck() fails, the logic starts another loop, but that case is not counted as 'iteration' in sense of this scenario.
http://gerrit.cloudera.org:8080/#/c/9255/5/src/kudu/integration-tests/raft_consensus_stress-itest.cc@277
PS5, Line 277: if (ksck_failures_in_a_row > FLAGS_test_max_ksck_failures) {
: // Suspecting a Raft consensus failure while running ksck with shorter
: // timeout (see above). Run an extra round of ksck with the regular timeout
: // to verify that replicas haven't really converged and, if so, just bail
: // right at this point.
: const auto& s = v.RunKsck();
: if (!s.ok()) {
: FAIL() << Substitute("$0: tablet replicas haven't converged", s.ToString());
: }
: }
> isn't this the same as line 288 where we run v.CheckCluster() ?
Yes, CheckCluster contains RunKsck(). However, it's wrapped into some extra logic for retries. Here I wanted to short-circuit and report an error right away if the ksck has already failed too many times in a row.
--
To view, visit http://gerrit.cloudera.org:8080/9255
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I036a882f7e9132a5c26013227c50c0699b59ed6e
Gerrit-Change-Number: 9255
Gerrit-PatchSet: 6
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Tue, 20 Feb 2018 23:54:32 +0000
Gerrit-HasComments: Yes
[kudu-CR] [itest] stress test for replica replacement
Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Mike Percy, Kudu Jenkins,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/9255
to look at the new patch set (#2).
Change subject: [itest] stress test for replica replacement
......................................................................
[itest] stress test for replica replacement
This is a stress test for the automatic replica replacement in Kudu.
Change-Id: I036a882f7e9132a5c26013227c50c0699b59ed6e
---
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/integration-tests/cluster_verifier.cc
M src/kudu/integration-tests/cluster_verifier.h
A src/kudu/integration-tests/raft_consensus_stress-itest.cc
4 files changed, 253 insertions(+), 4 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/55/9255/2
--
To view, visit http://gerrit.cloudera.org:8080/9255
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I036a882f7e9132a5c26013227c50c0699b59ed6e
Gerrit-Change-Number: 9255
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
[kudu-CR] KUDU-2274 [itest] stress test for replica replacement
Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/9255 )
Change subject: KUDU-2274 [itest] stress test for replica replacement
......................................................................
Patch Set 6: Code-Review+2
--
To view, visit http://gerrit.cloudera.org:8080/9255
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I036a882f7e9132a5c26013227c50c0699b59ed6e
Gerrit-Change-Number: 9255
Gerrit-PatchSet: 6
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Wed, 21 Feb 2018 01:37:02 +0000
Gerrit-HasComments: No