You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Bankim Bhavsar (Code Review)" <ge...@cloudera.org> on 2021/03/19 23:17:44 UTC

[kudu-CR] [test] KUDU-3266 Fix flakiness in dynamic multi master test

Bankim Bhavsar has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17211


Change subject: [test] KUDU-3266 Fix flakiness in dynamic_multi_master test
......................................................................

[test] KUDU-3266 Fix flakiness in dynamic_multi_master test

Flakiness was reported in dynamic_multi_master test after
the introduction of test for recovering dead master,
commit 4b4a8c0f2f.

See KUDU-3266 for the analysis.

This change wraps the check for row count under ASSERT_EVENTUALLY
to ensure the resumed master and the remaining master are given
a chance to communicate Raft messages and become up to date.

Change-Id: Ifac1d95707064b6ac2624d3f52336d6c39afd3c8
---
M src/kudu/master/dynamic_multi_master-test.cc
1 file changed, 13 insertions(+), 3 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/11/17211/1
-- 
To view, visit http://gerrit.cloudera.org:8080/17211
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ifac1d95707064b6ac2624d3f52336d6c39afd3c8
Gerrit-Change-Number: 17211
Gerrit-PatchSet: 1
Gerrit-Owner: Bankim Bhavsar <ba...@cloudera.com>

[kudu-CR] [test] KUDU-3266 Fix flakiness in dynamic multi master test

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/17211 )

Change subject: [test] KUDU-3266 Fix flakiness in dynamic_multi_master test
......................................................................


Patch Set 2: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17211/2/src/kudu/master/dynamic_multi_master-test.cc
File src/kudu/master/dynamic_multi_master-test.cc:

http://gerrit.cloudera.org:8080/#/c/17211/2/src/kudu/master/dynamic_multi_master-test.cc@602
PS2, Line 602:         LOG(INFO) << Substitute("Pausing master $0, $1", master->uuid(),
             :                                 master->bound_rpc_hostport().ToString());
nit: if this was added for diagnostic purposes, maybe consider dropping this after the issue is fixed?  I guess this extra logging line doesn't add much from the point of automated testing, right?



-- 
To view, visit http://gerrit.cloudera.org:8080/17211
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifac1d95707064b6ac2624d3f52336d6c39afd3c8
Gerrit-Change-Number: 17211
Gerrit-PatchSet: 2
Gerrit-Owner: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mahesh Reddy <mr...@cloudera.com>
Gerrit-Comment-Date: Tue, 23 Mar 2021 00:11:29 +0000
Gerrit-HasComments: Yes

[kudu-CR] [test] KUDU-3266 Fix flakiness in dynamic multi master test

Posted by "Bankim Bhavsar (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17211

to look at the new patch set (#2).

Change subject: [test] KUDU-3266 Fix flakiness in dynamic_multi_master test
......................................................................

[test] KUDU-3266 Fix flakiness in dynamic_multi_master test

Flakiness was reported in dynamic_multi_master test after
the introduction of test for recovering dead master,
commit 4b4a8c0f2f.

See KUDU-3266 for the analysis.

This change wraps the check for row count under ASSERT_EVENTUALLY
to ensure the resumed master and the remaining master are given
a chance to communicate Raft messages and become up to date.

Tests:
- Reproduced the issue with ASAN build with dist-test.
- Verified no failures over 100 iterations with the fix
on ASAN build.

Change-Id: Ifac1d95707064b6ac2624d3f52336d6c39afd3c8
---
M src/kudu/master/dynamic_multi_master-test.cc
1 file changed, 13 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/11/17211/2
-- 
To view, visit http://gerrit.cloudera.org:8080/17211
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifac1d95707064b6ac2624d3f52336d6c39afd3c8
Gerrit-Change-Number: 17211
Gerrit-PatchSet: 2
Gerrit-Owner: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] [test] KUDU-3266 Fix flakiness in dynamic multi master test

Posted by "Bankim Bhavsar (Code Review)" <ge...@cloudera.org>.
Bankim Bhavsar has removed a vote on this change.

Change subject: [test] KUDU-3266 Fix flakiness in dynamic_multi_master test
......................................................................


Removed Verified-1 by Kudu Jenkins (120)
-- 
To view, visit http://gerrit.cloudera.org:8080/17211
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: Ifac1d95707064b6ac2624d3f52336d6c39afd3c8
Gerrit-Change-Number: 17211
Gerrit-PatchSet: 2
Gerrit-Owner: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] [test] KUDU-3266 Fix flakiness in dynamic multi master test

Posted by "Bankim Bhavsar (Code Review)" <ge...@cloudera.org>.
Bankim Bhavsar has posted comments on this change. ( http://gerrit.cloudera.org:8080/17211 )

Change subject: [test] KUDU-3266 Fix flakiness in dynamic_multi_master test
......................................................................


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17211/2/src/kudu/master/dynamic_multi_master-test.cc
File src/kudu/master/dynamic_multi_master-test.cc:

http://gerrit.cloudera.org:8080/#/c/17211/2/src/kudu/master/dynamic_multi_master-test.cc@602
PS2, Line 602:         LOG(INFO) << Substitute("Pausing master $0, $1", master->uuid(),
             :                                 master->bound_rpc_hostport().ToString());
> nit: if this was added for diagnostic purposes, maybe consider dropping thi
I plan to drop this after a week or so once no flakiness has been observed due to this part of the verification.


http://gerrit.cloudera.org:8080/#/c/17211/2/src/kudu/master/dynamic_multi_master-test.cc@607
PS2, Line 607:         // We can run into table not found error in cases where the
             :         // previously paused master that's leader of prior term resumes
             :         // and the up to date follower doesn't become leader and the resumed
             :         // master from previous term isn't up to date. See KUDU-3266 for details.
             :         ASSERT_EVENTUALLY([&] {
             :           NO_FATALS(cv.CheckRowCount(table_name, ClusterVerifier::EXACTLY, 0));
             :         });
> It might be interesting to explore different ways to ensure consistency at 
Thanks for that suggestion. I'll explore more into RYW mode for master operations.



-- 
To view, visit http://gerrit.cloudera.org:8080/17211
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifac1d95707064b6ac2624d3f52336d6c39afd3c8
Gerrit-Change-Number: 17211
Gerrit-PatchSet: 2
Gerrit-Owner: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mahesh Reddy <mr...@cloudera.com>
Gerrit-Comment-Date: Tue, 23 Mar 2021 16:43:54 +0000
Gerrit-HasComments: Yes

[kudu-CR] [test] KUDU-3266 Fix flakiness in dynamic multi master test

Posted by "Bankim Bhavsar (Code Review)" <ge...@cloudera.org>.
Bankim Bhavsar has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/17211 )

Change subject: [test] KUDU-3266 Fix flakiness in dynamic_multi_master test
......................................................................

[test] KUDU-3266 Fix flakiness in dynamic_multi_master test

Flakiness was reported in dynamic_multi_master test after
the introduction of test for recovering dead master,
commit 4b4a8c0f2f.

See KUDU-3266 for the analysis.

This change wraps the check for row count under ASSERT_EVENTUALLY
to ensure the resumed master and the remaining master are given
a chance to communicate Raft messages and become up to date.

Tests:
- Reproduced the issue with ASAN build with dist-test.
- Verified no failures over 100 iterations with the fix
on ASAN build.

Change-Id: Ifac1d95707064b6ac2624d3f52336d6c39afd3c8
Reviewed-on: http://gerrit.cloudera.org:8080/17211
Tested-by: Bankim Bhavsar <ba...@cloudera.com>
Reviewed-by: Andrew Wong <aw...@cloudera.com>
Reviewed-by: Mahesh Reddy <mr...@cloudera.com>
Reviewed-by: Alexey Serbin <as...@cloudera.com>
---
M src/kudu/master/dynamic_multi_master-test.cc
1 file changed, 13 insertions(+), 3 deletions(-)

Approvals:
  Bankim Bhavsar: Verified
  Andrew Wong: Looks good to me, approved
  Mahesh Reddy: Looks good to me, but someone else must approve
  Alexey Serbin: Looks good to me, but someone else must approve

-- 
To view, visit http://gerrit.cloudera.org:8080/17211
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ifac1d95707064b6ac2624d3f52336d6c39afd3c8
Gerrit-Change-Number: 17211
Gerrit-PatchSet: 3
Gerrit-Owner: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mahesh Reddy <mr...@cloudera.com>

[kudu-CR] [test] KUDU-3266 Fix flakiness in dynamic multi master test

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/17211 )

Change subject: [test] KUDU-3266 Fix flakiness in dynamic_multi_master test
......................................................................


Patch Set 2: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17211/2/src/kudu/master/dynamic_multi_master-test.cc
File src/kudu/master/dynamic_multi_master-test.cc:

http://gerrit.cloudera.org:8080/#/c/17211/2/src/kudu/master/dynamic_multi_master-test.cc@607
PS2, Line 607:         // We can run into table not found error in cases where the
             :         // previously paused master that's leader of prior term resumes
             :         // and the up to date follower doesn't become leader and the resumed
             :         // master from previous term isn't up to date. See KUDU-3266 for details.
             :         ASSERT_EVENTUALLY([&] {
             :           NO_FATALS(cv.CheckRowCount(table_name, ClusterVerifier::EXACTLY, 0));
             :         });
It might be interesting to explore different ways to ensure consistency at least within the same client for master operations, i.e. "read-your-writes" for master operations.

For tablet servers we do this when the client specifies RYW mode by propagating timestamps to the client, having clients supply the highest seen timestamp prior to each scan request, and waiting on the tablet servers until the timestamp is considered safe to scan (similar to a snapshot scan, but with a Kudu-specified timestamp).

This seems like a reasonable fix for now though.



-- 
To view, visit http://gerrit.cloudera.org:8080/17211
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifac1d95707064b6ac2624d3f52336d6c39afd3c8
Gerrit-Change-Number: 17211
Gerrit-PatchSet: 2
Gerrit-Owner: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Mon, 22 Mar 2021 23:54:28 +0000
Gerrit-HasComments: Yes

[kudu-CR] [test] KUDU-3266 Fix flakiness in dynamic multi master test

Posted by "Bankim Bhavsar (Code Review)" <ge...@cloudera.org>.
Bankim Bhavsar has posted comments on this change. ( http://gerrit.cloudera.org:8080/17211 )

Change subject: [test] KUDU-3266 Fix flakiness in dynamic_multi_master test
......................................................................


Patch Set 2: Verified+1

> Patch Set 2: Verified-1
> 
> Build Failed 
> 
> http://jenkins.kudu.apache.org/job/kudu-gerrit/23515/ : FAILURE

Unrelated test failure:
HmsConfigurations/AlterTableRandomized.TestRandomSequence/1


-- 
To view, visit http://gerrit.cloudera.org:8080/17211
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifac1d95707064b6ac2624d3f52336d6c39afd3c8
Gerrit-Change-Number: 17211
Gerrit-PatchSet: 2
Gerrit-Owner: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Sun, 21 Mar 2021 16:54:15 +0000
Gerrit-HasComments: No

[kudu-CR] [test] KUDU-3266 Fix flakiness in dynamic multi master test

Posted by "Mahesh Reddy (Code Review)" <ge...@cloudera.org>.
Mahesh Reddy has posted comments on this change. ( http://gerrit.cloudera.org:8080/17211 )

Change subject: [test] KUDU-3266 Fix flakiness in dynamic_multi_master test
......................................................................


Patch Set 2: Code-Review+1

LGTM, the jira with the log messages was really useful in understanding the issue. Btw, the google drive link doesn't seem to open to all (asked me to request access when I clicked on it).


-- 
To view, visit http://gerrit.cloudera.org:8080/17211
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifac1d95707064b6ac2624d3f52336d6c39afd3c8
Gerrit-Change-Number: 17211
Gerrit-PatchSet: 2
Gerrit-Owner: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mahesh Reddy <mr...@cloudera.com>
Gerrit-Comment-Date: Tue, 23 Mar 2021 00:02:08 +0000
Gerrit-HasComments: No