You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Andrew Wong (Code Review)" <ge...@cloudera.org> on 2019/10/02 02:12:18 UTC

[kudu-CR] test: deflake TestBackgroundFailureDuringMaintenanceMode

Andrew Wong has uploaded this change for review. ( http://gerrit.cloudera.org:8080/14341


Change subject: test: deflake TestBackgroundFailureDuringMaintenanceMode
......................................................................

test: deflake TestBackgroundFailureDuringMaintenanceMode

We previously weren't accounting for the possibility of a slow
bootstrap, asserting early that we had the expected number of running
replicas.

Without this, the test failed 6/1000 times in debug mode. With it, it
passed 4997/5000 times (failures due to a different issue).

Change-Id: I67eed008a983cd9ed528496777ffe3d30126a55f
---
M src/kudu/integration-tests/maintenance_mode-itest.cc
1 file changed, 7 insertions(+), 3 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/41/14341/1
-- 
To view, visit http://gerrit.cloudera.org:8080/14341
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I67eed008a983cd9ed528496777ffe3d30126a55f
Gerrit-Change-Number: 14341
Gerrit-PatchSet: 1
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>

[kudu-CR] test: deflake TestBackgroundFailureDuringMaintenanceMode

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Kudu Jenkins, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14341

to look at the new patch set (#2).

Change subject: test: deflake TestBackgroundFailureDuringMaintenanceMode
......................................................................

test: deflake TestBackgroundFailureDuringMaintenanceMode

We previously weren't accounting for the possibility of a slow
bootstrap, asserting early that we had the expected number of running
replicas.

This also fixes the check for 0 failed replicas. Really, we should have
been checking for explicit healthy/failed states, rather than "not
failed".

Without this, the test failed 6/1000 times in debug mode. With it, it
passed 4997/5000 times (failures due to a different issue).

Change-Id: I67eed008a983cd9ed528496777ffe3d30126a55f
---
M src/kudu/integration-tests/maintenance_mode-itest.cc
1 file changed, 38 insertions(+), 18 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/41/14341/2
-- 
To view, visit http://gerrit.cloudera.org:8080/14341
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I67eed008a983cd9ed528496777ffe3d30126a55f
Gerrit-Change-Number: 14341
Gerrit-PatchSet: 2
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] test: deflake TestBackgroundFailureDuringMaintenanceMode

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/14341 )

Change subject: test: deflake TestBackgroundFailureDuringMaintenanceMode
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14341/1/src/kudu/integration-tests/maintenance_mode-itest.cc
File src/kudu/integration-tests/maintenance_mode-itest.cc:

http://gerrit.cloudera.org:8080/#/c/14341/1/src/kudu/integration-tests/maintenance_mode-itest.cc@453
PS1, Line 453:   SleepFor(kDurationForSomeHeartbeats);
Do we still need this?



-- 
To view, visit http://gerrit.cloudera.org:8080/14341
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I67eed008a983cd9ed528496777ffe3d30126a55f
Gerrit-Change-Number: 14341
Gerrit-PatchSet: 1
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Wed, 02 Oct 2019 03:56:49 +0000
Gerrit-HasComments: Yes

[kudu-CR] test: deflake TestBackgroundFailureDuringMaintenanceMode

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/14341 )

Change subject: test: deflake TestBackgroundFailureDuringMaintenanceMode
......................................................................

test: deflake TestBackgroundFailureDuringMaintenanceMode

We previously weren't accounting for the possibility of a slow
bootstrap, asserting early that we had the expected number of running
replicas.

This also fixes the check for 0 failed replicas. Really, we should have
been checking for explicit healthy/failed states, rather than "not
failed".

Without this, the test failed 6/1000 times in debug mode. With it, it
passed 4997/5000 times (failures due to a different issue).

Change-Id: I67eed008a983cd9ed528496777ffe3d30126a55f
Reviewed-on: http://gerrit.cloudera.org:8080/14341
Reviewed-by: Adar Dembo <ad...@cloudera.com>
Tested-by: Andrew Wong <aw...@cloudera.com>
Reviewed-by: Alexey Serbin <as...@cloudera.com>
---
M src/kudu/integration-tests/maintenance_mode-itest.cc
1 file changed, 38 insertions(+), 18 deletions(-)

Approvals:
  Adar Dembo: Looks good to me, but someone else must approve
  Andrew Wong: Verified
  Alexey Serbin: Looks good to me, approved

-- 
To view, visit http://gerrit.cloudera.org:8080/14341
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I67eed008a983cd9ed528496777ffe3d30126a55f
Gerrit-Change-Number: 14341
Gerrit-PatchSet: 3
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] test: deflake TestBackgroundFailureDuringMaintenanceMode

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/14341 )

Change subject: test: deflake TestBackgroundFailureDuringMaintenanceMode
......................................................................


Patch Set 2: Verified+1

Failures are due to NTP errors.


-- 
To view, visit http://gerrit.cloudera.org:8080/14341
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I67eed008a983cd9ed528496777ffe3d30126a55f
Gerrit-Change-Number: 14341
Gerrit-PatchSet: 2
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Wed, 02 Oct 2019 05:05:06 +0000
Gerrit-HasComments: No

[kudu-CR] test: deflake TestBackgroundFailureDuringMaintenanceMode

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/14341 )

Change subject: test: deflake TestBackgroundFailureDuringMaintenanceMode
......................................................................


Patch Set 2: Code-Review+1


-- 
To view, visit http://gerrit.cloudera.org:8080/14341
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I67eed008a983cd9ed528496777ffe3d30126a55f
Gerrit-Change-Number: 14341
Gerrit-PatchSet: 2
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Wed, 02 Oct 2019 04:56:44 +0000
Gerrit-HasComments: No

[kudu-CR] test: deflake TestBackgroundFailureDuringMaintenanceMode

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/14341 )

Change subject: test: deflake TestBackgroundFailureDuringMaintenanceMode
......................................................................


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/14341/1/src/kudu/integration-tests/maintenance_mode-itest.cc
File src/kudu/integration-tests/maintenance_mode-itest.cc:

http://gerrit.cloudera.org:8080/#/c/14341/1/src/kudu/integration-tests/maintenance_mode-itest.cc@a438
PS1, Line 438: 
> I'm curious why this was not failing all the time.  Just before evaluating 
It was because AssertEventuallyNumFailedReplicas(0) only checked for failed tablets, rather than healthy tablets. So if there were a leader election, which I guess was pretty likely over the course of thirty seconds given the low timeout, we'd refresh the health states and see UNKNOWN instead of FAILED.

I've updated the method to be more explicit.


http://gerrit.cloudera.org:8080/#/c/14341/1/src/kudu/integration-tests/maintenance_mode-itest.cc@453
PS1, Line 453:   });
> Do we still need this?
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/14341
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I67eed008a983cd9ed528496777ffe3d30126a55f
Gerrit-Change-Number: 14341
Gerrit-PatchSet: 2
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Wed, 02 Oct 2019 04:39:24 +0000
Gerrit-HasComments: Yes

[kudu-CR] test: deflake TestBackgroundFailureDuringMaintenanceMode

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/14341 )

Change subject: test: deflake TestBackgroundFailureDuringMaintenanceMode
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14341/1/src/kudu/integration-tests/maintenance_mode-itest.cc
File src/kudu/integration-tests/maintenance_mode-itest.cc:

http://gerrit.cloudera.org:8080/#/c/14341/1/src/kudu/integration-tests/maintenance_mode-itest.cc@a438
PS1, Line 438: 
I'm curious why this was not failing all the time.  Just before evaluating this condition the test has been waiting for some time to see tablet copying started, and that means even master has received information on failed replicas.



-- 
To view, visit http://gerrit.cloudera.org:8080/14341
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I67eed008a983cd9ed528496777ffe3d30126a55f
Gerrit-Change-Number: 14341
Gerrit-PatchSet: 1
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Wed, 02 Oct 2019 02:35:52 +0000
Gerrit-HasComments: Yes

[kudu-CR] test: deflake TestBackgroundFailureDuringMaintenanceMode

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has removed a vote on this change.

Change subject: test: deflake TestBackgroundFailureDuringMaintenanceMode
......................................................................


Removed Verified-1 by Kudu Jenkins (120)
-- 
To view, visit http://gerrit.cloudera.org:8080/14341
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: I67eed008a983cd9ed528496777ffe3d30126a55f
Gerrit-Change-Number: 14341
Gerrit-PatchSet: 2
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] test: deflake TestBackgroundFailureDuringMaintenanceMode

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/14341 )

Change subject: test: deflake TestBackgroundFailureDuringMaintenanceMode
......................................................................


Patch Set 2: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14341/1/src/kudu/integration-tests/maintenance_mode-itest.cc
File src/kudu/integration-tests/maintenance_mode-itest.cc:

http://gerrit.cloudera.org:8080/#/c/14341/1/src/kudu/integration-tests/maintenance_mode-itest.cc@a438
PS1, Line 438: 
> It was because AssertEventuallyNumFailedReplicas(0) only checked for failed
Ah, I see.  Thank you for making it more robust!



-- 
To view, visit http://gerrit.cloudera.org:8080/14341
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I67eed008a983cd9ed528496777ffe3d30126a55f
Gerrit-Change-Number: 14341
Gerrit-PatchSet: 2
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Wed, 02 Oct 2019 06:30:29 +0000
Gerrit-HasComments: Yes