You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Alexey Serbin (Code Review)" <ge...@cloudera.org> on 2022/08/11 23:47:02 UTC

[kudu-CR] [tests] fix flakiness in TestTabletCopyEncryptedServers

Alexey Serbin has uploaded this change for review. ( http://gerrit.cloudera.org:8080/18842


Change subject: [tests] fix flakiness in TestTabletCopyEncryptedServers
......................................................................

[tests] fix flakiness in TestTabletCopyEncryptedServers

The TabletCopyITest.TestTabletCopyEncryptedServers scenario deletes
a tablet, and then checks to see that the tablet data state is
TABLET_DATA_COPYING.  However, it's possible for the remote bootstrap
to complete so quickly that it's already TABLET_DATA_READY at the time
of sampling, so from time to time the test failed with

  src/kudu/integration-tests/tablet_copy-itest.cc:1014: Failure
  Failed
  Bad status: Timed out: Timed out after 30.002s waiting for correct tablet state: Illegal state: State TABLET_DATA_READY unexpected, expected TABLET_DATA_COPYING

This patch updates the assertion to allow both the COPYING and READY
tablet data states.

Without the patch, the test was about 7% flaky [1]. With the patch,
it's not flaky [2].

[1] http://dist-test.cloudera.org/job?job_id=aserbin.1660260668.94650
[2] http://dist-test.cloudera.org/job?job_id=aserbin.1660261249.109365

Change-Id: I22933cc9cb727711ee5fb45c811c2a759958fdfa
---
M src/kudu/integration-tests/tablet_copy-itest.cc
1 file changed, 7 insertions(+), 4 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/42/18842/1
-- 
To view, visit http://gerrit.cloudera.org:8080/18842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I22933cc9cb727711ee5fb45c811c2a759958fdfa
Gerrit-Change-Number: 18842
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <al...@apache.org>

[kudu-CR] [tests] fix flakiness in TestTabletCopyEncryptedServers

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/18842 )

Change subject: [tests] fix flakiness in TestTabletCopyEncryptedServers
......................................................................


Patch Set 1: Verified+1

unrelated dist-test failure (DEBUG):
  Could not submit C++ distributed test job


-- 
To view, visit http://gerrit.cloudera.org:8080/18842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I22933cc9cb727711ee5fb45c811c2a759958fdfa
Gerrit-Change-Number: 18842
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yingchun Lai <ac...@gmail.com>
Gerrit-Comment-Date: Sat, 13 Aug 2022 01:56:28 +0000
Gerrit-HasComments: No

[kudu-CR] [tests] fix flakiness in TestTabletCopyEncryptedServers

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/18842 )

Change subject: [tests] fix flakiness in TestTabletCopyEncryptedServers
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18842/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18842/1//COMMIT_MSG@19
PS1, Line 19: This patch updates the assertion to allow both the COPYING and READY
> couldn't we inject latency instead?
I didn't try that option yet: just found that similar flakiness was fixed this way some time ago, so I though I'd simply use the same approach: https://github.com/apache/kudu/commit/54839984932bca0c0ba49cdd8fa199a5711e589e

Let me know if you think injecting latency is the preferred way.



-- 
To view, visit http://gerrit.cloudera.org:8080/18842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I22933cc9cb727711ee5fb45c811c2a759958fdfa
Gerrit-Change-Number: 18842
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yingchun Lai <ac...@gmail.com>
Gerrit-Comment-Date: Sat, 20 Aug 2022 02:58:46 +0000
Gerrit-HasComments: Yes

[kudu-CR] [tests] fix flakiness in TestTabletCopyEncryptedServers

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has removed a vote on this change.

Change subject: [tests] fix flakiness in TestTabletCopyEncryptedServers
......................................................................


Removed Verified-1 by Kudu Jenkins (120)
-- 
To view, visit http://gerrit.cloudera.org:8080/18842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: I22933cc9cb727711ee5fb45c811c2a759958fdfa
Gerrit-Change-Number: 18842
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yingchun Lai <ac...@gmail.com>

[kudu-CR] [tests] fix flakiness in TestTabletCopyEncryptedServers

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/18842 )

Change subject: [tests] fix flakiness in TestTabletCopyEncryptedServers
......................................................................

[tests] fix flakiness in TestTabletCopyEncryptedServers

The TabletCopyITest.TestTabletCopyEncryptedServers scenario deletes
a tablet, and then checks to see that the tablet data state is
TABLET_DATA_COPYING.  However, it's possible for the remote bootstrap
to complete so quickly that it's already TABLET_DATA_READY at the time
of sampling, so from time to time the test failed with

  src/kudu/integration-tests/tablet_copy-itest.cc:1014: Failure
  Failed
  Bad status: Timed out: Timed out after 30.002s waiting for correct tablet state: Illegal state: State TABLET_DATA_READY unexpected, expected TABLET_DATA_COPYING

This patch updates the assertion to allow both the COPYING and READY
tablet data states.

Without the patch, the test was about 7% flaky [1]. With the patch,
it's not flaky [2].

[1] http://dist-test.cloudera.org/job?job_id=aserbin.1660260668.94650
[2] http://dist-test.cloudera.org/job?job_id=aserbin.1660261249.109365

Change-Id: I22933cc9cb727711ee5fb45c811c2a759958fdfa
Reviewed-on: http://gerrit.cloudera.org:8080/18842
Tested-by: Alexey Serbin <al...@apache.org>
Reviewed-by: Yingchun Lai <ac...@gmail.com>
Reviewed-by: Abhishek Chennaka <ac...@cloudera.com>
Reviewed-by: Attila Bukor <ab...@apache.org>
---
M src/kudu/integration-tests/tablet_copy-itest.cc
1 file changed, 7 insertions(+), 4 deletions(-)

Approvals:
  Alexey Serbin: Verified
  Yingchun Lai: Looks good to me, but someone else must approve
  Abhishek Chennaka: Looks good to me, but someone else must approve
  Attila Bukor: Looks good to me, approved

-- 
To view, visit http://gerrit.cloudera.org:8080/18842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I22933cc9cb727711ee5fb45c811c2a759958fdfa
Gerrit-Change-Number: 18842
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yingchun Lai <ac...@gmail.com>

[kudu-CR] [tests] fix flakiness in TestTabletCopyEncryptedServers

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Yingchun Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/18842 )

Change subject: [tests] fix flakiness in TestTabletCopyEncryptedServers
......................................................................


Patch Set 1: Code-Review+1


-- 
To view, visit http://gerrit.cloudera.org:8080/18842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I22933cc9cb727711ee5fb45c811c2a759958fdfa
Gerrit-Change-Number: 18842
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yingchun Lai <ac...@gmail.com>
Gerrit-Comment-Date: Sat, 13 Aug 2022 03:32:12 +0000
Gerrit-HasComments: No

[kudu-CR] [tests] fix flakiness in TestTabletCopyEncryptedServers

Posted by "Abhishek Chennaka (Code Review)" <ge...@cloudera.org>.
Abhishek Chennaka has posted comments on this change. ( http://gerrit.cloudera.org:8080/18842 )

Change subject: [tests] fix flakiness in TestTabletCopyEncryptedServers
......................................................................


Patch Set 1: Code-Review+1


-- 
To view, visit http://gerrit.cloudera.org:8080/18842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I22933cc9cb727711ee5fb45c811c2a759958fdfa
Gerrit-Change-Number: 18842
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yingchun Lai <ac...@gmail.com>
Gerrit-Comment-Date: Thu, 18 Aug 2022 03:03:56 +0000
Gerrit-HasComments: No

[kudu-CR] [tests] fix flakiness in TestTabletCopyEncryptedServers

Posted by "Attila Bukor (Code Review)" <ge...@cloudera.org>.
Attila Bukor has posted comments on this change. ( http://gerrit.cloudera.org:8080/18842 )

Change subject: [tests] fix flakiness in TestTabletCopyEncryptedServers
......................................................................


Patch Set 1: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18842/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18842/1//COMMIT_MSG@19
PS1, Line 19: This patch updates the assertion to allow both the COPYING and READY
> I didn't try that option yet: just found that similar flakiness was fixed t
I think we can go this way, it should be okay if we miss the copying, we still test what we mean to test here.



-- 
To view, visit http://gerrit.cloudera.org:8080/18842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I22933cc9cb727711ee5fb45c811c2a759958fdfa
Gerrit-Change-Number: 18842
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yingchun Lai <ac...@gmail.com>
Gerrit-Comment-Date: Tue, 23 Aug 2022 09:43:53 +0000
Gerrit-HasComments: Yes

[kudu-CR] [tests] fix flakiness in TestTabletCopyEncryptedServers

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/18842 )

Change subject: [tests] fix flakiness in TestTabletCopyEncryptedServers
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18842/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18842/1//COMMIT_MSG@19
PS1, Line 19: This patch updates the assertion to allow both the COPYING and READY
> I think we can go this way, it should be okay if we miss the copying, we st
Yep, that makes sense, thanks.

As an afterthought, I guess relying on the injected latency increases the runtime a bit and might still be prone to flakiness in case of scheduler anomalies, etc.



-- 
To view, visit http://gerrit.cloudera.org:8080/18842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I22933cc9cb727711ee5fb45c811c2a759958fdfa
Gerrit-Change-Number: 18842
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yingchun Lai <ac...@gmail.com>
Gerrit-Comment-Date: Tue, 23 Aug 2022 15:01:21 +0000
Gerrit-HasComments: Yes