You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Alexey Serbin (Code Review)" <ge...@cloudera.org> on 2018/03/20 03:12:33 UTC

[kudu-CR] [delete table-itest] fix flake in TestUnknownTabletsAreNotDeleted

Alexey Serbin has uploaded this change for review. ( http://gerrit.cloudera.org:8080/9722


Change subject: [delete_table-itest] fix flake in TestUnknownTabletsAreNotDeleted
......................................................................

[delete_table-itest] fix flake in TestUnknownTabletsAreNotDeleted

Fixed flake in the DeleteTableITest.TestUnknownTabletsAreNotDeleted
scenario which was easily reproducible under OS X: every second
run of the test would fail.

In the essence, the flakiness would be more noticeable if the
master->TS heartbeat interval was longer or the RPC authentication
were set to "required" (it's "optional" by default). However, with 10ms
heartbeat interval in the tserver is able to get a response from the
restarted master before the master re-generated its IPKI records.

Change-Id: Ib6fd439c0ef5fb66b752f7f49175e4c2d818412e
---
M src/kudu/integration-tests/delete_table-itest.cc
1 file changed, 30 insertions(+), 18 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/22/9722/1
-- 
To view, visit http://gerrit.cloudera.org:8080/9722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ib6fd439c0ef5fb66b752f7f49175e4c2d818412e
Gerrit-Change-Number: 9722
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>

[kudu-CR] [delete table-itest] fix flake in TestUnknownTabletsAreNotDeleted

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/9722 )

Change subject: [delete_table-itest] fix flake in TestUnknownTabletsAreNotDeleted
......................................................................


Patch Set 3: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9722/1/src/kudu/integration-tests/delete_table-itest.cc
File src/kudu/integration-tests/delete_table-itest.cc:

http://gerrit.cloudera.org:8080/#/c/9722/1/src/kudu/integration-tests/delete_table-itest.cc@1104
PS1, Line 1104: certificate.
> Because that's how connection negotiation works in case of --rpc-authentica
OK, that makes sense according to the diagram in docs/rpc.md.



-- 
To view, visit http://gerrit.cloudera.org:8080/9722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib6fd439c0ef5fb66b752f7f49175e4c2d818412e
Gerrit-Change-Number: 9722
Gerrit-PatchSet: 3
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Tue, 20 Mar 2018 17:06:19 +0000
Gerrit-HasComments: Yes

[kudu-CR] [delete table-itest] fix flake in TestUnknownTabletsAreNotDeleted

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Will Berkeley, Kudu Jenkins, Adar Dembo, Grant Henke, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/9722

to look at the new patch set (#3).

Change subject: [delete_table-itest] fix flake in TestUnknownTabletsAreNotDeleted
......................................................................

[delete_table-itest] fix flake in TestUnknownTabletsAreNotDeleted

Fixed flake in the DeleteTableITest.TestUnknownTabletsAreNotDeleted.
The flake was easily reproducible under macOS.

The scenario involves removing master's data directory along with the
IPKI information.  Once the master re-generates its IPKI system records
and starts using the new TLS server certificate signed by the newly
generated CA private key, the tserver fails to verify the new master's
server certificate using the old CA certificate.

With the RPC authentication set to "optional" and 10ms tserver->master
heartbeat interval, the tserver in most cases was able to establish
a connection to the restarted master before it re-generates its IPKI
records, so no TLS certificate was used for authentication.

Change-Id: Ib6fd439c0ef5fb66b752f7f49175e4c2d818412e
---
M src/kudu/integration-tests/delete_table-itest.cc
1 file changed, 30 insertions(+), 18 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/22/9722/3
-- 
To view, visit http://gerrit.cloudera.org:8080/9722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib6fd439c0ef5fb66b752f7f49175e4c2d818412e
Gerrit-Change-Number: 9722
Gerrit-PatchSet: 3
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] [delete table-itest] fix flake in TestUnknownTabletsAreNotDeleted

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/9722 )

Change subject: [delete_table-itest] fix flake in TestUnknownTabletsAreNotDeleted
......................................................................


Patch Set 1:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/9722/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/9722/1//COMMIT_MSG@10
PS1, Line 10: OS X
nit: s/OS X/ macOS/g


http://gerrit.cloudera.org:8080/#/c/9722/1//COMMIT_MSG@10
PS1, Line 10: every second
            : run of the test would fail
Wait, was the flake that it would fail every second run? Or you mean that's about how often we observed it fail on macOS?


http://gerrit.cloudera.org:8080/#/c/9722/1//COMMIT_MSG@13
PS1, Line 13: the
nit: remove


http://gerrit.cloudera.org:8080/#/c/9722/1//COMMIT_MSG@15
PS1, Line 15: However, with 10ms
            : heartbeat interval in the tserver is able to get a response from the
            : restarted master before the master re-generated its IPKI records.
I'm having trouble parsing this sentence. I think you mean "However, with a 10ms heartbeat interval, the tserver is able to get a response from the restarted master before the master regenerates its IPKI records." Is that right? If so, would you mind editing the sentence to , and also explaining how this causes the test to fail? Actually, on my machine, I saw it get stuck in a tight loop, repeatedly failing negotiation; presumably that's due to the 10ms heartbeat interval.


http://gerrit.cloudera.org:8080/#/c/9722/1/src/kudu/integration-tests/delete_table-itest.cc
File src/kudu/integration-tests/delete_table-itest.cc:

http://gerrit.cloudera.org:8080/#/c/9722/1/src/kudu/integration-tests/delete_table-itest.cc@1097
PS1, Line 1097: NOTE on disabled RPC authentication and encryption:
Ah ok, I understand the issue based on this comment. Could you add an abbreviated version of this explanation to the commit message? I think it would be helpful to have that in the git log.


http://gerrit.cloudera.org:8080/#/c/9722/1/src/kudu/integration-tests/delete_table-itest.cc@1104
PS1, Line 1104: CA private key.
If the tserver does heartbeat before the master regenerates the IPKI stuff, why does the test pass? In that case, does the tablet server count as registered? Shouldn't we refuse to register in that situation?



-- 
To view, visit http://gerrit.cloudera.org:8080/9722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib6fd439c0ef5fb66b752f7f49175e4c2d818412e
Gerrit-Change-Number: 9722
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Tue, 20 Mar 2018 06:36:10 +0000
Gerrit-HasComments: Yes

[kudu-CR] [delete table-itest] fix flake in TestUnknownTabletsAreNotDeleted

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Will Berkeley, Kudu Jenkins, Adar Dembo, Grant Henke, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/9722

to look at the new patch set (#2).

Change subject: [delete_table-itest] fix flake in TestUnknownTabletsAreNotDeleted
......................................................................

[delete_table-itest] fix flake in TestUnknownTabletsAreNotDeleted

Fixed flake in the DeleteTableITest.TestUnknownTabletsAreNotDeleted.
The flake was easily reproducible under macOS.

The scenario involves removing master's data directory along with the
IPKI information.  Once the master re-generates its IPKI system records
and starts using the new TLS server certificate signed by the newly
generated CA private key, the tserver fails to verify the new master's
server certificate using the old CA certificate.

The scenario always fails if the RPC authentication is set to
"required".  With the RPC authentication set to "optional" and 10ms
tserver->master heartbeat interval, the tserver is able to establish
a connection to the restarted master before it re-generates its IPKI
records, so no TLS certificate is used for authentication.

Change-Id: Ib6fd439c0ef5fb66b752f7f49175e4c2d818412e
---
M src/kudu/integration-tests/delete_table-itest.cc
1 file changed, 29 insertions(+), 18 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/22/9722/2
-- 
To view, visit http://gerrit.cloudera.org:8080/9722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib6fd439c0ef5fb66b752f7f49175e4c2d818412e
Gerrit-Change-Number: 9722
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] [delete table-itest] fix flake in TestUnknownTabletsAreNotDeleted

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/9722 )

Change subject: [delete_table-itest] fix flake in TestUnknownTabletsAreNotDeleted
......................................................................


Patch Set 1:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/9722/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/9722/1//COMMIT_MSG@10
PS1, Line 10: OS X
> nit: s/OS X/ macOS/g
Done


http://gerrit.cloudera.org:8080/#/c/9722/1//COMMIT_MSG@10
PS1, Line 10: every second
            : run of the test would fail
> Wait, was the flake that it would fail every second run? Or you mean that's
I meant it was failing in 50% of all runs.  I removed this useless information.


http://gerrit.cloudera.org:8080/#/c/9722/1//COMMIT_MSG@13
PS1, Line 13: the
> nit: remove
Done


http://gerrit.cloudera.org:8080/#/c/9722/1//COMMIT_MSG@15
PS1, Line 15: However, with 10ms
            : heartbeat interval in the tserver is able to get a response from the
            : restarted master before the master re-generated its IPKI records.
> I'm having trouble parsing this sentence. I think you mean "However, with a
I added the comment from the test.  It should be clearer now.


http://gerrit.cloudera.org:8080/#/c/9722/1/src/kudu/integration-tests/delete_table-itest.cc
File src/kudu/integration-tests/delete_table-itest.cc:

http://gerrit.cloudera.org:8080/#/c/9722/1/src/kudu/integration-tests/delete_table-itest.cc@1097
PS1, Line 1097: NOTE on disabled RPC authentication and encryption:
> Ah ok, I understand the issue based on this comment. Could you add an abbre
Done


http://gerrit.cloudera.org:8080/#/c/9722/1/src/kudu/integration-tests/delete_table-itest.cc@1104
PS1, Line 1104: CA private key.
> If the tserver does heartbeat before the master regenerates the IPKI stuff,
Because that's how connection negotiation works in case of --rpc-authentication=optional and --rpc-encryption=optional: the client and the server negotiates on the common 'maximum'  to have the most secure connection, and CA-signed cert is preferred over other means of authentication.

Once connection is established, all RPC calls are considered legit if they pass the blanket-style authz checks.  So, we cannot and should not refuse to register in that situation.  If it's necessary to secure the cluster, just set --rpc-authentication=required and --rpc-encryption=required for masters and tablet servers.



-- 
To view, visit http://gerrit.cloudera.org:8080/9722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib6fd439c0ef5fb66b752f7f49175e4c2d818412e
Gerrit-Change-Number: 9722
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Tue, 20 Mar 2018 15:53:04 +0000
Gerrit-HasComments: Yes

[kudu-CR] [delete table-itest] fix flake in TestUnknownTabletsAreNotDeleted

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has removed Dan Burkert from this change.  ( http://gerrit.cloudera.org:8080/9722 )

Change subject: [delete_table-itest] fix flake in TestUnknownTabletsAreNotDeleted
......................................................................


Removed reviewer Dan Burkert.
-- 
To view, visit http://gerrit.cloudera.org:8080/9722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteReviewer
Gerrit-Change-Id: Ib6fd439c0ef5fb66b752f7f49175e4c2d818412e
Gerrit-Change-Number: 9722
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] [delete table-itest] fix flake in TestUnknownTabletsAreNotDeleted

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/9722 )

Change subject: [delete_table-itest] fix flake in TestUnknownTabletsAreNotDeleted
......................................................................

[delete_table-itest] fix flake in TestUnknownTabletsAreNotDeleted

Fixed flake in the DeleteTableITest.TestUnknownTabletsAreNotDeleted.
The flake was easily reproducible under macOS.

The scenario involves removing master's data directory along with the
IPKI information.  Once the master re-generates its IPKI system records
and starts using the new TLS server certificate signed by the newly
generated CA private key, the tserver fails to verify the new master's
server certificate using the old CA certificate.

With the RPC authentication set to "optional" and 10ms tserver->master
heartbeat interval, the tserver in most cases was able to establish
a connection to the restarted master before it re-generates its IPKI
records, so no TLS certificate was used for authentication.

Change-Id: Ib6fd439c0ef5fb66b752f7f49175e4c2d818412e
Reviewed-on: http://gerrit.cloudera.org:8080/9722
Tested-by: Kudu Jenkins
Reviewed-by: Will Berkeley <wd...@gmail.com>
---
M src/kudu/integration-tests/delete_table-itest.cc
1 file changed, 30 insertions(+), 18 deletions(-)

Approvals:
  Kudu Jenkins: Verified
  Will Berkeley: Looks good to me, approved

-- 
To view, visit http://gerrit.cloudera.org:8080/9722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ib6fd439c0ef5fb66b752f7f49175e4c2d818412e
Gerrit-Change-Number: 9722
Gerrit-PatchSet: 4
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>