You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Will Berkeley (Code Review)" <ge...@cloudera.org> on 2019/03/14 23:06:03 UTC

[kudu-CR] KUDU-2576: TlsSocketTest.TestRecvFailure is flaky

Will Berkeley has uploaded this change for review. ( http://gerrit.cloudera.org:8080/12758


Change subject: KUDU-2576: TlsSocketTest.TestRecvFailure is flaky
......................................................................

KUDU-2576: TlsSocketTest.TestRecvFailure is flaky

TestRecvFailure wanted the an interleaving of client (main test) thread and
server thread, or something roughly like it:

Server: enters echo (recv -> write) loop
Server: blocking recv
Client: stop the server
Client: blocking write to match server's blocking recv
Server: blocking write
Client: blocking recv to match server's blocking write
Client: blocking recv
Server: exits echo loop and closes connection because it was stopped
Client: blocking recv fails because connection is closed

A sleep was used in the client thread to try to ensure that the server
reached the blocking recv call before the client shut the server down.
However, under TSAN, occasionally the client was able to stop the server
before the server entered the echo loop, so the server closed the
connection before any data could be sent, failing the test.

This patch changes moves the client call to stop the server to after the
first recv-write succeeds, guaranteeing the server is in the echo loop.

Without this patch, I saw 10/2000 runs fail in TSAN with 8 stress
threads. 8 were due to KUDU-2576. With this patch, I saw 4/2000, all of
which were due to a different issue that will be addressed in a
follow-up.

Change-Id: If95576ddc9e1e23f2db904d5b22bc3b9c1522ea4
---
M src/kudu/security/tls_socket-test.cc
1 file changed, 39 insertions(+), 28 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/58/12758/1
-- 
To view, visit http://gerrit.cloudera.org:8080/12758
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: If95576ddc9e1e23f2db904d5b22bc3b9c1522ea4
Gerrit-Change-Number: 12758
Gerrit-PatchSet: 1
Gerrit-Owner: Will Berkeley <wd...@gmail.com>

[kudu-CR] KUDU-2576: TlsSocketTest.TestRecvFailure is flaky

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/12758 )

Change subject: KUDU-2576: TlsSocketTest.TestRecvFailure is flaky
......................................................................


Patch Set 2: Verified+1

Unrelated failure is KUDU-2658.


-- 
To view, visit http://gerrit.cloudera.org:8080/12758
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If95576ddc9e1e23f2db904d5b22bc3b9c1522ea4
Gerrit-Change-Number: 12758
Gerrit-PatchSet: 2
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Thu, 14 Mar 2019 23:50:46 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-2576: TlsSocketTest.TestRecvFailure is flaky

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Kudu Jenkins, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/12758

to look at the new patch set (#2).

Change subject: KUDU-2576: TlsSocketTest.TestRecvFailure is flaky
......................................................................

KUDU-2576: TlsSocketTest.TestRecvFailure is flaky

TestRecvFailure wanted an interleaving of client thread and server thread
similar to the following:

Server: enters echo (recv -> write) loop
Server: blocking recv
Client: stop the server
Client: blocking write to match server's blocking recv
Server: blocking write
Client: blocking recv to match server's blocking write
Client: blocking recv
Server: exits echo loop and closes connection because it was stopped
Client: blocking recv fails because connection is closed

A sleep was used in the client thread to try to ensure that the server
reached the blocking recv call before the client shut the server down.
However, under TSAN, occasionally the client was able to stop the server
before the server entered the echo loop, so the server closed the
connection before any data could be sent, failing the test.

This patch moves the client call to stop the server to after the first
recv-write succeeds, guaranteeing the server is in the echo loop.

Without this patch, I saw 10/2000 runs fail in TSAN with 8 stress
threads. 8 were due to KUDU-2576. With this patch, I saw 4/2000, all of
which were due to a different issue that will be addressed in a
follow-up.

Change-Id: If95576ddc9e1e23f2db904d5b22bc3b9c1522ea4
---
M src/kudu/security/tls_socket-test.cc
1 file changed, 39 insertions(+), 28 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/58/12758/2
-- 
To view, visit http://gerrit.cloudera.org:8080/12758
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If95576ddc9e1e23f2db904d5b22bc3b9c1522ea4
Gerrit-Change-Number: 12758
Gerrit-PatchSet: 2
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] KUDU-2576: TlsSocketTest.TestRecvFailure is flaky

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has removed a vote on this change.

Change subject: KUDU-2576: TlsSocketTest.TestRecvFailure is flaky
......................................................................


Removed Verified-1 by Kudu Jenkins (120)
-- 
To view, visit http://gerrit.cloudera.org:8080/12758
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: If95576ddc9e1e23f2db904d5b22bc3b9c1522ea4
Gerrit-Change-Number: 12758
Gerrit-PatchSet: 2
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] KUDU-2576: TlsSocketTest.TestRecvFailure is flaky

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/12758 )

Change subject: KUDU-2576: TlsSocketTest.TestRecvFailure is flaky
......................................................................


Patch Set 2: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/12758
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If95576ddc9e1e23f2db904d5b22bc3b9c1522ea4
Gerrit-Change-Number: 12758
Gerrit-PatchSet: 2
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Thu, 14 Mar 2019 23:19:38 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-2576: TlsSocketTest.TestRecvFailure is flaky

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/12758 )

Change subject: KUDU-2576: TlsSocketTest.TestRecvFailure is flaky
......................................................................


Patch Set 2: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/12758
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If95576ddc9e1e23f2db904d5b22bc3b9c1522ea4
Gerrit-Change-Number: 12758
Gerrit-PatchSet: 2
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Thu, 14 Mar 2019 23:16:31 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-2576: TlsSocketTest.TestRecvFailure is flaky

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/12758 )

Change subject: KUDU-2576: TlsSocketTest.TestRecvFailure is flaky
......................................................................

KUDU-2576: TlsSocketTest.TestRecvFailure is flaky

TestRecvFailure wanted an interleaving of client thread and server thread
similar to the following:

Server: enters echo (recv -> write) loop
Server: blocking recv
Client: stop the server
Client: blocking write to match server's blocking recv
Server: blocking write
Client: blocking recv to match server's blocking write
Client: blocking recv
Server: exits echo loop and closes connection because it was stopped
Client: blocking recv fails because connection is closed

A sleep was used in the client thread to try to ensure that the server
reached the blocking recv call before the client shut the server down.
However, under TSAN, occasionally the client was able to stop the server
before the server entered the echo loop, so the server closed the
connection before any data could be sent, failing the test.

This patch moves the client call to stop the server to after the first
recv-write succeeds, guaranteeing the server is in the echo loop.

Without this patch, I saw 10/2000 runs fail in TSAN with 8 stress
threads. 8 were due to KUDU-2576. With this patch, I saw 4/2000, all of
which were due to a different issue that will be addressed in a
follow-up.

Change-Id: If95576ddc9e1e23f2db904d5b22bc3b9c1522ea4
Reviewed-on: http://gerrit.cloudera.org:8080/12758
Reviewed-by: Adar Dembo <ad...@cloudera.com>
Reviewed-by: Alexey Serbin <as...@cloudera.com>
Tested-by: Will Berkeley <wd...@gmail.com>
---
M src/kudu/security/tls_socket-test.cc
1 file changed, 39 insertions(+), 28 deletions(-)

Approvals:
  Adar Dembo: Looks good to me, approved
  Alexey Serbin: Looks good to me, approved
  Will Berkeley: Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/12758
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: If95576ddc9e1e23f2db904d5b22bc3b9c1522ea4
Gerrit-Change-Number: 12758
Gerrit-PatchSet: 3
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>