You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Alexey Serbin (Code Review)" <ge...@cloudera.org> on 2017/11/20 20:14:35 UTC

[kudu-CR](branch-1.4.x) KUDU-2218. tls socket: properly handle temporary socket errors in Writev

Hello Kudu Jenkins,

I'd like you to do a code review. Please visit

    http://gerrit.cloudera.org:8080/8602

to review the following change.


Change subject: KUDU-2218. tls_socket: properly handle temporary socket errors in Writev
......................................................................

KUDU-2218. tls_socket: properly handle temporary socket errors in Writev

This fixes a bug which caused RaftConsensusITest.TestLargeBatches to
fail when run under stress, as in the following command line:

taskset -c 0-4 \
 build/latest/bin/raft_consensus-itest \
   --gtest_filter=\*LargeBat\* \
   --stress-cpu-threads=8

This would produce an error like:
Network error: failed to write to TLS socket: error:1409F07F:SSL routines:SSL3_WRITE_PENDING:bad write retry:s3_pkt.c:878

This means that we were retrying a write after getting EAGAIN, but with
a different buffer than the first time.

I tracked this down to mishandling of temporary socket errors in
TlsSocket::Writev(). In the case that we successfully write part of the
io vector but hit such an error trying to write a later element in the
vector, we were still propagating the error back up to the caller. The
caller didn't realize that part of the write was successful, and thus it
would retry the write from the beginning.

The fix is to fix the above, but also to enable partial writes in
TlsContext. The new test fails if either of the above two changes are
backed out.

Change-Id: If797f220f42bfb2e6f452b66f15e7a758e883472
Reviewed-on: http://gerrit.cloudera.org:8080/8570
Tested-by: Kudu Jenkins
Reviewed-by: Alexey Serbin <as...@cloudera.com>
(cherry picked from commit 64eb9f37b171419ed12a3795efe28faf2fd33b3d)
---
M src/kudu/security/tls_context.cc
M src/kudu/security/tls_socket-test.cc
M src/kudu/security/tls_socket.cc
3 files changed, 219 insertions(+), 70 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/02/8602/1
-- 
To view, visit http://gerrit.cloudera.org:8080/8602
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: branch-1.4.x
Gerrit-MessageType: newchange
Gerrit-Change-Id: If797f220f42bfb2e6f452b66f15e7a758e883472
Gerrit-Change-Number: 8602
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR](branch-1.4.x) KUDU-2218. tls socket: properly handle temporary socket errors in Writev

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/8602 )

Change subject: KUDU-2218. tls_socket: properly handle temporary socket errors in Writev
......................................................................


Patch Set 2: Verified+1

Known bug in encoder: https://issues.apache.org/jira/browse/KUDU-2119

That's fixed in Kudu 1.6


-- 
To view, visit http://gerrit.cloudera.org:8080/8602
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: branch-1.4.x
Gerrit-MessageType: comment
Gerrit-Change-Id: If797f220f42bfb2e6f452b66f15e7a758e883472
Gerrit-Change-Number: 8602
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Mon, 20 Nov 2017 23:54:02 +0000
Gerrit-HasComments: No

[kudu-CR](branch-1.4.x) KUDU-2218. tls socket: properly handle temporary socket errors in Writev

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/8602 )

Change subject: KUDU-2218. tls_socket: properly handle temporary socket errors in Writev
......................................................................

KUDU-2218. tls_socket: properly handle temporary socket errors in Writev

This fixes a bug which caused RaftConsensusITest.TestLargeBatches to
fail when run under stress, as in the following command line:

taskset -c 0-4 \
 build/latest/bin/raft_consensus-itest \
   --gtest_filter=\*LargeBat\* \
   --stress-cpu-threads=8

This would produce an error like:
Network error: failed to write to TLS socket: error:1409F07F:SSL routines:SSL3_WRITE_PENDING:bad write retry:s3_pkt.c:878

This means that we were retrying a write after getting EAGAIN, but with
a different buffer than the first time.

I tracked this down to mishandling of temporary socket errors in
TlsSocket::Writev(). In the case that we successfully write part of the
io vector but hit such an error trying to write a later element in the
vector, we were still propagating the error back up to the caller. The
caller didn't realize that part of the write was successful, and thus it
would retry the write from the beginning.

The fix is to fix the above, but also to enable partial writes in
TlsContext. The new test fails if either of the above two changes are
backed out.

Change-Id: If797f220f42bfb2e6f452b66f15e7a758e883472
Reviewed-on: http://gerrit.cloudera.org:8080/8570
Tested-by: Kudu Jenkins
Reviewed-by: Alexey Serbin <as...@cloudera.com>
(cherry picked from commit 64eb9f37b171419ed12a3795efe28faf2fd33b3d)
Reviewed-on: http://gerrit.cloudera.org:8080/8602
Tested-by: Alexey Serbin <as...@cloudera.com>
Reviewed-by: Todd Lipcon <to...@apache.org>
---
M src/kudu/security/tls_context.cc
M src/kudu/security/tls_socket-test.cc
M src/kudu/security/tls_socket.cc
3 files changed, 219 insertions(+), 70 deletions(-)

Approvals:
  Alexey Serbin: Verified
  Todd Lipcon: Looks good to me, approved

-- 
To view, visit http://gerrit.cloudera.org:8080/8602
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: branch-1.4.x
Gerrit-MessageType: merged
Gerrit-Change-Id: If797f220f42bfb2e6f452b66f15e7a758e883472
Gerrit-Change-Number: 8602
Gerrit-PatchSet: 3
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR](branch-1.4.x) KUDU-2218. tls socket: properly handle temporary socket errors in Writev

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has removed Kudu Jenkins from this change.  ( http://gerrit.cloudera.org:8080/8602 )

Change subject: KUDU-2218. tls_socket: properly handle temporary socket errors in Writev
......................................................................


Removed reviewer Kudu Jenkins with the following votes:

* Verified-1 by Kudu Jenkins (120)
-- 
To view, visit http://gerrit.cloudera.org:8080/8602
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: branch-1.4.x
Gerrit-MessageType: deleteReviewer
Gerrit-Change-Id: If797f220f42bfb2e6f452b66f15e7a758e883472
Gerrit-Change-Number: 8602
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR](branch-1.4.x) KUDU-2218. tls socket: properly handle temporary socket errors in Writev

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/8602 )

Change subject: KUDU-2218. tls_socket: properly handle temporary socket errors in Writev
......................................................................


Patch Set 2: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/8602
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: branch-1.4.x
Gerrit-MessageType: comment
Gerrit-Change-Id: If797f220f42bfb2e6f452b66f15e7a758e883472
Gerrit-Change-Number: 8602
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Wed, 22 Nov 2017 06:43:01 +0000
Gerrit-HasComments: No

[kudu-CR](branch-1.4.x) KUDU-2218. tls socket: properly handle temporary socket errors in Writev

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Jean-Daniel Cryans, Kudu Jenkins, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8602

to look at the new patch set (#2).

Change subject: KUDU-2218. tls_socket: properly handle temporary socket errors in Writev
......................................................................

KUDU-2218. tls_socket: properly handle temporary socket errors in Writev

This fixes a bug which caused RaftConsensusITest.TestLargeBatches to
fail when run under stress, as in the following command line:

taskset -c 0-4 \
 build/latest/bin/raft_consensus-itest \
   --gtest_filter=\*LargeBat\* \
   --stress-cpu-threads=8

This would produce an error like:
Network error: failed to write to TLS socket: error:1409F07F:SSL routines:SSL3_WRITE_PENDING:bad write retry:s3_pkt.c:878

This means that we were retrying a write after getting EAGAIN, but with
a different buffer than the first time.

I tracked this down to mishandling of temporary socket errors in
TlsSocket::Writev(). In the case that we successfully write part of the
io vector but hit such an error trying to write a later element in the
vector, we were still propagating the error back up to the caller. The
caller didn't realize that part of the write was successful, and thus it
would retry the write from the beginning.

The fix is to fix the above, but also to enable partial writes in
TlsContext. The new test fails if either of the above two changes are
backed out.

Change-Id: If797f220f42bfb2e6f452b66f15e7a758e883472
Reviewed-on: http://gerrit.cloudera.org:8080/8570
Tested-by: Kudu Jenkins
Reviewed-by: Alexey Serbin <as...@cloudera.com>
(cherry picked from commit 64eb9f37b171419ed12a3795efe28faf2fd33b3d)
---
M src/kudu/security/tls_context.cc
M src/kudu/security/tls_socket-test.cc
M src/kudu/security/tls_socket.cc
3 files changed, 219 insertions(+), 70 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/02/8602/2
-- 
To view, visit http://gerrit.cloudera.org:8080/8602
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: branch-1.4.x
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If797f220f42bfb2e6f452b66f15e7a758e883472
Gerrit-Change-Number: 8602
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR](branch-1.4.x) KUDU-2218. tls socket: properly handle temporary socket errors in Writev

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/8602 )

Change subject: KUDU-2218. tls_socket: properly handle temporary socket errors in Writev
......................................................................


Patch Set 2:

> Known bug in encoder: https://issues.apache.org/jira/browse/KUDU-2119
 > 
 > That's fixed in Kudu 1.6

Uh-oh, I mean that's unrelated flake in org.apache.kudu.client.TestKuduClient.testCloseShortlyAfterOpen


-- 
To view, visit http://gerrit.cloudera.org:8080/8602
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: branch-1.4.x
Gerrit-MessageType: comment
Gerrit-Change-Id: If797f220f42bfb2e6f452b66f15e7a758e883472
Gerrit-Change-Number: 8602
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Mon, 20 Nov 2017 23:57:29 +0000
Gerrit-HasComments: No