You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Alexey Serbin (Code Review)" <ge...@cloudera.org> on 2017/11/03 01:06:12 UTC

[kudu-CR] [tls socket] workaround for TLS short read

Hello Kudu Jenkins, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8328

to look at the new patch set (#2).

Change subject: [tls_socket] workaround for TLS short read
......................................................................

[tls_socket] workaround for TLS short read

Added a workaround for TLS short read issue which appears on Linux
kernels 3.x+ (version of the OpenSSL library is not relevant).  The
issue does not appear on OS X and Linux kernels 2.x.

The workaround is simple: retry read/write operations in case if
SSL_{read,write} returns SSL_ERROR_WANT_{READ,WRITE} on a socket
in blocking I/O mode.

I haven't found the exact reason behind the issue, but so far it
seems that since kernel 3.0 the behavior of some system call
changed and now exibits interraptable behavior (like returning EINTR)
in some scenarios, where one of the scenarios is resuming the process
on SIGCONT after SIGSTOP.

The essense of the issue: since we set the SSL_MODE_AUTO_RETRY in the
TLS context and use blocking IO during negotiation, we should not expect
SSL_read() to return SSL_ERROR_WANT_READ:
  https://www.openssl.org/docs/man1.0.2/ssl/SSL_CTX_set_mode.html
  https://www.openssl.org/docs/man1.0.2/ssl/SSL_read.html

This changelist fixes the flakiness in
ClientStressTest.TestUniqueClientIds scenario.  Prior to this fix,
the test failure ratio observed with dist-test for TSAN builds was
about 6% in multiple 1K runs.  After the fix, no failures observed.

The test was failing with errors like the following:
  Bad status: IO error: Could not connect to the cluster: \
    Client connection negotiation failed: client connection to \
    IP:port: Read zero bytes on a blocking Recv() call: \
    Transferred 0 of 4 bytes

Change-Id: I72b2050d1aa683731faa02b2adb360d46cd0f94c
---
M src/kudu/security/tls_socket.cc
M src/kudu/security/tls_socket.h
M src/kudu/util/net/socket.h
3 files changed, 73 insertions(+), 28 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/28/8328/2
-- 
To view, visit http://gerrit.cloudera.org:8080/8328
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I72b2050d1aa683731faa02b2adb360d46cd0f94c
Gerrit-Change-Number: 8328
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>