You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Todd Lipcon (Code Review)" <ge...@cloudera.org> on 2017/04/04 21:37:55 UTC

[kudu-CR] KUDU-1964. security: avoid calling ERR clear error() defensively

Hello Dan Burkert, Henry Robinson, Alexey Serbin,

I'd like you to do a code review.  Please visit

    http://gerrit.cloudera.org:8080/6552

to review the following change.

Change subject: KUDU-1964. security: avoid calling ERR_clear_error() defensively
......................................................................

KUDU-1964. security: avoid calling ERR_clear_error() defensively

This changes our various code which wraps OpenSSL to avoid calling
ERR_clear_error() defensively. Instead, we now make sure to call
ERR_clear_error() after any case where we receive an error return value
from an OpenSSL library call.

For cases where we use the existing wrapper macros like
OPENSSL_RET_NOT_OK we are already ensured of this since the
GetOpenSSLErrors() call clears the error queue.

This provides a performance boost, since ERR_clear_error() ends up
taking several library-wide mutexes in OpenSSL 1.0.x (apparently
improved in OpenSSL 1.1, but that's not available on current OSes).

To ensure that we don't accidentally leave an OpenSSL error lying around
after any functions, I added a scoped helper which is active in debug
builds. This performs a DCHECK that the error queue is empty on scope
entry and exit.

To benchmark, I tested rpc-bench with encryption enabled using OpenSSL 1.0.1e
(RHEl6):

Master, SSL off:
----------------
I0404 12:49:04.811158  7250 rpc-bench.cc:84] Mode:            Async
I0404 12:49:04.811172  7250 rpc-bench.cc:88] Client reactors:  16
I0404 12:49:04.811175  7250 rpc-bench.cc:89] Call concurrency: 60
I0404 12:49:04.811178  7250 rpc-bench.cc:92] Worker threads:   1
I0404 12:49:04.811182  7250 rpc-bench.cc:93] Server reactors:  4
I0404 12:49:04.811183  7250 rpc-bench.cc:94] ----------------------------------
I0404 12:49:04.811187  7250 rpc-bench.cc:95] Reqs/sec:         446998
I0404 12:49:04.811193  7250 rpc-bench.cc:96] User CPU per req: 9.97575us
I0404 12:49:04.811197  7250 rpc-bench.cc:97] Sys CPU per req:  12.2342us
I0404 12:49:04.811202  7250 rpc-bench.cc:98] Ctx Sw. per req:  0.604032

Master, SSL on:
--------------
I0404 12:57:10.241720  9949 rpc-bench.cc:86] Mode:            Async
I0404 12:57:10.241736  9949 rpc-bench.cc:90] Client reactors:  16
I0404 12:57:10.241739  9949 rpc-bench.cc:91] Call concurrency: 60
I0404 12:57:10.241742  9949 rpc-bench.cc:94] Worker threads:   1
I0404 12:57:10.241745  9949 rpc-bench.cc:95] Server reactors:  4
I0404 12:57:10.241747  9949 rpc-bench.cc:96] Encryption:       1
I0404 12:57:10.241760  9949 rpc-bench.cc:97] ----------------------------------
I0404 12:57:10.241762  9949 rpc-bench.cc:98] Reqs/sec:         56761.3
I0404 12:57:10.241778  9949 rpc-bench.cc:99] User CPU per req: 39.7792us
I0404 12:57:10.241783  9949 rpc-bench.cc:100] Sys CPU per req:  106.916us
I0404 12:57:10.241786  9949 rpc-bench.cc:101] Ctx Sw. per req:  5.98631

Note the high number of context switches and system CPU. I gathered stack
traces of context switches using "perf record -g -e cs":

  Overhead       Samples    Command      Shared Object                         Symbol
  ........  ............  .........  .................  .............................

   100.00%        180979  rpc-bench  [kernel.kallsyms]  [k] perf_event_task_sched_out
            |
            --- perf_event_task_sched_out
                schedule
               |
               |--83.17%-- futex_wait_queue_me
               |          futex_wait
               |          do_futex
               |          sys_futex
               |          system_call_fastpath
               |          |
               |          |--83.11%-- __lll_lock_wait
               |          |          |
               |          |          |--42.33%-- int_thread_get
               |          |          |
               |          |          |--29.36%-- CRYPTO_add_lock
               |          |          |
               |          |          |--28.28%-- int_thread_get_item
               |          |           --0.03%-- [...]
               |          |
               |          |--16.89%-- pthread_cond_wait@@GLIBC_2.3.2
               |          |          |
               |          |          |--100.00%-- kudu::rpc::ServicePool::RunThread()
               |          |          |          kudu::Thread::SuperviseThread(void*)
               |          |          |          start_thread
               |          |           --0.00%-- [...]
               |           --0.01%-- [...]
               |
               |--16.80%-- schedule_hrtimeout_range
               |          |
               |          |--100.00%-- ep_poll
               |          |          sys_epoll_wait
               |          |          system_call_fastpath
               |          |          epoll_wait
               |          |          |
               |          |           --100.00%-- ev_run
               |          |                     kudu::rpc::ReactorThread::RunThread()
               |          |                     kudu::Thread::SuperviseThread(void*)
               |          |                     start_thread
               |           --0.00%-- [...]
                --0.03%-- [...]

With this patch
---------------
I0404 14:13:59.529119 27892 rpc-bench.cc:86] Mode:            Async
I0404 14:13:59.529134 27892 rpc-bench.cc:90] Client reactors:  16
I0404 14:13:59.529137 27892 rpc-bench.cc:91] Call concurrency: 60
I0404 14:13:59.529140 27892 rpc-bench.cc:94] Worker threads:   1
I0404 14:13:59.529142 27892 rpc-bench.cc:95] Server reactors:  4
I0404 14:13:59.529145 27892 rpc-bench.cc:96] Encryption:       1
I0404 14:13:59.529155 27892 rpc-bench.cc:97] ----------------------------------
I0404 14:13:59.529158 27892 rpc-bench.cc:98] Reqs/sec:         280819
I0404 14:13:59.529173 27892 rpc-bench.cc:99] User CPU per req: 18.2016us
I0404 14:13:59.529177 27892 rpc-bench.cc:100] Sys CPU per req:  19.9601us
I0404 14:13:59.529181 27892 rpc-bench.cc:101] Ctx Sw. per req:  1.20247
(about 5x improved throughput)

The SSL-related context switches no longer show up:

   100.00%        186653  rpc-bench  [kernel.kallsyms]  [k] perf_event_task_sched_out
            |
            --- perf_event_task_sched_out
                schedule
               |
               |--81.87%-- schedule_hrtimeout_range
               |          |
               |          |--100.00%-- ep_poll
               |          |          sys_epoll_wait
               |          |          system_call_fastpath
               |          |          epoll_wait
               |          |          |
               |          |           --100.00%-- ev_run
               |          |                     kudu::rpc::ReactorThread::RunThread()
               |          |                     kudu::Thread::SuperviseThread(void*)
               |          |                     start_thread
               |           --0.00%-- [...]
               |
               |--18.11%-- futex_wait_queue_me
               |          futex_wait
               |          do_futex
               |          sys_futex
               |          system_call_fastpath
               |          |
               |          |--98.68%-- pthread_cond_wait@@GLIBC_2.3.2
               |          |          |
               |          |          |--100.00%-- kudu::rpc::ServicePool::RunThread()
               |          |          |          kudu::Thread::SuperviseThread(void*)
               |          |          |          start_thread
               |          |           --0.00%-- [...]
               |          |
               |          |--1.02%-- base::internal::SpinLockDelay(int volatile*, int, int)
               |          |          |
               |          |          |--99.82%-- base::SpinLock::SlowLock()
               |          |          |          |
               |          |          |          |--98.09%-- kudu::rpc::LifoServiceQueue::BlockingGet(std::unique_ptr<kudu::rpc::InboundCall, std::default_
               |          |          |          |          kudu::rpc::ServicePool::RunThread()
               |          |          |          |          kudu::Thread::SuperviseThread(void*)
               |          |          |          |          start_thread
               |          |          |          |
               |          |          |           --1.91%-- kudu::rpc::LifoServiceQueue::Put(kudu::rpc::InboundCall*, boost::optional<kudu::rpc::InboundCal
               |          |          |                     kudu::rpc::ServicePool::QueueInboundCall(gscoped_ptr<kudu::rpc::InboundCall, kudu::DefaultDelet
               |          |          |                     kudu::rpc::Messenger::QueueInboundCall(gscoped_ptr<kudu::rpc::InboundCall, kudu::DefaultDeleter
               |          |          |                     kudu::rpc::Connection::HandleIncomingCall(gscoped_ptr<kudu::rpc::InboundTransfer, kudu::Default
               |          |          |                     void ev::base<ev_io, ev::io>::method_thunk<kudu::rpc::Connection, &(kudu::rpc::Connection::Read
               |          |          |                     ev_invoke_pending
               |          |          |                     ev_run
               |          |          |                     kudu::rpc::ReactorThread::RunThread()
               |          |          |                     kudu::Thread::SuperviseThread(void*)
               |          |          |                     start_thread
               |          |           --0.18%-- [...]
               |           --0.29%-- [...]
                --0.03%-- [...]

Change-Id: I3b4421f4aae4d0e5a2d938881f9eea4e07ff2b10
---
M src/kudu/security/openssl_util.cc
M src/kudu/security/openssl_util.h
M src/kudu/security/tls_context.cc
M src/kudu/security/tls_handshake.cc
M src/kudu/security/tls_socket.cc
5 files changed, 75 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/52/6552/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6552
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I3b4421f4aae4d0e5a2d938881f9eea4e07ff2b10
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>

[kudu-CR] KUDU-1964. security: avoid calling ERR clear error() defensively

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/6552

to look at the new patch set (#3).

Change subject: KUDU-1964. security: avoid calling ERR_clear_error() defensively
......................................................................

KUDU-1964. security: avoid calling ERR_clear_error() defensively

This changes our various code which wraps OpenSSL to avoid calling
ERR_clear_error() defensively. Instead, we now make sure to call
ERR_clear_error() after any case where we receive an error return value
from an OpenSSL library call.

For cases where we use the existing wrapper macros like
OPENSSL_RET_NOT_OK we are already ensured of this since the
GetOpenSSLErrors() call clears the error queue.

This provides a performance boost, since ERR_clear_error() ends up
taking several library-wide mutexes in OpenSSL 1.0.x (apparently
improved in OpenSSL 1.1, but that's not available on current OSes).

To ensure that we don't accidentally leave an OpenSSL error lying around
after any functions, I added a scoped helper which is active in debug
builds. This performs a DCHECK that the error queue is empty on scope
entry and exit.

To benchmark, I tested rpc-bench with encryption enabled using OpenSSL 1.0.1e
(RHEl6):

Master, SSL off:
----------------
I0404 12:49:04.811158  7250 rpc-bench.cc:84] Mode:            Async
I0404 12:49:04.811172  7250 rpc-bench.cc:88] Client reactors:  16
I0404 12:49:04.811175  7250 rpc-bench.cc:89] Call concurrency: 60
I0404 12:49:04.811178  7250 rpc-bench.cc:92] Worker threads:   1
I0404 12:49:04.811182  7250 rpc-bench.cc:93] Server reactors:  4
I0404 12:49:04.811183  7250 rpc-bench.cc:94] ----------------------------------
I0404 12:49:04.811187  7250 rpc-bench.cc:95] Reqs/sec:         446998
I0404 12:49:04.811193  7250 rpc-bench.cc:96] User CPU per req: 9.97575us
I0404 12:49:04.811197  7250 rpc-bench.cc:97] Sys CPU per req:  12.2342us
I0404 12:49:04.811202  7250 rpc-bench.cc:98] Ctx Sw. per req:  0.604032

Master, SSL on:
--------------
I0404 12:57:10.241720  9949 rpc-bench.cc:86] Mode:            Async
I0404 12:57:10.241736  9949 rpc-bench.cc:90] Client reactors:  16
I0404 12:57:10.241739  9949 rpc-bench.cc:91] Call concurrency: 60
I0404 12:57:10.241742  9949 rpc-bench.cc:94] Worker threads:   1
I0404 12:57:10.241745  9949 rpc-bench.cc:95] Server reactors:  4
I0404 12:57:10.241747  9949 rpc-bench.cc:96] Encryption:       1
I0404 12:57:10.241760  9949 rpc-bench.cc:97] ----------------------------------
I0404 12:57:10.241762  9949 rpc-bench.cc:98] Reqs/sec:         56761.3
I0404 12:57:10.241778  9949 rpc-bench.cc:99] User CPU per req: 39.7792us
I0404 12:57:10.241783  9949 rpc-bench.cc:100] Sys CPU per req:  106.916us
I0404 12:57:10.241786  9949 rpc-bench.cc:101] Ctx Sw. per req:  5.98631

Note the high number of context switches and system CPU. I gathered stack
traces of context switches using "perf record -g -e cs":

  Overhead       Samples    Command      Shared Object                         Symbol
  ........  ............  .........  .................  .............................

   100.00%        180979  rpc-bench  [kernel.kallsyms]  [k] perf_event_task_sched_out
            |
            --- perf_event_task_sched_out
                schedule
               |
               |--83.17%-- futex_wait_queue_me
               |          futex_wait
               |          do_futex
               |          sys_futex
               |          system_call_fastpath
               |          |
               |          |--83.11%-- __lll_lock_wait
               |          |          |
               |          |          |--42.33%-- int_thread_get
               |          |          |
               |          |          |--29.36%-- CRYPTO_add_lock
               |          |          |
               |          |          |--28.28%-- int_thread_get_item
               |          |           --0.03%-- [...]
               |          |
               |          |--16.89%-- pthread_cond_wait@@GLIBC_2.3.2
               |          |          |
               |          |          |--100.00%-- kudu::rpc::ServicePool::RunThread()
               |          |          |          kudu::Thread::SuperviseThread(void*)
               |          |          |          start_thread
               |          |           --0.00%-- [...]
               |           --0.01%-- [...]
               |
               |--16.80%-- schedule_hrtimeout_range
               |          |
               |          |--100.00%-- ep_poll
               |          |          sys_epoll_wait
               |          |          system_call_fastpath
               |          |          epoll_wait
               |          |          |
               |          |           --100.00%-- ev_run
               |          |                     kudu::rpc::ReactorThread::RunThread()
               |          |                     kudu::Thread::SuperviseThread(void*)
               |          |                     start_thread
               |           --0.00%-- [...]
                --0.03%-- [...]

With this patch
---------------
I0404 14:13:59.529119 27892 rpc-bench.cc:86] Mode:            Async
I0404 14:13:59.529134 27892 rpc-bench.cc:90] Client reactors:  16
I0404 14:13:59.529137 27892 rpc-bench.cc:91] Call concurrency: 60
I0404 14:13:59.529140 27892 rpc-bench.cc:94] Worker threads:   1
I0404 14:13:59.529142 27892 rpc-bench.cc:95] Server reactors:  4
I0404 14:13:59.529145 27892 rpc-bench.cc:96] Encryption:       1
I0404 14:13:59.529155 27892 rpc-bench.cc:97] ----------------------------------
I0404 14:13:59.529158 27892 rpc-bench.cc:98] Reqs/sec:         280819
I0404 14:13:59.529173 27892 rpc-bench.cc:99] User CPU per req: 18.2016us
I0404 14:13:59.529177 27892 rpc-bench.cc:100] Sys CPU per req:  19.9601us
I0404 14:13:59.529181 27892 rpc-bench.cc:101] Ctx Sw. per req:  1.20247
(about 5x improved throughput)

The SSL-related context switches no longer show up:

   100.00%        186653  rpc-bench  [kernel.kallsyms]  [k] perf_event_task_sched_out
            |
            --- perf_event_task_sched_out
                schedule
               |
               |--81.87%-- schedule_hrtimeout_range
               |          |
               |          |--100.00%-- ep_poll
               |          |          sys_epoll_wait
               |          |          system_call_fastpath
               |          |          epoll_wait
               |          |          |
               |          |           --100.00%-- ev_run
               |          |                     kudu::rpc::ReactorThread::RunThread()
               |          |                     kudu::Thread::SuperviseThread(void*)
               |          |                     start_thread
               |           --0.00%-- [...]
               |
               |--18.11%-- futex_wait_queue_me
               |          futex_wait
               |          do_futex
               |          sys_futex
               |          system_call_fastpath
               |          |
               |          |--98.68%-- pthread_cond_wait@@GLIBC_2.3.2
               |          |          |
               |          |          |--100.00%-- kudu::rpc::ServicePool::RunThread()
               |          |          |          kudu::Thread::SuperviseThread(void*)
               |          |          |          start_thread
               |          |           --0.00%-- [...]
               |          |
               |          |--1.02%-- base::internal::SpinLockDelay(int volatile*, int, int)
               |          |          |
               |          |          |--99.82%-- base::SpinLock::SlowLock()
               |          |          |          |
               |          |          |          |--98.09%-- kudu::rpc::LifoServiceQueue::BlockingGet(std::unique_ptr<kudu::rpc::InboundCall, std::default_
               |          |          |          |          kudu::rpc::ServicePool::RunThread()
               |          |          |          |          kudu::Thread::SuperviseThread(void*)
               |          |          |          |          start_thread
               |          |          |          |
               |          |          |           --1.91%-- kudu::rpc::LifoServiceQueue::Put(kudu::rpc::InboundCall*, boost::optional<kudu::rpc::InboundCal
               |          |          |                     kudu::rpc::ServicePool::QueueInboundCall(gscoped_ptr<kudu::rpc::InboundCall, kudu::DefaultDelet
               |          |          |                     kudu::rpc::Messenger::QueueInboundCall(gscoped_ptr<kudu::rpc::InboundCall, kudu::DefaultDeleter
               |          |          |                     kudu::rpc::Connection::HandleIncomingCall(gscoped_ptr<kudu::rpc::InboundTransfer, kudu::Default
               |          |          |                     void ev::base<ev_io, ev::io>::method_thunk<kudu::rpc::Connection, &(kudu::rpc::Connection::Read
               |          |          |                     ev_invoke_pending
               |          |          |                     ev_run
               |          |          |                     kudu::rpc::ReactorThread::RunThread()
               |          |          |                     kudu::Thread::SuperviseThread(void*)
               |          |          |                     start_thread
               |          |           --0.18%-- [...]
               |           --0.29%-- [...]
                --0.03%-- [...]

Change-Id: I3b4421f4aae4d0e5a2d938881f9eea4e07ff2b10
---
M src/kudu/security/ca/cert_management.cc
M src/kudu/security/crypto.cc
M src/kudu/security/openssl_util.cc
M src/kudu/security/openssl_util.h
M src/kudu/security/tls_context.cc
M src/kudu/security/tls_handshake.cc
M src/kudu/security/tls_socket.cc
7 files changed, 102 insertions(+), 14 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/52/6552/3
-- 
To view, visit http://gerrit.cloudera.org:8080/6552
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3b4421f4aae4d0e5a2d938881f9eea4e07ff2b10
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-1964. security: avoid calling ERR clear error() defensively

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has submitted this change and it was merged.

Change subject: KUDU-1964. security: avoid calling ERR_clear_error() defensively
......................................................................


KUDU-1964. security: avoid calling ERR_clear_error() defensively

This changes our various code which wraps OpenSSL to avoid calling
ERR_clear_error() defensively. Instead, we now make sure to call
ERR_clear_error() after any case where we receive an error return value
from an OpenSSL library call.

For cases where we use the existing wrapper macros like
OPENSSL_RET_NOT_OK we are already ensured of this since the
GetOpenSSLErrors() call clears the error queue.

This provides a performance boost, since ERR_clear_error() ends up
taking several library-wide mutexes in OpenSSL 1.0.x (apparently
improved in OpenSSL 1.1, but that's not available on current OSes).

To ensure that we don't accidentally leave an OpenSSL error lying around
after any functions, I added a scoped helper which is active in debug
builds. This performs a DCHECK that the error queue is empty on scope
entry and exit.

To benchmark, I tested rpc-bench with encryption enabled using OpenSSL 1.0.1e
(RHEl6):

Master, SSL off:
----------------
I0404 12:49:04.811158  7250 rpc-bench.cc:84] Mode:            Async
I0404 12:49:04.811172  7250 rpc-bench.cc:88] Client reactors:  16
I0404 12:49:04.811175  7250 rpc-bench.cc:89] Call concurrency: 60
I0404 12:49:04.811178  7250 rpc-bench.cc:92] Worker threads:   1
I0404 12:49:04.811182  7250 rpc-bench.cc:93] Server reactors:  4
I0404 12:49:04.811183  7250 rpc-bench.cc:94] ----------------------------------
I0404 12:49:04.811187  7250 rpc-bench.cc:95] Reqs/sec:         446998
I0404 12:49:04.811193  7250 rpc-bench.cc:96] User CPU per req: 9.97575us
I0404 12:49:04.811197  7250 rpc-bench.cc:97] Sys CPU per req:  12.2342us
I0404 12:49:04.811202  7250 rpc-bench.cc:98] Ctx Sw. per req:  0.604032

Master, SSL on:
--------------
I0404 12:57:10.241720  9949 rpc-bench.cc:86] Mode:            Async
I0404 12:57:10.241736  9949 rpc-bench.cc:90] Client reactors:  16
I0404 12:57:10.241739  9949 rpc-bench.cc:91] Call concurrency: 60
I0404 12:57:10.241742  9949 rpc-bench.cc:94] Worker threads:   1
I0404 12:57:10.241745  9949 rpc-bench.cc:95] Server reactors:  4
I0404 12:57:10.241747  9949 rpc-bench.cc:96] Encryption:       1
I0404 12:57:10.241760  9949 rpc-bench.cc:97] ----------------------------------
I0404 12:57:10.241762  9949 rpc-bench.cc:98] Reqs/sec:         56761.3
I0404 12:57:10.241778  9949 rpc-bench.cc:99] User CPU per req: 39.7792us
I0404 12:57:10.241783  9949 rpc-bench.cc:100] Sys CPU per req:  106.916us
I0404 12:57:10.241786  9949 rpc-bench.cc:101] Ctx Sw. per req:  5.98631

Note the high number of context switches and system CPU. I gathered stack
traces of context switches using "perf record -g -e cs":

  Overhead       Samples    Command      Shared Object                         Symbol
  ........  ............  .........  .................  .............................

   100.00%        180979  rpc-bench  [kernel.kallsyms]  [k] perf_event_task_sched_out
            |
            --- perf_event_task_sched_out
                schedule
               |
               |--83.17%-- futex_wait_queue_me
               |          futex_wait
               |          do_futex
               |          sys_futex
               |          system_call_fastpath
               |          |
               |          |--83.11%-- __lll_lock_wait
               |          |          |
               |          |          |--42.33%-- int_thread_get
               |          |          |
               |          |          |--29.36%-- CRYPTO_add_lock
               |          |          |
               |          |          |--28.28%-- int_thread_get_item
               |          |           --0.03%-- [...]
               |          |
               |          |--16.89%-- pthread_cond_wait@@GLIBC_2.3.2
               |          |          |
               |          |          |--100.00%-- kudu::rpc::ServicePool::RunThread()
               |          |          |          kudu::Thread::SuperviseThread(void*)
               |          |          |          start_thread
               |          |           --0.00%-- [...]
               |           --0.01%-- [...]
               |
               |--16.80%-- schedule_hrtimeout_range
               |          |
               |          |--100.00%-- ep_poll
               |          |          sys_epoll_wait
               |          |          system_call_fastpath
               |          |          epoll_wait
               |          |          |
               |          |           --100.00%-- ev_run
               |          |                     kudu::rpc::ReactorThread::RunThread()
               |          |                     kudu::Thread::SuperviseThread(void*)
               |          |                     start_thread
               |           --0.00%-- [...]
                --0.03%-- [...]

With this patch
---------------
I0404 14:13:59.529119 27892 rpc-bench.cc:86] Mode:            Async
I0404 14:13:59.529134 27892 rpc-bench.cc:90] Client reactors:  16
I0404 14:13:59.529137 27892 rpc-bench.cc:91] Call concurrency: 60
I0404 14:13:59.529140 27892 rpc-bench.cc:94] Worker threads:   1
I0404 14:13:59.529142 27892 rpc-bench.cc:95] Server reactors:  4
I0404 14:13:59.529145 27892 rpc-bench.cc:96] Encryption:       1
I0404 14:13:59.529155 27892 rpc-bench.cc:97] ----------------------------------
I0404 14:13:59.529158 27892 rpc-bench.cc:98] Reqs/sec:         280819
I0404 14:13:59.529173 27892 rpc-bench.cc:99] User CPU per req: 18.2016us
I0404 14:13:59.529177 27892 rpc-bench.cc:100] Sys CPU per req:  19.9601us
I0404 14:13:59.529181 27892 rpc-bench.cc:101] Ctx Sw. per req:  1.20247
(about 5x improved throughput)

The SSL-related context switches no longer show up:

   100.00%        186653  rpc-bench  [kernel.kallsyms]  [k] perf_event_task_sched_out
            |
            --- perf_event_task_sched_out
                schedule
               |
               |--81.87%-- schedule_hrtimeout_range
               |          |
               |          |--100.00%-- ep_poll
               |          |          sys_epoll_wait
               |          |          system_call_fastpath
               |          |          epoll_wait
               |          |          |
               |          |           --100.00%-- ev_run
               |          |                     kudu::rpc::ReactorThread::RunThread()
               |          |                     kudu::Thread::SuperviseThread(void*)
               |          |                     start_thread
               |           --0.00%-- [...]
               |
               |--18.11%-- futex_wait_queue_me
               |          futex_wait
               |          do_futex
               |          sys_futex
               |          system_call_fastpath
               |          |
               |          |--98.68%-- pthread_cond_wait@@GLIBC_2.3.2
               |          |          |
               |          |          |--100.00%-- kudu::rpc::ServicePool::RunThread()
               |          |          |          kudu::Thread::SuperviseThread(void*)
               |          |          |          start_thread
               |          |           --0.00%-- [...]
               |          |
               |          |--1.02%-- base::internal::SpinLockDelay(int volatile*, int, int)
               |          |          |
               |          |          |--99.82%-- base::SpinLock::SlowLock()
               |          |          |          |
               |          |          |          |--98.09%-- kudu::rpc::LifoServiceQueue::BlockingGet(std::unique_ptr<kudu::rpc::InboundCall, std::default_
               |          |          |          |          kudu::rpc::ServicePool::RunThread()
               |          |          |          |          kudu::Thread::SuperviseThread(void*)
               |          |          |          |          start_thread
               |          |          |          |
               |          |          |           --1.91%-- kudu::rpc::LifoServiceQueue::Put(kudu::rpc::InboundCall*, boost::optional<kudu::rpc::InboundCal
               |          |          |                     kudu::rpc::ServicePool::QueueInboundCall(gscoped_ptr<kudu::rpc::InboundCall, kudu::DefaultDelet
               |          |          |                     kudu::rpc::Messenger::QueueInboundCall(gscoped_ptr<kudu::rpc::InboundCall, kudu::DefaultDeleter
               |          |          |                     kudu::rpc::Connection::HandleIncomingCall(gscoped_ptr<kudu::rpc::InboundTransfer, kudu::Default
               |          |          |                     void ev::base<ev_io, ev::io>::method_thunk<kudu::rpc::Connection, &(kudu::rpc::Connection::Read
               |          |          |                     ev_invoke_pending
               |          |          |                     ev_run
               |          |          |                     kudu::rpc::ReactorThread::RunThread()
               |          |          |                     kudu::Thread::SuperviseThread(void*)
               |          |          |                     start_thread
               |          |           --0.18%-- [...]
               |           --0.29%-- [...]
                --0.03%-- [...]

Change-Id: I3b4421f4aae4d0e5a2d938881f9eea4e07ff2b10
Reviewed-on: http://gerrit.cloudera.org:8080/6552
Reviewed-by: Alexey Serbin <as...@cloudera.com>
Tested-by: Kudu Jenkins
---
M src/kudu/security/ca/cert_management.cc
M src/kudu/security/crypto.cc
M src/kudu/security/openssl_util.cc
M src/kudu/security/openssl_util.h
M src/kudu/security/tls_context.cc
M src/kudu/security/tls_handshake.cc
M src/kudu/security/tls_socket.cc
7 files changed, 111 insertions(+), 19 deletions(-)

Approvals:
  Alexey Serbin: Looks good to me, approved
  Kudu Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/6552
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I3b4421f4aae4d0e5a2d938881f9eea4e07ff2b10
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-1964. security: avoid calling ERR clear error() defensively

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.

Change subject: KUDU-1964. security: avoid calling ERR_clear_error() defensively
......................................................................


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6552/3/src/kudu/security/ca/cert_management.cc
File src/kudu/security/ca/cert_management.cc:

PS3, Line 114: Status::RuntimeError(
> Would it make sense to add GetOpenSSLErrors() here as well to clean up the 
Done


http://gerrit.cloudera.org:8080/#/c/6552/3/src/kudu/security/openssl_util.h
File src/kudu/security/openssl_util.h:

PS3, Line 67:   kudu::security::internal::ScopedCheckNoPendingSSLErrors _no_ssl_errors(__PRETTY_FUNCTION__)
> nit: would it make sense to put this under
constructing the empty/unused object on the stack should already optimize itself out


-- 
To view, visit http://gerrit.cloudera.org:8080/6552
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3b4421f4aae4d0e5a2d938881f9eea4e07ff2b10
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1964. security: avoid calling ERR clear error() defensively

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.

Change subject: KUDU-1964. security: avoid calling ERR_clear_error() defensively
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6552/1/src/kudu/security/tls_handshake.cc
File src/kudu/security/tls_handshake.cc:

PS1, Line 96: ERR_clear_error();
> I also checked SSL_get_error() code, from which I saw that in current versi
ah, I see. well, in that case we'd get the CHECK failure from SCOPED_OPENSSL_NO_PENDING_ERRORS at least.


-- 
To view, visit http://gerrit.cloudera.org:8080/6552
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3b4421f4aae4d0e5a2d938881f9eea4e07ff2b10
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1964. security: avoid calling ERR clear error() defensively

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change.

Change subject: KUDU-1964. security: avoid calling ERR_clear_error() defensively
......................................................................


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6552/3/src/kudu/security/ca/cert_management.cc
File src/kudu/security/ca/cert_management.cc:

PS3, Line 114: Status::RuntimeError(
Would it make sense to add GetOpenSSLErrors() here as well to clean up the error stack or you think it's better to do that in a separate changelist?


http://gerrit.cloudera.org:8080/#/c/6552/3/src/kudu/security/openssl_util.h
File src/kudu/security/openssl_util.h:

PS3, Line 67:   kudu::security::internal::ScopedCheckNoPendingSSLErrors _no_ssl_errors(__PRETTY_FUNCTION__)
nit: would it make sense to put this under
#ifndef NDEBUG to avoid creating no-op objects?


-- 
To view, visit http://gerrit.cloudera.org:8080/6552
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3b4421f4aae4d0e5a2d938881f9eea4e07ff2b10
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1964. security: avoid calling ERR clear error() defensively

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change.

Change subject: KUDU-1964. security: avoid calling ERR_clear_error() defensively
......................................................................


Patch Set 1:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/6552/1//COMMIT_MSG
Commit Message:

PS1, Line 19: apparently
            : improved in OpenSSL 1.1, but that's not available on current OSes
> seems like extra complexity because then we have two different code paths t
Yah, this seems like a good change to make regardless of OpenSSL version (btw, we wouldn't support 1.1 even if it were in a released OS; see KUDU-1889).  I knew preemptively clearing the error was a bit of a hack way back when this was introduced, in that it obfuscates openssl callsites that aren't checking and clearing the error.


http://gerrit.cloudera.org:8080/#/c/6552/1/src/kudu/security/openssl_util.cc
File src/kudu/security/openssl_util.cc:

Line 110:   ERR_clear_error();
Should this be moved into the if block?


http://gerrit.cloudera.org:8080/#/c/6552/1/src/kudu/security/openssl_util.h
File src/kudu/security/openssl_util.h:

Line 41:   CHECK_GT((call), 0)
Perhaps we should assert that the error stack is empty here, before making the call?  (and likewise in the other helper macros).


-- 
To view, visit http://gerrit.cloudera.org:8080/6552
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3b4421f4aae4d0e5a2d938881f9eea4e07ff2b10
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1964. security: avoid calling ERR clear error() defensively

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Hello Dan Burkert, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/6552

to look at the new patch set (#4).

Change subject: KUDU-1964. security: avoid calling ERR_clear_error() defensively
......................................................................

KUDU-1964. security: avoid calling ERR_clear_error() defensively

This changes our various code which wraps OpenSSL to avoid calling
ERR_clear_error() defensively. Instead, we now make sure to call
ERR_clear_error() after any case where we receive an error return value
from an OpenSSL library call.

For cases where we use the existing wrapper macros like
OPENSSL_RET_NOT_OK we are already ensured of this since the
GetOpenSSLErrors() call clears the error queue.

This provides a performance boost, since ERR_clear_error() ends up
taking several library-wide mutexes in OpenSSL 1.0.x (apparently
improved in OpenSSL 1.1, but that's not available on current OSes).

To ensure that we don't accidentally leave an OpenSSL error lying around
after any functions, I added a scoped helper which is active in debug
builds. This performs a DCHECK that the error queue is empty on scope
entry and exit.

To benchmark, I tested rpc-bench with encryption enabled using OpenSSL 1.0.1e
(RHEl6):

Master, SSL off:
----------------
I0404 12:49:04.811158  7250 rpc-bench.cc:84] Mode:            Async
I0404 12:49:04.811172  7250 rpc-bench.cc:88] Client reactors:  16
I0404 12:49:04.811175  7250 rpc-bench.cc:89] Call concurrency: 60
I0404 12:49:04.811178  7250 rpc-bench.cc:92] Worker threads:   1
I0404 12:49:04.811182  7250 rpc-bench.cc:93] Server reactors:  4
I0404 12:49:04.811183  7250 rpc-bench.cc:94] ----------------------------------
I0404 12:49:04.811187  7250 rpc-bench.cc:95] Reqs/sec:         446998
I0404 12:49:04.811193  7250 rpc-bench.cc:96] User CPU per req: 9.97575us
I0404 12:49:04.811197  7250 rpc-bench.cc:97] Sys CPU per req:  12.2342us
I0404 12:49:04.811202  7250 rpc-bench.cc:98] Ctx Sw. per req:  0.604032

Master, SSL on:
--------------
I0404 12:57:10.241720  9949 rpc-bench.cc:86] Mode:            Async
I0404 12:57:10.241736  9949 rpc-bench.cc:90] Client reactors:  16
I0404 12:57:10.241739  9949 rpc-bench.cc:91] Call concurrency: 60
I0404 12:57:10.241742  9949 rpc-bench.cc:94] Worker threads:   1
I0404 12:57:10.241745  9949 rpc-bench.cc:95] Server reactors:  4
I0404 12:57:10.241747  9949 rpc-bench.cc:96] Encryption:       1
I0404 12:57:10.241760  9949 rpc-bench.cc:97] ----------------------------------
I0404 12:57:10.241762  9949 rpc-bench.cc:98] Reqs/sec:         56761.3
I0404 12:57:10.241778  9949 rpc-bench.cc:99] User CPU per req: 39.7792us
I0404 12:57:10.241783  9949 rpc-bench.cc:100] Sys CPU per req:  106.916us
I0404 12:57:10.241786  9949 rpc-bench.cc:101] Ctx Sw. per req:  5.98631

Note the high number of context switches and system CPU. I gathered stack
traces of context switches using "perf record -g -e cs":

  Overhead       Samples    Command      Shared Object                         Symbol
  ........  ............  .........  .................  .............................

   100.00%        180979  rpc-bench  [kernel.kallsyms]  [k] perf_event_task_sched_out
            |
            --- perf_event_task_sched_out
                schedule
               |
               |--83.17%-- futex_wait_queue_me
               |          futex_wait
               |          do_futex
               |          sys_futex
               |          system_call_fastpath
               |          |
               |          |--83.11%-- __lll_lock_wait
               |          |          |
               |          |          |--42.33%-- int_thread_get
               |          |          |
               |          |          |--29.36%-- CRYPTO_add_lock
               |          |          |
               |          |          |--28.28%-- int_thread_get_item
               |          |           --0.03%-- [...]
               |          |
               |          |--16.89%-- pthread_cond_wait@@GLIBC_2.3.2
               |          |          |
               |          |          |--100.00%-- kudu::rpc::ServicePool::RunThread()
               |          |          |          kudu::Thread::SuperviseThread(void*)
               |          |          |          start_thread
               |          |           --0.00%-- [...]
               |           --0.01%-- [...]
               |
               |--16.80%-- schedule_hrtimeout_range
               |          |
               |          |--100.00%-- ep_poll
               |          |          sys_epoll_wait
               |          |          system_call_fastpath
               |          |          epoll_wait
               |          |          |
               |          |           --100.00%-- ev_run
               |          |                     kudu::rpc::ReactorThread::RunThread()
               |          |                     kudu::Thread::SuperviseThread(void*)
               |          |                     start_thread
               |           --0.00%-- [...]
                --0.03%-- [...]

With this patch
---------------
I0404 14:13:59.529119 27892 rpc-bench.cc:86] Mode:            Async
I0404 14:13:59.529134 27892 rpc-bench.cc:90] Client reactors:  16
I0404 14:13:59.529137 27892 rpc-bench.cc:91] Call concurrency: 60
I0404 14:13:59.529140 27892 rpc-bench.cc:94] Worker threads:   1
I0404 14:13:59.529142 27892 rpc-bench.cc:95] Server reactors:  4
I0404 14:13:59.529145 27892 rpc-bench.cc:96] Encryption:       1
I0404 14:13:59.529155 27892 rpc-bench.cc:97] ----------------------------------
I0404 14:13:59.529158 27892 rpc-bench.cc:98] Reqs/sec:         280819
I0404 14:13:59.529173 27892 rpc-bench.cc:99] User CPU per req: 18.2016us
I0404 14:13:59.529177 27892 rpc-bench.cc:100] Sys CPU per req:  19.9601us
I0404 14:13:59.529181 27892 rpc-bench.cc:101] Ctx Sw. per req:  1.20247
(about 5x improved throughput)

The SSL-related context switches no longer show up:

   100.00%        186653  rpc-bench  [kernel.kallsyms]  [k] perf_event_task_sched_out
            |
            --- perf_event_task_sched_out
                schedule
               |
               |--81.87%-- schedule_hrtimeout_range
               |          |
               |          |--100.00%-- ep_poll
               |          |          sys_epoll_wait
               |          |          system_call_fastpath
               |          |          epoll_wait
               |          |          |
               |          |           --100.00%-- ev_run
               |          |                     kudu::rpc::ReactorThread::RunThread()
               |          |                     kudu::Thread::SuperviseThread(void*)
               |          |                     start_thread
               |           --0.00%-- [...]
               |
               |--18.11%-- futex_wait_queue_me
               |          futex_wait
               |          do_futex
               |          sys_futex
               |          system_call_fastpath
               |          |
               |          |--98.68%-- pthread_cond_wait@@GLIBC_2.3.2
               |          |          |
               |          |          |--100.00%-- kudu::rpc::ServicePool::RunThread()
               |          |          |          kudu::Thread::SuperviseThread(void*)
               |          |          |          start_thread
               |          |           --0.00%-- [...]
               |          |
               |          |--1.02%-- base::internal::SpinLockDelay(int volatile*, int, int)
               |          |          |
               |          |          |--99.82%-- base::SpinLock::SlowLock()
               |          |          |          |
               |          |          |          |--98.09%-- kudu::rpc::LifoServiceQueue::BlockingGet(std::unique_ptr<kudu::rpc::InboundCall, std::default_
               |          |          |          |          kudu::rpc::ServicePool::RunThread()
               |          |          |          |          kudu::Thread::SuperviseThread(void*)
               |          |          |          |          start_thread
               |          |          |          |
               |          |          |           --1.91%-- kudu::rpc::LifoServiceQueue::Put(kudu::rpc::InboundCall*, boost::optional<kudu::rpc::InboundCal
               |          |          |                     kudu::rpc::ServicePool::QueueInboundCall(gscoped_ptr<kudu::rpc::InboundCall, kudu::DefaultDelet
               |          |          |                     kudu::rpc::Messenger::QueueInboundCall(gscoped_ptr<kudu::rpc::InboundCall, kudu::DefaultDeleter
               |          |          |                     kudu::rpc::Connection::HandleIncomingCall(gscoped_ptr<kudu::rpc::InboundTransfer, kudu::Default
               |          |          |                     void ev::base<ev_io, ev::io>::method_thunk<kudu::rpc::Connection, &(kudu::rpc::Connection::Read
               |          |          |                     ev_invoke_pending
               |          |          |                     ev_run
               |          |          |                     kudu::rpc::ReactorThread::RunThread()
               |          |          |                     kudu::Thread::SuperviseThread(void*)
               |          |          |                     start_thread
               |          |           --0.18%-- [...]
               |           --0.29%-- [...]
                --0.03%-- [...]

Change-Id: I3b4421f4aae4d0e5a2d938881f9eea4e07ff2b10
---
M src/kudu/security/ca/cert_management.cc
M src/kudu/security/crypto.cc
M src/kudu/security/openssl_util.cc
M src/kudu/security/openssl_util.h
M src/kudu/security/tls_context.cc
M src/kudu/security/tls_handshake.cc
M src/kudu/security/tls_socket.cc
7 files changed, 111 insertions(+), 19 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/52/6552/4
-- 
To view, visit http://gerrit.cloudera.org:8080/6552
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3b4421f4aae4d0e5a2d938881f9eea4e07ff2b10
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-1964. security: avoid calling ERR clear error() defensively

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.

Change subject: KUDU-1964. security: avoid calling ERR_clear_error() defensively
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6552/1/src/kudu/security/openssl_util.h
File src/kudu/security/openssl_util.h:

Line 41:   CHECK_GT((call), 0)
> Done
I did this in r2 but actually having second thoughts, because it's reasonable to do something like:

unique_ptr<Foo> foo(SSL_ctx_new());
OPENSSL_RET_IF_NULL(foo.get())

but if we peek_error expecting no error here 'before the call', it actaully might be 'after the call'. So I'm gonna revert these.


-- 
To view, visit http://gerrit.cloudera.org:8080/6552
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3b4421f4aae4d0e5a2d938881f9eea4e07ff2b10
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1964. security: avoid calling ERR clear error() defensively

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.

Change subject: KUDU-1964. security: avoid calling ERR_clear_error() defensively
......................................................................


Patch Set 1:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/6552/1/src/kudu/security/openssl_util.cc
File src/kudu/security/openssl_util.cc:

PS1, Line 79:   if (!ctx) {
            :     ERR_clear_error();
            :     return Status::RuntimeError("SSL library appears uninitialized (cannot create SSL_CTX)");
            :   }
> Consider using OPENSSL_RET_IF_NULL() macro instead: it's more idiomatic and
I don't think we want to spit back the internal error here (it's pretty ugly) since this gets propagated up to the user library.


PS1, Line 110: ERR_clear_error();
> Why is it necessary here?  As I understand, DoInitializeOpenSSL() is the on
we actually expect the above SSL_CTX_new() call to cause an error (we expect that SSL is _not_ initialized yet at this point). So we want to clear the error that we got. I'll move it into an 'else'


http://gerrit.cloudera.org:8080/#/c/6552/1/src/kudu/security/openssl_util.h
File src/kudu/security/openssl_util.h:

Line 41:   CHECK_GT((call), 0)
> Perhaps we should assert that the error stack is empty here, before making 
Done


PS1, Line 174: ERR_peek_last_error
> Why ERR_peek_last_error() instead of ERR_peek_error()?  Please add a commen
Done


http://gerrit.cloudera.org:8080/#/c/6552/1/src/kudu/security/tls_handshake.cc
File src/kudu/security/tls_handshake.cc:

PS1, Line 96: ERR_clear_error();
> Consider removing this, otherwise the code below would not be able to extra
wouldn't your code end up generating warnings on every WANT_READ/WANT_WRITE return, which are normal?

I see your point though that GetSSLErrorDescription itself may need to look at the error queue and the clear_error breaks that.

I checked the SSL code and it seems that SSL_get_error() itself guarantees to return either SSL_ERROR_SYSCALL or SSL_ERROR_SSL in the case that the error queue is non-empty, so there's no need to clear it here.


PS1, Line 132:     ERR_clear_error();
             :     return Status::NotAuthorized("SSL_get_verify_result()", X509_verify_cert_error_string(rc));
> Consider replacing with:
k, did something like that (kept a prefix on the error string)


http://gerrit.cloudera.org:8080/#/c/6552/1/src/kudu/security/tls_socket.cc
File src/kudu/security/tls_socket.cc:

PS1, Line 57: ERR_clear_error();
> I don't think it's needed.  Even if it were needed, it would be better plac
yea, agreed it's not needed since GetSSLErrorDescription clears in the case that there could have been some error


-- 
To view, visit http://gerrit.cloudera.org:8080/6552
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3b4421f4aae4d0e5a2d938881f9eea4e07ff2b10
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1964. security: avoid calling ERR clear error() defensively

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change.

Change subject: KUDU-1964. security: avoid calling ERR_clear_error() defensively
......................................................................


Patch Set 1:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/6552/1/src/kudu/security/openssl_util.cc
File src/kudu/security/openssl_util.cc:

PS1, Line 79:   if (!ctx) {
            :     ERR_clear_error();
            :     return Status::RuntimeError("SSL library appears uninitialized (cannot create SSL_CTX)");
            :   }
Consider using OPENSSL_RET_IF_NULL() macro instead: it's more idiomatic and allows to see some information on the error.


PS1, Line 110: ERR_clear_error();
Why is it necessary here?  As I understand, DoInitializeOpenSSL() is the only method which can put errors into the per-thread error stack here, and we it seems we do pick up our errors from the error stack.  And ERR_clear_error() has been called already in the very beginning of this function.


http://gerrit.cloudera.org:8080/#/c/6552/1/src/kudu/security/openssl_util.h
File src/kudu/security/openssl_util.h:

PS1, Line 174: ERR_peek_last_error
Why ERR_peek_last_error() instead of ERR_peek_error()?  Please add a comment if it's crucial.


http://gerrit.cloudera.org:8080/#/c/6552/1/src/kudu/security/tls_handshake.cc
File src/kudu/security/tls_handshake.cc:

PS1, Line 96: ERR_clear_error();
Consider removing this, otherwise the code below would not be able to extract error information from the error stack.

Probably, the code should be rewritten like this:

int ssl_err = SSL_get_error(ssl_.get(), rc);
// WANT_READ and WANT_WRITE indicate that the handshake is not yet complete.
if (ssl_err != SSL_ERROR_WANT_READ && ssl_err != SSL_ERROR_WANT_WRITE) {
  return Status::RuntimeError("TLS Handshake error", GetSSLErrorDescription(ssl_err));
}

if (ERR_peak_error() != 0) {
  const string err_str = GetOpenSSLErrors();
  DLOG(WARNING) << "SSL errors: " << err_str;
}


http://gerrit.cloudera.org:8080/#/c/6552/1/src/kudu/security/tls_socket.cc
File src/kudu/security/tls_socket.cc:

PS1, Line 57: ERR_clear_error();
I don't think it's needed.  Even if it were needed, it would be better place for this inside the if (...) {} block below.


-- 
To view, visit http://gerrit.cloudera.org:8080/6552
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3b4421f4aae4d0e5a2d938881f9eea4e07ff2b10
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1964. security: avoid calling ERR clear error() defensively

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change.

Change subject: KUDU-1964. security: avoid calling ERR_clear_error() defensively
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6552/1/src/kudu/security/tls_handshake.cc
File src/kudu/security/tls_handshake.cc:

PS1, Line 132:     ERR_clear_error();
             :     return Status::NotAuthorized("SSL_get_verify_result()", X509_verify_cert_error_string(rc));
Consider replacing with:

return Status::NotAuthorized(X509_verify_cert_error_string(rc), GetOpenSSLErrors());

It could be useful to get more information from the error stack (for troubleshooting purposes at least).


-- 
To view, visit http://gerrit.cloudera.org:8080/6552
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3b4421f4aae4d0e5a2d938881f9eea4e07ff2b10
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1964. security: avoid calling ERR clear error() defensively

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.

Change subject: KUDU-1964. security: avoid calling ERR_clear_error() defensively
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6552/1//COMMIT_MSG
Commit Message:

PS1, Line 19: apparently
            : improved in OpenSSL 1.1, but that's not available on current OSes
> Just passing through, but could we condition the behavior on the version of
seems like extra complexity because then we have two different code paths to test. Also no supported operating systems are actually shipping 1.1 yet.


-- 
To view, visit http://gerrit.cloudera.org:8080/6552
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3b4421f4aae4d0e5a2d938881f9eea4e07ff2b10
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1964. security: avoid calling ERR clear error() defensively

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: KUDU-1964. security: avoid calling ERR_clear_error() defensively
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6552/1//COMMIT_MSG
Commit Message:

PS1, Line 19: apparently
            : improved in OpenSSL 1.1, but that's not available on current OSes
Just passing through, but could we condition the behavior on the version of OpenSSL we're using at runtime?


-- 
To view, visit http://gerrit.cloudera.org:8080/6552
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3b4421f4aae4d0e5a2d938881f9eea4e07ff2b10
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1964. security: avoid calling ERR clear error() defensively

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.

Change subject: KUDU-1964. security: avoid calling ERR_clear_error() defensively
......................................................................


Patch Set 1:

I also tested this on an Impala cluster where the clients were hitting this contention a lot. Query throughput (of short queries) just about doubled.

-- 
To view, visit http://gerrit.cloudera.org:8080/6552
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3b4421f4aae4d0e5a2d938881f9eea4e07ff2b10
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] KUDU-1964. security: avoid calling ERR clear error() defensively

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change.

Change subject: KUDU-1964. security: avoid calling ERR_clear_error() defensively
......................................................................


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6552/1/src/kudu/security/openssl_util.cc
File src/kudu/security/openssl_util.cc:

PS1, Line 79:   if (!ctx) {
            :     ERR_clear_error();
            :     return Status::RuntimeError("SSL library appears uninitialized (cannot create SSL_CTX)");
            :   }
> I don't think we want to spit back the internal error here (it's pretty ugl
Oh, I see: this is used only to check whether the library is initialized.  Indeed, in this context it does not make much sense to give more information on the exact error.


http://gerrit.cloudera.org:8080/#/c/6552/1/src/kudu/security/tls_handshake.cc
File src/kudu/security/tls_handshake.cc:

PS1, Line 96: ERR_clear_error();
> wouldn't your code end up generating warnings on every WANT_READ/WANT_WRITE
I also checked SSL_get_error() code, from which I saw that in current version nothing should be on the error stack in case of SSL_ERROR_WANT_READ or SSL_ERROR_WANT_WRITE.  The 'if (ERR_peek_error() != 0) { ... }' closure is to spot a problem if we switch to some other version where the implementation changes and we start having some errors on the stack for SSL_ERROR_WANT_READ and SSL_ERROR_WANT_WRITE.

:)


-- 
To view, visit http://gerrit.cloudera.org:8080/6552
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3b4421f4aae4d0e5a2d938881f9eea4e07ff2b10
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1964. security: avoid calling ERR clear error() defensively

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change.

Change subject: KUDU-1964. security: avoid calling ERR_clear_error() defensively
......................................................................


Patch Set 3: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6552/1/src/kudu/security/openssl_util.h
File src/kudu/security/openssl_util.h:

Line 41:   CHECK_GT((call), 0)
> I did this in r2 but actually having second thoughts, because it's reasonab
fair enough.


-- 
To view, visit http://gerrit.cloudera.org:8080/6552
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3b4421f4aae4d0e5a2d938881f9eea4e07ff2b10
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1964. security: avoid calling ERR clear error() defensively

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change.

Change subject: KUDU-1964. security: avoid calling ERR_clear_error() defensively
......................................................................


Patch Set 4: Code-Review+2

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6552/4/src/kudu/security/ca/cert_management.cc
File src/kudu/security/ca/cert_management.cc:

Line 73:   SCOPED_OPENSSL_NO_PENDING_ERRORS;
good catch!


http://gerrit.cloudera.org:8080/#/c/6552/3/src/kudu/security/openssl_util.h
File src/kudu/security/openssl_util.h:

PS3, Line 67:   kudu::security::internal::ScopedCheckNoPendingSSLErrors _no_ssl_errors(__PRETTY_FUNCTION__)
> constructing the empty/unused object on the stack should already optimize i
ok, that makes sense


-- 
To view, visit http://gerrit.cloudera.org:8080/6552
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3b4421f4aae4d0e5a2d938881f9eea4e07ff2b10
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1964. security: avoid calling ERR clear error() defensively

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/6552

to look at the new patch set (#2).

Change subject: KUDU-1964. security: avoid calling ERR_clear_error() defensively
......................................................................

KUDU-1964. security: avoid calling ERR_clear_error() defensively

This changes our various code which wraps OpenSSL to avoid calling
ERR_clear_error() defensively. Instead, we now make sure to call
ERR_clear_error() after any case where we receive an error return value
from an OpenSSL library call.

For cases where we use the existing wrapper macros like
OPENSSL_RET_NOT_OK we are already ensured of this since the
GetOpenSSLErrors() call clears the error queue.

This provides a performance boost, since ERR_clear_error() ends up
taking several library-wide mutexes in OpenSSL 1.0.x (apparently
improved in OpenSSL 1.1, but that's not available on current OSes).

To ensure that we don't accidentally leave an OpenSSL error lying around
after any functions, I added a scoped helper which is active in debug
builds. This performs a DCHECK that the error queue is empty on scope
entry and exit.

To benchmark, I tested rpc-bench with encryption enabled using OpenSSL 1.0.1e
(RHEl6):

Master, SSL off:
----------------
I0404 12:49:04.811158  7250 rpc-bench.cc:84] Mode:            Async
I0404 12:49:04.811172  7250 rpc-bench.cc:88] Client reactors:  16
I0404 12:49:04.811175  7250 rpc-bench.cc:89] Call concurrency: 60
I0404 12:49:04.811178  7250 rpc-bench.cc:92] Worker threads:   1
I0404 12:49:04.811182  7250 rpc-bench.cc:93] Server reactors:  4
I0404 12:49:04.811183  7250 rpc-bench.cc:94] ----------------------------------
I0404 12:49:04.811187  7250 rpc-bench.cc:95] Reqs/sec:         446998
I0404 12:49:04.811193  7250 rpc-bench.cc:96] User CPU per req: 9.97575us
I0404 12:49:04.811197  7250 rpc-bench.cc:97] Sys CPU per req:  12.2342us
I0404 12:49:04.811202  7250 rpc-bench.cc:98] Ctx Sw. per req:  0.604032

Master, SSL on:
--------------
I0404 12:57:10.241720  9949 rpc-bench.cc:86] Mode:            Async
I0404 12:57:10.241736  9949 rpc-bench.cc:90] Client reactors:  16
I0404 12:57:10.241739  9949 rpc-bench.cc:91] Call concurrency: 60
I0404 12:57:10.241742  9949 rpc-bench.cc:94] Worker threads:   1
I0404 12:57:10.241745  9949 rpc-bench.cc:95] Server reactors:  4
I0404 12:57:10.241747  9949 rpc-bench.cc:96] Encryption:       1
I0404 12:57:10.241760  9949 rpc-bench.cc:97] ----------------------------------
I0404 12:57:10.241762  9949 rpc-bench.cc:98] Reqs/sec:         56761.3
I0404 12:57:10.241778  9949 rpc-bench.cc:99] User CPU per req: 39.7792us
I0404 12:57:10.241783  9949 rpc-bench.cc:100] Sys CPU per req:  106.916us
I0404 12:57:10.241786  9949 rpc-bench.cc:101] Ctx Sw. per req:  5.98631

Note the high number of context switches and system CPU. I gathered stack
traces of context switches using "perf record -g -e cs":

  Overhead       Samples    Command      Shared Object                         Symbol
  ........  ............  .........  .................  .............................

   100.00%        180979  rpc-bench  [kernel.kallsyms]  [k] perf_event_task_sched_out
            |
            --- perf_event_task_sched_out
                schedule
               |
               |--83.17%-- futex_wait_queue_me
               |          futex_wait
               |          do_futex
               |          sys_futex
               |          system_call_fastpath
               |          |
               |          |--83.11%-- __lll_lock_wait
               |          |          |
               |          |          |--42.33%-- int_thread_get
               |          |          |
               |          |          |--29.36%-- CRYPTO_add_lock
               |          |          |
               |          |          |--28.28%-- int_thread_get_item
               |          |           --0.03%-- [...]
               |          |
               |          |--16.89%-- pthread_cond_wait@@GLIBC_2.3.2
               |          |          |
               |          |          |--100.00%-- kudu::rpc::ServicePool::RunThread()
               |          |          |          kudu::Thread::SuperviseThread(void*)
               |          |          |          start_thread
               |          |           --0.00%-- [...]
               |           --0.01%-- [...]
               |
               |--16.80%-- schedule_hrtimeout_range
               |          |
               |          |--100.00%-- ep_poll
               |          |          sys_epoll_wait
               |          |          system_call_fastpath
               |          |          epoll_wait
               |          |          |
               |          |           --100.00%-- ev_run
               |          |                     kudu::rpc::ReactorThread::RunThread()
               |          |                     kudu::Thread::SuperviseThread(void*)
               |          |                     start_thread
               |           --0.00%-- [...]
                --0.03%-- [...]

With this patch
---------------
I0404 14:13:59.529119 27892 rpc-bench.cc:86] Mode:            Async
I0404 14:13:59.529134 27892 rpc-bench.cc:90] Client reactors:  16
I0404 14:13:59.529137 27892 rpc-bench.cc:91] Call concurrency: 60
I0404 14:13:59.529140 27892 rpc-bench.cc:94] Worker threads:   1
I0404 14:13:59.529142 27892 rpc-bench.cc:95] Server reactors:  4
I0404 14:13:59.529145 27892 rpc-bench.cc:96] Encryption:       1
I0404 14:13:59.529155 27892 rpc-bench.cc:97] ----------------------------------
I0404 14:13:59.529158 27892 rpc-bench.cc:98] Reqs/sec:         280819
I0404 14:13:59.529173 27892 rpc-bench.cc:99] User CPU per req: 18.2016us
I0404 14:13:59.529177 27892 rpc-bench.cc:100] Sys CPU per req:  19.9601us
I0404 14:13:59.529181 27892 rpc-bench.cc:101] Ctx Sw. per req:  1.20247
(about 5x improved throughput)

The SSL-related context switches no longer show up:

   100.00%        186653  rpc-bench  [kernel.kallsyms]  [k] perf_event_task_sched_out
            |
            --- perf_event_task_sched_out
                schedule
               |
               |--81.87%-- schedule_hrtimeout_range
               |          |
               |          |--100.00%-- ep_poll
               |          |          sys_epoll_wait
               |          |          system_call_fastpath
               |          |          epoll_wait
               |          |          |
               |          |           --100.00%-- ev_run
               |          |                     kudu::rpc::ReactorThread::RunThread()
               |          |                     kudu::Thread::SuperviseThread(void*)
               |          |                     start_thread
               |           --0.00%-- [...]
               |
               |--18.11%-- futex_wait_queue_me
               |          futex_wait
               |          do_futex
               |          sys_futex
               |          system_call_fastpath
               |          |
               |          |--98.68%-- pthread_cond_wait@@GLIBC_2.3.2
               |          |          |
               |          |          |--100.00%-- kudu::rpc::ServicePool::RunThread()
               |          |          |          kudu::Thread::SuperviseThread(void*)
               |          |          |          start_thread
               |          |           --0.00%-- [...]
               |          |
               |          |--1.02%-- base::internal::SpinLockDelay(int volatile*, int, int)
               |          |          |
               |          |          |--99.82%-- base::SpinLock::SlowLock()
               |          |          |          |
               |          |          |          |--98.09%-- kudu::rpc::LifoServiceQueue::BlockingGet(std::unique_ptr<kudu::rpc::InboundCall, std::default_
               |          |          |          |          kudu::rpc::ServicePool::RunThread()
               |          |          |          |          kudu::Thread::SuperviseThread(void*)
               |          |          |          |          start_thread
               |          |          |          |
               |          |          |           --1.91%-- kudu::rpc::LifoServiceQueue::Put(kudu::rpc::InboundCall*, boost::optional<kudu::rpc::InboundCal
               |          |          |                     kudu::rpc::ServicePool::QueueInboundCall(gscoped_ptr<kudu::rpc::InboundCall, kudu::DefaultDelet
               |          |          |                     kudu::rpc::Messenger::QueueInboundCall(gscoped_ptr<kudu::rpc::InboundCall, kudu::DefaultDeleter
               |          |          |                     kudu::rpc::Connection::HandleIncomingCall(gscoped_ptr<kudu::rpc::InboundTransfer, kudu::Default
               |          |          |                     void ev::base<ev_io, ev::io>::method_thunk<kudu::rpc::Connection, &(kudu::rpc::Connection::Read
               |          |          |                     ev_invoke_pending
               |          |          |                     ev_run
               |          |          |                     kudu::rpc::ReactorThread::RunThread()
               |          |          |                     kudu::Thread::SuperviseThread(void*)
               |          |          |                     start_thread
               |          |           --0.18%-- [...]
               |           --0.29%-- [...]
                --0.03%-- [...]

Change-Id: I3b4421f4aae4d0e5a2d938881f9eea4e07ff2b10
---
M src/kudu/security/openssl_util.cc
M src/kudu/security/openssl_util.h
M src/kudu/security/tls_context.cc
M src/kudu/security/tls_handshake.cc
M src/kudu/security/tls_socket.cc
5 files changed, 102 insertions(+), 19 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/52/6552/2
-- 
To view, visit http://gerrit.cloudera.org:8080/6552
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3b4421f4aae4d0e5a2d938881f9eea4e07ff2b10
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>