You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-issues@hadoop.apache.org by "Daryn Sharp (JIRA)" <ji...@apache.org> on 2018/10/02 21:01:00 UTC

[jira] [Commented] (HADOOP-15813) Enable more reliable SSL connection reuse

    [ https://issues.apache.org/jira/browse/HADOOP-15813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16636110#comment-16636110 ] 

Daryn Sharp commented on HADOOP-15813:
--------------------------------------

One of the internal patches we've used internally for ~3 months to make the KMS almost usable under moderately high load.  Testing certain low latency software that rapidly opened many files would destroy the KMS.

There is still something wrong at the java level that I cannot quite figure out.  The client send a TLS close_notify.  Server responds with close_notify, does shutdown & close of the socket which lingers in TIME_WAIT2.  Odd.  On the client side, the socket lingers in CLOSE_WAIT state with 31 bytes (the close_notify message) in the recv buffer.  The keep alive cache cleaner won't detect the closed socket for at least another ~5s.   However...

Even during a steady stream of multiple requests/sec, the client appears to unreliable not reuse connections and initiate premature TLS shutdowns.  Observed a RM renewing a single-digit number of KMS tokens per second yet typically has ~20-100+ sockets in CLOSE_WAIT.

At any rate, this patch is a definite improvement but not a silver bullet.

> Enable more reliable SSL connection reuse
> -----------------------------------------
>
>                 Key: HADOOP-15813
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15813
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: common
>    Affects Versions: 2.6.0
>            Reporter: Daryn Sharp
>            Priority: Major
>         Attachments: HADOOP-15813.patch
>
>
> The java keep-alive cache relies on instance equivalence of the SSL socket factory.  In many java versions, SSLContext#getSocketFactory always returns a new instance which completely breaks the cache.  Clients flooding a service with lingering per-request connections that can lead to port exhaustion.  The hadoop SSLFactory should cache the socket factory associated with the context.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org