You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "kartheek muthyala (JIRA)" <ji...@apache.org> on 2017/02/07 06:20:41 UTC

[jira] [Commented] (HADOOP-13836) Securing Hadoop RPC using SSL

    [ https://issues.apache.org/jira/browse/HADOOP-13836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15855409#comment-15855409 ] 

kartheek muthyala commented on HADOOP-13836:
--------------------------------------------

[~daryn], Thank you for the insightful feedback. :)

When SSL encrypts the databuffers, the length of the data packets differ from the actual data sent. For example, if we have a 10 byte data packet, after encryption - the data packet can grow up to 16 byte depending on the algorithm used for encryption. So, when a hadoop RPC is sent on a channel, we read the data length to get to know the data to be read in advance. So, in the current readAndProcess, when we replace the socket channel with SSLServerSocketChannel, the channelRead might read partial data, which might not be able to sense the data length or data. For example, when we call SSLSocketChannel.read() might yield only 3 bytes, even though it has read 8 bytes on the channel. These 3 bytes won't be able to decode the data length, because today we use 4 bytes to understand the data length. So this nature of varying datalength on the channel, made me to modify the readAndProcess to continuously loop until we have enough data. This can probably be simplified by having another class which extends SSLServerSocketChannel and buffers at a layer under readAndProcess. That might avoid the extra readAndProcess. I will create an improvement on top of this jira to verify if that abstraction is possible. But even with this extra interface, we still have to loop for the data because of the same data length issues.


Multi-threaded clients generating requests faster than read will indefinitely tie up a reader
- I am not sure if it gets indefinitely tied up, but they will get processed eventually.
Clients sending a slow trickle of bytes will tie up a reader until a request is fully read.
- This is a problem that exists still today, when large data packets are sent and we use ChannelIO on the server to process this. 
Clients stalled mid-request will cause the reader to go into a spin loop.
- The connection timeout on the stalled clients, would lead to closure of channel and the spin loop breaks.


[~wheat9], The performance study quoted in the link occurs on a setup where clients are interfacing with frontend machines which support HTTPS. They pointed out that "On our production frontend machines, SSL/TLS accounts for less than 1% of the CPU load, less than 10KB of memory per connection and less than 2% of network overhead.", so it is an overall 3% overall for them too including network overhead due to handshaking. I am not sure if this is an Apple to Apple comparison with the setup on which I have taken performance numbers. The CPU processing speed in decoding and encoding, SSL protocol used, network bandwidth between the machines and workload characteristics etc.. might have varied in both the setups. 

> Securing Hadoop RPC using SSL
> -----------------------------
>
>                 Key: HADOOP-13836
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13836
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: ipc
>            Reporter: kartheek muthyala
>            Assignee: kartheek muthyala
>         Attachments: HADOOP-13836.patch, HADOOP-13836-v2.patch, HADOOP-13836-v3.patch, HADOOP-13836-v4.patch, Secure IPC OSS Proposal-1.pdf, SecureIPC Performance Analysis-OSS.pdf
>
>
> Today, RPC connections in Hadoop are encrypted using Simple Authentication & Security Layer (SASL), with the Kerberos ticket based authentication or Digest-md5 checksum based authentication protocols. This proposal is about enhancing this cipher suite with SSL/TLS based encryption and authentication. SSL/TLS is a proposed Internet Engineering Task Force (IETF) standard, that provides data security and integrity across two different end points in a network. This protocol has made its way to a number of applications such as web browsing, email, internet faxing, messaging, VOIP etc. And supporting this cipher suite at the core of Hadoop would give a good synergy with the applications on top and also bolster industry adoption of Hadoop.
> The Server and Client code in Hadoop IPC should support the following modes of communication
> 1.	Plain 
> 2.     SASL encryption with an underlying authentication
> 3.     SSL based encryption and authentication (x509 certificate)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org