You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Jerry Chen (JIRA)" <ji...@apache.org> on 2016/03/04 04:18:40 UTC

[jira] [Commented] (HADOOP-12725) RPC encryption benchmark and optimization prototypes

    [ https://issues.apache.org/jira/browse/HADOOP-12725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15179231#comment-15179231 ] 

Jerry Chen commented on HADOOP-12725:
-------------------------------------

We need to aware that there are several aspects related to Hadoop RPC encryption optimization. The current discussions by now focus on the GSSAPI used in SASL Kerberos mechanism. And trying to optimize the GSSAPI internally.

For Hadoop client, Kerberos method is usually used as the first step of authentication to gain the access to the system. While different use cases follows a different pattern in the following steps. For example, A MapReduce job do Kerberos authentication only at Job submission and do DIGEST-MD5 authentication using delegation token in all the tasks. HBase or other services may follow a different pattern. While for MapReduce job case, it obviously more important or at least the same important to optimize also DIGEST-MD5 auth-conf implementation. From our experiment done on Spark RPC encryption, DIGEST-MD5 doesn't support AES algorithm. Also taking one of its algorithm 3DES for example, 3DES is very slow and possible has only 10 - 20 Mb/s throughput. 

So for RPC encryption optimization, it should be considered overall from the beginning.

As far as I see now, there might be two approaches:
1. Optimize the individual mechanism separately. (GSSAPI, DIGEST-MD5, ...). 
    The current discussion above to fit into this but only for Kerberos related GSSAPI.

2. Optimize on top of individual mechanisms and build its own auth-conf layer with AES-NI optimization. 
In HADOOP-10768, Andrew Purtell mentioned this approach: One could wrap the initial payloads with whatever encryption was negotiated during connection initiation until completing additional key exchange and negotiation steps, then switch to an alternate means of applying a symmetric cipher to RPC payloads." and HDFS-6606 also took this the approach to optimize data transfer encryption.

The #2 option has the advantage that Hadoop RPC implementation has control on all the optimizations and will not depend on under-layer mechanism optimization.




> RPC encryption benchmark and optimization prototypes
> ----------------------------------------------------
>
>                 Key: HADOOP-12725
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12725
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Kai Zheng
>            Assignee: Wei Zhou
>
> This would implement a benchmark tool to measure and compare the performance of Hadoop IPC/RPC call when security is enabled and different SASL QOP(Quality of Protection) is enforced. Given the data collected by this benchmark, it would then be able to know if any performance concern when considering to enforce privacy, integration, or authenticy protection level, and do optimization accordingly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)