You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by Erik Krogen <ek...@linkedin.com> on 2018/11/01 21:29:03 UTC

Re: [DISCUSS] Hadoop RPC encryption performance improvements

Hey Wei-Chiu,


We (LinkedIn) are definitely interested in the progression of this feature. Surveying HADOOP-10768 vs. HADOOP-13836, we feel that HADOOP-10768 is a change that is more in-line with Hadoop's progression. For example, it re-uses the existing SASL layer, maintains consistency with the encryption used for data transfer, and avoids the necessity for setting up client key/trust stores. Given that it is such a security-critical piece of code, I think we should make sure to get some additional sets of eyes on the patch and ensure that all of Daryn's concerns are addressed fully, but the approach seems valid.


Though we are interested in the Netty SSL approach, it is very difficult to make any judgements on it at this time with such little information available. How fundamental of a code change will this be? Is it fully backwards compatible? Will switching to a new RPC engine introduce the possibility for a whole new range of performance issues and/or bugs? We can appreciate the point that outsourcing such security-critical concerns to another widely used and battle-tested framework could be a big potential benefit, but are worried about the associated risks. More detailed information may help to assuage these concerns.


One additional point we would like to make is that right now, it seems that different approaches are using different benchmarks. For example, HADOOP-13836 posted results from Terasort, and HADOOP-10768 posted results from RPCCallBenchmark. Clearly the performance of the approach is crucial in making the decision and we should ensure that any comparisons made are apples-to-apples with the same test setup.

Thanks,
Erik Krogen
LinkedIn

________________________________
From: Wei-Chiu Chuang <we...@apache.org>
Sent: Wednesday, October 31, 2018 6:43 AM
To: Hadoop Common; Hdfs-dev
Subject: Re: [DISCUSS] Hadoop RPC encryption performance improvements

Ping. Any one? Cloudera is interested in moving forward with the RPC
encryption improvements, but I just like to get a consensus which approach
to go with.

Otherwise I'll pick HADOOP-10768 since it's ready for commit, and I've
spent time on testing it.

On Thu, Oct 25, 2018 at 11:04 AM Wei-Chiu Chuang <we...@apache.org> wrote:

> Folks,
>
> I would like to invite all to discuss the various Hadoop RPC encryption
> performance improvements. As you probably know, Hadoop RPC encryption
> currently relies on Java SASL, and have _really_ bad performance (in terms
> of number of RPCs per second, around 15~20% of the one without SASL)
>
> There have been some attempts to address this, most notably, HADOOP-10768
> <https://issues.apache.org/jira/browse/HADOOP-10768> (Optimize Hadoop RPC
> encryption performance) and HADOOP-13836
> <https://issues.apache.org/jira/browse/HADOOP-13836> (Securing Hadoop RPC
> using SSL). But it looks like both attempts have not been progressing.
>
> During the recent Hadoop contributor meetup, Daryn Sharp mentioned he's
> working on another approach that leverages Netty for its SSL encryption,
> and then integrate Netty with Hadoop RPC so that Hadoop RPC automatically
> benefits from netty's SSL encryption performance.
>
> So there are at least 3 attempts to address this issue as I see it. Do we
> have a consensus that:
> 1. this is an important problem
> 2. which approach we want to move forward with
>
> --
> A very happy Hadoop contributor
>


--
A very happy Hadoop contributor