You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by Gary Helmling <gh...@gmail.com> on 2010/07/03 01:15:29 UTC

Proposed feature branch for HBase security

Hi folks,

With Yahoo's latest security release on github (
http://github.com/yahoo/hadoop-common/tree/yahoo-hadoop-0.20.104), it looks
like we now have a real-world usable version of secure Hadoop, based on
0.20.  This is exciting stuff, because now we have something solid to start
working towards implementing similar security controls in HBase (HBASE-1697,
HBASE-2014, HBASE-2016, HBASE-2420)!

However, this is going to be a large undertaking, with a strong dependency
on the secure Hadoop branch (more on that in a bit --  unfortunately the
fragmented hadoop-0.20 world is already leaking through).  So I'd like to
propose a feature branch in the HBase svn repo for security work, to:

1) ensure that changes towards implementing secure HBase have an ASF home
2) provide more visibility and granularity for review (esp. JIRA &
reviewboard usage)
3) ease interaction/integration with other branched changes underway (master
rewrite)

I've already started pushing some preliminary changes up to github (
http://github.com/ghelmling/hbase/tree/security), and will continue to do
so, but I'd like to avoid both massive patch sets accumulating too many
changes and making interested committers & contributors go digging to see
what the current state is.

On the secure Hadoop branch dependency -- I've integrated the
org.apache.hadoop.ipc changes into o.a.h.hbase.ipc.* (HBASE-2742) and run
into a couple complications:

* Hadoop RPC version rolled from 3 to 4 (apparently 0.20-append also does
this!)
* various bits in the updated HBaseClient, HBaseServer, etc. now depend on
the security implementation, so building and running on top of non-secure
Hadoop will not be possible.

I'd like to post the diff on review.hbase.org for more review and feedback,
but that begs the question of where the changes should go?

Longer term, I think we need to dump Hadoop RPC (AVRO-405 seems promising in
this) so that HBase internals aren't so intertwined with Hadoop
implementation details, but that's it's own large scale project which we
shouldn't couple to security.

So, to sum up, thoughts on:

a) creating a "security" feature branch in svn?
b) RPC related changes, specifically cross Hadoop branch incompatibility due
to version increment and Hadoop security dependencies?

Thanks,
Gary

Re: Proposed feature branch for HBase security

Posted by Stack <sa...@gmail.com>.
+1 on branch integrating security. If I can help just say.  

On b.) I am game moving to nio avro but not of it gets in the way of a.). 



On Jul 2, 2010, at 19:15, Gary Helmling <gh...@gmail.com> wrote:

> Hi folks,
> 
> With Yahoo's latest security release on github (
> http://github.com/yahoo/hadoop-common/tree/yahoo-hadoop-0.20.104), it looks
> like we now have a real-world usable version of secure Hadoop, based on
> 0.20.  This is exciting stuff, because now we have something solid to start
> working towards implementing similar security controls in HBase (HBASE-1697,
> HBASE-2014, HBASE-2016, HBASE-2420)!
> 
> However, this is going to be a large undertaking, with a strong dependency
> on the secure Hadoop branch (more on that in a bit --  unfortunately the
> fragmented hadoop-0.20 world is already leaking through).  So I'd like to
> propose a feature branch in the HBase svn repo for security work, to:
> 
> 1) ensure that changes towards implementing secure HBase have an ASF home
> 2) provide more visibility and granularity for review (esp. JIRA &
> reviewboard usage)
> 3) ease interaction/integration with other branched changes underway (master
> rewrite)
> 
> I've already started pushing some preliminary changes up to github (
> http://github.com/ghelmling/hbase/tree/security), and will continue to do
> so, but I'd like to avoid both massive patch sets accumulating too many
> changes and making interested committers & contributors go digging to see
> what the current state is.
> 
> On the secure Hadoop branch dependency -- I've integrated the
> org.apache.hadoop.ipc changes into o.a.h.hbase.ipc.* (HBASE-2742) and run
> into a couple complications:
> 
> * Hadoop RPC version rolled from 3 to 4 (apparently 0.20-append also does
> this!)
> * various bits in the updated HBaseClient, HBaseServer, etc. now depend on
> the security implementation, so building and running on top of non-secure
> Hadoop will not be possible.
> 
> I'd like to post the diff on review.hbase.org for more review and feedback,
> but that begs the question of where the changes should go?
> 
> Longer term, I think we need to dump Hadoop RPC (AVRO-405 seems promising in
> this) so that HBase internals aren't so intertwined with Hadoop
> implementation details, but that's it's own large scale project which we
> shouldn't couple to security.
> 
> So, to sum up, thoughts on:
> 
> a) creating a "security" feature branch in svn?
> b) RPC related changes, specifically cross Hadoop branch incompatibility due
> to version increment and Hadoop security dependencies?
> 
> Thanks,
> Gary

Re: Proposed feature branch for HBase security

Posted by Gary Helmling <gh...@gmail.com>.
>
> > * Hadoop RPC version rolled from 3 to 4 (apparently 0.20-append also does
> > this!)
> >
>
> Can you explain further which version you're talking about here? Our HBase
> IPC is already wire-incompatible with Hadoop IPC, so the version numbers
> needn't match up, right?
>
> Correct, this is really only a complication in producing a combined Hadoop
0.20 append+security branch (see Andy Purtell's recent email), so not really
relevant here.  For the HBase RPC update, we can of course handle the
versioning however we want.


>
> > * various bits in the updated HBaseClient, HBaseServer, etc. now depend
> on
> > the security implementation, so building and running on top of non-secure
> > Hadoop will not be possible.
> >
> >
> Which classes do we depend on? Can we copy-paste those over into our tree?
> It's more of a maintenance pain in some ways, but in other ways it allows
> us
> to fix bugs, etc, without waiting on Hadoop releases.
>
>
The main new classes I brought over are:

org.apache.hadoop.security.SaslRpcClient ->
o.a.h.hbase.security.HBaseSaslRpcClient
org.apache.hadoop.security.SaslRpcServer ->
o.a.h.hbase.security.HBaseSaslRpcServer

which depend on:

o.a.hadoop.security.SaslInputStream
o.a.hadoop.security.SaslOutputStream
o.a.hadoop.security.token.Token
o.a.hadoop.security.token.TokenIdentifier

other RPC changes depend on:

o.a.hadoop.security.UserGroupInformation (incompatible interface changes)
o.a.hadoop.security.authorize.ServiceAuthorizationManager
o.a.hadoop.security.authorize.ProxyUsers

which further expands into the org.apache.hadoop.security package and
sub-packages.

My goal here has really just been to get bootstrapped on top of secure
Hadoop, so I've been opting for the minimal changes/duplication necessary.
We could duplicate more and strip out unnecessary bits to fully insulate our
changes and achieve cross Hadoop version compatibility.  Cross Hadoop
version compatibility will be critical before any of this could go into
trunk, but I don't think anything here would restrict taking that option
later on.  I'm just trying to defer that decision/work in case something
better comes along in the interim.  :)  In any case, secure RPC changes
wouldn't really have any value in themselves without a secure HBase
implementation, so this seems best fitted as a first step in a security
branch.


> > I'd like to post the diff on review.hbase.org for more review and
> > feedback,
> > but that begs the question of where the changes should go?
> >
> >
> I can set up review.hbase.org to fetch from your github repo, so that
> diffs
> against your github branch will upload properly. Is that useful?
>
>
At this point, I think it would help along to have some concrete changes to
refer to, so I'll follow up by posting to rb.  Whether or not the github
setup would be useful I think depends on consensus for basing security
development in an svn feature branch vs. externally (ie github)?

Thanks for the feedback and thoughts.

--gh

Re: Proposed feature branch for HBase security

Posted by Todd Lipcon <to...@cloudera.com>.
On Fri, Jul 2, 2010 at 4:15 PM, Gary Helmling <gh...@gmail.com> wrote:

> Hi folks,
>
> With Yahoo's latest security release on github (
> http://github.com/yahoo/hadoop-common/tree/yahoo-hadoop-0.20.104), it
> looks
> like we now have a real-world usable version of secure Hadoop, based on
> 0.20.  This is exciting stuff, because now we have something solid to start
> working towards implementing similar security controls in HBase
> (HBASE-1697,
> HBASE-2014, HBASE-2016, HBASE-2420)!
>
> However, this is going to be a large undertaking, with a strong dependency
> on the secure Hadoop branch (more on that in a bit --  unfortunately the
> fragmented hadoop-0.20 world is already leaking through).  So I'd like to
> propose a feature branch in the HBase svn repo for security work, to:
>
> 1) ensure that changes towards implementing secure HBase have an ASF home
> 2) provide more visibility and granularity for review (esp. JIRA &
> reviewboard usage)
> 3) ease interaction/integration with other branched changes underway
> (master
> rewrite)
>
> I've already started pushing some preliminary changes up to github (
> http://github.com/ghelmling/hbase/tree/security), and will continue to do
> so, but I'd like to avoid both massive patch sets accumulating too many
> changes and making interested committers & contributors go digging to see
> what the current state is.
>
> On the secure Hadoop branch dependency -- I've integrated the
> org.apache.hadoop.ipc changes into o.a.h.hbase.ipc.* (HBASE-2742) and run
> into a couple complications:
>
> * Hadoop RPC version rolled from 3 to 4 (apparently 0.20-append also does
> this!)
>

Can you explain further which version you're talking about here? Our HBase
IPC is already wire-incompatible with Hadoop IPC, so the version numbers
needn't match up, right?


> * various bits in the updated HBaseClient, HBaseServer, etc. now depend on
> the security implementation, so building and running on top of non-secure
> Hadoop will not be possible.
>
>
Which classes do we depend on? Can we copy-paste those over into our tree?
It's more of a maintenance pain in some ways, but in other ways it allows us
to fix bugs, etc, without waiting on Hadoop releases.


> I'd like to post the diff on review.hbase.org for more review and
> feedback,
> but that begs the question of where the changes should go?
>
>
I can set up review.hbase.org to fetch from your github repo, so that diffs
against your github branch will upload properly. Is that useful?


> Longer term, I think we need to dump Hadoop RPC (AVRO-405 seems promising
> in
> this) so that HBase internals aren't so intertwined with Hadoop
> implementation details, but that's it's own large scale project which we
> shouldn't couple to security.
>
> So, to sum up, thoughts on:
>
> a) creating a "security" feature branch in svn?
> b) RPC related changes, specifically cross Hadoop branch incompatibility
> due
> to version increment and Hadoop security dependencies?
>
> Thanks,
> Gary
>



-- 
Todd Lipcon
Software Engineer, Cloudera