You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by Stack <st...@duboce.net> on 2012/03/02 20:24:34 UTC

DISCUSS: Have hbase require at least hadoop 1.0.0 in hbase 0.96.0?

Should we make it so hbase 0.96.0 requires at least hadoop 1.0.0?
This would mean we would no longer support running on older versions
such as branch-0.20-append (and perhaps stuff like CDH2?)?

Requiring Hadoop 1.0.0 at least means we can presume security and
append.  We also narrow the set of hadoops we need to support
simplifying things for ourselves some.

What you lot think?
St.Ack

Re: DISCUSS: Have hbase require at least hadoop 1.0.0 in hbase 0.96.0?

Posted by Andrew Purtell <ap...@apache.org>.
+1

We're going to have enough issues with 0.23, 0.24, ...

 
Best regards,


    - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)



>________________________________
> From: Stack <st...@duboce.net>
>To: HBase Dev List <de...@hbase.apache.org> 
>Sent: Friday, March 2, 2012 11:24 AM
>Subject: DISCUSS: Have hbase require at least hadoop 1.0.0 in hbase 0.96.0?
> 
>Should we make it so hbase 0.96.0 requires at least hadoop 1.0.0?
>This would mean we would no longer support running on older versions
>such as branch-0.20-append (and perhaps stuff like CDH2?)?
>
>Requiring Hadoop 1.0.0 at least means we can presume security and
>append.  We also narrow the set of hadoops we need to support
>simplifying things for ourselves some.
>
>What you lot think?
>St.Ack
>
>
>

RE: DISCUSS: Have hbase require at least hadoop 1.0.0 in hbase 0.96.0?

Posted by "Ramkrishna.S.Vasudevan" <ra...@huawei.com>.
+1  on using Hadoop 1.0.0.

-----Original Message-----
From: Huanyou Chang [mailto:mapbased@mapbased.com] 
Sent: Tuesday, March 06, 2012 4:27 PM
To: dev@hbase.apache.org
Subject: Re: DISCUSS: Have hbase require at least hadoop 1.0.0 in hbase
0.96.0?

+1

2012/3/3 Stack <st...@duboce.net>

> Should we make it so hbase 0.96.0 requires at least hadoop 1.0.0?
> This would mean we would no longer support running on older versions
> such as branch-0.20-append (and perhaps stuff like CDH2?)?
>
> Requiring Hadoop 1.0.0 at least means we can presume security and
> append.  We also narrow the set of hadoops we need to support
> simplifying things for ourselves some.
>
> What you lot think?
> St.Ack
>


Re: DISCUSS: Have hbase require at least hadoop 1.0.0 in hbase 0.96.0?

Posted by Huanyou Chang <ma...@mapbased.com>.
+1

2012/3/3 Stack <st...@duboce.net>

> Should we make it so hbase 0.96.0 requires at least hadoop 1.0.0?
> This would mean we would no longer support running on older versions
> such as branch-0.20-append (and perhaps stuff like CDH2?)?
>
> Requiring Hadoop 1.0.0 at least means we can presume security and
> append.  We also narrow the set of hadoops we need to support
> simplifying things for ourselves some.
>
> What you lot think?
> St.Ack
>

Re: DISCUSS: Have hbase require at least hadoop 1.0.0 in hbase 0.96.0?

Posted by Jesse Yates <je...@gmail.com>.
+1 as well.
Will definitely be nice to take some load off Jenkins too :)


-------------------
Jesse Yates
240-888-2200
@jesse_yates
jyates.github.com


On Fri, Mar 2, 2012 at 11:29 AM, lars hofhansl <lh...@yahoo.com> wrote:

> +1
>
>
> ________________________________
>  From: Stack <st...@duboce.net>
> To: HBase Dev List <de...@hbase.apache.org>
> Sent: Friday, March 2, 2012 11:24 AM
> Subject: DISCUSS: Have hbase require at least hadoop 1.0.0 in hbase 0.96.0?
>
> Should we make it so hbase 0.96.0 requires at least hadoop 1.0.0?
> This would mean we would no longer support running on older versions
> such as branch-0.20-append (and perhaps stuff like CDH2?)?
>
> Requiring Hadoop 1.0.0 at least means we can presume security and
> append.  We also narrow the set of hadoops we need to support
> simplifying things for ourselves some.
>
> What you lot think?
> St.Ack
>

Re: DISCUSS: Have hbase require at least hadoop 1.0.0 in hbase 0.96.0?

Posted by lars hofhansl <lh...@yahoo.com>.
+1


________________________________
 From: Stack <st...@duboce.net>
To: HBase Dev List <de...@hbase.apache.org> 
Sent: Friday, March 2, 2012 11:24 AM
Subject: DISCUSS: Have hbase require at least hadoop 1.0.0 in hbase 0.96.0?
 
Should we make it so hbase 0.96.0 requires at least hadoop 1.0.0?
This would mean we would no longer support running on older versions
such as branch-0.20-append (and perhaps stuff like CDH2?)?

Requiring Hadoop 1.0.0 at least means we can presume security and
append.  We also narrow the set of hadoops we need to support
simplifying things for ourselves some.

What you lot think?
St.Ack

Re: DISCUSS: Have hbase require at least hadoop 1.0.0 in hbase 0.96.0?

Posted by Ioan Eugen Stan <st...@gmail.com>.
Pe 02.03.2012 21:24, Stack a scris:
> Should we make it so hbase 0.96.0 requires at least hadoop 1.0.0?
> This would mean we would no longer support running on older versions
> such as branch-0.20-append (and perhaps stuff like CDH2?)?
>
> Requiring Hadoop 1.0.0 at least means we can presume security and
> append.  We also narrow the set of hadoops we need to support
> simplifying things for ourselves some.
>
> What you lot think?
> St.Ack

+1, simple is better

-- 
Ioan Eugen Stan
http://ieugen.blogspot.com

Re: DISCUSS: Have hbase require at least hadoop 1.0.0 in hbase 0.96.0?

Posted by Mikhail Bautin <ba...@gmail.com>.
@Stack, Jonathan: thank you for your replies.

After some more internal discussion, we decided it might not be too hard
for us to implement stubs in our version of HDFS to accommodate the new API
requirements on the HBase side.

Putting some of the HDFS multi-version support plumbing in HFileSystem
sounds like a good idea going forward, though, even if we are removing
support for some of the versions.

Thanks,
--Mikhail

On Thu, Mar 8, 2012 at 9:08 AM, Stack <st...@duboce.net> wrote:

> On Thu, Mar 8, 2012 at 12:31 AM, Jonathan Hsieh <jo...@cloudera.com> wrote:
> > I feel some sympathy towards the existing user argument (we have plenty
> to
> > deal with) -- a compromise may be to have hbase core tested and focused
> on
> > a small number of hdfs versions (apache hadoop 1.0.0 and apache hadoop
> > 0.23.x are my first suggestions) and to have an interface that isolates
> all
> > the  the reflection checks that are currently sprinkled throughout the
> code
> > base into an interface which can be targeted to support other specific
> > HDFS/DFS flavors.  This would be saner and could explicitly be tested.
> >
>
> HBASE-5074 introduces HFilesystem, the hbase filesystem. In this new
> layer, HBASE-5074 does the new checksum facility.  It includes faking
> a call that is in a new hdfs that is not in older versions.  Perhaps
> its here that we should move all of our reflectioneering so its
> contained and grokable?
>
> St.Ack
>

Re: DISCUSS: Have hbase require at least hadoop 1.0.0 in hbase 0.96.0?

Posted by Stack <st...@duboce.net>.
On Thu, Mar 8, 2012 at 12:31 AM, Jonathan Hsieh <jo...@cloudera.com> wrote:
> I feel some sympathy towards the existing user argument (we have plenty to
> deal with) -- a compromise may be to have hbase core tested and focused on
> a small number of hdfs versions (apache hadoop 1.0.0 and apache hadoop
> 0.23.x are my first suggestions) and to have an interface that isolates all
> the  the reflection checks that are currently sprinkled throughout the code
> base into an interface which can be targeted to support other specific
> HDFS/DFS flavors.  This would be saner and could explicitly be tested.
>

HBASE-5074 introduces HFilesystem, the hbase filesystem. In this new
layer, HBASE-5074 does the new checksum facility.  It includes faking
a call that is in a new hdfs that is not in older versions.  Perhaps
its here that we should move all of our reflectioneering so its
contained and grokable?

St.Ack

Re: DISCUSS: Have hbase require at least hadoop 1.0.0 in hbase 0.96.0?

Posted by Jonathan Hsieh <jo...@cloudera.com>.
I'm +1 on moving to a min version of Hadoop 1.0.0 in 0.96.

The support-all-flavors stance, especially on branches as opposed to
releases requires us to maintain shims for different versions thus requires
us to expend energy managing this complexity instead of improving HBase's
core.

I'm not convinced about the new user argument -- if folks are completely
new, I'd imagine they'd most likely start by going with the herd and
picking a DFS that most folks use (such as an apache hadoop1.0.0, a cdh
version, or possibly a mapr version).  In the case of cdh/mapr or internal
custom build it would be the responsibility of the packager to maintain and
support their own idiosyncrasies or limitations.

I feel some sympathy towards the existing user argument (we have plenty to
deal with) -- a compromise may be to have hbase core tested and focused on
a small number of hdfs versions (apache hadoop 1.0.0 and apache hadoop
0.23.x are my first suggestions) and to have an interface that isolates all
the  the reflection checks that are currently sprinkled throughout the code
base into an interface which can be targeted to support other specific
HDFS/DFS flavors.  This would be saner and could explicitly be tested.

My guess is that this problem isn't just for the user/security API -- I
believe there may be performance improvements and api improvements in newer
HDFS's that we may want to take advantage of and would need reflection to
be discovered as well.

Jon.

On Wed, Mar 7, 2012 at 11:42 AM, Mikhail Bautin <
bautin.mailing.lists@gmail.com> wrote:

> The current support for multiple versions of HDFS is in my opinion actually
> one of the strengths of HBase, and the project will lose that advantage if
> we cut support for earlier versions of Hadoop. I think HBase should only
> require the simplest possible universally available subset of HDFS API, and
> security should be an optional feature, discovered through reflection or
> enabled in some other ways.
>
> We have a custom version of Hadoop at Facebook that is not planning to
> implement security any time soon. This version of Hadoop runs underneath
> what we believe to be some of the largest existing production HBase
> deployments. We are currently running the 0.89-fb version of HBase in
> production, but are considering moving to a more recent version of HBase at
> some point, and it would be great to be able to do that independently of
> changing the underlying Hadoop distribution for migration complexity
> reasons. Currently we are able to run public HBase trunk on our version of
> Hadoop, but once in a while we have to satisfy new dependences on Hadoop
> features that are added to HBase. If the changes proposed in this thread
> happen, we would have to pull in a lot more security-related dependencies
> into our version of Hadoop and, most likely, implement a lot of no-op
> stubs. However, that may not be a trivial project, and it certainly would
> not add any clarity or value to our Hadoop codebase or HBase / HDFS
> interaction.
>
> I imagine there are other custom flavors of Hadoop out there where HBase
> support would be desirable. For example, does MapR implement the same
> security API as Hadoop 1.0.0 does? Restricting HBase to a smaller subset of
> Hadoop versions complicates life for existing users, and makes HBase a less
> likely choice for new users, who could go with something like Hypertable
> where they have an extra abstraction layer between the database and the
> underlying distributed file system implementation.
>
> Thanks,
> --Mikhail
>
> On Wed, Mar 7, 2012 at 10:20 AM, Devaraj Das <dd...@hortonworks.com> wrote:
>
> > Given that the token/ugi APIs are being used in other ecosystem
> components
> > too (like Hive, HCatalog & Oozie), and in general, that security model
> will
> > probably hold for other projects too, I think that its not an unfair
> > expectation from Hadoop that it should maintain compatibility on
> UGI/Token*
> > interfaces (*smile*).
> >
> > On Mar 6, 2012, at 11:57 AM, Arun C Murthy wrote:
> >
> > > Andy - could you please start a discussion?
> > >
> > > We could, at the very least, mark UGI as LimitedPrivate for HBase and
> > work with you guys to maintain compatibility for the future. Makes sense?
> > >
> > > thanks,
> > > Arun
> > >
> > > On Mar 6, 2012, at 10:21 AM, Andrew Purtell wrote:
> > >
> > >> After that, I believe we can merge the security sources in. However we
> > may have an issue going forward because UGI is an unstable/private API.
> > Needs sorting out with core at some point.
> > >>
> > >> Best regards,
> > >>
> > >>   - Andy
> > >>
> > >>
> > >> On Mar 6, 2012, at 9:55 AM, Stack <st...@duboce.net> wrote:
> > >>
> > >>> On Tue, Mar 6, 2012 at 9:10 AM, Andrew Purtell <ap...@apache.org>
> > wrote:
> > >>>> ...however we can't easily build a single artifact because the
> secure
> > RPC engine, as it interacts with the Hadoop auth framework, must use
> > UserGroupInformation.
> > >>>>
> > >>>
> > >>> OK.  So security story needs a bit of work.  Sounds like we have
> > >>> enough votes though to require hadoop 1.0.0 at least in 0.96.
> > >>>
> > >>> St.Ack
> > >
> > > --
> > > Arun C. Murthy
> > > Hortonworks Inc.
> > > http://hortonworks.com/
> > >
> > >
> >
> >
>



-- 
// Jonathan Hsieh (shay)
// Software Engineer, Cloudera
// jon@cloudera.com

Re: DISCUSS: Have hbase require at least hadoop 1.0.0 in hbase 0.96.0?

Posted by Andrew Purtell <ap...@yahoo.com>.
I have no strong opinion either way: separate profile or merge could be made to work. I'm happy to maintain security related sources as a module as long as the necessary accommodations are made by other devs; e.g. don't break our sources by changing the coprocessor API or RPC without also fixing up the security module or at least making it straightforward for us to do those fixups. 

Best regards,

    - Andy


On Mar 7, 2012, at 11:42 AM, Mikhail Bautin <ba...@gmail.com> wrote:

> The current support for multiple versions of HDFS is in my opinion actually
> one of the strengths of HBase, and the project will lose that advantage if
> we cut support for earlier versions of Hadoop. I think HBase should only
> require the simplest possible universally available subset of HDFS API, and
> security should be an optional feature, discovered through reflection or
> enabled in some other ways.
> 
> We have a custom version of Hadoop at Facebook that is not planning to
> implement security any time soon. This version of Hadoop runs underneath
> what we believe to be some of the largest existing production HBase
> deployments. We are currently running the 0.89-fb version of HBase in
> production, but are considering moving to a more recent version of HBase at
> some point, and it would be great to be able to do that independently of
> changing the underlying Hadoop distribution for migration complexity
> reasons. Currently we are able to run public HBase trunk on our version of
> Hadoop, but once in a while we have to satisfy new dependences on Hadoop
> features that are added to HBase. If the changes proposed in this thread
> happen, we would have to pull in a lot more security-related dependencies
> into our version of Hadoop and, most likely, implement a lot of no-op
> stubs. However, that may not be a trivial project, and it certainly would
> not add any clarity or value to our Hadoop codebase or HBase / HDFS
> interaction.
> 
> I imagine there are other custom flavors of Hadoop out there where HBase
> support would be desirable. For example, does MapR implement the same
> security API as Hadoop 1.0.0 does? Restricting HBase to a smaller subset of
> Hadoop versions complicates life for existing users, and makes HBase a less
> likely choice for new users, who could go with something like Hypertable
> where they have an extra abstraction layer between the database and the
> underlying distributed file system implementation.
> 
> Thanks,
> --Mikhail
> 
> On Wed, Mar 7, 2012 at 10:20 AM, Devaraj Das <dd...@hortonworks.com> wrote:
> 
>> Given that the token/ugi APIs are being used in other ecosystem components
>> too (like Hive, HCatalog & Oozie), and in general, that security model will
>> probably hold for other projects too, I think that its not an unfair
>> expectation from Hadoop that it should maintain compatibility on UGI/Token*
>> interfaces (*smile*).
>> 
>> On Mar 6, 2012, at 11:57 AM, Arun C Murthy wrote:
>> 
>>> Andy - could you please start a discussion?
>>> 
>>> We could, at the very least, mark UGI as LimitedPrivate for HBase and
>> work with you guys to maintain compatibility for the future. Makes sense?
>>> 
>>> thanks,
>>> Arun
>>> 
>>> On Mar 6, 2012, at 10:21 AM, Andrew Purtell wrote:
>>> 
>>>> After that, I believe we can merge the security sources in. However we
>> may have an issue going forward because UGI is an unstable/private API.
>> Needs sorting out with core at some point.
>>>> 
>>>> Best regards,
>>>> 
>>>>  - Andy
>>>> 
>>>> 
>>>> On Mar 6, 2012, at 9:55 AM, Stack <st...@duboce.net> wrote:
>>>> 
>>>>> On Tue, Mar 6, 2012 at 9:10 AM, Andrew Purtell <ap...@apache.org>
>> wrote:
>>>>>> ...however we can't easily build a single artifact because the secure
>> RPC engine, as it interacts with the Hadoop auth framework, must use
>> UserGroupInformation.
>>>>>> 
>>>>> 
>>>>> OK.  So security story needs a bit of work.  Sounds like we have
>>>>> enough votes though to require hadoop 1.0.0 at least in 0.96.
>>>>> 
>>>>> St.Ack
>>> 
>>> --
>>> Arun C. Murthy
>>> Hortonworks Inc.
>>> http://hortonworks.com/
>>> 
>>> 
>> 
>> 

Re: DISCUSS: Have hbase require at least hadoop 1.0.0 in hbase 0.96.0?

Posted by Ted Yu <yu...@gmail.com>.
+1 on what Mikhail said below.

On Wed, Mar 7, 2012 at 11:42 AM, Mikhail Bautin <
bautin.mailing.lists@gmail.com> wrote:

> The current support for multiple versions of HDFS is in my opinion actually
> one of the strengths of HBase, and the project will lose that advantage if
> we cut support for earlier versions of Hadoop. I think HBase should only
> require the simplest possible universally available subset of HDFS API, and
> security should be an optional feature, discovered through reflection or
> enabled in some other ways.
>
> We have a custom version of Hadoop at Facebook that is not planning to
> implement security any time soon. This version of Hadoop runs underneath
> what we believe to be some of the largest existing production HBase
> deployments. We are currently running the 0.89-fb version of HBase in
> production, but are considering moving to a more recent version of HBase at
> some point, and it would be great to be able to do that independently of
> changing the underlying Hadoop distribution for migration complexity
> reasons. Currently we are able to run public HBase trunk on our version of
> Hadoop, but once in a while we have to satisfy new dependences on Hadoop
> features that are added to HBase. If the changes proposed in this thread
> happen, we would have to pull in a lot more security-related dependencies
> into our version of Hadoop and, most likely, implement a lot of no-op
> stubs. However, that may not be a trivial project, and it certainly would
> not add any clarity or value to our Hadoop codebase or HBase / HDFS
> interaction.
>
> I imagine there are other custom flavors of Hadoop out there where HBase
> support would be desirable. For example, does MapR implement the same
> security API as Hadoop 1.0.0 does? Restricting HBase to a smaller subset of
> Hadoop versions complicates life for existing users, and makes HBase a less
> likely choice for new users, who could go with something like Hypertable
> where they have an extra abstraction layer between the database and the
> underlying distributed file system implementation.
>
> Thanks,
> --Mikhail
>
> On Wed, Mar 7, 2012 at 10:20 AM, Devaraj Das <dd...@hortonworks.com> wrote:
>
> > Given that the token/ugi APIs are being used in other ecosystem
> components
> > too (like Hive, HCatalog & Oozie), and in general, that security model
> will
> > probably hold for other projects too, I think that its not an unfair
> > expectation from Hadoop that it should maintain compatibility on
> UGI/Token*
> > interfaces (*smile*).
> >
> > On Mar 6, 2012, at 11:57 AM, Arun C Murthy wrote:
> >
> > > Andy - could you please start a discussion?
> > >
> > > We could, at the very least, mark UGI as LimitedPrivate for HBase and
> > work with you guys to maintain compatibility for the future. Makes sense?
> > >
> > > thanks,
> > > Arun
> > >
> > > On Mar 6, 2012, at 10:21 AM, Andrew Purtell wrote:
> > >
> > >> After that, I believe we can merge the security sources in. However we
> > may have an issue going forward because UGI is an unstable/private API.
> > Needs sorting out with core at some point.
> > >>
> > >> Best regards,
> > >>
> > >>   - Andy
> > >>
> > >>
> > >> On Mar 6, 2012, at 9:55 AM, Stack <st...@duboce.net> wrote:
> > >>
> > >>> On Tue, Mar 6, 2012 at 9:10 AM, Andrew Purtell <ap...@apache.org>
> > wrote:
> > >>>> ...however we can't easily build a single artifact because the
> secure
> > RPC engine, as it interacts with the Hadoop auth framework, must use
> > UserGroupInformation.
> > >>>>
> > >>>
> > >>> OK.  So security story needs a bit of work.  Sounds like we have
> > >>> enough votes though to require hadoop 1.0.0 at least in 0.96.
> > >>>
> > >>> St.Ack
> > >
> > > --
> > > Arun C. Murthy
> > > Hortonworks Inc.
> > > http://hortonworks.com/
> > >
> > >
> >
> >
>

Re: DISCUSS: Have hbase require at least hadoop 1.0.0 in hbase 0.96.0?

Posted by Stack <st...@duboce.net>.
On Wed, Mar 7, 2012 at 11:42 AM, Mikhail Bautin
<ba...@gmail.com> wrote:
> The current support for multiple versions of HDFS is in my opinion actually
> one of the strengths of HBase, and the project will lose that advantage if
> we cut support for earlier versions of Hadoop.

It just gets a little tough to keep up when the span to support is
broad: branch-0.20-append up through 0.23.x.  I'm not sure if its
tenable keeping it up after we get beyond a certain breadth.

The issue that prompted this discussion was in part "HBASE-5419)
FileAlreadyExistsException has moved from mapred to fs package", a
helpful patch by Dhruba to get us off a deprecated class.  Its
application will break our building against hadoop's older than 1.0.0
(I believe).

I suppose we can keep up (hacky) reflection but at a certain stage its
maintenance becomes "difficult".

> I think HBase should only
> require the simplest possible universally available subset of HDFS API

This notion.  I like.  How would we ensure we keep to a narrow subset
(excepting security for the moment)?  If we want to use an exotic hdfs
api, we go there via reflection?

>, and
> security should be an optional feature, discovered through reflection or
> enabled in some other ways.
>

If we can't assume 1.0.0, and you've made a point that we can't and
shouldn't (because we'd be leaving behind our biggest deploy -- which
would just be silly), then security is done via the modularization
route that has been discussed previous and that has had some work
applied (you fellas good w/ that?).


> If the changes proposed in this thread
> happen, we would have to pull in a lot more security-related dependencies
> into our version of Hadoop and, most likely, implement a lot of no-op
> stubs.

Lets not have you have to do this.

How do you suggest we ensure we minimize you or anyone else having to
address '...new dependencies on Hadoop features that are added to
HBase'?


>
> I imagine there are other custom flavors of Hadoop out there where HBase
> support would be desirable. For example, does MapR implement the same
> security API as Hadoop 1.0.0 does?

I don't know.  I hoped the lads over there would speak up if this were
a suggestion that would mess them up.

> Restricting HBase to a smaller subset of
> Hadoop versions complicates life for existing users...

Agreed but I was thinking that some versions of Hadoop so old and odd
-- e.g. branch-0.20-append -- that we could leave them behind by the
time we get to 0.96, which I SWAG to be autumn of this year.

Good on you Mikhail,
St.Ack

Re: DISCUSS: Have hbase require at least hadoop 1.0.0 in hbase 0.96.0?

Posted by Mikhail Bautin <ba...@gmail.com>.
The current support for multiple versions of HDFS is in my opinion actually
one of the strengths of HBase, and the project will lose that advantage if
we cut support for earlier versions of Hadoop. I think HBase should only
require the simplest possible universally available subset of HDFS API, and
security should be an optional feature, discovered through reflection or
enabled in some other ways.

We have a custom version of Hadoop at Facebook that is not planning to
implement security any time soon. This version of Hadoop runs underneath
what we believe to be some of the largest existing production HBase
deployments. We are currently running the 0.89-fb version of HBase in
production, but are considering moving to a more recent version of HBase at
some point, and it would be great to be able to do that independently of
changing the underlying Hadoop distribution for migration complexity
reasons. Currently we are able to run public HBase trunk on our version of
Hadoop, but once in a while we have to satisfy new dependences on Hadoop
features that are added to HBase. If the changes proposed in this thread
happen, we would have to pull in a lot more security-related dependencies
into our version of Hadoop and, most likely, implement a lot of no-op
stubs. However, that may not be a trivial project, and it certainly would
not add any clarity or value to our Hadoop codebase or HBase / HDFS
interaction.

I imagine there are other custom flavors of Hadoop out there where HBase
support would be desirable. For example, does MapR implement the same
security API as Hadoop 1.0.0 does? Restricting HBase to a smaller subset of
Hadoop versions complicates life for existing users, and makes HBase a less
likely choice for new users, who could go with something like Hypertable
where they have an extra abstraction layer between the database and the
underlying distributed file system implementation.

Thanks,
--Mikhail

On Wed, Mar 7, 2012 at 10:20 AM, Devaraj Das <dd...@hortonworks.com> wrote:

> Given that the token/ugi APIs are being used in other ecosystem components
> too (like Hive, HCatalog & Oozie), and in general, that security model will
> probably hold for other projects too, I think that its not an unfair
> expectation from Hadoop that it should maintain compatibility on UGI/Token*
> interfaces (*smile*).
>
> On Mar 6, 2012, at 11:57 AM, Arun C Murthy wrote:
>
> > Andy - could you please start a discussion?
> >
> > We could, at the very least, mark UGI as LimitedPrivate for HBase and
> work with you guys to maintain compatibility for the future. Makes sense?
> >
> > thanks,
> > Arun
> >
> > On Mar 6, 2012, at 10:21 AM, Andrew Purtell wrote:
> >
> >> After that, I believe we can merge the security sources in. However we
> may have an issue going forward because UGI is an unstable/private API.
> Needs sorting out with core at some point.
> >>
> >> Best regards,
> >>
> >>   - Andy
> >>
> >>
> >> On Mar 6, 2012, at 9:55 AM, Stack <st...@duboce.net> wrote:
> >>
> >>> On Tue, Mar 6, 2012 at 9:10 AM, Andrew Purtell <ap...@apache.org>
> wrote:
> >>>> ...however we can't easily build a single artifact because the secure
> RPC engine, as it interacts with the Hadoop auth framework, must use
> UserGroupInformation.
> >>>>
> >>>
> >>> OK.  So security story needs a bit of work.  Sounds like we have
> >>> enough votes though to require hadoop 1.0.0 at least in 0.96.
> >>>
> >>> St.Ack
> >
> > --
> > Arun C. Murthy
> > Hortonworks Inc.
> > http://hortonworks.com/
> >
> >
>
>

Re: DISCUSS: Have hbase require at least hadoop 1.0.0 in hbase 0.96.0?

Posted by Andrew Purtell <ap...@yahoo.com>.
Thanks Deveraj and Arun. 

The security related APIs should be promoted to public/stable given their increasing adoption. 

At least on the HBase side, I'll take the pain once to rework our related sources if the APIs on their way to stability make one more change. However, it would be preferable to avoid further need for hacks. Use of reflection can ride over an API in transition, but it can also punt breakage due to API change to runtime, where we'd least like to see it for the first time. 

Best regards,

    - Andy


On Mar 7, 2012, at 10:20 AM, Devaraj Das <dd...@hortonworks.com> wrote:

> Given that the token/ugi APIs are being used in other ecosystem components too (like Hive, HCatalog & Oozie), and in general, that security model will probably hold for other projects too, I think that its not an unfair expectation from Hadoop that it should maintain compatibility on UGI/Token* interfaces (*smile*). 
> 
> On Mar 6, 2012, at 11:57 AM, Arun C Murthy wrote:
> 
>> Andy - could you please start a discussion? 
>> 
>> We could, at the very least, mark UGI as LimitedPrivate for HBase and work with you guys to maintain compatibility for the future. Makes sense?
>> 
>> thanks,
>> Arun
>> 
>> On Mar 6, 2012, at 10:21 AM, Andrew Purtell wrote:
>> 
>>> After that, I believe we can merge the security sources in. However we may have an issue going forward because UGI is an unstable/private API. Needs sorting out with core at some point. 
>>> 
>>> Best regards,
>>> 
>>>  - Andy
>>> 
>>> 
>>> On Mar 6, 2012, at 9:55 AM, Stack <st...@duboce.net> wrote:
>>> 
>>>> On Tue, Mar 6, 2012 at 9:10 AM, Andrew Purtell <ap...@apache.org> wrote:
>>>>> ...however we can't easily build a single artifact because the secure RPC engine, as it interacts with the Hadoop auth framework, must use UserGroupInformation.
>>>>> 
>>>> 
>>>> OK.  So security story needs a bit of work.  Sounds like we have
>>>> enough votes though to require hadoop 1.0.0 at least in 0.96.
>>>> 
>>>> St.Ack
>> 
>> --
>> Arun C. Murthy
>> Hortonworks Inc.
>> http://hortonworks.com/
>> 
>> 
> 

Re: DISCUSS: Have hbase require at least hadoop 1.0.0 in hbase 0.96.0?

Posted by Devaraj Das <dd...@hortonworks.com>.
Given that the token/ugi APIs are being used in other ecosystem components too (like Hive, HCatalog & Oozie), and in general, that security model will probably hold for other projects too, I think that its not an unfair expectation from Hadoop that it should maintain compatibility on UGI/Token* interfaces (*smile*). 

On Mar 6, 2012, at 11:57 AM, Arun C Murthy wrote:

> Andy - could you please start a discussion? 
> 
> We could, at the very least, mark UGI as LimitedPrivate for HBase and work with you guys to maintain compatibility for the future. Makes sense?
> 
> thanks,
> Arun
> 
> On Mar 6, 2012, at 10:21 AM, Andrew Purtell wrote:
> 
>> After that, I believe we can merge the security sources in. However we may have an issue going forward because UGI is an unstable/private API. Needs sorting out with core at some point. 
>> 
>> Best regards,
>> 
>>   - Andy
>> 
>> 
>> On Mar 6, 2012, at 9:55 AM, Stack <st...@duboce.net> wrote:
>> 
>>> On Tue, Mar 6, 2012 at 9:10 AM, Andrew Purtell <ap...@apache.org> wrote:
>>>> ...however we can't easily build a single artifact because the secure RPC engine, as it interacts with the Hadoop auth framework, must use UserGroupInformation.
>>>> 
>>> 
>>> OK.  So security story needs a bit of work.  Sounds like we have
>>> enough votes though to require hadoop 1.0.0 at least in 0.96.
>>> 
>>> St.Ack
> 
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
> 
> 


Re: DISCUSS: Have hbase require at least hadoop 1.0.0 in hbase 0.96.0?

Posted by Gary Helmling <gh...@gmail.com>.
> Andy - could you please start a discussion?
>
> We could, at the very least, mark UGI as LimitedPrivate for HBase and work with you guys to maintain compatibility for the future. Makes sense?
>

That would probably help for internal usage of UGI in the secure RPC
engine.  As Andy points out, we do already encapsulate UGI in our own
org.apache.hadoop.hbase.security.User class (which uses reflection to
account for the API incompatibilities) outside of the RPC engine.  We
do also make direct use of some other Hadoop security classes to
implement secure RPC:

org.apache.hadoop.security.authorize.PolicyProvider
org.apache.hadoop.security.authorize.Service
org.apache.hadoop.security.authorize.ServiceAuthorizationManager
org.apache.hadoop.security.SaslInputStream
org.apache.hadoop.security.SaslOutputStream
org.apache.hadoop.security.token.SecretManager
org.apache.hadoop.security.token.Token
org.apache.hadoop.security.token.TokenIdentifier

If we require Hadoop 1.0.0 then these others should at least be
available, though I don't know the API stability of each.  If we
don't, then the best way towards a single build for release seems
continuing towards modularization so that the security classes can be
built in a separate jar and included in the classpath when enabled.
Handling all of these interactions through reflection does not seem
desirable (or sane) to me.

--gh

Re: DISCUSS: Have hbase require at least hadoop 1.0.0 in hbase 0.96.0?

Posted by Arun C Murthy <ac...@hortonworks.com>.
Andy - could you please start a discussion? 

We could, at the very least, mark UGI as LimitedPrivate for HBase and work with you guys to maintain compatibility for the future. Makes sense?

thanks,
Arun

On Mar 6, 2012, at 10:21 AM, Andrew Purtell wrote:

> After that, I believe we can merge the security sources in. However we may have an issue going forward because UGI is an unstable/private API. Needs sorting out with core at some point. 
> 
> Best regards,
> 
>    - Andy
> 
> 
> On Mar 6, 2012, at 9:55 AM, Stack <st...@duboce.net> wrote:
> 
>> On Tue, Mar 6, 2012 at 9:10 AM, Andrew Purtell <ap...@apache.org> wrote:
>>> ...however we can't easily build a single artifact because the secure RPC engine, as it interacts with the Hadoop auth framework, must use UserGroupInformation.
>>> 
>> 
>> OK.  So security story needs a bit of work.  Sounds like we have
>> enough votes though to require hadoop 1.0.0 at least in 0.96.
>> 
>> St.Ack

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/



Re: DISCUSS: Have hbase require at least hadoop 1.0.0 in hbase 0.96.0?

Posted by Andrew Purtell <ap...@yahoo.com>.
After that, I believe we can merge the security sources in. However we may have an issue going forward because UGI is an unstable/private API. Needs sorting out with core at some point. 

Best regards,

    - Andy


On Mar 6, 2012, at 9:55 AM, Stack <st...@duboce.net> wrote:

> On Tue, Mar 6, 2012 at 9:10 AM, Andrew Purtell <ap...@apache.org> wrote:
>> ...however we can't easily build a single artifact because the secure RPC engine, as it interacts with the Hadoop auth framework, must use UserGroupInformation.
>> 
> 
> OK.  So security story needs a bit of work.  Sounds like we have
> enough votes though to require hadoop 1.0.0 at least in 0.96.
> 
> St.Ack

Re: DISCUSS: Have hbase require at least hadoop 1.0.0 in hbase 0.96.0?

Posted by Stack <st...@duboce.net>.
On Tue, Mar 6, 2012 at 9:10 AM, Andrew Purtell <ap...@apache.org> wrote:
> ...however we can't easily build a single artifact because the secure RPC engine, as it interacts with the Hadoop auth framework, must use UserGroupInformation.
>

OK.  So security story needs a bit of work.  Sounds like we have
enough votes though to require hadoop 1.0.0 at least in 0.96.

St.Ack

Re: DISCUSS: Have hbase require at least hadoop 1.0.0 in hbase 0.96.0?

Posted by Andrew Purtell <ap...@apache.org>.
The UserGroupInformation API is incompatible between secure and nonsecure versions **of Hadoop** (among other issues). This leads to two issues:

  - Runtime exceptions. We indeed do use reflection to do run time detection of which variant is available.

  - Compile time errors. We can't do anything about this. Hence the separate profile.


And just FYI security has two components: the totally optional coprocessor-based access controller, and the secure RPC engine as a plug in option. If you don't enable either you won't see any runtime errors; however we can't easily build a single artifact because the secure RPC engine, as it interacts with the Hadoop auth framework, must use UserGroupInformation.


Best regards,


    - Andy


Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)



>________________________________
> From: Stack <st...@duboce.net>
>To: dev@hbase.apache.org 
>Sent: Monday, March 5, 2012 11:59 PM
>Subject: Re: DISCUSS: Have hbase require at least hadoop 1.0.0 in hbase 0.96.0?
> 
>On Fri, Mar 2, 2012 at 12:57 PM, Nicolas Spiegelberg
><ns...@fb.com> wrote:
>> I'm wondering why HDFS security support should be mandatory?  Append makes
>> sense because there's no way to have a durable system without it.
>> Security is currently an optional feature & implemented as an HBase
>> co-processor (vs core), correct?  Is there a problem (other than minor
>> inconvenience) with using introspection APIs for security in the core and
>> then warning if security is enabled but the API is unreachable?
>>
>
>We could try and do that.
>
>The proposal is about pulling up the bottom end on the hadoop's we
>will run on going forward.  If all hadoop's from 1.0.0 on have
>security, and we can depend on that being the case going forward, then
>we could do things like ship a single artifact rather than the two we
>currently ship; one that does not depend on a secure hadoop and
>another that requires it.
>
>I forgot that 0.22 hadoop doesn't have security.  Would suggest that
>we drop support for it too in 0.96 hbase.
>
>St.Ack
>
>
>

Re: DISCUSS: Have hbase require at least hadoop 1.0.0 in hbase 0.96.0?

Posted by Stack <st...@duboce.net>.
On Fri, Mar 2, 2012 at 12:57 PM, Nicolas Spiegelberg
<ns...@fb.com> wrote:
> I'm wondering why HDFS security support should be mandatory?  Append makes
> sense because there's no way to have a durable system without it.
> Security is currently an optional feature & implemented as an HBase
> co-processor (vs core), correct?  Is there a problem (other than minor
> inconvenience) with using introspection APIs for security in the core and
> then warning if security is enabled but the API is unreachable?
>

We could try and do that.

The proposal is about pulling up the bottom end on the hadoop's we
will run on going forward.  If all hadoop's from 1.0.0 on have
security, and we can depend on that being the case going forward, then
we could do things like ship a single artifact rather than the two we
currently ship; one that does not depend on a secure hadoop and
another that requires it.

I forgot that 0.22 hadoop doesn't have security.  Would suggest that
we drop support for it too in 0.96 hbase.

St.Ack

Re: DISCUSS: Have hbase require at least hadoop 1.0.0 in hbase 0.96.0?

Posted by Nicolas Spiegelberg <ns...@fb.com>.
I'm wondering why HDFS security support should be mandatory?  Append makes
sense because there's no way to have a durable system without it.
Security is currently an optional feature & implemented as an HBase
co-processor (vs core), correct?  Is there a problem (other than minor
inconvenience) with using introspection APIs for security in the core and
then warning if security is enabled but the API is unreachable?

Nicolas

On 3/2/12 3:50 PM, "Ted Yu" <yu...@gmail.com> wrote:

>Hadoop 0.22 currently doesn't support security.
>
>FYI
>
>On Fri, Mar 2, 2012 at 11:24 AM, Stack <st...@duboce.net> wrote:
>
>> Should we make it so hbase 0.96.0 requires at least hadoop 1.0.0?
>> This would mean we would no longer support running on older versions
>> such as branch-0.20-append (and perhaps stuff like CDH2?)?
>>
>> Requiring Hadoop 1.0.0 at least means we can presume security and
>> append.  We also narrow the set of hadoops we need to support
>> simplifying things for ourselves some.
>>
>> What you lot think?
>> St.Ack
>>


Re: DISCUSS: Have hbase require at least hadoop 1.0.0 in hbase 0.96.0?

Posted by Ted Yu <yu...@gmail.com>.
Hadoop 0.22 currently doesn't support security.

FYI

On Fri, Mar 2, 2012 at 11:24 AM, Stack <st...@duboce.net> wrote:

> Should we make it so hbase 0.96.0 requires at least hadoop 1.0.0?
> This would mean we would no longer support running on older versions
> such as branch-0.20-append (and perhaps stuff like CDH2?)?
>
> Requiring Hadoop 1.0.0 at least means we can presume security and
> append.  We also narrow the set of hadoops we need to support
> simplifying things for ourselves some.
>
> What you lot think?
> St.Ack
>