You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hbase.apache.org by erwin x <er...@gmail.com> on 2012/06/17 14:29:16 UTC

HBase security research

Hi all,

I am investigating how HBase can be used to store sensitive/confidential
information.
This research is part of my master thesis for computing science at a
university.

The research involves mostly confidentiality, for example:
 - Describing the location of the data within the distributed system
 - Role based access control
 - Fine grained access control (at column/row level)
 - Build-in encryption based on the role
 - The impact on performance and validation of the above security.

My questions are:

1) are the above features interesting for HBase?
2) should I propose my changes and results in the Jira of HBase?

This research assumes that the data is so sensitive that even
administrators, developers or other malicious accessors may not see
the data unless they have an authorized role.

 If I observed correctly (correct me if I am wrong), security in HBase
now focuses primarily on authentication and discretionary access
control and assumes that no malicious user has access to the
underlying system, for example HDFS, hard drive or shell access because
data can still be read in that way. My research focuses on extending
HBase security with more authorization and confidentiality features.

Thanks in advance!

Kind regards,
erwinx

Re: HBase security research

Posted by Enis Söztutar <en...@hortonworks.com>.

Hi,

This sounds interesting. I cannot comment on the research / paper aspects
of your work, but from my experience, there are different considerations
for implementing something for research and the open source side of things.
If you are interested in getting your work into the project, the best way
might not be forking the project now, and sometime later, pointing to the
forked code, which will have a lot of core changes. And it will be
extremely difficult to merge your code back.

Again, if you are interested in contributing back, I would suggest, opening
a jira, and describing the problem you are trying to solve, with a high
level roadmap of the changes. Then, if there is some interest, PMC can even
create a branch in svn, or you can manage a branch yourself. Then you can
 open subtasks, for each discrete change you propose, and work on that.
This will also guarantee, that at least some parts of your work can be
merged back, and the committers can easily evaluate your patches.

Cheers,
Enis

On Mon, Jun 18, 2012 at 1:25 AM, erwin x <er...@gmail.com> wrote:

> Hi all,
>
> Thank you for all your enthusiastic responses!
>
> I already started to hack the HBase security source code a litte bit,
> but I think a separate project using co-processors is indeed a good
> idea. I can then create a separate solution for the research problem
> using as much as possible of the existing (security) code.
>
> It is possible that during my research, in the mean time some features
> could be implemented by other developers. Although I perhaps could
> actively help developing, it makes it difficult to do independent research,
> as the requirements are a lit bit different that current solutions,
> and it takes some time to create a complete solution for the
> research problem (roughly described in first mail).
>
> Therefore I think I propose the project when it is completed,
> so you can look if there are some interesting features/results.
>
> I saw various Jira issues that relate to my research, except
> HBase-6222 as it was created this weekend :) Thank you
> for mentioning that. When some issues are still open at the
> end of my research, I perhaps can relate them to my research.
>
> I will now continue with my research and will return to you
> if I have some updates. It is good to know that there is
> some interest in the research :)
>
> Kind regards,
> erwinx
>
> On Sun, Jun 17, 2012 at 7:30 PM, Andrew Purtell <andrew.purtell@gmail.com
> >wrote:
>
> > HBASE-1697, apologies for that.
> >
> > On Jun 17, 2012, at 10:28 AM, Andrew Purtell <an...@gmail.com>
> > wrote:
> >
> > > I'd also encourage you to read HBASE-1637 and subtasks to see what has
> > already gone in and how it was implemented basically as Joey had
> suggested.
> > If you reimplement something the first question that will be asked is
> what
> > part of HBase code can be reused I.e incremental dev is preferred where
> > possible.
> > >
> > > Your work sounds interesting and also challenging as it seems you may
> > have to substantially hack the DFSClient as well as HBase.
> > >
> > >   - Andy
> > >
> > > On Jun 17, 2012, at 8:40 AM, Jonathan Hsieh <jo...@cloudera.com> wrote:
> > >
> > >> Hi erwinx,
> > >>
> > >> Sounds interesting to me!
> > >>
> > >> If your purposes are to research/a paper,  I'm always a fan of
> spending
> > >> some time to define the problem (something constrained to 2 pages
> would
> > be
> > >> good) you are trying to solve.  I find it personally helpful to myself
> > and
> > >> it would help us greatly if you ask us for implementation advice!
>  After
> > >> that I'd following Joey's advice as an implementation avenue -- start
> > >> hacking using the coprocessor interface.
> > >>
> > >> Does your goal also includes potential integration as part of HBase?
> > >>
> > >> The threat model sketch you are assuming sounds interesting.  Up to
> this
> > >> point, our threat model is roughly gives the attacker only the ability
> > to
> > >> make arbitrary rpcs, the ability to sniff client traffic, but also
> > someone
> > >> who does not have credentials to get to the underlying hdfs file
> system.
> > >>
> > >> There are a few related issues that may be related to what you  are
> > looking
> > >> into on the bug/feature tracker.  Here are some links to get started:
> >  It
> > >> would be nice to frame what you are trying to solve in relation to
> > those.
> > >> :)
> > >>
> > >> https://issues.apache.org/jira/browse/HBASE-6222 Key value visibility
> > tags.
> > >> https://issues.apache.org/jira/browse/HBASE-1697 DAC umbrella
> > >>
> > >> Jon.
> > >>
> > >> On Sun, Jun 17, 2012 at 5:29 AM, erwin x <er...@gmail.com>
> wrote:
> > >>
> > >>> Hi all,
> > >>>
> > >>> I am investigating how HBase can be used to store
> > sensitive/confidential
> > >>> information.
> > >>> This research is part of my master thesis for computing science at a
> > >>> university.
> > >>>
> > >>> The research involves mostly confidentiality, for example:
> > >>> - Describing the location of the data within the distributed system
> > >>> - Role based access control
> > >>> - Fine grained access control (at column/row level)
> > >>> - Build-in encryption based on the role
> > >>> - The impact on performance and validation of the above security.
> > >>>
> > >>> My questions are:
> > >>>
> > >>> 1) are the above features interesting for HBase?
> > >>> 2) should I propose my changes and results in the Jira of HBase?
> > >>>
> > >>> This research assumes that the data is so sensitive that even
> > >>> administrators, developers or other malicious accessors may not see
> > >>> the data unless they have an authorized role.
> > >>>
> > >>> If I observed correctly (correct me if I am wrong), security in HBase
> > >>> now focuses primarily on authentication and discretionary access
> > >>> control and assumes that no malicious user has access to the
> > >>> underlying system, for example HDFS, hard drive or shell access
> because
> > >>> data can still be read in that way. My research focuses on extending
> > >>> HBase security with more authorization and confidentiality features.
> > >>>
> > >>> Thanks in advance!
> > >>>
> > >>> Kind regards,
> > >>> erwinx
> > >>>
> > >>
> > >>
> > >>
> > >> --
> > >> // Jonathan Hsieh (shay)
> > >> // Software Engineer, Cloudera
> > >> // jon@cloudera.com
> >
> >
>

Re: HBase security research

Posted by erwin x <er...@gmail.com>.

Hi all,

Thank you for all your enthusiastic responses!

I already started to hack the HBase security source code a litte bit,
but I think a separate project using co-processors is indeed a good
idea. I can then create a separate solution for the research problem
using as much as possible of the existing (security) code.

It is possible that during my research, in the mean time some features
could be implemented by other developers. Although I perhaps could
actively help developing, it makes it difficult to do independent research,
as the requirements are a lit bit different that current solutions,
and it takes some time to create a complete solution for the
research problem (roughly described in first mail).

Therefore I think I propose the project when it is completed,
so you can look if there are some interesting features/results.

I saw various Jira issues that relate to my research, except
HBase-6222 as it was created this weekend :) Thank you
for mentioning that. When some issues are still open at the
end of my research, I perhaps can relate them to my research.

I will now continue with my research and will return to you
if I have some updates. It is good to know that there is
some interest in the research :)

Kind regards,
erwinx

On Sun, Jun 17, 2012 at 7:30 PM, Andrew Purtell <an...@gmail.com>wrote:

> HBASE-1697, apologies for that.
>
> On Jun 17, 2012, at 10:28 AM, Andrew Purtell <an...@gmail.com>
> wrote:
>
> > I'd also encourage you to read HBASE-1637 and subtasks to see what has
> already gone in and how it was implemented basically as Joey had suggested.
> If you reimplement something the first question that will be asked is what
> part of HBase code can be reused I.e incremental dev is preferred where
> possible.
> >
> > Your work sounds interesting and also challenging as it seems you may
> have to substantially hack the DFSClient as well as HBase.
> >
> >   - Andy
> >
> > On Jun 17, 2012, at 8:40 AM, Jonathan Hsieh <jo...@cloudera.com> wrote:
> >
> >> Hi erwinx,
> >>
> >> Sounds interesting to me!
> >>
> >> If your purposes are to research/a paper,  I'm always a fan of spending
> >> some time to define the problem (something constrained to 2 pages would
> be
> >> good) you are trying to solve.  I find it personally helpful to myself
> and
> >> it would help us greatly if you ask us for implementation advice!  After
> >> that I'd following Joey's advice as an implementation avenue -- start
> >> hacking using the coprocessor interface.
> >>
> >> Does your goal also includes potential integration as part of HBase?
> >>
> >> The threat model sketch you are assuming sounds interesting.  Up to this
> >> point, our threat model is roughly gives the attacker only the ability
> to
> >> make arbitrary rpcs, the ability to sniff client traffic, but also
> someone
> >> who does not have credentials to get to the underlying hdfs file system.
> >>
> >> There are a few related issues that may be related to what you  are
> looking
> >> into on the bug/feature tracker.  Here are some links to get started:
>  It
> >> would be nice to frame what you are trying to solve in relation to
> those.
> >> :)
> >>
> >> https://issues.apache.org/jira/browse/HBASE-6222 Key value visibility
> tags.
> >> https://issues.apache.org/jira/browse/HBASE-1697 DAC umbrella
> >>
> >> Jon.
> >>
> >> On Sun, Jun 17, 2012 at 5:29 AM, erwin x <er...@gmail.com> wrote:
> >>
> >>> Hi all,
> >>>
> >>> I am investigating how HBase can be used to store
> sensitive/confidential
> >>> information.
> >>> This research is part of my master thesis for computing science at a
> >>> university.
> >>>
> >>> The research involves mostly confidentiality, for example:
> >>> - Describing the location of the data within the distributed system
> >>> - Role based access control
> >>> - Fine grained access control (at column/row level)
> >>> - Build-in encryption based on the role
> >>> - The impact on performance and validation of the above security.
> >>>
> >>> My questions are:
> >>>
> >>> 1) are the above features interesting for HBase?
> >>> 2) should I propose my changes and results in the Jira of HBase?
> >>>
> >>> This research assumes that the data is so sensitive that even
> >>> administrators, developers or other malicious accessors may not see
> >>> the data unless they have an authorized role.
> >>>
> >>> If I observed correctly (correct me if I am wrong), security in HBase
> >>> now focuses primarily on authentication and discretionary access
> >>> control and assumes that no malicious user has access to the
> >>> underlying system, for example HDFS, hard drive or shell access because
> >>> data can still be read in that way. My research focuses on extending
> >>> HBase security with more authorization and confidentiality features.
> >>>
> >>> Thanks in advance!
> >>>
> >>> Kind regards,
> >>> erwinx
> >>>
> >>
> >>
> >>
> >> --
> >> // Jonathan Hsieh (shay)
> >> // Software Engineer, Cloudera
> >> // jon@cloudera.com
>
>

Re: HBase security research

Posted by Andrew Purtell <an...@gmail.com>.

HBASE-1697, apologies for that. 

On Jun 17, 2012, at 10:28 AM, Andrew Purtell <an...@gmail.com> wrote:

> I'd also encourage you to read HBASE-1637 and subtasks to see what has already gone in and how it was implemented basically as Joey had suggested. If you reimplement something the first question that will be asked is what part of HBase code can be reused I.e incremental dev is preferred where possible. 
> 
> Your work sounds interesting and also challenging as it seems you may have to substantially hack the DFSClient as well as HBase. 
> 
>   - Andy
> 
> On Jun 17, 2012, at 8:40 AM, Jonathan Hsieh <jo...@cloudera.com> wrote:
> 
>> Hi erwinx,
>> 
>> Sounds interesting to me!
>> 
>> If your purposes are to research/a paper,  I'm always a fan of spending
>> some time to define the problem (something constrained to 2 pages would be
>> good) you are trying to solve.  I find it personally helpful to myself and
>> it would help us greatly if you ask us for implementation advice!  After
>> that I'd following Joey's advice as an implementation avenue -- start
>> hacking using the coprocessor interface.
>> 
>> Does your goal also includes potential integration as part of HBase?
>> 
>> The threat model sketch you are assuming sounds interesting.  Up to this
>> point, our threat model is roughly gives the attacker only the ability to
>> make arbitrary rpcs, the ability to sniff client traffic, but also someone
>> who does not have credentials to get to the underlying hdfs file system.
>> 
>> There are a few related issues that may be related to what you  are looking
>> into on the bug/feature tracker.  Here are some links to get started:  It
>> would be nice to frame what you are trying to solve in relation to those.
>> :)
>> 
>> https://issues.apache.org/jira/browse/HBASE-6222 Key value visibility tags.
>> https://issues.apache.org/jira/browse/HBASE-1697 DAC umbrella
>> 
>> Jon.
>> 
>> On Sun, Jun 17, 2012 at 5:29 AM, erwin x <er...@gmail.com> wrote:
>> 
>>> Hi all,
>>> 
>>> I am investigating how HBase can be used to store sensitive/confidential
>>> information.
>>> This research is part of my master thesis for computing science at a
>>> university.
>>> 
>>> The research involves mostly confidentiality, for example:
>>> - Describing the location of the data within the distributed system
>>> - Role based access control
>>> - Fine grained access control (at column/row level)
>>> - Build-in encryption based on the role
>>> - The impact on performance and validation of the above security.
>>> 
>>> My questions are:
>>> 
>>> 1) are the above features interesting for HBase?
>>> 2) should I propose my changes and results in the Jira of HBase?
>>> 
>>> This research assumes that the data is so sensitive that even
>>> administrators, developers or other malicious accessors may not see
>>> the data unless they have an authorized role.
>>> 
>>> If I observed correctly (correct me if I am wrong), security in HBase
>>> now focuses primarily on authentication and discretionary access
>>> control and assumes that no malicious user has access to the
>>> underlying system, for example HDFS, hard drive or shell access because
>>> data can still be read in that way. My research focuses on extending
>>> HBase security with more authorization and confidentiality features.
>>> 
>>> Thanks in advance!
>>> 
>>> Kind regards,
>>> erwinx
>>> 
>> 
>> 
>> 
>> -- 
>> // Jonathan Hsieh (shay)
>> // Software Engineer, Cloudera
>> // jon@cloudera.com

Re: HBase security research

Posted by Andrew Purtell <an...@gmail.com>.

I'd also encourage you to read HBASE-1637 and subtasks to see what has already gone in and how it was implemented basically as Joey had suggested. If you reimplement something the first question that will be asked is what part of HBase code can be reused I.e incremental dev is preferred where possible. 

Your work sounds interesting and also challenging as it seems you may have to substantially hack the DFSClient as well as HBase. 

    - Andy

On Jun 17, 2012, at 8:40 AM, Jonathan Hsieh <jo...@cloudera.com> wrote:

> Hi erwinx,
> 
> Sounds interesting to me!
> 
> If your purposes are to research/a paper,  I'm always a fan of spending
> some time to define the problem (something constrained to 2 pages would be
> good) you are trying to solve.  I find it personally helpful to myself and
> it would help us greatly if you ask us for implementation advice!  After
> that I'd following Joey's advice as an implementation avenue -- start
> hacking using the coprocessor interface.
> 
> Does your goal also includes potential integration as part of HBase?
> 
> The threat model sketch you are assuming sounds interesting.  Up to this
> point, our threat model is roughly gives the attacker only the ability to
> make arbitrary rpcs, the ability to sniff client traffic, but also someone
> who does not have credentials to get to the underlying hdfs file system.
> 
> There are a few related issues that may be related to what you  are looking
> into on the bug/feature tracker.  Here are some links to get started:  It
> would be nice to frame what you are trying to solve in relation to those.
> :)
> 
> https://issues.apache.org/jira/browse/HBASE-6222 Key value visibility tags.
> https://issues.apache.org/jira/browse/HBASE-1697 DAC umbrella
> 
> Jon.
> 
> On Sun, Jun 17, 2012 at 5:29 AM, erwin x <er...@gmail.com> wrote:
> 
>> Hi all,
>> 
>> I am investigating how HBase can be used to store sensitive/confidential
>> information.
>> This research is part of my master thesis for computing science at a
>> university.
>> 
>> The research involves mostly confidentiality, for example:
>> - Describing the location of the data within the distributed system
>> - Role based access control
>> - Fine grained access control (at column/row level)
>> - Build-in encryption based on the role
>> - The impact on performance and validation of the above security.
>> 
>> My questions are:
>> 
>> 1) are the above features interesting for HBase?
>> 2) should I propose my changes and results in the Jira of HBase?
>> 
>> This research assumes that the data is so sensitive that even
>> administrators, developers or other malicious accessors may not see
>> the data unless they have an authorized role.
>> 
>> If I observed correctly (correct me if I am wrong), security in HBase
>> now focuses primarily on authentication and discretionary access
>> control and assumes that no malicious user has access to the
>> underlying system, for example HDFS, hard drive or shell access because
>> data can still be read in that way. My research focuses on extending
>> HBase security with more authorization and confidentiality features.
>> 
>> Thanks in advance!
>> 
>> Kind regards,
>> erwinx
>> 
> 
> 
> 
> -- 
> // Jonathan Hsieh (shay)
> // Software Engineer, Cloudera
> // jon@cloudera.com

Re: HBase security research

Posted by Jonathan Hsieh <jo...@cloudera.com>.

Hi erwinx,

Sounds interesting to me!

If your purposes are to research/a paper,  I'm always a fan of spending
some time to define the problem (something constrained to 2 pages would be
good) you are trying to solve.  I find it personally helpful to myself and
it would help us greatly if you ask us for implementation advice!  After
that I'd following Joey's advice as an implementation avenue -- start
hacking using the coprocessor interface.

Does your goal also includes potential integration as part of HBase?

The threat model sketch you are assuming sounds interesting.  Up to this
point, our threat model is roughly gives the attacker only the ability to
make arbitrary rpcs, the ability to sniff client traffic, but also someone
who does not have credentials to get to the underlying hdfs file system.

There are a few related issues that may be related to what you  are looking
into on the bug/feature tracker.  Here are some links to get started:  It
would be nice to frame what you are trying to solve in relation to those.
:)

https://issues.apache.org/jira/browse/HBASE-6222 Key value visibility tags.
https://issues.apache.org/jira/browse/HBASE-1697 DAC umbrella

Jon.

On Sun, Jun 17, 2012 at 5:29 AM, erwin x <er...@gmail.com> wrote:

> Hi all,
>
> I am investigating how HBase can be used to store sensitive/confidential
> information.
> This research is part of my master thesis for computing science at a
> university.
>
> The research involves mostly confidentiality, for example:
>  - Describing the location of the data within the distributed system
>  - Role based access control
>  - Fine grained access control (at column/row level)
>  - Build-in encryption based on the role
>  - The impact on performance and validation of the above security.
>
> My questions are:
>
> 1) are the above features interesting for HBase?
> 2) should I propose my changes and results in the Jira of HBase?
>
> This research assumes that the data is so sensitive that even
> administrators, developers or other malicious accessors may not see
> the data unless they have an authorized role.
>
>  If I observed correctly (correct me if I am wrong), security in HBase
> now focuses primarily on authentication and discretionary access
> control and assumes that no malicious user has access to the
> underlying system, for example HDFS, hard drive or shell access because
> data can still be read in that way. My research focuses on extending
> HBase security with more authorization and confidentiality features.
>
> Thanks in advance!
>
> Kind regards,
> erwinx
>

-- 
// Jonathan Hsieh (shay)
// Software Engineer, Cloudera
// jon@cloudera.com

Re: HBase security research

Posted by Joey Echeverria <jo...@cloudera.com>.

Hey Erwinx,

You're research sounds very interesting, and the features you describe
are definitely useful for some use cases. The easiest way to implement
these features would to use co-processors, which let you extend HBase
without the modifying the core. My recommendation would be to build
the features in a standalone project and propose JIRAs for any changes
to the core required to support the project.

This would give more time for the community to evaluate the changes
and isolate the changes to core to smaller JIRAs that are easier to
integrate. Eventually you could propose merging the project into HBase
after there's proven demand in the community.

Either way, please keep the dev list up-to-date on your progress.

Good on you!

-Joey

On Sun, Jun 17, 2012 at 8:29 AM, erwin x <er...@gmail.com> wrote:
> Hi all,
>
> I am investigating how HBase can be used to store sensitive/confidential
> information.
> This research is part of my master thesis for computing science at a
> university.
>
> The research involves mostly confidentiality, for example:
>  - Describing the location of the data within the distributed system
>  - Role based access control
>  - Fine grained access control (at column/row level)
>  - Build-in encryption based on the role
>  - The impact on performance and validation of the above security.
>
> My questions are:
>
> 1) are the above features interesting for HBase?
> 2) should I propose my changes and results in the Jira of HBase?
>
> This research assumes that the data is so sensitive that even
> administrators, developers or other malicious accessors may not see
> the data unless they have an authorized role.
>
>  If I observed correctly (correct me if I am wrong), security in HBase
> now focuses primarily on authentication and discretionary access
> control and assumes that no malicious user has access to the
> underlying system, for example HDFS, hard drive or shell access because
> data can still be read in that way. My research focuses on extending
> HBase security with more authorization and confidentiality features.
>
> Thanks in advance!
>
> Kind regards,
> erwinx

-- 
Joey Echeverria
Principal Solutions Architect
Cloudera, Inc.