You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by Giovanni Tusa <gi...@gmail.com> on 2009/07/22 18:00:48 UTC

Current security implementation in Hadoop

Hi all,

on my company we recently began some projects where we would like to
actively use hadoop.
For one of them, basically we have implemented a feed service where feed
entries (there could be a lot of data because many different users can post
feeds), are actually stored in the server filesystem (for the moment only
one server) by using Hadoop.S

Such a system needs some security improvements, and I'm wondering what kind
of security mechanism is currently provided by Hadoop.
For example, only allowed users should be able to access, from the browser,
to our service to read data (because they will normally be sensitive data),
provided that they have access credentials (authenticated and authorized
users). Also, the same data should be secured by using some forms of
encryption for the server-to-client data transfer.
I read some old posts and discussions inside the hadoop project concerning
security, but I am a little confused on what is actually implemented
concerning authentication, authorization and encryption, on which version of
hadoop, and if something currently provided can match my needs.

If someone could point me to the right direction, your help will be very
appreciated.

Giovanni

Re: Current security implementation in Hadoop

Posted by Ted Dunning <te...@gmail.com>.
Your architecture is a bit unusual in that you seem to be proposing that
users get direct access to the hadoop storage layer.

More common is to have a controller layer that mediates requests to store or
read data.

With that layer of abstraction, you can deal with some of the problems
associated with file update.  See the recent hbase work, for instance.

Even with that layer of abstraction and the recent massive improvements to
hbase, Hadoop still tends to be much better for batch processing rather than
real-time support of ad hoc user data reads and writes.  Depending on the
data you have and the update patterns, you might be much happier with a
clustered key-value store like Voldemort or Cassandra.  Voldemort especially
has very nice capabilities for dumping large amounts of data from hadoop
into a large store.  It also works to support real-time (ish) random reads
and writes.

On Thu, Jul 23, 2009 at 6:44 AM, Giovanni Tusa <gi...@gmail.com> wrote:

> Could you also suggest me some other useful links, maybe with examples if
> any, on how to implement such a mechanism?
>



-- 
Ted Dunning, CTO
DeepDyve

Re: Current security implementation in Hadoop

Posted by Giovanni Tusa <gi...@gmail.com>.
Hi,
thank you both for the links and advices. Looking at the work in progress
concerning security, it seems that currently the only way to have
authentication is to implement it at the Web Server level. In my case, my
web server is deployed under Tomcat, therefore I think I should manage
clients authentication with Tomcat, and then, in case, forward the obtained
credentials in some way from my service downstream to hadoop, when I could
define service level authorization and/or users permissions. Is that right?
Could you also suggest me some other useful links, maybe with examples if
any, on how to implement such a mechanism?

Thank you again,
Giovanni

2009/7/22 Andrey Pankov <ap...@iponweb.net>

> Maybe this would be interesting for you,
>
>
> http://www.hadoop.iponweb.net/Home/hdfs-over-webdav/authentication-and-permissions
>
> On Wed, Jul 22, 2009 at 19:00, Giovanni Tusa<gi...@gmail.com> wrote:
> > Hi all,
> >
> > on my company we recently began some projects where we would like to
> > actively use hadoop.
> > For one of them, basically we have implemented a feed service where feed
> > entries (there could be a lot of data because many different users can
> post
> > feeds), are actually stored in the server filesystem (for the moment only
> > one server) by using Hadoop.S
> >
> > Such a system needs some security improvements, and I'm wondering what
> kind
> > of security mechanism is currently provided by Hadoop.
> > For example, only allowed users should be able to access, from the
> browser,
> > to our service to read data (because they will normally be sensitive
> data),
> > provided that they have access credentials (authenticated and authorized
> > users). Also, the same data should be secured by using some forms of
> > encryption for the server-to-client data transfer.
> > I read some old posts and discussions inside the hadoop project
> concerning
> > security, but I am a little confused on what is actually implemented
> > concerning authentication, authorization and encryption, on which version
> of
> > hadoop, and if something currently provided can match my needs.
> >
> > If someone could point me to the right direction, your help will be very
> > appreciated.
> >
> > Giovanni
> >
>
>
>
> --
> Andrey Pankov
>

Re: Current security implementation in Hadoop

Posted by Dhruba Borthakur <dh...@gmail.com>.
Also, you can look at work in progress:
http://issues.apache.org/jira/browse/HADOOP-4487

thanks,
dhruba

On Wed, Jul 22, 2009 at 9:11 AM, Andrey Pankov <ap...@iponweb.net> wrote:

> Maybe this would be interesting for you,
>
>
> http://www.hadoop.iponweb.net/Home/hdfs-over-webdav/authentication-and-permissions
>
> On Wed, Jul 22, 2009 at 19:00, Giovanni Tusa<gi...@gmail.com> wrote:
> > Hi all,
> >
> > on my company we recently began some projects where we would like to
> > actively use hadoop.
> > For one of them, basically we have implemented a feed service where feed
> > entries (there could be a lot of data because many different users can
> post
> > feeds), are actually stored in the server filesystem (for the moment only
> > one server) by using Hadoop.S
> >
> > Such a system needs some security improvements, and I'm wondering what
> kind
> > of security mechanism is currently provided by Hadoop.
> > For example, only allowed users should be able to access, from the
> browser,
> > to our service to read data (because they will normally be sensitive
> data),
> > provided that they have access credentials (authenticated and authorized
> > users). Also, the same data should be secured by using some forms of
> > encryption for the server-to-client data transfer.
> > I read some old posts and discussions inside the hadoop project
> concerning
> > security, but I am a little confused on what is actually implemented
> > concerning authentication, authorization and encryption, on which version
> of
> > hadoop, and if something currently provided can match my needs.
> >
> > If someone could point me to the right direction, your help will be very
> > appreciated.
> >
> > Giovanni
> >
>
>
>
> --
> Andrey Pankov
>

Re: Current security implementation in Hadoop

Posted by Andrey Pankov <ap...@iponweb.net>.
Maybe this would be interesting for you,

http://www.hadoop.iponweb.net/Home/hdfs-over-webdav/authentication-and-permissions

On Wed, Jul 22, 2009 at 19:00, Giovanni Tusa<gi...@gmail.com> wrote:
> Hi all,
>
> on my company we recently began some projects where we would like to
> actively use hadoop.
> For one of them, basically we have implemented a feed service where feed
> entries (there could be a lot of data because many different users can post
> feeds), are actually stored in the server filesystem (for the moment only
> one server) by using Hadoop.S
>
> Such a system needs some security improvements, and I'm wondering what kind
> of security mechanism is currently provided by Hadoop.
> For example, only allowed users should be able to access, from the browser,
> to our service to read data (because they will normally be sensitive data),
> provided that they have access credentials (authenticated and authorized
> users). Also, the same data should be secured by using some forms of
> encryption for the server-to-client data transfer.
> I read some old posts and discussions inside the hadoop project concerning
> security, but I am a little confused on what is actually implemented
> concerning authentication, authorization and encryption, on which version of
> hadoop, and if something currently provided can match my needs.
>
> If someone could point me to the right direction, your help will be very
> appreciated.
>
> Giovanni
>



-- 
Andrey Pankov