You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by Erik fang <fm...@gmail.com> on 2013/08/20 09:07:02 UTC

Implement directory/table level access control in HDFS

Hi folks,


HDFS has a POSIX-like permission model, using R,W,X and owner, group, other
for access control. It is good most of the time, except for:

1. Data need to be shared among users

group can be used for access control, and the users has to be in the same
GROUP as the data. the GROUP here stand for the sharing relationship
between users and data. If many sharing relationships exists, there are
many groups. It is hard to manage.

2. Hive

Hive use a table based access control model, user can have SELECT,  UPDATE,
CREATE, DROP privileges on certain table, which means R/W permission in
HDFS. However, Hive’s table based authorization doesn’t match HDFS’s
POSIX-like model. For hive user accessing HDFS, Group permissions can be
deployed, which introduces many groups, or big groups contains many sharing
relationship.

Inspired by RDBMS’s way of manage data, a  directory level access control
based on authorized user impersonate can be implemented as a extension to
POSIX-like permission model.

it consist of:

1. ACLFileSystem

2. authorization manager: hold access control information and a shared
secret with namenode

3. authenticator(embedded in namenode)

Take hive as a example, owner of the data is user DW. The procedure is:

1. user submit a hive query or a hcatalog job to access DW’s data, we can
get the read table/partition and write table/partition, and the
corresponding hdfs path. Then a RPC call to authorization manager is
invoked, send

{user, tablename, table_path, w/r}

2. authorization manager do a authorization check to find whether it is
allowed. If allowed, reply a encrypted tablepath:

{realuser, encrypted(tablepath+w/r)}

realuser here stand for the owner of the requested data

3. ACLFilesystem extends FileSystem and when a open(path) call is invoked ,
it replace the path to encrypted(tablepath+w/r) and invoke the namenode RPC
call, such as

open(realuser, encrypted(tablepath+w/r), null)

If the user is requesting a partition path, the rpc call can be invoked as

open(realuser, encrypted(tablepath+w/r), path_suffix)

4. Namenode pick up the RPC call, decrypt the encrypted(hdfspath+w/r) with
the shared secret to verify whether it is fake. If it is true, check w/r
operation, join the  tablepath and path_suffix, and invoke the call as
hdfspath owner, for example user DW.


delegation token or something else can be used as the shared secret, and
authorization manager can be integrated into hive metastore.

In general, I propose a HDFS user impersonate mechanism and a authorization
mechanism based on HDFS user impersonation.

If the community is interested, I will file a jira for HDFS user
impersonation and a jira for authorization manager soon.


Thoughts?

Thanks a lot
Erik.fang

Re: Implement directory/table level access control in HDFS

Posted by Erik fang <fm...@gmail.com>.
HDFS-5126 <https://issues.apache.org/jira/browse/HDFS-5126> has been
created for HDFS user impersonation, and I will develop a prototype in a
few weeks

Thanks,
Erik.fang




On Tue, Aug 20, 2013 at 3:07 PM, Erik fang <fm...@gmail.com> wrote:

> Hi folks,
>
>
> HDFS has a POSIX-like permission model, using R,W,X and owner, group,
> other for access control. It is good most of the time, except for:
>
> 1. Data need to be shared among users
>
> group can be used for access control, and the users has to be in the same
> GROUP as the data. the GROUP here stand for the sharing relationship
> between users and data. If many sharing relationships exists, there are
> many groups. It is hard to manage.
>
> 2. Hive
>
> Hive use a table based access control model, user can have SELECT,
>  UPDATE, CREATE, DROP privileges on certain table, which means R/W
> permission in HDFS. However, Hive’s table based authorization doesn’t match
> HDFS’s POSIX-like model. For hive user accessing HDFS, Group permissions
> can be deployed, which introduces many groups, or big groups contains many
> sharing relationship.
>
> Inspired by RDBMS’s way of manage data, a  directory level access control
> based on authorized user impersonate can be implemented as a extension to
> POSIX-like permission model.
>
> it consist of:
>
> 1. ACLFileSystem
>
> 2. authorization manager: hold access control information and a shared
> secret with namenode
>
> 3. authenticator(embedded in namenode)
>
> Take hive as a example, owner of the data is user DW. The procedure is:
>
>  1. user submit a hive query or a hcatalog job to access DW’s data, we
> can get the read table/partition and write table/partition, and the
> corresponding hdfs path. Then a RPC call to authorization manager is
> invoked, send
>
> {user, tablename, table_path, w/r}
>
> 2. authorization manager do a authorization check to find whether it is
> allowed. If allowed, reply a encrypted tablepath:
>
> {realuser, encrypted(tablepath+w/r)}
>
> realuser here stand for the owner of the requested data
>
> 3. ACLFilesystem extends FileSystem and when a open(path) call is invoked
> , it replace the path to encrypted(tablepath+w/r) and invoke the namenode
> RPC call, such as
>
> open(realuser, encrypted(tablepath+w/r), null)
>
> If the user is requesting a partition path, the rpc call can be invoked as
>
> open(realuser, encrypted(tablepath+w/r), path_suffix)
>
> 4. Namenode pick up the RPC call, decrypt the encrypted(hdfspath+w/r) with
> the shared secret to verify whether it is fake. If it is true, check w/r
> operation, join the  tablepath and path_suffix, and invoke the call as
> hdfspath owner, for example user DW.
>
>
> delegation token or something else can be used as the shared secret, and
> authorization manager can be integrated into hive metastore.
>
> In general, I propose a HDFS user impersonate mechanism and a
> authorization mechanism based on HDFS user impersonation.
>
> If the community is interested, I will file a jira for HDFS user
> impersonation and a jira for authorization manager soon.
>
>
> Thoughts?
>
> Thanks a lot
> Erik.fang
>
>

Re: Implement directory/table level access control in HDFS

Posted by Erik fang <fm...@gmail.com>.
HDFS-5126 <https://issues.apache.org/jira/browse/HDFS-5126> has been
created for HDFS user impersonation, and I will develop a prototype in a
few weeks

Thanks,
Erik.fang




On Tue, Aug 20, 2013 at 3:07 PM, Erik fang <fm...@gmail.com> wrote:

> Hi folks,
>
>
> HDFS has a POSIX-like permission model, using R,W,X and owner, group,
> other for access control. It is good most of the time, except for:
>
> 1. Data need to be shared among users
>
> group can be used for access control, and the users has to be in the same
> GROUP as the data. the GROUP here stand for the sharing relationship
> between users and data. If many sharing relationships exists, there are
> many groups. It is hard to manage.
>
> 2. Hive
>
> Hive use a table based access control model, user can have SELECT,
>  UPDATE, CREATE, DROP privileges on certain table, which means R/W
> permission in HDFS. However, Hive’s table based authorization doesn’t match
> HDFS’s POSIX-like model. For hive user accessing HDFS, Group permissions
> can be deployed, which introduces many groups, or big groups contains many
> sharing relationship.
>
> Inspired by RDBMS’s way of manage data, a  directory level access control
> based on authorized user impersonate can be implemented as a extension to
> POSIX-like permission model.
>
> it consist of:
>
> 1. ACLFileSystem
>
> 2. authorization manager: hold access control information and a shared
> secret with namenode
>
> 3. authenticator(embedded in namenode)
>
> Take hive as a example, owner of the data is user DW. The procedure is:
>
>  1. user submit a hive query or a hcatalog job to access DW’s data, we
> can get the read table/partition and write table/partition, and the
> corresponding hdfs path. Then a RPC call to authorization manager is
> invoked, send
>
> {user, tablename, table_path, w/r}
>
> 2. authorization manager do a authorization check to find whether it is
> allowed. If allowed, reply a encrypted tablepath:
>
> {realuser, encrypted(tablepath+w/r)}
>
> realuser here stand for the owner of the requested data
>
> 3. ACLFilesystem extends FileSystem and when a open(path) call is invoked
> , it replace the path to encrypted(tablepath+w/r) and invoke the namenode
> RPC call, such as
>
> open(realuser, encrypted(tablepath+w/r), null)
>
> If the user is requesting a partition path, the rpc call can be invoked as
>
> open(realuser, encrypted(tablepath+w/r), path_suffix)
>
> 4. Namenode pick up the RPC call, decrypt the encrypted(hdfspath+w/r) with
> the shared secret to verify whether it is fake. If it is true, check w/r
> operation, join the  tablepath and path_suffix, and invoke the call as
> hdfspath owner, for example user DW.
>
>
> delegation token or something else can be used as the shared secret, and
> authorization manager can be integrated into hive metastore.
>
> In general, I propose a HDFS user impersonate mechanism and a
> authorization mechanism based on HDFS user impersonation.
>
> If the community is interested, I will file a jira for HDFS user
> impersonation and a jira for authorization manager soon.
>
>
> Thoughts?
>
> Thanks a lot
> Erik.fang
>
>