You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Allen Wittenauer (JIRA)" <ji...@apache.org> on 2008/10/09 02:17:44 UTC

[jira] Issue Comment Edited: (HADOOP-4348) Adding service-level authorization to Hadoop

    [ https://issues.apache.org/jira/browse/HADOOP-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12638149#action_12638149 ] 

aw edited comment on HADOOP-4348 at 10/8/08 5:16 PM:
-------------------------------------------------------------------

We have two grids.  We only want data to flow one way.  If a user has an account on both grids, we would need to put IP-level protections in place in order to enforce the data flow policy (since only user-level protection would let the data flow both ways). The question comes down to whether Hadoop should do this or should there be a firewall rule in place.

In my mind, this is better handled by Hadoop, since there is a very good chance that you may only want certain users to have these restrictions.  For example, normal users have to abide by the policy, but an automated task that copies data could be free to go any direction.

Additionally, in a homogeneous set of nodes, it may be that the hadoop-site.xml file is the same for everything but the namenode configuration option such that it can be easily templated by a configuration management system.  I'm highly concerned that putting something like ACLs in the hadoop-site.xml file.  It would greatly increase the amount of work an ops team of many grids would have to do.  If there is a GUI put on top of ACLs (as mentioned by the scheduling team), then I *definitely* do not want it mucking with hadoop-site.xml.  Corruption of that file would likely mean my entire grid goes down vs. users just losing access.

      was (Author: aw):
    We have two grids.  We only want data to flow one way.  If a user has an account on both grids, we would need to put IP-level protections in place in order to enforce the data flow policy (since only user-level protection would let the data flow both ways). The question comes down to whether Hadoop should do this or should there be a firewall rule in place.

In my mind, this is better handled by Hadoop, since there is a very good chance that you may only want certain users to have these restrictions.  For example, normal users have to abide by the policy, but an automated task that copies data could be free to go any direction.

Additionally, in a homogeneous set of nodes, it may be that the hadoop-site.xml file is the same for everything but the namenode, which can be easily templated by a configuration management system.  I'm highly concerned that putting something like ACLs in the hadoop-site.xml file.  It would greatly increase the amount of work an ops team of many grids would have to do.  If there is a GUI put on top of ACLs (as mentioned by the scheduling team), then I *definitely* do not want it mucking with hadoop-site.xml.  Corruption of that file would likely mean my entire grid goes down vs. users just losing access.
  
> Adding service-level authorization to Hadoop
> --------------------------------------------
>
>                 Key: HADOOP-4348
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4348
>             Project: Hadoop Core
>          Issue Type: New Feature
>            Reporter: Kan Zhang
>            Assignee: Arun C Murthy
>             Fix For: 0.20.0
>
>
> Service-level authorization is the initial checking done by a Hadoop service to find out if a connecting client is a pre-defined user of that service. If not, the connection or service request will be declined. This feature allows services to limit access to a clearly defined group of users. For example, service-level authorization allows "world-readable" files on a HDFS cluster to be readable only by the pre-defined users of that cluster, not by anyone who can connect to the cluster. It also allows a M/R cluster to define its group of users so that only those users can submit jobs to it.
> Here is an initial list of requirements I came up with.
>     1. Users of a cluster is defined by a flat list of usernames and groups. A client is a user of the cluster if and only if her username is listed in the flat list or one of her groups is explicitly listed in the flat list. Nested groups are not supported.
>     2. The flat list is stored in a conf file and pushed to every cluster node so that services can access them.
>     3. Services will monitor the modification of the conf file periodically (5 mins interval by default) and reload the list if needed.
>     4. Checking against the flat list is done as early as possible and before any other authorization checking. Both HDFS and M/R clusters will implement this feature.
>     5. This feature can be switched off and is off by default.
> I'm aware of interests in pulling user data from LDAP. For this JIRA, I suggest we implement it using a conf file. Additional data sources may be supported via new JIRA's.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.