You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by bigbibguy father <bi...@gmail.com> on 2011/10/01 04:19:46 UTC

Hadoop Security - TaskTracker and Active Directory

We are planning to enable secure Hadoop using Kerberos.

Our users reside in the active directory. We read that there are two options
 to use Kerberos for securing Hadoop.

1) You run Kerberos on machine local to the cluster and create service
principals here
2) Use Active Directory itself as the kerberos KDC and create service
principals also in Active Directory.

It seems cloudera and industry in general recommends option1 of running a
local KDC for authernticating service principals.
https://ccp.cloudera.com/display/CDHDOC/Integrating+Hadoop+Security+with+Active+Directory

 I read that the tasktrackers run tasks as the user who submitted the user.
In that case , doesn't the TaskTracker nodes need to talk to the Active
Directory to get the user details like gid etc ?

So does this mean that every node (tasktrackers, job tracker and namenode)
 will be interacting with the Active Directory anyway ?

If so, option 1 doesn't seem to be superior since each node has to talk to
two kdc's - local kerberos for authenticating service principals, Active
Directory to get the user details and group information .

Please correct me if I am wrong in my assumptions.

Thanks and Regards,

BBG

Re: Hadoop Security - TaskTracker and Active Directory

Posted by Devaraj Das <dd...@hortonworks.com>.
Doing everything in the Active Directory should work as well.. What I said earlier was more from the Yahoo deployment of security. Let us know how it goes.

On Oct 1, 2011, at 9:36 AM, bigbibguy father wrote:

> Thanks Devaraj for responding.
> 
> In our case , the LDAP server is the corporate active directory server, which has the user id and the attributes.
> 
> Cluster nodes contact KDC for getting TGT and service tickets for NN and JT and keep them until the expiry time (7 days). Cluster nodes contact LDAP Server for each task. So if I understand correctly, the LDAP traffic from the cluster nodes (around 1000)  will be much more than the Authentication traffic from cluster nodes.  
> 
> Why not use the Active Directory as the KDC for authenticating the service principals (cluster nodes)  also?
> 
> In this way , we do not have to manage a separate KDC and worry about it's availability and health.
>  
> We also plan to have one Active Directory server at the same datacenter as the cluster , but outside the cluster firewall so that LDAP queries have a higher SLA.
> 
> The benefits associated with the local KDC option are below  and my analysis is added for each of the benefit.
> 
> It requires less configuration with Active Directory.  - But cluster nodes need to talk to Active Directory for the user details. So it anyway needs the configuration with Active Directory 
> It is comparatively easy to script the creation of many principals and keytabs. A principal and keytab must be created for every daemon in the cluster, and in a large cluster this can be extremely onerous to do directly in Active Directory.  - This is a one time job and we may be able to script this with AD also.
> There is no need to involve central Active Directory administrators in order to get service principals created. - We get to manage the OU containing the service principals.
> It allows for incremental configuration. The Hadoop administrator can completely configure and verify the functionality the cluster independently of integrating with Active Directory - Good to have this benefit and this is not available in the Active Directory only option
> It can serve to shield the corporate Active Directory server(s) from the many machines in a Hadoop cluster all requesting Kerberos tickets simultaneously. During cluster start-up, Hadoop will effectively be acting as a distributed denial of service attack on the central Active Directory server, which could adversely affect the performance of the Active Directory server. - The service principal authentication traffic is not that frequent and hence these spikes should not be much of a problem for our highly available Active Directory. 
> 
> 
>       But the drawback for local KDC option is that we need to maintain that KDC server and make sure its highly available with backup server. 
> 
> 
> 
> Thanks and Regards,
> BBG
> 
> 
> 
> 
> On Sat, Oct 1, 2011 at 8:14 AM, Devaraj Das <dd...@hortonworks.com> wrote:
> The Cluster KDC should be set up to trust the Active Directory KDC (cross-realm trust in the kerberos lingo). This handles the cases of user authentication when a user talks to a server in the cluster directly (e.g., user->namenode). 
> The GID and other user attributes are usually stored in ldap. The cluster nodes are set up to talk to the cluster specific ldap server. 
> 
> On Sep 30, 2011, at 7:19 PM, bigbibguy father wrote:
> 
>> We are planning to enable secure Hadoop using Kerberos. 
>> 
>> Our users reside in the active directory. We read that there are two options  to use Kerberos for securing Hadoop.
>> 
>> 1) You run Kerberos on machine local to the cluster and create service principals here
>> 2) Use Active Directory itself as the kerberos KDC and create service principals also in Active Directory.
>> 
>> It seems cloudera and industry in general recommends option1 of running a local KDC for authernticating service principals.
>> https://ccp.cloudera.com/display/CDHDOC/Integrating+Hadoop+Security+with+Active+Directory
>> 
>>  I read that the tasktrackers run tasks as the user who submitted the user. In that case , doesn't the TaskTracker nodes need to talk to the Active Directory to get the user details like gid etc ?
>> 
>> So does this mean that every node (tasktrackers, job tracker and namenode)  will be interacting with the Active Directory anyway ?
>> 
>> If so, option 1 doesn't seem to be superior since each node has to talk to two kdc's - local kerberos for authenticating service principals, Active Directory to get the user details and group information . 
>> 
>> Please correct me if I am wrong in my assumptions.
>> 
>> Thanks and Regards,
>> 
>> BBG
> 
> 


Re: Hadoop Security - TaskTracker and Active Directory

Posted by bigbibguy father <bi...@gmail.com>.
Thanks Devaraj for responding.

In our case , the LDAP server is the corporate active directory server,
which has the user id and the attributes.

Cluster nodes contact KDC for getting TGT and service tickets for NN and JT
and keep them until the expiry time (7 days). Cluster nodes contact LDAP
Server for each task. So if I understand correctly, the LDAP traffic from
the cluster nodes (around 1000)  will be much more than the Authentication
traffic from cluster nodes.

Why not use the Active Directory as the KDC for authenticating the service
principals (cluster nodes)  also?

In this way , we do not have to manage a separate KDC and worry about it's
availability and health.

We also plan to have one Active Directory server at the same datacenter as
the cluster , but outside the cluster firewall so that LDAP queries have a
higher SLA.

The benefits associated with the local KDC option are below  and my analysis
is added for each of the benefit.


   - It requires less configuration with Active Directory.  - *But cluster
   nodes need to talk to Active Directory for the user details. So it anyway
   needs the configuration with Active Directory *
   - It is comparatively easy to script the creation of many principals and
   keytabs. A principal and keytab must be created for every daemon in the
   cluster, and in a large cluster this can be extremely onerous to do directly
   in Active Directory.  - *This is a one time job and we may be able to
   script this with AD also.*
   - There is no need to involve central Active Directory administrators in
   order to get service principals created. - *We get to manage the OU
   containing the service principals.*
   - It allows for incremental configuration. The Hadoop administrator can
   completely configure and verify the functionality the cluster independently
   of integrating with Active Directory - *Good to have this benefit and
   this is not available in the Active Directory only option*
   - It can serve to shield the corporate Active Directory server(s) from
   the many machines in a Hadoop cluster all requesting Kerberos tickets
   simultaneously. During cluster start-up, Hadoop will effectively be acting
   as a distributed denial of service attack on the central Active Directory
   server, which could adversely affect the performance of the Active Directory
   server. - *The service principal authentication traffic is not that
   frequent and hence these spikes should not be much of a problem for our
   highly available Active Directory. *



      But the drawback for local KDC option is that we need to maintain that
KDC server and make sure its highly available with backup server.



Thanks and Regards,
BBG




On Sat, Oct 1, 2011 at 8:14 AM, Devaraj Das <dd...@hortonworks.com> wrote:

> The Cluster KDC should be set up to trust the Active Directory KDC
> (cross-realm trust in the kerberos lingo). This handles the cases of user
> authentication when a user talks to a server in the cluster directly (e.g.,
> user->namenode).
> The GID and other user attributes are usually stored in ldap. The cluster
> nodes are set up to talk to the cluster specific ldap server.
>
> On Sep 30, 2011, at 7:19 PM, bigbibguy father wrote:
>
> We are planning to enable secure Hadoop using Kerberos.
>
> Our users reside in the active directory. We read that there are two
> options  to use Kerberos for securing Hadoop.
>
> 1) You run Kerberos on machine local to the cluster and create service
> principals here
> 2) Use Active Directory itself as the kerberos KDC and create service
> principals also in Active Directory.
>
> It seems cloudera and industry in general recommends option1 of running a
> local KDC for authernticating service principals.
>
> https://ccp.cloudera.com/display/CDHDOC/Integrating+Hadoop+Security+with+Active+Directory
>
>  I read that the tasktrackers run tasks as the user who submitted the user.
> In that case , doesn't the TaskTracker nodes need to talk to the Active
> Directory to get the user details like gid etc ?
>
> So does this mean that every node (tasktrackers, job tracker and namenode)
>  will be interacting with the Active Directory anyway ?
>
> If so, option 1 doesn't seem to be superior since each node has to talk to
> two kdc's - local kerberos for authenticating service principals, Active
> Directory to get the user details and group information .
>
> Please correct me if I am wrong in my assumptions.
>
> Thanks and Regards,
>
> BBG
>
>
>

Re: Hadoop Security - TaskTracker and Active Directory

Posted by Devaraj Das <dd...@hortonworks.com>.
The Cluster KDC should be set up to trust the Active Directory KDC (cross-realm trust in the kerberos lingo). This handles the cases of user authentication when a user talks to a server in the cluster directly (e.g., user->namenode). 
The GID and other user attributes are usually stored in ldap. The cluster nodes are set up to talk to the cluster specific ldap server. 

On Sep 30, 2011, at 7:19 PM, bigbibguy father wrote:

> We are planning to enable secure Hadoop using Kerberos. 
> 
> Our users reside in the active directory. We read that there are two options  to use Kerberos for securing Hadoop.
> 
> 1) You run Kerberos on machine local to the cluster and create service principals here
> 2) Use Active Directory itself as the kerberos KDC and create service principals also in Active Directory.
> 
> It seems cloudera and industry in general recommends option1 of running a local KDC for authernticating service principals.
> https://ccp.cloudera.com/display/CDHDOC/Integrating+Hadoop+Security+with+Active+Directory
> 
>  I read that the tasktrackers run tasks as the user who submitted the user. In that case , doesn't the TaskTracker nodes need to talk to the Active Directory to get the user details like gid etc ?
> 
> So does this mean that every node (tasktrackers, job tracker and namenode)  will be interacting with the Active Directory anyway ?
> 
> If so, option 1 doesn't seem to be superior since each node has to talk to two kdc's - local kerberos for authenticating service principals, Active Directory to get the user details and group information . 
> 
> Please correct me if I am wrong in my assumptions.
> 
> Thanks and Regards,
> 
> BBG