You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by Benyi Wang <be...@gmail.com> on 2012/02/08 19:43:38 UTC

Fwd: Hadoop Active Directory Integration

Can anyone answer my questions?

Thanks a lot.

---------- Forwarded message ----------
From: Benyi Wang <be...@gmail.com>
Date: Mon, Feb 6, 2012 at 11:07 PM
Subject: Hadoop Active Directory Integration
To: common-user@hadoop.apache.org


Hi,

I have questions about Hadoop Active Directory Integration:

   1. When using Active Directory, do we still need to create a Linux
   account for each user on each Linux node?
   2. What about if I enable queue acls and use fairscheduler? Will task
   trackers send all ACLs check to Active directory? Can I list the user
   accounts or AD security groups in mapred-queue-acls.xml? Do I need to
   create those groups in Linux node?
   3. Does someone configure Hadoop AD integration in multiple networks?
   for example, my company have three networks:  corp,  lab, and prod. A user
   in "corp" network can log on a window server in lab or prod. If we want to
   use local MIT KDC and set up "one-way cross-realm trust from this realm
   to the Active Directory realm" in
   https://ccp.cloudera.com/display/CDHDOC/Integrating+Hadoop+Security+with+Active+Directory.
   How to set up Kerberos in such a environment?
   4. Is this right? If AD is setup, a window user can remotely submit a
   mapred job?
   5. What about the authorization? Can hadoop configure so that only users
   in the specified security groups in AD can submit jobs.

Thanks.

Ben

Re: Hadoop Active Directory Integration

Posted by Benyi Wang <be...@gmail.com>.
Thanks for your answers.

But I am still not sure that I understand. Let me try again to clarify what
I really want to know:

Our cluster's is built on Linux CentOS without the integration of Kerberos
and Active directory integration. There is a Linux LDAP server which is not
integrated with Corp's Active Directory. I need to ask the admin team to
create a Linux account in the LDAP server first, then I can ssh to a Linux
box which we call client node. From the client node, I can submit mapred
jobs. This probably is what you say LDAP via PAM. Hadoop's user/group
system calls is actually redirected to the LDAP server by OS.  Is this
right?

We have to wait for months before the admin team can synchronize Active
Directory with the Linux LDAP server. Before this happens, is it possible
that a new user which registered in AD can use our hadoop cluster without
creating an account in the Linux LDAP server?

When a client sends a request to NN or JT using kerberos authentication,
does the kerberos ticket include enough information about user/group so
that Hadoop doesn't need to use user/group system call, in other words,
doesn't need the LDAP server?

There is a slide called "kerberos dataflow" in
http://www.slideshare.net/hadoopusergroup/hadoop-security-preview. It said
user joe needs to get TGT and a service ticket, then be able to connect to
NN using the service ticket. Does it mean: a windows user can get a service
ticket and connect to NN without a Linux account for hadoop Linux cluster?
Suppose the window user in a security group of AD which allows to access
Hadoop Linux cluster.

We need this because we don't want our business user to log on our cluster,
but we want to allow them submit jobs remotely.

Hope this time I make it clear.

Thanks.

On Wed, Feb 8, 2012 at 10:54 AM, Patrick Angeles
<pa...@gmail.com>wrote:

> On Wed, Feb 8, 2012 at 1:43 PM, Benyi Wang <be...@gmail.com> wrote:
>
> > Can anyone answer my questions?
> >
> > Thanks a lot.
> >
> > ---------- Forwarded message ----------
> > From: Benyi Wang <be...@gmail.com>
> > Date: Mon, Feb 6, 2012 at 11:07 PM
> > Subject: Hadoop Active Directory Integration
> > To: common-user@hadoop.apache.org
> >
> >
> > Hi,
> >
> > I have questions about Hadoop Active Directory Integration:
> >
> >   1. When using Active Directory, do we still need to create a Linux
> >   account for each user on each Linux node?
> >
>
> Yes. You can do LDAP integration via PAM.
>
>
> >   2. What about if I enable queue acls and use fairscheduler? Will task
> >   trackers send all ACLs check to Active directory? Can I list the user
> >   accounts or AD security groups in mapred-queue-acls.xml? Do I need to
> >   create those groups in Linux node?
> >
> The fairscheduler runs entirely on the JT.  Those groups need to resolve on
> the JT (and NN) machines.
>
>
> >   3. Does someone configure Hadoop AD integration in multiple networks?
> >   for example, my company have three networks:  corp,  lab, and prod. A
> > user
> >   in "corp" network can log on a window server in lab or prod. If we want
> > to
> >   use local MIT KDC and set up "one-way cross-realm trust from this realm
> >   to the Active Directory realm" in
> >
> >
> https://ccp.cloudera.com/display/CDHDOC/Integrating+Hadoop+Security+with+Active+Directory
> > .
> >   How to set up Kerberos in such a environment?
> >
>
> You can have a local KDC and realm per cluster, and set up one-way
> cross-realm trust on each realm to your corp AD.
>
>
> >   4. Is this right? If AD is setup, a window user can remotely submit a
> >   mapred job?
> >
> I've never tried this, but my guess is it won't just work.
>
>
> >   5. What about the authorization? Can hadoop configure so that only
> users
> >   in the specified security groups in AD can submit jobs.
> >
> You can do this via ACLs.
>
>
> >
> > Thanks.
> >
> > Ben
> >
>

Re: Hadoop Active Directory Integration

Posted by Patrick Angeles <pa...@gmail.com>.
On Wed, Feb 8, 2012 at 1:43 PM, Benyi Wang <be...@gmail.com> wrote:

> Can anyone answer my questions?
>
> Thanks a lot.
>
> ---------- Forwarded message ----------
> From: Benyi Wang <be...@gmail.com>
> Date: Mon, Feb 6, 2012 at 11:07 PM
> Subject: Hadoop Active Directory Integration
> To: common-user@hadoop.apache.org
>
>
> Hi,
>
> I have questions about Hadoop Active Directory Integration:
>
>   1. When using Active Directory, do we still need to create a Linux
>   account for each user on each Linux node?
>

Yes. You can do LDAP integration via PAM.


>   2. What about if I enable queue acls and use fairscheduler? Will task
>   trackers send all ACLs check to Active directory? Can I list the user
>   accounts or AD security groups in mapred-queue-acls.xml? Do I need to
>   create those groups in Linux node?
>
The fairscheduler runs entirely on the JT.  Those groups need to resolve on
the JT (and NN) machines.


>   3. Does someone configure Hadoop AD integration in multiple networks?
>   for example, my company have three networks:  corp,  lab, and prod. A
> user
>   in "corp" network can log on a window server in lab or prod. If we want
> to
>   use local MIT KDC and set up "one-way cross-realm trust from this realm
>   to the Active Directory realm" in
>
> https://ccp.cloudera.com/display/CDHDOC/Integrating+Hadoop+Security+with+Active+Directory
> .
>   How to set up Kerberos in such a environment?
>

You can have a local KDC and realm per cluster, and set up one-way
cross-realm trust on each realm to your corp AD.


>   4. Is this right? If AD is setup, a window user can remotely submit a
>   mapred job?
>
I've never tried this, but my guess is it won't just work.


>   5. What about the authorization? Can hadoop configure so that only users
>   in the specified security groups in AD can submit jobs.
>
You can do this via ACLs.


>
> Thanks.
>
> Ben
>