You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-user@hadoop.apache.org by rohit sarewar <ro...@gmail.com> on 2013/02/28 16:41:22 UTC

Hadoop Security

Hi Hadoopers
I am trying to learn " How Kerberos can be implemented in Hadoop ?"
I have gone through this doc
https://issues.apache.org/jira/browse/HADOOP-4487
I have also gone through Basic Kerberos stuff (http://web.mit.edu/kerberos/,
https://www.youtube.com/watch?v=KD2Q-2ToloE)

After learning from these resources I have come to a conclusion which I am
representing through a diagram.
*Scenario : - User logs on to his computer gets authenticated by Kerberos
Authentication and  submits a map reduce job *
(Please read the contents below the diagram it hardly needs 5 minutes of
your time).

[image: Inline image 3]

I would like to explain the above diagram and ask questions related with
few steps(highlighted in yellow below)
Numbers in yellow background represents the entire flow (Numbers 1 to 19)
DT (with red background ) represents* Delegation Token*
BAT (with green Background) represents *Block Access Token*
JT (with Brown Background) represents *Job Token*

*Steps 1,2,3 and 4 represents :-*
Request for a TGT (Ticket Granting Ticket)
Request for a service Ticket for Name Node.
Question1) Where should be KDC located ? Can it be on the machine where my
name node or job tracker is present ?

*Steps 5,6,7,8 and 9  represents :-
*Show service ticket to name node , get an Acknowledgement .
Name Node will issue a *Delegation Token* (red)
User will tell about the Token renewer (In this case it is Job Tracker)

Question2) User submits this*Delegation Token* along with the job to Job
Tracker. Will *Delegation Token be shared with Task tracker ?*
*
Steps 10,11,12,13 and 14 represents:-*
Ask a service ticket for Job tracker , get the service ticket from KDC
Show this ticket to Job Tracker and get an ACK from JobTracker
Submit *Job + Delegation Token* to JobTracker.

*Steps 15,16 and 17 represents:-*
Generate Block Access Token and spread across all Data Nodes.
Send blockID and Block Access Token to Job Tracker and Job Tracker will
pass it on to TaskTracker

Who will ask for the BlockAccessToken and Block ID from the Name Node ?
JobTracker or TaskTracker

Sorry, I missed number 18 by mistake.
*Step19 represents:-*
Job tracker generates* Job Token* (brown) and passes it to the TaskTrackers.

Can I conclude that there will be one Delegation Token per user which will
be distributed throughout the cluster and
there will be one Job token per job  ? So a user will have only one *Delegation
Token* and many Job Tokens(equal to the number of Jobs submitted by him) .

*Please tell me if I missed something or I was wrong at some point in my
explanation.*

Thanks for your help.

Regards
Rohit Sarewar

Re: Hadoop Security

Posted by Anurag Tangri <an...@yahoo.com>.

Hi Rohit,
Nice compilation.

KDC is not a heavy weight application.

 So, it can go on NN/JT but in a big cluster would be preferable to set it up on another node like secondary name node.

Thanks,
Anurag Tangri
Sent from my iPhone

On Feb 28, 2013, at 7:41 AM, rohit sarewar <ro...@gmail.com> wrote:

> Hi Hadoopers
> I am trying to learn " How Kerberos can be implemented in Hadoop ?"
> I have gone through this doc  https://issues.apache.org/jira/browse/HADOOP-4487
> I have also gone through Basic Kerberos stuff (http://web.mit.edu/kerberos/ , https://www.youtube.com/watch?v=KD2Q-2ToloE)
> 
> After learning from these resources I have come to a conclusion which I am representing through a diagram.
> Scenario : - User logs on to his computer gets authenticated by Kerberos Authentication and  submits a map reduce job 
> (Please read the contents below the diagram it hardly needs 5 minutes of your time).
> 
> <image.png>
>  
> I would like to explain the above diagram and ask questions related with few steps(highlighted in yellow below)
> Numbers in yellow background represents the entire flow (Numbers 1 to 19)
> DT (with red background ) represents Delegation Token
> BAT (with green Background) represents Block Access Token
> JT (with Brown Background) represents Job Token
> 
> Steps 1,2,3 and 4 represents :-
> Request for a TGT (Ticket Granting Ticket)
> Request for a service Ticket for Name Node.
> Question1) Where should be KDC located ? Can it be on the machine where my name node or job tracker is present ?
> 
> Steps 5,6,7,8 and 9  represents :-
> Show service ticket to name node , get an Acknowledgement .
> Name Node will issue a Delegation Token (red) 
> User will tell about the Token renewer (In this case it is Job Tracker)
> 
> Question2) User submits thisDelegation Token along with the job to Job Tracker. Will Delegation Token be shared with Task tracker ?
> 
> Steps 10,11,12,13 and 14 represents:-
> Ask a service ticket for Job tracker , get the service ticket from KDC
> Show this ticket to Job Tracker and get an ACK from JobTracker
> Submit Job + Delegation Token to JobTracker.
> 
> Steps 15,16 and 17 represents:-
> Generate Block Access Token and spread across all Data Nodes.
> Send blockID and Block Access Token to Job Tracker and Job Tracker will pass it on to TaskTracker
> 
> Who will ask for the BlockAccessToken and Block ID from the Name Node ?
> JobTracker or TaskTracker
> 
> Sorry, I missed number 18 by mistake.
> Step19 represents:-
> Job tracker generates Job Token (brown) and passes it to the TaskTrackers.
> 
> Can I conclude that there will be one Delegation Token per user which will be distributed throughout the cluster and 
> there will be one Job token per job  ? So a user will have only one Delegation Token and many Job Tokens(equal to the number of Jobs submitted by him) .
> 
> Please tell me if I missed something or I was wrong at some point in my explanation.
> 
> Thanks for your help.
> 
> Regards
> Rohit Sarewar
> 
> 
>

Re: Hadoop Security

Posted by Anurag Tangri <an...@yahoo.com>.

Hi Rohit,
Nice compilation.

KDC is not a heavy weight application.

 So, it can go on NN/JT but in a big cluster would be preferable to set it up on another node like secondary name node.

Thanks,
Anurag Tangri
Sent from my iPhone

On Feb 28, 2013, at 7:41 AM, rohit sarewar <ro...@gmail.com> wrote:

> Hi Hadoopers
> I am trying to learn " How Kerberos can be implemented in Hadoop ?"
> I have gone through this doc  https://issues.apache.org/jira/browse/HADOOP-4487
> I have also gone through Basic Kerberos stuff (http://web.mit.edu/kerberos/ , https://www.youtube.com/watch?v=KD2Q-2ToloE)
> 
> After learning from these resources I have come to a conclusion which I am representing through a diagram.
> Scenario : - User logs on to his computer gets authenticated by Kerberos Authentication and  submits a map reduce job 
> (Please read the contents below the diagram it hardly needs 5 minutes of your time).
> 
> <image.png>
>  
> I would like to explain the above diagram and ask questions related with few steps(highlighted in yellow below)
> Numbers in yellow background represents the entire flow (Numbers 1 to 19)
> DT (with red background ) represents Delegation Token
> BAT (with green Background) represents Block Access Token
> JT (with Brown Background) represents Job Token
> 
> Steps 1,2,3 and 4 represents :-
> Request for a TGT (Ticket Granting Ticket)
> Request for a service Ticket for Name Node.
> Question1) Where should be KDC located ? Can it be on the machine where my name node or job tracker is present ?
> 
> Steps 5,6,7,8 and 9  represents :-
> Show service ticket to name node , get an Acknowledgement .
> Name Node will issue a Delegation Token (red) 
> User will tell about the Token renewer (In this case it is Job Tracker)
> 
> Question2) User submits thisDelegation Token along with the job to Job Tracker. Will Delegation Token be shared with Task tracker ?
> 
> Steps 10,11,12,13 and 14 represents:-
> Ask a service ticket for Job tracker , get the service ticket from KDC
> Show this ticket to Job Tracker and get an ACK from JobTracker
> Submit Job + Delegation Token to JobTracker.
> 
> Steps 15,16 and 17 represents:-
> Generate Block Access Token and spread across all Data Nodes.
> Send blockID and Block Access Token to Job Tracker and Job Tracker will pass it on to TaskTracker
> 
> Who will ask for the BlockAccessToken and Block ID from the Name Node ?
> JobTracker or TaskTracker
> 
> Sorry, I missed number 18 by mistake.
> Step19 represents:-
> Job tracker generates Job Token (brown) and passes it to the TaskTrackers.
> 
> Can I conclude that there will be one Delegation Token per user which will be distributed throughout the cluster and 
> there will be one Job token per job  ? So a user will have only one Delegation Token and many Job Tokens(equal to the number of Jobs submitted by him) .
> 
> Please tell me if I missed something or I was wrong at some point in my explanation.
> 
> Thanks for your help.
> 
> Regards
> Rohit Sarewar
> 
> 
>

Re: Hadoop Security

Posted by Anurag Tangri <an...@yahoo.com>.

Hi Rohit,
Nice compilation.

KDC is not a heavy weight application.

 So, it can go on NN/JT but in a big cluster would be preferable to set it up on another node like secondary name node.

Thanks,
Anurag Tangri
Sent from my iPhone

On Feb 28, 2013, at 7:41 AM, rohit sarewar <ro...@gmail.com> wrote:

> Hi Hadoopers
> I am trying to learn " How Kerberos can be implemented in Hadoop ?"
> I have gone through this doc  https://issues.apache.org/jira/browse/HADOOP-4487
> I have also gone through Basic Kerberos stuff (http://web.mit.edu/kerberos/ , https://www.youtube.com/watch?v=KD2Q-2ToloE)
> 
> After learning from these resources I have come to a conclusion which I am representing through a diagram.
> Scenario : - User logs on to his computer gets authenticated by Kerberos Authentication and  submits a map reduce job 
> (Please read the contents below the diagram it hardly needs 5 minutes of your time).
> 
> <image.png>
>  
> I would like to explain the above diagram and ask questions related with few steps(highlighted in yellow below)
> Numbers in yellow background represents the entire flow (Numbers 1 to 19)
> DT (with red background ) represents Delegation Token
> BAT (with green Background) represents Block Access Token
> JT (with Brown Background) represents Job Token
> 
> Steps 1,2,3 and 4 represents :-
> Request for a TGT (Ticket Granting Ticket)
> Request for a service Ticket for Name Node.
> Question1) Where should be KDC located ? Can it be on the machine where my name node or job tracker is present ?
> 
> Steps 5,6,7,8 and 9  represents :-
> Show service ticket to name node , get an Acknowledgement .
> Name Node will issue a Delegation Token (red) 
> User will tell about the Token renewer (In this case it is Job Tracker)
> 
> Question2) User submits thisDelegation Token along with the job to Job Tracker. Will Delegation Token be shared with Task tracker ?
> 
> Steps 10,11,12,13 and 14 represents:-
> Ask a service ticket for Job tracker , get the service ticket from KDC
> Show this ticket to Job Tracker and get an ACK from JobTracker
> Submit Job + Delegation Token to JobTracker.
> 
> Steps 15,16 and 17 represents:-
> Generate Block Access Token and spread across all Data Nodes.
> Send blockID and Block Access Token to Job Tracker and Job Tracker will pass it on to TaskTracker
> 
> Who will ask for the BlockAccessToken and Block ID from the Name Node ?
> JobTracker or TaskTracker
> 
> Sorry, I missed number 18 by mistake.
> Step19 represents:-
> Job tracker generates Job Token (brown) and passes it to the TaskTrackers.
> 
> Can I conclude that there will be one Delegation Token per user which will be distributed throughout the cluster and 
> there will be one Job token per job  ? So a user will have only one Delegation Token and many Job Tokens(equal to the number of Jobs submitted by him) .
> 
> Please tell me if I missed something or I was wrong at some point in my explanation.
> 
> Thanks for your help.
> 
> Regards
> Rohit Sarewar
> 
> 
>

Re: Hadoop Security

Posted by Anurag Tangri <an...@yahoo.com>.

Hi Rohit,
Nice compilation.

KDC is not a heavy weight application.

 So, it can go on NN/JT but in a big cluster would be preferable to set it up on another node like secondary name node.

Thanks,
Anurag Tangri
Sent from my iPhone

On Feb 28, 2013, at 7:41 AM, rohit sarewar <ro...@gmail.com> wrote:

> Hi Hadoopers
> I am trying to learn " How Kerberos can be implemented in Hadoop ?"
> I have gone through this doc  https://issues.apache.org/jira/browse/HADOOP-4487
> I have also gone through Basic Kerberos stuff (http://web.mit.edu/kerberos/ , https://www.youtube.com/watch?v=KD2Q-2ToloE)
> 
> After learning from these resources I have come to a conclusion which I am representing through a diagram.
> Scenario : - User logs on to his computer gets authenticated by Kerberos Authentication and  submits a map reduce job 
> (Please read the contents below the diagram it hardly needs 5 minutes of your time).
> 
> <image.png>
>  
> I would like to explain the above diagram and ask questions related with few steps(highlighted in yellow below)
> Numbers in yellow background represents the entire flow (Numbers 1 to 19)
> DT (with red background ) represents Delegation Token
> BAT (with green Background) represents Block Access Token
> JT (with Brown Background) represents Job Token
> 
> Steps 1,2,3 and 4 represents :-
> Request for a TGT (Ticket Granting Ticket)
> Request for a service Ticket for Name Node.
> Question1) Where should be KDC located ? Can it be on the machine where my name node or job tracker is present ?
> 
> Steps 5,6,7,8 and 9  represents :-
> Show service ticket to name node , get an Acknowledgement .
> Name Node will issue a Delegation Token (red) 
> User will tell about the Token renewer (In this case it is Job Tracker)
> 
> Question2) User submits thisDelegation Token along with the job to Job Tracker. Will Delegation Token be shared with Task tracker ?
> 
> Steps 10,11,12,13 and 14 represents:-
> Ask a service ticket for Job tracker , get the service ticket from KDC
> Show this ticket to Job Tracker and get an ACK from JobTracker
> Submit Job + Delegation Token to JobTracker.
> 
> Steps 15,16 and 17 represents:-
> Generate Block Access Token and spread across all Data Nodes.
> Send blockID and Block Access Token to Job Tracker and Job Tracker will pass it on to TaskTracker
> 
> Who will ask for the BlockAccessToken and Block ID from the Name Node ?
> JobTracker or TaskTracker
> 
> Sorry, I missed number 18 by mistake.
> Step19 represents:-
> Job tracker generates Job Token (brown) and passes it to the TaskTrackers.
> 
> Can I conclude that there will be one Delegation Token per user which will be distributed throughout the cluster and 
> there will be one Job token per job  ? So a user will have only one Delegation Token and many Job Tokens(equal to the number of Jobs submitted by him) .
> 
> Please tell me if I missed something or I was wrong at some point in my explanation.
> 
> Thanks for your help.
> 
> Regards
> Rohit Sarewar
> 
> 
>