You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-commits@hadoop.apache.org by om...@apache.org on 2011/03/04 05:07:10 UTC
svn commit: r1077361 - in
/hadoop/common/branches/branch-0.20-security-patches/src/docs/src/documentation/content/xdocs:
mapred_tutorial.xml site.xml
Author: omalley
Date: Fri Mar 4 04:07:10 2011
New Revision: 1077361
URL: http://svn.apache.org/viewvc?rev=1077361&view=rev
Log:
commit f4e5bb34ebed5b85153d7e3855af370ef7371517
Author: Devaraj Das <dd...@yahoo-inc.com>
Date: Wed Mar 24 17:27:04 2010 -0700
MAPREDUCE:1624 from https://issues.apache.org/jira/secure/attachment/12439738/job-creds.2.patch
+++ b/YAHOO-CHANGES.txt
+ MAPREDUCE-1624. Documents the job credentials and associated details
+ to do with delegation tokens (ddas)
+
Modified:
hadoop/common/branches/branch-0.20-security-patches/src/docs/src/documentation/content/xdocs/mapred_tutorial.xml
hadoop/common/branches/branch-0.20-security-patches/src/docs/src/documentation/content/xdocs/site.xml
Modified: hadoop/common/branches/branch-0.20-security-patches/src/docs/src/documentation/content/xdocs/mapred_tutorial.xml
URL: http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20-security-patches/src/docs/src/documentation/content/xdocs/mapred_tutorial.xml?rev=1077361&r1=1077360&r2=1077361&view=diff
==============================================================================
--- hadoop/common/branches/branch-0.20-security-patches/src/docs/src/documentation/content/xdocs/mapred_tutorial.xml (original)
+++ hadoop/common/branches/branch-0.20-security-patches/src/docs/src/documentation/content/xdocs/mapred_tutorial.xml Fri Mar 4 04:07:10 2011
@@ -1592,6 +1592,83 @@
</li>
</ul>
</section>
+ <section>
+ <title>Job Credentials</title>
+ <p>In a secure cluster, the user is authenticated via Kerberos'
+ kinit command. Because of scalability concerns, we don't push
+ the client's Kerberos' tickets in MapReduce jobs. Instead, we
+ acquire delegation tokens from each HDFS NameNode that the job
+ will use and store them in the job as part of job submission.
+ The delegation tokens are automatically obtained
+ for the HDFS that holds the staging directories, where the job
+ job files are written, and any HDFS systems referenced by
+ FileInputFormats, FileOutputFormats, DistCp, and the
+ distributed cache.
+ Other applications require to set the configuration
+ "mapreduce.job.hdfs-servers" for all NameNodes that tasks might
+ need to talk during the job execution. This is a comma separated
+ list of file system names, such as "hdfs://nn1/,hdfs://nn2/".
+ These tokens are passed to the JobTracker
+ as part of the job submission as <a href="ext:api/org/apache/hadoop/
+ security/credentials">Credentials</a>. </p>
+
+ <p>Similar to HDFS delegation tokens, we also have MapReduce delegation tokens. The
+ MapReduce tokens are provided so that tasks can spawn jobs if they wish to. The tasks authenticate
+ to the JobTracker via the MapReduce delegation tokens. The delegation token can
+ be obtained via the API in <a href="api/org/apache/hadoop/mapred/jobclient/getdelegationtoken">
+ JobClient.getDelegationToken</a>. The obtained token must then be pushed onto the
+ credentials that is there in the JobConf used for job submission. The API
+ <a href="ext:api/org/apache/hadoop/security/credentials/addtoken">Credentials.addToken</a>
+ can be used for this. </p>
+
+ <p>The credentials are sent to the JobTracker as part of the job submission process.
+ The JobTracker persists the tokens and secrets in its filesystem (typically HDFS)
+ in a file within mapred.system.dir/JOBID. The TaskTracker localizes the file as part
+ job localization. Tasks see an environment variable called
+ HADOOP_TOKEN_FILE_LOCATION and the framework sets this to point to the
+ localized file. In order to launch jobs from tasks or for doing any HDFS operation,
+ tasks must set the configuration "mapreduce.job.credentials.binary" to point to
+ this token file.</p>
+
+ <p>The HDFS delegation tokens passed to the JobTracker during job submission are
+ are cancelled by the JobTracker when the job completes. This is the default behavior
+ unless mapreduce.job.complete.cancel.delegation.tokens is set to false in the
+ JobConf. For jobs whose tasks in turn spawns jobs, this should be set to false.
+ Applications sharing JobConf objects between multiple jobs on the JobClient side
+ should look at setting mapreduce.job.complete.cancel.delegation.tokens to false.
+ This is because the Credentials object within the JobConf will then be shared.
+ All jobs will end up sharing the same tokens, and hence the tokens should not be
+ canceled when the jobs in the sequence finish.</p>
+
+ <p>Apart from the HDFS delegation tokens, arbitrary secrets can also be
+ passed during the job submission for tasks to access other third party services.
+ The APIs
+ <a href="ext:api/org/apache/hadoop/mapred/jobconf/getcredentials">
+ JobConf.getCredentials</a> or <a href="ext:api/org/apache/
+ hadoop/mapreduce/jobcontext/getcredentials">JobContext.getCredentials()</a>
+ should be used to get the credentials object and then
+ <a href="ext:api/org/apache/hadoop/security/credentials/addsecretkey">
+ Credentials.addSecretKey</a> should be used to add secrets.</p>
+
+ <p>For applications written using the old MapReduce API, the Mapper/Reducer classes
+ need to implement <a href="api/org/apache/hadoop/mapred/jobconfigurable">
+ JobConfigurable</a> in order to get access to the credentials in the tasks.
+ A reference to the JobConf passed in the
+ <a href="api/org/apache/hadoop/mapred/jobconfigurable/configure">
+ JobConfigurable.configure</a> should be stored. In the new MapReduce API,
+ a similar thing can be done in the
+ <a href="api/org/apache/hadoop/mapreduce/mapper/setup">Mapper.setup</a>
+ method.
+ The api <a href="ext:api/org/apache/hadoop/mapred/jobconf/getcredentials">
+ JobConf.getCredentials()</a> or the api <a href="ext:api/org/apache/
+ hadoop/mapreduce/jobcontext/getcredentials">JobContext.getCredentials()</a>
+ should be used to get the credentials reference (depending
+ on whether the new MapReduce API or the old MapReduce API is used).
+ Tasks can access the secrets using the APIs in <a href="ext:api/
+ org/apache/hadoop/security/credentials">Credentials</a> </p>
+
+
+ </section>
</section>
<section>
Modified: hadoop/common/branches/branch-0.20-security-patches/src/docs/src/documentation/content/xdocs/site.xml
URL: http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20-security-patches/src/docs/src/documentation/content/xdocs/site.xml?rev=1077361&r1=1077360&r2=1077361&view=diff
==============================================================================
--- hadoop/common/branches/branch-0.20-security-patches/src/docs/src/documentation/content/xdocs/site.xml (original)
+++ hadoop/common/branches/branch-0.20-security-patches/src/docs/src/documentation/content/xdocs/site.xml Fri Mar 4 04:07:10 2011
@@ -155,6 +155,20 @@ See http://forrest.apache.org/docs/linki
<compressioncodec href="CompressionCodec.html" />
</compress>
</io>
+ <security href="security/">
+ <credentials href="Credentials.html">
+ <addtoken href="#addToken(org.apache.hadoop.io.Text,org.apache.hadoop.security.token.Token)" />
+ <addsecretkey href="#addSecretKey(org.apache.hadoop.io.Text,byte[])" />
+ </credentials>
+ </security>
+ <mapreduce href="mapreduce/">
+ <mapper href="Mapper.html">
+ <setup href="#setup(org.apache.hadoop.mapreduce.Mapper.Context)" />
+ </mapper>
+ <jobcontext href="JobContext.html">
+ <getcredentials href="#getcredentials" />
+ </jobcontext>
+ </mapreduce>
<mapred href="mapred/">
<clusterstatus href="ClusterStatus.html" />
<counters href="Counters.html" />
@@ -178,6 +192,7 @@ See http://forrest.apache.org/docs/linki
<jobclient href="JobClient.html">
<runjob href="#runJob(org.apache.hadoop.mapred.JobConf)" />
<submitjob href="#submitJob(org.apache.hadoop.mapred.JobConf)" />
+ <getdelegationtoken href="#getDelegationToken(org.apache.hadoop.io.Text)" />
</jobclient>
<jobconf href="JobConf.html">
<setnummaptasks href="#setNumMapTasks(int)" />
@@ -203,6 +218,7 @@ See http://forrest.apache.org/docs/linki
<setqueuename href="#setQueueName(java.lang.String)" />
<getjoblocaldir href="#getJobLocalDir()" />
<getjar href="#getJar()" />
+ <getcredentials href="#getCredentials()" />
</jobconf>
<jobconfigurable href="JobConfigurable.html">
<configure href="#configure(org.apache.hadoop.mapred.JobConf)" />