You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-commits@hadoop.apache.org by om...@apache.org on 2011/03/04 05:07:10 UTC

svn commit: r1077361 - in /hadoop/common/branches/branch-0.20-security-patches/src/docs/src/documentation/content/xdocs: mapred_tutorial.xml site.xml

Author: omalley
Date: Fri Mar  4 04:07:10 2011
New Revision: 1077361

URL: http://svn.apache.org/viewvc?rev=1077361&view=rev
Log:
commit f4e5bb34ebed5b85153d7e3855af370ef7371517
Author: Devaraj Das <dd...@yahoo-inc.com>
Date:   Wed Mar 24 17:27:04 2010 -0700

    MAPREDUCE:1624 from https://issues.apache.org/jira/secure/attachment/12439738/job-creds.2.patch
    
    +++ b/YAHOO-CHANGES.txt
    +    MAPREDUCE-1624. Documents the job credentials and associated details
    +    to do with delegation tokens (ddas)
    +

Modified:
    hadoop/common/branches/branch-0.20-security-patches/src/docs/src/documentation/content/xdocs/mapred_tutorial.xml
    hadoop/common/branches/branch-0.20-security-patches/src/docs/src/documentation/content/xdocs/site.xml

Modified: hadoop/common/branches/branch-0.20-security-patches/src/docs/src/documentation/content/xdocs/mapred_tutorial.xml
URL: http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20-security-patches/src/docs/src/documentation/content/xdocs/mapred_tutorial.xml?rev=1077361&r1=1077360&r2=1077361&view=diff
==============================================================================
--- hadoop/common/branches/branch-0.20-security-patches/src/docs/src/documentation/content/xdocs/mapred_tutorial.xml (original)
+++ hadoop/common/branches/branch-0.20-security-patches/src/docs/src/documentation/content/xdocs/mapred_tutorial.xml Fri Mar  4 04:07:10 2011
@@ -1592,6 +1592,83 @@
             </li>
           </ul>
         </section>
+        <section>
+          <title>Job Credentials</title>
+          <p>In a secure cluster, the user is authenticated via Kerberos'
+             kinit command. Because of scalability concerns, we don't push
+             the client's Kerberos' tickets in MapReduce jobs. Instead, we
+             acquire delegation tokens from each HDFS NameNode that the job
+             will use and store them in the job as part of job submission.
+             The delegation tokens are automatically obtained
+             for the HDFS that holds the staging directories, where the job
+             job files are written, and any HDFS systems referenced by
+             FileInputFormats, FileOutputFormats, DistCp, and the
+             distributed cache.
+             Other applications require to set the configuration
+             "mapreduce.job.hdfs-servers" for all NameNodes that tasks might 
+             need to talk during the job execution. This is a comma separated
+             list of file system names, such as "hdfs://nn1/,hdfs://nn2/".
+             These tokens are passed to the JobTracker
+             as part of the job submission as <a href="ext:api/org/apache/hadoop/
+             security/credentials">Credentials</a>. </p> 
+
+          <p>Similar to HDFS delegation tokens, we also have MapReduce delegation tokens. The
+             MapReduce tokens are provided so that tasks can spawn jobs if they wish to. The tasks authenticate
+             to the JobTracker via the MapReduce delegation tokens. The delegation token can
+             be obtained via the API in <a href="api/org/apache/hadoop/mapred/jobclient/getdelegationtoken">
+             JobClient.getDelegationToken</a>. The obtained token must then be pushed onto the
+             credentials that is there in the JobConf used for job submission. The API  
+             <a href="ext:api/org/apache/hadoop/security/credentials/addtoken">Credentials.addToken</a>
+             can be used for this. </p>
+
+          <p>The credentials are sent to the JobTracker as part of the job submission process.
+             The JobTracker persists the tokens and secrets in its filesystem (typically HDFS) 
+             in a file within mapred.system.dir/JOBID. The TaskTracker localizes the file as part
+             job localization. Tasks see an environment variable called
+             HADOOP_TOKEN_FILE_LOCATION and the framework sets this to point to the
+             localized file. In order to launch jobs from tasks or for doing any HDFS operation,
+             tasks must set the configuration "mapreduce.job.credentials.binary" to point to
+             this token file.</p> 
+
+          <p>The HDFS delegation tokens passed to the JobTracker during job submission are
+             are cancelled by the JobTracker when the job completes. This is the default behavior
+             unless mapreduce.job.complete.cancel.delegation.tokens is set to false in the 
+             JobConf. For jobs whose tasks in turn spawns jobs, this should be set to false.
+             Applications sharing JobConf objects between multiple jobs on the JobClient side 
+             should look at setting mapreduce.job.complete.cancel.delegation.tokens to false. 
+             This is because the Credentials object within the JobConf will then be shared. 
+             All jobs will end up sharing the same tokens, and hence the tokens should not be 
+             canceled when the jobs in the sequence finish.</p>
+
+          <p>Apart from the HDFS delegation tokens, arbitrary secrets can also be 
+             passed during the job submission for tasks to access other third party services.
+             The APIs 
+             <a href="ext:api/org/apache/hadoop/mapred/jobconf/getcredentials">
+             JobConf.getCredentials</a> or <a href="ext:api/org/apache/
+              hadoop/mapreduce/jobcontext/getcredentials">JobContext.getCredentials()</a>
+             should be used to get the credentials object and then
+             <a href="ext:api/org/apache/hadoop/security/credentials/addsecretkey">
+             Credentials.addSecretKey</a> should be used to add secrets.</p>
+
+          <p>For applications written using the old MapReduce API, the Mapper/Reducer classes 
+             need to implement <a href="api/org/apache/hadoop/mapred/jobconfigurable">
+             JobConfigurable</a> in order to get access to the credentials in the tasks.
+             A reference to the JobConf passed in the 
+             <a href="api/org/apache/hadoop/mapred/jobconfigurable/configure">
+             JobConfigurable.configure</a> should be stored. In the new MapReduce API, 
+             a similar thing can be done in the 
+             <a href="api/org/apache/hadoop/mapreduce/mapper/setup">Mapper.setup</a>
+             method.
+             The api <a href="ext:api/org/apache/hadoop/mapred/jobconf/getcredentials">
+              JobConf.getCredentials()</a> or the api <a href="ext:api/org/apache/
+              hadoop/mapreduce/jobcontext/getcredentials">JobContext.getCredentials()</a>
+              should be used to get the credentials reference (depending
+              on whether the new MapReduce API or the old MapReduce API is used). 
+              Tasks can access the secrets using the APIs in <a href="ext:api/
+              org/apache/hadoop/security/credentials">Credentials</a> </p>
+
+             
+        </section>
       </section>
 
       <section>

Modified: hadoop/common/branches/branch-0.20-security-patches/src/docs/src/documentation/content/xdocs/site.xml
URL: http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20-security-patches/src/docs/src/documentation/content/xdocs/site.xml?rev=1077361&r1=1077360&r2=1077361&view=diff
==============================================================================
--- hadoop/common/branches/branch-0.20-security-patches/src/docs/src/documentation/content/xdocs/site.xml (original)
+++ hadoop/common/branches/branch-0.20-security-patches/src/docs/src/documentation/content/xdocs/site.xml Fri Mar  4 04:07:10 2011
@@ -155,6 +155,20 @@ See http://forrest.apache.org/docs/linki
                 <compressioncodec href="CompressionCodec.html" />
               </compress>
             </io>
+            <security href="security/">
+              <credentials href="Credentials.html">
+                <addtoken href="#addToken(org.apache.hadoop.io.Text,org.apache.hadoop.security.token.Token)" />
+                <addsecretkey href="#addSecretKey(org.apache.hadoop.io.Text,byte[])" />
+              </credentials> 
+            </security>
+            <mapreduce href="mapreduce/">
+              <mapper href="Mapper.html">
+                <setup href="#setup(org.apache.hadoop.mapreduce.Mapper.Context)" />
+              </mapper>
+              <jobcontext href="JobContext.html">
+                <getcredentials href="#getcredentials" />
+              </jobcontext>
+            </mapreduce>
             <mapred href="mapred/">
               <clusterstatus href="ClusterStatus.html" />
               <counters href="Counters.html" />
@@ -178,6 +192,7 @@ See http://forrest.apache.org/docs/linki
               <jobclient href="JobClient.html">
                 <runjob href="#runJob(org.apache.hadoop.mapred.JobConf)" />
                 <submitjob href="#submitJob(org.apache.hadoop.mapred.JobConf)" />
+                <getdelegationtoken href="#getDelegationToken(org.apache.hadoop.io.Text)" />
               </jobclient>
               <jobconf href="JobConf.html">
                 <setnummaptasks href="#setNumMapTasks(int)" />
@@ -203,6 +218,7 @@ See http://forrest.apache.org/docs/linki
                 <setqueuename href="#setQueueName(java.lang.String)" />
                 <getjoblocaldir href="#getJobLocalDir()" />
                 <getjar href="#getJar()" />
+                <getcredentials href="#getCredentials()" />
               </jobconf>
               <jobconfigurable href="JobConfigurable.html">
                 <configure href="#configure(org.apache.hadoop.mapred.JobConf)" />