You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@knox.apache.org by km...@apache.org on 2015/10/26 16:54:15 UTC

svn commit: r1710635 [2/11] - in /knox: site/ site/books/knox-0-3-0/ site/books/knox-0-4-0/ site/books/knox-0-5-0/ site/books/knox-0-6-0/ site/books/knox-0-7-0/ site/images/ trunk/markbook/src/main/java/org/apache/hadoop/gateway/markbook/ trunk/markboo...

Modified: knox/site/books/knox-0-3-0/knox-0-3-0.html
URL: http://svn.apache.org/viewvc/knox/site/books/knox-0-3-0/knox-0-3-0.html?rev=1710635&r1=1710634&r2=1710635&view=diff
==============================================================================
--- knox/site/books/knox-0-3-0/knox-0-3-0.html (original)
+++ knox/site/books/knox-0-3-0/knox-0-3-0.html Mon Oct 26 15:54:14 2015
@@ -16,7 +16,7 @@
 --><p><link href="book.css" rel="stylesheet"/></p>
 <div id="logo" style="width:100%; text-align:center">
   <!--img src="knox-logo.gif" alt="Knox"/-->
-</div><p><br>  <img src="knox-logo.gif" alt="Knox"/>  <img src="apache-incubator-logo.png" align="right" alt="Incubator"/></p><h1><a id="Apache+Knox+Gateway+0.3.x+(Incubator)+User's+Guide"></a>Apache Knox Gateway 0.3.x (Incubator) User&rsquo;s Guide</h1><h2><a id="Table+Of+Contents"></a>Table Of Contents</h2>
+</div><p><br>  <img src="knox-logo.gif" alt="Knox"/>  <img src="apache-incubator-logo.png" align="right" alt="Incubator"/></p><h1><a id="Apache+Knox+Gateway+0.3.x+(Incubator)+User's+Guide">Apache Knox Gateway 0.3.x (Incubator) User&rsquo;s Guide</a> <a href="#Apache+Knox+Gateway+0.3.x+(Incubator)+User's+Guide"><img src="markbook-section-link.png"/></a></h1><h2><a id="Table+Of+Contents">Table Of Contents</a> <a href="#Table+Of+Contents"><img src="markbook-section-link.png"/></a></h2>
 <ul>
   <li><a href="#Introduction">Introduction</a></li>
   <li><a href="#Quick+Start">Quick Start</a></li>
@@ -47,7 +47,7 @@
   <li><a href="#Limitations">Limitations</a></li>
   <li><a href="#Troubleshooting">Troubleshooting</a></li>
   <li><a href="#Export+Controls">Export Controls</a></li>
-</ul><h2><a id="Introduction"></a>Introduction</h2><p>The Apache Knox Gateway is a system that provides a single point of authentication and access for Apache Hadoop services in a cluster. The goal is to simplify Hadoop security for both users (i.e. who access the cluster data and execute jobs) and operators (i.e. who control access and manage the cluster). The gateway runs as a server (or cluster of servers) that provide centralized access to one or more Hadoop clusters. In general the goals of the gateway are as follows:</p>
+</ul><h2><a id="Introduction">Introduction</a> <a href="#Introduction"><img src="markbook-section-link.png"/></a></h2><p>The Apache Knox Gateway is a system that provides a single point of authentication and access for Apache Hadoop services in a cluster. The goal is to simplify Hadoop security for both users (i.e. who access the cluster data and execute jobs) and operators (i.e. who control access and manage the cluster). The gateway runs as a server (or cluster of servers) that provide centralized access to one or more Hadoop clusters. In general the goals of the gateway are as follows:</p>
 <ul>
   <li>Provide perimeter security for Hadoop REST APIs to make Hadoop security easier to setup and use
   <ul>
@@ -60,7 +60,7 @@
     <li>Limit the network endpoints (and therefore firewall holes) required to access a Hadoop cluster</li>
     <li>Hide the internal Hadoop cluster topology from potential attackers</li>
   </ul></li>
-</ul><h2><a id="Quick+Start"></a>Quick Start</h2><p>Here are the steps to have Apache Knox up and running against a Hadoop Cluster:</p>
+</ul><h2><a id="Quick+Start">Quick Start</a> <a href="#Quick+Start"><img src="markbook-section-link.png"/></a></h2><p>Here are the steps to have Apache Knox up and running against a Hadoop Cluster:</p>
 <ol>
   <li>Verify system requirements</li>
   <li>Download a virtual machine (VM) with Hadoop</li>
@@ -70,14 +70,14 @@
   <li>Start the LDAP embedded within Knox</li>
   <li>Start the Knox Gateway</li>
   <li>Do Hadoop with Knox</li>
-</ol><h3><a id="1+-+Requirements"></a>1 - Requirements</h3><h4><a id="Java"></a>Java</h4><p>Java 1.6 or later is required for the Knox Gateway runtime. Use the command below to check the version of Java installed on the system where Knox will be running.</p>
+</ol><h3><a id="1+-+Requirements">1 - Requirements</a> <a href="#1+-+Requirements"><img src="markbook-section-link.png"/></a></h3><h4><a id="Java">Java</a> <a href="#Java"><img src="markbook-section-link.png"/></a></h4><p>Java 1.6 or later is required for the Knox Gateway runtime. Use the command below to check the version of Java installed on the system where Knox will be running.</p>
 <pre><code>java -version
-</code></pre><h4><a id="Hadoop"></a>Hadoop</h4><p>Knox supports Hadoop 1.x or 2.x, the quick start instructions assume a Hadoop 2.x virtual machine based environment. </p><h3><a id="2+-+Download+Hadoop+2.x+VM"></a>2 - Download Hadoop 2.x VM</h3><p>The quick start provides a link to download Hadoop 2.0 based Hortonworks virtual machine <a href="http://hortonworks.com/products/hdp-2/#install">Sandbox</a>. Please note Knox supports other Hadoop distributions and is configurable against a full blown Hadoop cluster. Configuring Knox for Hadoop 1.x/2.x version, or Hadoop deployed in EC2 or a custom Hadoop cluster is documented in advance deployment guide.</p><h3><a id="3+-+Download+Apache+Knox+Gateway"></a>3 - Download Apache Knox Gateway</h3><p>Download one of the distributions below from the <a href="http://www.apache.org/dyn/closer.cgi/knox">Apache mirrors</a>.</p>
+</code></pre><h4><a id="Hadoop">Hadoop</a> <a href="#Hadoop"><img src="markbook-section-link.png"/></a></h4><p>Knox supports Hadoop 1.x or 2.x, the quick start instructions assume a Hadoop 2.x virtual machine based environment. </p><h3><a id="2+-+Download+Hadoop+2.x+VM">2 - Download Hadoop 2.x VM</a> <a href="#2+-+Download+Hadoop+2.x+VM"><img src="markbook-section-link.png"/></a></h3><p>The quick start provides a link to download Hadoop 2.0 based Hortonworks virtual machine <a href="http://hortonworks.com/products/hdp-2/#install">Sandbox</a>. Please note Knox supports other Hadoop distributions and is configurable against a full blown Hadoop cluster. Configuring Knox for Hadoop 1.x/2.x version, or Hadoop deployed in EC2 or a custom Hadoop cluster is documented in advance deployment guide.</p><h3><a id="3+-+Download+Apache+Knox+Gateway">3 - Download Apache Knox Gateway</a> <a href="#3+-+Download+Apache+Knox+Gateway"><img src="markbook-section-link.png"/></a></h3><p>Download one of th
 e distributions below from the <a href="http://www.apache.org/dyn/closer.cgi/knox">Apache mirrors</a>.</p>
 <ul>
   <li>Source archive: <a href="http://www.apache.org/dyn/closer.cgi/incubator/knox/0.3.0-incubating/knox-incubating-0.3.0-src.zip">knox-incubating-0.3.0-src.zip</a> (<a href="http://www.apache.org/dist/incubator/knox/0.3.0-incubating/knox-incubating-0.3.0-src.zip.asc">PGP signature</a>, <a href="http://www.apache.org/dist/incubator/knox/0.3.0-incubating/knox-incubating-0.3.0-src.zip.sha">SHA1 digest</a>, <a href="http://www.apache.org/dist/incubator/knox/0.3.0-incubating/knox-incubating-0.3.0-src.zip.md5">MD5 digest</a>)</li>
   <li>Binary archive: <a href="http://www.apache.org/dyn/closer.cgi/incubator/knox/0.3.0-incubating/knox-incubating-0.3.0.zip">knox-incubating-0.3.0.zip</a> (<a href="http://www.apache.org/dist/incubator/knox/0.3.0-incubating/knox-incubating-0.3.0.zip.asc">PGP signature</a>, <a href="http://www.apache.org/dist/incubator/knox/0.3.0-incubating/knox-incubating-0.3.0.zip.sha">SHA1 digest</a>, <a href="http://www.apache.org/dist/incubator/knox/0.3.0-incubating/knox-incubating-0.3.0.zip.md5">MD5 digest</a>)</li>
   <li>RPM package: <a href="http://www.apache.org/dyn/closer.cgi/incubator/knox/0.3.0-incubating/knox-incubating-0.3.0.rpm">knox-incubating-0.3.0.rpm</a> (<a href="http://www.apache.org/dist/incubator/knox/0.3.0-incubating/knox-incubating-0.3.0.rpm.asc">PGP signature</a>, <a href="http://www.apache.org/dist/incubator/knox/0.3.0-incubating/knox-incubating-0.3.0.rpm.sha">SHA1 digest</a>, <a href="http://www.apache.org/dist/incubator/knox/0.3.0-incubating/knox-incubating-0.3.0.rpm.md5">MD5 digest</a>)</li>
-</ul><p>Apache Knox Gateway releases are available under the <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>. See the NOTICE file contained in each release artifact for applicable copyright attribution notices.</p><h3><a id="Verify"></a>Verify</h3><p>While recommended, verify is an optional step. You can verify the integrity of any downloaded files using the PGP signatures. Please read <a href="http://httpd.apache.org/dev/verification.html">Verifying Apache HTTP Server Releases</a> for more information on why you should verify our releases.</p><p>The PGP signatures can be verified using PGP or GPG. First download the KEYS file as well as the .asc signature files for the relevant release packages. Make sure you get these files from the main distribution directory linked above, rather than from a mirror. Then verify the signatures using one of the methods below.</p>
+</ul><p>Apache Knox Gateway releases are available under the <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>. See the NOTICE file contained in each release artifact for applicable copyright attribution notices.</p><h3><a id="Verify">Verify</a> <a href="#Verify"><img src="markbook-section-link.png"/></a></h3><p>While recommended, verify is an optional step. You can verify the integrity of any downloaded files using the PGP signatures. Please read <a href="http://httpd.apache.org/dev/verification.html">Verifying Apache HTTP Server Releases</a> for more information on why you should verify our releases.</p><p>The PGP signatures can be verified using PGP or GPG. First download the KEYS file as well as the .asc signature files for the relevant release packages. Make sure you get these files from the main distribution directory linked above, rather than from a mirror. Then verify the signatures using one of the methods below.</p>
 <pre><code>% pgpk -a KEYS
 % pgpv knox-incubating-0.3.0.zip.asc
 </code></pre><p>or</p>
@@ -86,19 +86,19 @@
 </code></pre><p>or</p>
 <pre><code>% gpg --import KEYS
 % gpg --verify knox-incubating-0.3.0.zip.asc
-</code></pre><h3><a id="4+-+Start+Hadoop+virtual+machine"></a>4 - Start Hadoop virtual machine</h3><p>Start the Hadoop virtual machine.</p><h3><a id="5+-+Install+Knox"></a>5 - Install Knox</h3><p>The steps required to install the gateway will vary depending upon which distribution format (zip | rpm) was downloaded. In either case you will end up with a directory where the gateway is installed. This directory will be referred to as your <code>{GATEWAY_HOME}</code> throughout this document.</p><h4><a id="ZIP"></a>ZIP</h4><p>If you downloaded the Zip distribution you can simply extract the contents into a directory. The example below provides a command that can be executed to do this. Note the <code>{VERSION}</code> portion of the command must be replaced with an actual Apache Knox Gateway version number. This might be 0.3.0 for example and must patch the value in the file downloaded.</p>
+</code></pre><h3><a id="4+-+Start+Hadoop+virtual+machine">4 - Start Hadoop virtual machine</a> <a href="#4+-+Start+Hadoop+virtual+machine"><img src="markbook-section-link.png"/></a></h3><p>Start the Hadoop virtual machine.</p><h3><a id="5+-+Install+Knox">5 - Install Knox</a> <a href="#5+-+Install+Knox"><img src="markbook-section-link.png"/></a></h3><p>The steps required to install the gateway will vary depending upon which distribution format (zip | rpm) was downloaded. In either case you will end up with a directory where the gateway is installed. This directory will be referred to as your <code>{GATEWAY_HOME}</code> throughout this document.</p><h4><a id="ZIP">ZIP</a> <a href="#ZIP"><img src="markbook-section-link.png"/></a></h4><p>If you downloaded the Zip distribution you can simply extract the contents into a directory. The example below provides a command that can be executed to do this. Note the <code>{VERSION}</code> portion of the command must be replaced with an actual Apa
 che Knox Gateway version number. This might be 0.3.0 for example and must patch the value in the file downloaded.</p>
 <pre><code>jar xf knox-incubating-{VERSION}.zip
-</code></pre><p>This will create a directory <code>knox-incubating-{VERSION}</code> in your current directory. The directory <code>knox-incubating-{VERSION}</code> will considered your <code>{GATEWAY_HOME}</code></p><h4><a id="RPM"></a>RPM</h4><p>If you downloaded the RPM distribution you can install it using normal RPM package tools. It is important that the user that will be running the gateway server is used to install. This is because several directories are created that are owned by this user. These command will install Knox to <code>/usr/lib/knox</code> following the pattern of other Hadoop components. This directory will be considered your <code>{GATEWAY_HOME}</code>.</p>
+</code></pre><p>This will create a directory <code>knox-incubating-{VERSION}</code> in your current directory. The directory <code>knox-incubating-{VERSION}</code> will considered your <code>{GATEWAY_HOME}</code></p><h4><a id="RPM">RPM</a> <a href="#RPM"><img src="markbook-section-link.png"/></a></h4><p>If you downloaded the RPM distribution you can install it using normal RPM package tools. It is important that the user that will be running the gateway server is used to install. This is because several directories are created that are owned by this user. These command will install Knox to <code>/usr/lib/knox</code> following the pattern of other Hadoop components. This directory will be considered your <code>{GATEWAY_HOME}</code>.</p>
 <pre><code>sudo yum localinstall knox-incubating-{VERSION}.rpm
 </code></pre><p>or</p>
 <pre><code>sudo rpm -ihv knox-incubating-{VERSION}.rpm
-</code></pre><h3><a id="6+-+Start+LDAP+embedded+in+Knox"></a>6 - Start LDAP embedded in Knox</h3><p>Knox comes with an LDAP server for demonstration purposes.</p>
+</code></pre><h3><a id="6+-+Start+LDAP+embedded+in+Knox">6 - Start LDAP embedded in Knox</a> <a href="#6+-+Start+LDAP+embedded+in+Knox"><img src="markbook-section-link.png"/></a></h3><p>Knox comes with an LDAP server for demonstration purposes.</p>
 <pre><code>cd {GATEWAY_HOME}
 java -jar bin/ldap.jar conf &amp;
-</code></pre><h3><a id="7+-+Start+Knox"></a>7 - Start Knox</h3><p>The gateway can be started in one of two ways, as java -jar or with a shell script.</p><h6><a id="Starting+via+Java"></a>Starting via Java</h6><p>This is the simplest way to start the gateway. Starting this way will result in all logging being written directly to standard output.</p>
+</code></pre><h3><a id="7+-+Start+Knox">7 - Start Knox</a> <a href="#7+-+Start+Knox"><img src="markbook-section-link.png"/></a></h3><p>The gateway can be started in one of two ways, as java -jar or with a shell script.</p><h6><a id="Starting+via+Java">Starting via Java</a> <a href="#Starting+via+Java"><img src="markbook-section-link.png"/></a></h6><p>This is the simplest way to start the gateway. Starting this way will result in all logging being written directly to standard output.</p>
 <pre><code>cd {GATEWAY_HOME}
 java -jar bin/gateway.jar
-</code></pre><p>Upon start, Knox server will prompt you for the master secret (i.e. password). This secret is used to secure artifacts used by the gateway server for things like SSL and credential/password aliasing. This secret will have to be entered at startup unless you choose to persist it.</p><h6><a id="Starting+via+script+(*nix+only)"></a>Starting via script (*nix only)</h6><p>Run the setup command with root privileges.</p>
+</code></pre><p>Upon start, Knox server will prompt you for the master secret (i.e. password). This secret is used to secure artifacts used by the gateway server for things like SSL and credential/password aliasing. This secret will have to be entered at startup unless you choose to persist it.</p><h6><a id="Starting+via+script+(*nix+only)">Starting via script (*nix only)</a> <a href="#Starting+via+script+(*nix+only)"><img src="markbook-section-link.png"/></a></h6><p>Run the setup command with root privileges.</p>
 <pre><code>cd {GATEWAY_HOME}
 sudo bin/gateway.sh setup
 </code></pre><p>The server will prompt you for the master secret (i.e. password).</p><p>The server can then be started without root privileges using this command.</p>
@@ -110,7 +110,7 @@ bin/gateway.sh stop
 </code></pre><p>If for some reason the gateway is stopped other than by using the command above you may need to clear the tracking PID.</p>
 <pre><code>cd {GATEWAY_HOME}
 bin/gateway.sh clean
-</code></pre><p><strong>NOTE: This command will also clear any log output in /var/log/knox so use this with caution.</strong></p><h3><a id="8+-+Do+Hadoop+with+Knox"></a>8 - Do Hadoop with Knox</h3><h4><a id="Put+a+file+in+HDFS+via+Knox."></a>Put a file in HDFS via Knox.</h4><h4><a id="CAT+a+file+in+HDFS+via+Knox."></a>CAT a file in HDFS via Knox.</h4><h4><a id="Invoke+the+LISTSATUS+operation+on+WebHDFS+via+the+gateway."></a>Invoke the LISTSATUS operation on WebHDFS via the gateway.</h4><p>This will return a directory listing of the root (i.e. /) directory of HDFS.</p>
+</code></pre><p><strong>NOTE: This command will also clear any log output in /var/log/knox so use this with caution.</strong></p><h3><a id="8+-+Do+Hadoop+with+Knox">8 - Do Hadoop with Knox</a> <a href="#8+-+Do+Hadoop+with+Knox"><img src="markbook-section-link.png"/></a></h3><h4><a id="Put+a+file+in+HDFS+via+Knox.">Put a file in HDFS via Knox.</a> <a href="#Put+a+file+in+HDFS+via+Knox."><img src="markbook-section-link.png"/></a></h4><h4><a id="CAT+a+file+in+HDFS+via+Knox.">CAT a file in HDFS via Knox.</a> <a href="#CAT+a+file+in+HDFS+via+Knox."><img src="markbook-section-link.png"/></a></h4><h4><a id="Invoke+the+LISTSATUS+operation+on+WebHDFS+via+the+gateway.">Invoke the LISTSATUS operation on WebHDFS via the gateway.</a> <a href="#Invoke+the+LISTSATUS+operation+on+WebHDFS+via+the+gateway."><img src="markbook-section-link.png"/></a></h4><p>This will return a directory listing of the root (i.e. /) directory of HDFS.</p>
 <pre><code>curl -i -k -u guest:guest-password -X GET \
     &#39;https://localhost:8443/gateway/sandbox/webhdfs/v1/?op=LISTSTATUS&#39;
 </code></pre><p>The results of the above command should result in something to along the lines of the output below. The exact information returned is subject to the content within HDFS in your Hadoop cluster. Successfully executing this command at a minimum proves that the gateway is properly configured to provide access to WebHDFS. It does not necessarily provide that any of the other services are correct configured to be accessible. To validate that see the sections for the individual services in <a href="#Service+Details">Service Details</a>.</p>
@@ -125,11 +125,11 @@ Server: Jetty(6.1.26)
 {&quot;accessTime&quot;:0,&quot;blockSize&quot;:0,&quot;group&quot;:&quot;hdfs&quot;,&quot;length&quot;:0,&quot;modificationTime&quot;:1350596040075,&quot;owner&quot;:&quot;hdfs&quot;,&quot;pathSuffix&quot;:&quot;tmp&quot;,&quot;permission&quot;:&quot;777&quot;,&quot;replication&quot;:0,&quot;type&quot;:&quot;DIRECTORY&quot;},
 {&quot;accessTime&quot;:0,&quot;blockSize&quot;:0,&quot;group&quot;:&quot;hdfs&quot;,&quot;length&quot;:0,&quot;modificationTime&quot;:1350595857178,&quot;owner&quot;:&quot;hdfs&quot;,&quot;pathSuffix&quot;:&quot;user&quot;,&quot;permission&quot;:&quot;755&quot;,&quot;replication&quot;:0,&quot;type&quot;:&quot;DIRECTORY&quot;}
 ]}}
-</code></pre><h4><a id="Submit+a+MR+job+via+Knox."></a>Submit a MR job via Knox.</h4><h4><a id="Get+status+of+a+MR+job+via+Knox."></a>Get status of a MR job via Knox.</h4><h4><a id="Cancel+a+MR+job+via+Knox."></a>Cancel a MR job via Knox.</h4><h3><a id="More+Examples"></a>More Examples</h3><h2><a id="Apache+Knox+Details"></a>Apache Knox Details</h2><p>This section provides everything you need to know to get the Knox gateway up and running against a Hadoop cluster.</p><h4><a id="Hadoop"></a>Hadoop</h4><p>An an existing Hadoop 1.x or 2.x cluster is required for Knox sit in front of and protect. It is possible to use a Hadoop cluster deployed on EC2 but this will require additional configuration not covered here. It is also possible to use a limited set of services in Hadoop cluster secured with Kerberos. This too required additional configuration that is not described here. See <a href="#Supported+Services">Supported Services</a> for details on what is supported for this release.</p><
 p>The Hadoop cluster should be ensured to have at least WebHDFS, WebHCat (i.e. Templeton) and Oozie configured, deployed and running. HBase/Stargate and Hive can also be accessed via the Knox Gateway given the proper versions and configuration.</p><p>The instructions that follow assume a few things:</p>
+</code></pre><h4><a id="Submit+a+MR+job+via+Knox.">Submit a MR job via Knox.</a> <a href="#Submit+a+MR+job+via+Knox."><img src="markbook-section-link.png"/></a></h4><h4><a id="Get+status+of+a+MR+job+via+Knox.">Get status of a MR job via Knox.</a> <a href="#Get+status+of+a+MR+job+via+Knox."><img src="markbook-section-link.png"/></a></h4><h4><a id="Cancel+a+MR+job+via+Knox.">Cancel a MR job via Knox.</a> <a href="#Cancel+a+MR+job+via+Knox."><img src="markbook-section-link.png"/></a></h4><h3><a id="More+Examples">More Examples</a> <a href="#More+Examples"><img src="markbook-section-link.png"/></a></h3><h2><a id="Apache+Knox+Details">Apache Knox Details</a> <a href="#Apache+Knox+Details"><img src="markbook-section-link.png"/></a></h2><p>This section provides everything you need to know to get the Knox gateway up and running against a Hadoop cluster.</p><h4><a id="Hadoop">Hadoop</a> <a href="#Hadoop"><img src="markbook-section-link.png"/></a></h4><p>An an existing Hadoop 1.x or 2.x clust
 er is required for Knox sit in front of and protect. It is possible to use a Hadoop cluster deployed on EC2 but this will require additional configuration not covered here. It is also possible to use a limited set of services in Hadoop cluster secured with Kerberos. This too required additional configuration that is not described here. See <a href="#Supported+Services">Supported Services</a> for details on what is supported for this release.</p><p>The Hadoop cluster should be ensured to have at least WebHDFS, WebHCat (i.e. Templeton) and Oozie configured, deployed and running. HBase/Stargate and Hive can also be accessed via the Knox Gateway given the proper versions and configuration.</p><p>The instructions that follow assume a few things:</p>
 <ol>
   <li>The gateway is <em>not</em> collocated with the Hadoop clusters themselves.</li>
   <li>The host names and IP addresses of the cluster services are accessible by the gateway where ever it happens to be running.</li>
-</ol><p>All of the instructions and samples provided here are tailored and tested to work &ldquo;out of the box&rdquo; against a <a href="http://hortonworks.com/products/hortonworks-sandbox">Hortonworks Sandbox 2.x VM</a>.</p><h4><a id="Apache+Knox+Directory+Layout"></a>Apache Knox Directory Layout</h4><p>Knox can be installed by expanding the zip file or with rpm. With rpm based install the following directories are created in addition to those described in this section.</p>
+</ol><p>All of the instructions and samples provided here are tailored and tested to work &ldquo;out of the box&rdquo; against a <a href="http://hortonworks.com/products/hortonworks-sandbox">Hortonworks Sandbox 2.x VM</a>.</p><h4><a id="Apache+Knox+Directory+Layout">Apache Knox Directory Layout</a> <a href="#Apache+Knox+Directory+Layout"><img src="markbook-section-link.png"/></a></h4><p>Knox can be installed by expanding the zip file or with rpm. With rpm based install the following directories are created in addition to those described in this section.</p>
 <pre><code>/usr/lib/knox
 /var/log/knox
 /var/run/knox
@@ -199,7 +199,7 @@ Server: Jetty(6.1.26)
       <td>Documents that this release is from a project undergoing incubation at Apache. </td>
     </tr>
   </tbody>
-</table><h3><a id="Supported+Services"></a>Supported Services</h3><p>This table enumerates the versions of various Hadoop services that have been tested to work with the Knox Gateway. Only more recent versions of some Hadoop components when secured via Kerberos can be accessed via the Knox Gateway.</p>
+</table><h3><a id="Supported+Services">Supported Services</a> <a href="#Supported+Services"><img src="markbook-section-link.png"/></a></h3><p>This table enumerates the versions of various Hadoop services that have been tested to work with the Knox Gateway. Only more recent versions of some Hadoop components when secured via Kerberos can be accessed via the Knox Gateway.</p>
 <table>
   <thead>
     <tr>
@@ -277,14 +277,14 @@ Server: Jetty(6.1.26)
       <td><img src="error.png"  alt="n"/> </td>
     </tr>
   </tbody>
-</table><h3><a id="More+Examples"></a>More Examples</h3><p>These examples provide more detail about how to access various Apache Hadoop services via the Apache Knox Gateway.</p>
+</table><h3><a id="More+Examples">More Examples</a> <a href="#More+Examples"><img src="markbook-section-link.png"/></a></h3><p>These examples provide more detail about how to access various Apache Hadoop services via the Apache Knox Gateway.</p>
 <ul>
   <li><a href="#WebHDFS+Examples">WebHDFS Examples</a></li>
   <li><a href="#WebHCat+Examples">WebHCat Examples</a></li>
   <li><a href="#Oozie+Examples">Oozie Examples</a></li>
   <li><a href="#HBase+Examples">HBase Examples</a></li>
   <li><a href="#Hive+Examples">Hive Examples</a></li>
-</ul><h2><a id="Gateway+Details"></a>Gateway Details</h2><p>TODO</p><h3><a id="URL+Mapping"></a>URL Mapping</h3><p>The gateway functions much like a reverse proxy. As such it maintains a mapping of URLs that are exposed externally by the gateway to URLs that are provided by the Hadoop cluster. Examples of mappings for the WebHDFS, WebHCat, Oozie and Stargate/HBase are shown below. These mapping are generated from the combination of the gateway configuration file (i.e. <code>{GATEWAY_HOME}/conf/gateway-site.xml</code>) and the cluster topology descriptors (e.g. <code>{GATEWAY_HOME}/deployments/{cluster-name}.xml</code>). The port numbers show for the Cluster URLs represent the default ports for these services. The actual port number may be different for a given cluster.</p>
+</ul><h2><a id="Gateway+Details">Gateway Details</a> <a href="#Gateway+Details"><img src="markbook-section-link.png"/></a></h2><p>TODO</p><h3><a id="URL+Mapping">URL Mapping</a> <a href="#URL+Mapping"><img src="markbook-section-link.png"/></a></h3><p>The gateway functions much like a reverse proxy. As such it maintains a mapping of URLs that are exposed externally by the gateway to URLs that are provided by the Hadoop cluster. Examples of mappings for the WebHDFS, WebHCat, Oozie and Stargate/HBase are shown below. These mapping are generated from the combination of the gateway configuration file (i.e. <code>{GATEWAY_HOME}/conf/gateway-site.xml</code>) and the cluster topology descriptors (e.g. <code>{GATEWAY_HOME}/deployments/{cluster-name}.xml</code>). The port numbers show for the Cluster URLs represent the default ports for these services. The actual port number may be different for a given cluster.</p>
 <ul>
   <li>WebHDFS
   <ul>
@@ -306,7 +306,7 @@ Server: Jetty(6.1.26)
     <li>Gateway: <code>https://{gateway-host}:{gateway-port}/{gateway-path}/{cluster-name}/hbase</code></li>
     <li>Cluster: <code>http://{hbase-host}:60080</code></li>
   </ul></li>
-</ul><p>The values for <code>{gateway-host}</code>, <code>{gateway-port}</code>, <code>{gateway-path}</code> are provided via the gateway configuration file (i.e. <code>{GATEWAY_HOME}/conf/gateway-site.xml</code>).</p><p>The value for <code>{cluster-name}</code> is derived from the file name of the cluster topology descriptor (e.g. <code>{GATEWAY_HOME}/deployments/{cluster-name}.xml</code>).</p><p>The value for <code>{webhdfs-host}</code>, <code>{webhcat-host}</code>, <code>{oozie-host}</code> and <code>{hbase-host}</code> are provided via the cluster topology descriptor (e.g. <code>{GATEWAY_HOME}/deployments/{cluster-name}.xml</code>).</p><p>Note: The ports 50070, 50111, 11000 and 60080 are the defaults for WebHDFS, WebHCat, Oozie and Stargate/HBase respectively. Their values can also be provided via the cluster topology descriptor if your Hadoop cluster uses different ports.</p><h3><a id="Configuration"></a>Configuration</h3><h4><a id="Topology+Descriptors"></a>Topology Descriptor
 s</h4><p>The topology descriptor files provide the gateway with per-cluster configuration information. This includes configuration for both the providers within the gateway and the services within the Hadoop cluster. These files are located in <code>{GATEWAY_HOME}/deployments</code>. The general outline of this document looks like this.</p>
+</ul><p>The values for <code>{gateway-host}</code>, <code>{gateway-port}</code>, <code>{gateway-path}</code> are provided via the gateway configuration file (i.e. <code>{GATEWAY_HOME}/conf/gateway-site.xml</code>).</p><p>The value for <code>{cluster-name}</code> is derived from the file name of the cluster topology descriptor (e.g. <code>{GATEWAY_HOME}/deployments/{cluster-name}.xml</code>).</p><p>The value for <code>{webhdfs-host}</code>, <code>{webhcat-host}</code>, <code>{oozie-host}</code> and <code>{hbase-host}</code> are provided via the cluster topology descriptor (e.g. <code>{GATEWAY_HOME}/deployments/{cluster-name}.xml</code>).</p><p>Note: The ports 50070, 50111, 11000 and 60080 are the defaults for WebHDFS, WebHCat, Oozie and Stargate/HBase respectively. Their values can also be provided via the cluster topology descriptor if your Hadoop cluster uses different ports.</p><h3><a id="Configuration">Configuration</a> <a href="#Configuration"><img src="markbook-section-link.png
 "/></a></h3><h4><a id="Topology+Descriptors">Topology Descriptors</a> <a href="#Topology+Descriptors"><img src="markbook-section-link.png"/></a></h4><p>The topology descriptor files provide the gateway with per-cluster configuration information. This includes configuration for both the providers within the gateway and the services within the Hadoop cluster. These files are located in <code>{GATEWAY_HOME}/deployments</code>. The general outline of this document looks like this.</p>
 <pre><code>&lt;topology&gt;
     &lt;gateway&gt;
         &lt;provider&gt;
@@ -317,7 +317,7 @@ Server: Jetty(6.1.26)
 &lt;/topology&gt;
 </code></pre><p>There are typically multiple <code>&lt;provider&gt;</code> and <code>&lt;service&gt;</code> elements.</p>
 <dl><dt>/topology</dt><dd>Defines the provider and configuration and service topology for a single Hadoop cluster.</dd><dt>/topology/gateway</dt><dd>Groups all of the provider elements</dd><dt>/topology/gateway/provider</dt><dd>Defines the configuration of a specific provider for the cluster.</dd><dt>/topology/service</dt><dd>Defines the location of a specific Hadoop service within the Hadoop cluster.</dd>
-</dl><h5><a id="Provider+Configuration"></a>Provider Configuration</h5><p>Provider configuration is used to customize the behavior of a particular gateway feature. The general outline of a provider element looks like this.</p>
+</dl><h5><a id="Provider+Configuration">Provider Configuration</a> <a href="#Provider+Configuration"><img src="markbook-section-link.png"/></a></h5><p>Provider configuration is used to customize the behavior of a particular gateway feature. The general outline of a provider element looks like this.</p>
 <pre><code>&lt;provider&gt;
     &lt;role&gt;authentication&lt;/role&gt;
     &lt;name&gt;ShiroProvider&lt;/name&gt;
@@ -329,14 +329,14 @@ Server: Jetty(6.1.26)
 &lt;/provider&gt;
 </code></pre>
 <dl><dt>/topology/gateway/provider</dt><dd>Groups information for a specific provider.</dd><dt>/topology/gateway/provider/role</dt><dd>Defines the role of a particular provider. There are a number of pre-defined roles used by out-of-the-box provider plugins for the gateay. These roles are: authentication, identity-assertion, authentication, rewrite and hostmap</dd><dt>/topology/gateway/provider/name</dt><dd>Defines the name of the provider for which this configuration applies. There can be multiple provider implementations for a given role. Specifying the name is used identify which particular provider is being configured. Typically each topology descriptor should contain only one provider for each role but there are exceptions.</dd><dt>/topology/gateway/provider/enabled</dt><dd>Allows a particular provider to be enabled or disabled via <code>true</code> or <code>false</code> respectively. When a provider is disabled any filters associated with that provider are excluded from the pr
 ocessing chain.</dd><dt>/topology/gateway/provider/param</dt><dd>These elements are used to supply provider configuration. There can be zero or more of these per provider.</dd><dt>/topology/gateway/provider/param/name</dt><dd>The name of a parameter to pass to the provider.</dd><dt>/topology/gateway/provider/param/value</dt><dd>The value of a parameter to pass to the provider.</dd>
-</dl><h5><a id="Service+Configuration"></a>Service Configuration</h5><p>Service configuration is used to specify the location of services within the Hadoop cluster. The general outline of a service element looks like this.</p>
+</dl><h5><a id="Service+Configuration">Service Configuration</a> <a href="#Service+Configuration"><img src="markbook-section-link.png"/></a></h5><p>Service configuration is used to specify the location of services within the Hadoop cluster. The general outline of a service element looks like this.</p>
 <pre><code>&lt;service&gt;
     &lt;role&gt;WEBHDFS&lt;/role&gt;
     &lt;url&gt;http://localhost:50070/webhdfs&lt;/url&gt;
 &lt;/service&gt;
 </code></pre>
 <dl><dt>/topology/service</dt><dd>Provider information about a particular service within the Hadoop cluster. Not all services are necessarily exposed as gateway endpoints.</dd><dt>/topology/service/role</dt><dd>Identifies the role of this service. Currently supported roles are: WEBHDFS, WEBHCAT, WEBHBASE, OOZIE, HIVE, NAMENODE, JOBTRACKER Additional service roles can be supported via plugins.</dd><dt>topology/service/url</dt><dd>The URL identifying the location of a particular service within the Hadoop cluster.</dd>
-</dl><h4><a id="Hostmap+Provider"></a>Hostmap Provider</h4><p>The purpose of the Hostmap provider is to handle situations where host are know by one name within the cluster and another name externally. This frequently occurs when virtual machines are used and in particular using cloud hosting services. Currently the Hostmap provider is configured as part of the topology file. The basic structure is shown below.</p>
+</dl><h4><a id="Hostmap+Provider">Hostmap Provider</a> <a href="#Hostmap+Provider"><img src="markbook-section-link.png"/></a></h4><p>The purpose of the Hostmap provider is to handle situations where host are know by one name within the cluster and another name externally. This frequently occurs when virtual machines are used and in particular using cloud hosting services. Currently the Hostmap provider is configured as part of the topology file. The basic structure is shown below.</p>
 <pre><code>&lt;topology&gt;
     &lt;gateway&gt;
         ...
@@ -350,7 +350,7 @@ Server: Jetty(6.1.26)
     &lt;/gateway&gt;
     ...
 &lt;/topology&gt;
-</code></pre><p>This mapping is required because the Hadoop servies running within the cluster are unaware that they are being accessed from outside the cluster. Therefore URLs returned as part of REST API responses will typically contain internal host names. Since clients outside the cluster will be unable to resolve those host name they must be mapped to external host names.</p><h5><a id="Hostmap+Provider+Example+-+EC2"></a>Hostmap Provider Example - EC2</h5><p>Consider an EC2 example where two VMs have been allocated. Each VM has an external host name by which it can be accessed via the internet. However the EC2 VM is unaware of this external host name and instead is configured with the internal host name.</p>
+</code></pre><p>This mapping is required because the Hadoop servies running within the cluster are unaware that they are being accessed from outside the cluster. Therefore URLs returned as part of REST API responses will typically contain internal host names. Since clients outside the cluster will be unable to resolve those host name they must be mapped to external host names.</p><h5><a id="Hostmap+Provider+Example+-+EC2">Hostmap Provider Example - EC2</a> <a href="#Hostmap+Provider+Example+-+EC2"><img src="markbook-section-link.png"/></a></h5><p>Consider an EC2 example where two VMs have been allocated. Each VM has an external host name by which it can be accessed via the internet. However the EC2 VM is unaware of this external host name and instead is configured with the internal host name.</p>
 <pre><code>External HOSTNAMES:
 ec2-23-22-31-165.compute-1.amazonaws.com
 ec2-23-23-25-10.compute-1.amazonaws.com
@@ -379,7 +379,7 @@ ip-10-39-107-209.ec2.internal
     &lt;/gateway&gt;
     ...
 &lt;/topology&gt;
-</code></pre><h5><a id="Hostmap+Provider+Example+-+Sandbox"></a>Hostmap Provider Example - Sandbox</h5><p>Hortonwork&rsquo;s Sandbox 2.x poses a different challenge for host name mapping. This version of the Sandbox uses port mapping to make the Sandbox VM appear as though it is accessible via localhost. However the Sandbox VM is internally configured to consider sandbox.hortonworks.com as the host name. So from the perspective of a client accessing Sandbox the external host name is localhost. The Hostmap configuration required to allow access to Sandbox from the host operating system is this.</p>
+</code></pre><h5><a id="Hostmap+Provider+Example+-+Sandbox">Hostmap Provider Example - Sandbox</a> <a href="#Hostmap+Provider+Example+-+Sandbox"><img src="markbook-section-link.png"/></a></h5><p>Hortonwork&rsquo;s Sandbox 2.x poses a different challenge for host name mapping. This version of the Sandbox uses port mapping to make the Sandbox VM appear as though it is accessible via localhost. However the Sandbox VM is internally configured to consider sandbox.hortonworks.com as the host name. So from the perspective of a client accessing Sandbox the external host name is localhost. The Hostmap configuration required to allow access to Sandbox from the host operating system is this.</p>
 <pre><code>&lt;topology&gt;
     &lt;gateway&gt;
         ...
@@ -393,9 +393,9 @@ ip-10-39-107-209.ec2.internal
     &lt;/gateway&gt;
     ...
 &lt;/topology&gt;
-</code></pre><h5><a id="Hostmap+Provider+Configuration"></a>Hostmap Provider Configuration</h5><p>Details about each provider configuration element is enumerated below.</p>
+</code></pre><h5><a id="Hostmap+Provider+Configuration">Hostmap Provider Configuration</a> <a href="#Hostmap+Provider+Configuration"><img src="markbook-section-link.png"/></a></h5><p>Details about each provider configuration element is enumerated below.</p>
 <dl><dt>topology/gateway/provider/role</dt><dd>The role for a Hostmap provider must always be <code>hostmap</code>.</dd><dt>topology/gateway/provider/name</dt><dd>The Hostmap provider supplied out-of-the-box is selected via the name <code>static</code>.</dd><dt>topology/gateway/provider/enabled</dt><dd>Host mapping can be enabled or disabled by providing <code>true</code> or <code>false</code>.</dd><dt>topology/gateway/provider/param</dt><dd>Host mapping is configured by providing parameters for each external to internal mapping.</dd><dt>topology/gateway/provider/param/name</dt><dd>The parameter names represent an external host names associated with the internal host names provided by the value element. This can be a comma separated list of host names that all represent the same physical host. When mapping from internal to external host name the first external host name in the list is used.</dd><dt>topology/gateway/provider/param/value</dt><dd>The parameter values represent the inte
 rnal host names associated with the external host names provider by the name element. This can be a comma separated list of host names that all represent the same physical host. When mapping from external to internal host names the first internal host name in the list is used.</dd>
-</dl><h4><a id="Logging"></a>Logging</h4><p>If necessary you can enable additional logging by editing the <code>log4j.properties</code> file in the <code>conf</code> directory. Changing the rootLogger value from <code>ERROR</code> to <code>DEBUG</code> will generate a large amount of debug logging. A number of useful, more fine loggers are also provided in the file.</p><h4><a id="Java+VM+Options"></a>Java VM Options</h4><p>TODO - Java VM options doc.</p><h4><a id="Persisting+the+Master+Secret"></a>Persisting the Master Secret</h4><p>The master secret is required to start the server. This secret is used to access secured artifacts by the gateway instance. Keystore, trust stores and credential stores are all protected with the master secret.</p><p>You may persist the master secret by supplying the <em>-persist-master</em> switch at startup. This will result in a warning indicating that persisting the secret is less secure than providing it at startup. We do make some provisions in ord
 er to protect the persisted password.</p><p>It is encrypted with AES 128 bit encryption and where possible the file permissions are set to only be accessible by the user that the gateway is running as.</p><p>After persisting the secret, ensure that the file at config/security/master has the appropriate permissions set for your environment. This is probably the most important layer of defense for master secret. Do not assume that the encryption if sufficient protection.</p><p>A specific user should be created to run the gateway this will protect a persisted master file.</p><h4><a id="Management+of+Security+Artifacts"></a>Management of Security Artifacts</h4><p>There are a number of artifacts that are used by the gateway in ensuring the security of wire level communications, access to protected resources and the encryption of sensitive data. These artifacts can be managed from outside of the gateway instances or generated and populated by the gateway instance itself.</p><p>The followi
 ng is a description of how this is coordinated with both standalone (development, demo, etc) gateway instances and instances as part of a cluster of gateways in mind.</p><p>Upon start of the gateway server we:</p>
+</dl><h4><a id="Logging">Logging</a> <a href="#Logging"><img src="markbook-section-link.png"/></a></h4><p>If necessary you can enable additional logging by editing the <code>log4j.properties</code> file in the <code>conf</code> directory. Changing the rootLogger value from <code>ERROR</code> to <code>DEBUG</code> will generate a large amount of debug logging. A number of useful, more fine loggers are also provided in the file.</p><h4><a id="Java+VM+Options">Java VM Options</a> <a href="#Java+VM+Options"><img src="markbook-section-link.png"/></a></h4><p>TODO - Java VM options doc.</p><h4><a id="Persisting+the+Master+Secret">Persisting the Master Secret</a> <a href="#Persisting+the+Master+Secret"><img src="markbook-section-link.png"/></a></h4><p>The master secret is required to start the server. This secret is used to access secured artifacts by the gateway instance. Keystore, trust stores and credential stores are all protected with the master secret.</p><p>You may persist the master
  secret by supplying the <em>-persist-master</em> switch at startup. This will result in a warning indicating that persisting the secret is less secure than providing it at startup. We do make some provisions in order to protect the persisted password.</p><p>It is encrypted with AES 128 bit encryption and where possible the file permissions are set to only be accessible by the user that the gateway is running as.</p><p>After persisting the secret, ensure that the file at config/security/master has the appropriate permissions set for your environment. This is probably the most important layer of defense for master secret. Do not assume that the encryption if sufficient protection.</p><p>A specific user should be created to run the gateway this will protect a persisted master file.</p><h4><a id="Management+of+Security+Artifacts">Management of Security Artifacts</a> <a href="#Management+of+Security+Artifacts"><img src="markbook-section-link.png"/></a></h4><p>There are a number of artif
 acts that are used by the gateway in ensuring the security of wire level communications, access to protected resources and the encryption of sensitive data. These artifacts can be managed from outside of the gateway instances or generated and populated by the gateway instance itself.</p><p>The following is a description of how this is coordinated with both standalone (development, demo, etc) gateway instances and instances as part of a cluster of gateways in mind.</p><p>Upon start of the gateway server we:</p>
 <ol>
   <li>Look for an identity store at <code>conf/security/keystores/gateway.jks</code>.  The identity store contains the certificate and private key used to represent the identity of the server for SSL connections and signature creation.
   <ul>
@@ -418,7 +418,7 @@ ip-10-39-107-209.ec2.internal
 <ol>
   <li>Using a single gateway instance as a master instance the artifacts can be generated or placed into the expected location and then replicated across all of the slave instances before startup.</li>
   <li>Using an NFS mount as a central location for the artifacts would provide a single source of truth without the need to replicate them over the network. Of course, NFS mounts have their own challenges.</li>
-</ol><h4><a id="Keystores"></a>Keystores</h4><p>In order to provide your own certificate for use by the gateway, you will need to either import an existing key pair into a Java keystore or generate a self-signed cert using the Java keytool.</p><h5><a id="Importing+a+key+pair+into+a+Java+keystore"></a>Importing a key pair into a Java keystore</h5><h1><a id="----NEEDS+TESTING"></a>&mdash;-NEEDS TESTING</h1><p>One way to accomplish this is to start with a PKCS12 store for your key pair and then convert it to a Java keystore or JKS.</p>
+</ol><h4><a id="Keystores">Keystores</a> <a href="#Keystores"><img src="markbook-section-link.png"/></a></h4><p>In order to provide your own certificate for use by the gateway, you will need to either import an existing key pair into a Java keystore or generate a self-signed cert using the Java keytool.</p><h5><a id="Importing+a+key+pair+into+a+Java+keystore">Importing a key pair into a Java keystore</a> <a href="#Importing+a+key+pair+into+a+Java+keystore"><img src="markbook-section-link.png"/></a></h5><h1><a id="----NEEDS+TESTING">&mdash;-NEEDS TESTING</a> <a href="#----NEEDS+TESTING"><img src="markbook-section-link.png"/></a></h1><p>One way to accomplish this is to start with a PKCS12 store for your key pair and then convert it to a Java keystore or JKS.</p>
 <pre><code>openssl pkcs12 -export -in cert.pem -inkey key.pem &gt; server.p12
 </code></pre><p>The above example uses openssl to create a PKCS12 encoded store for your provided certificate private key.</p>
 <pre><code>keytool -importkeystore -srckeystore {server.p12} -destkeystore gateway.jks -srcstoretype pkcs12
@@ -427,19 +427,19 @@ ip-10-39-107-209.ec2.internal
   <li>the alias MUST be &ldquo;gateway-identity&rdquo;</li>
   <li>the name of the expected identity keystore for the gateway MUST be gateway.jks</li>
   <li>the passwords for the keystore and the imported key MUST both be the master secret for the gateway install</li>
-</ol><p>NOTE: The password for the keystore as well as that of the imported key must be the master secret for the gateway instance.</p><h1><a id="----END+NEEDS+TESTING"></a>&mdash;-END NEEDS TESTING</h1><h5><a id="Generating+a+self-signed+cert+for+use+in+testing+or+development+environments"></a>Generating a self-signed cert for use in testing or development environments</h5>
+</ol><p>NOTE: The password for the keystore as well as that of the imported key must be the master secret for the gateway instance.</p><h1><a id="----END+NEEDS+TESTING">&mdash;-END NEEDS TESTING</a> <a href="#----END+NEEDS+TESTING"><img src="markbook-section-link.png"/></a></h1><h5><a id="Generating+a+self-signed+cert+for+use+in+testing+or+development+environments">Generating a self-signed cert for use in testing or development environments</a> <a href="#Generating+a+self-signed+cert+for+use+in+testing+or+development+environments"><img src="markbook-section-link.png"/></a></h5>
 <pre><code>keytool -genkey -keyalg RSA -alias gateway-identity -keystore gateway.jks \
     -storepass {master-secret} -validity 360 -keysize 2048
-</code></pre><p>Keytool will prompt you for a number of elements used that will comprise this distiniguished name (DN) within your certificate. </p><p><em>NOTE:</em> When it prompts you for your First and Last name be sure to type in the hostname of the machine that your gateway instance will be running on. This is used by clients during hostname verification to ensure that the presented certificate matches the hostname that was used in the URL for the connection - so they need to match.</p><p><em>NOTE:</em> When it prompts for the key password just press enter to ensure that it is the same as the keystore password. Which as was described earlier must match the master secret for the gateway instance.</p><h5><a id="Credential+Store"></a>Credential Store</h5><p>Whenever you provide your own keystore with either a self-signed cert or a real certificate signed by a trusted authority, you will need to create an empty credential store. This is necessary for the current release in order fo
 r the system to utilize the same password for the keystore and the key.</p><p>The credential stores in Knox use the JCEKS keystore type as it allows for the storage of general secrets in addition to certificates.</p>
+</code></pre><p>Keytool will prompt you for a number of elements used that will comprise this distiniguished name (DN) within your certificate. </p><p><em>NOTE:</em> When it prompts you for your First and Last name be sure to type in the hostname of the machine that your gateway instance will be running on. This is used by clients during hostname verification to ensure that the presented certificate matches the hostname that was used in the URL for the connection - so they need to match.</p><p><em>NOTE:</em> When it prompts for the key password just press enter to ensure that it is the same as the keystore password. Which as was described earlier must match the master secret for the gateway instance.</p><h5><a id="Credential+Store">Credential Store</a> <a href="#Credential+Store"><img src="markbook-section-link.png"/></a></h5><p>Whenever you provide your own keystore with either a self-signed cert or a real certificate signed by a trusted authority, you will need to create an empty 
 credential store. This is necessary for the current release in order for the system to utilize the same password for the keystore and the key.</p><p>The credential stores in Knox use the JCEKS keystore type as it allows for the storage of general secrets in addition to certificates.</p>
 <pre><code>keytool -genkey -alias {anything} -keystore __gateway-credentials.jceks \
     -storepass {master-secret} -validity 360 -keysize 1024 -storetype JCEKS
-</code></pre><p>Follow the prompts again for the DN for the cert of the credential store. This certificate isn&rsquo;t really used for anything at the moment but is required to create the credential store.</p><h5><a id="Provisioning+of+Keystores"></a>Provisioning of Keystores</h5><p>Once you have created these keystores you must move them into place for the gateway to discover them and use them to represent its identity for SSL connections. This is done by copying the keystores to the <code>{GATEWAY_HOME}/conf/security/keystores</code> directory for your gateway install.</p><h4><a id="Summary+of+Secrets+to+be+Managed"></a>Summary of Secrets to be Managed</h4>
+</code></pre><p>Follow the prompts again for the DN for the cert of the credential store. This certificate isn&rsquo;t really used for anything at the moment but is required to create the credential store.</p><h5><a id="Provisioning+of+Keystores">Provisioning of Keystores</a> <a href="#Provisioning+of+Keystores"><img src="markbook-section-link.png"/></a></h5><p>Once you have created these keystores you must move them into place for the gateway to discover them and use them to represent its identity for SSL connections. This is done by copying the keystores to the <code>{GATEWAY_HOME}/conf/security/keystores</code> directory for your gateway install.</p><h4><a id="Summary+of+Secrets+to+be+Managed">Summary of Secrets to be Managed</a> <a href="#Summary+of+Secrets+to+be+Managed"><img src="markbook-section-link.png"/></a></h4>
 <ol>
   <li>Master secret - the same for all gateway instances in a cluster of gateways</li>
   <li>All security related artifacts are protected with the master secret</li>
   <li>Secrets used by the gateway itself are stored within the gateway credential store and are the same across all gateway instances in the cluster of gateways</li>
   <li>Secrets used by providers within cluster topologies are stored in topology specific credential stores and are the same for the same topology across the cluster of gateway instances.  However, they are specific to the topology - so secrets for one hadoop cluster are different from those of another.  This allows for fail-over from one gateway instance to another even when encryption is being used while not allowing the compromise of one encryption key to expose the data for all clusters.</li>
-</ol><p>NOTE: the SSL certificate will need special consideration depending on the type of certificate. Wildcard certs may be able to be shared across all gateway instances in a cluster. When certs are dedicated to specific machines the gateway identity store will not be able to be blindly replicated as host name verification problems will ensue. Obviously, trust-stores will need to be taken into account as well.</p><h3><a id="Authentication"></a>Authentication</h3><p>There are two types of providers supported in Knox for establishing a user&rsquo;s identity:</p>
+</ol><p>NOTE: the SSL certificate will need special consideration depending on the type of certificate. Wildcard certs may be able to be shared across all gateway instances in a cluster. When certs are dedicated to specific machines the gateway identity store will not be able to be blindly replicated as host name verification problems will ensue. Obviously, trust-stores will need to be taken into account as well.</p><h3><a id="Authentication">Authentication</a> <a href="#Authentication"><img src="markbook-section-link.png"/></a></h3><p>There are two types of providers supported in Knox for establishing a user&rsquo;s identity:</p>
 <ol>
   <li>Authentication Providers</li>
   <li>Federation Providers</li>
@@ -449,7 +449,7 @@ ip-10-39-107-209.ec2.internal
   <li>Specific configuration for the bundled BASIC/LDAP configuration</li>
   <li>Some tips into what may need to be customized for your environment</li>
   <li>How to setup the use of LDAP over SSL or LDAPS</li>
-</ol><h4><a id="General+Configuration+for+Shiro+Provider"></a>General Configuration for Shiro Provider</h4><p>As is described in the configuration section of this document, providers have a name-value based configuration - as is the common pattern in the rest of Hadoop.</p><p>The following example shows the format of the configuration for a given provider:</p>
+</ol><h4><a id="General+Configuration+for+Shiro+Provider">General Configuration for Shiro Provider</a> <a href="#General+Configuration+for+Shiro+Provider"><img src="markbook-section-link.png"/></a></h4><p>As is described in the configuration section of this document, providers have a name-value based configuration - as is the common pattern in the rest of Hadoop.</p><p>The following example shows the format of the configuration for a given provider:</p>
 <pre><code>&lt;provider&gt;
     &lt;role&gt;authentication&lt;/role&gt;
     &lt;name&gt;ShiroProvider&lt;/name&gt;
@@ -494,12 +494,12 @@ ldapRealm.userDnTemplate=uid={0},ou=peop
             &lt;value&gt;authcBasic&lt;/value&gt;
         &lt;/param&gt;
     &lt;/provider&gt;
-</code></pre><p>This happens to be the way that we are currently configuring Shiro for BASIC/LDAP authentication. This same config approach may be used to achieve other authentication mechanisms or variations on this one. We however have not tested additional uses for it for this release.</p><h4><a id="LDAP+Configuration"></a>LDAP Configuration</h4><p>This section discusses the LDAP configuration used above for the Shiro Provider. Some of these configuration elements will need to be customized to reflect your deployment environment.</p><p><strong>main.ldapRealm</strong> - this element indicates the fully qualified classname of the Shiro realm to be used in authenticating the user. The classname provided by default in the sample is the <code>org.apache.shiro.realm.ldap.JndiLdapRealm</code> this implementation provides us with the ability to authenticate but by default has authorization disabled. In order to provide authorization - which is seen by Shiro as dependent on an LDAP schema
  that is specific to each organization - an extension of JndiLdapRealm is generally used to override and implement the doGetAuhtorizationInfo method. In this particular release we are providing a simple authorization provider that can be used along with the Shiro authentication provider.</p><p><strong>main.ldapRealm.userDnTemplate</strong> - in order to bind a simple username to an LDAP server that generally requires a full distinguished name (DN), we must provide the template into which the simple username will be inserted. This template allows for the creation of a DN by injecting the simple username into the common name (CN) portion of the DN. <strong>This element will need to be customized to reflect your deployment environment.</strong> The template provided in the sample is only an example and is valid only within the LDAP schema distributed with Knox and is represented by the users.ldif file in the {GATEWAY_HOME}conf directory.</p><p><strong>main.ldapRealm.contextFactory.url<
 /strong> - this element is the URL that represents the host and port of LDAP server. It also includes the scheme of the protocol to use. This may be either ldap or ldaps depending on whether you are communicating with the LDAP over SSL (higly recommended). <strong>This element will need to be cusomized to reflect your deployment environment.</strong>.</p><p><strong>main.ldapRealm.contextFactory.authenticationMechanism</strong> - this element indicates the type of authentication that should be performed against the LDAP server. The current default value is <code>simple</code> which indicates a simple bind operation. This element should not need to be modified and no mechanism other than a simple bind has been tested for this particular release.</p><p><strong>urls./</strong>** - this element represents a single URL_Ant_Path_Expression and the value the Shiro filter chain to apply to it. This particular sample indicates that all paths into the application have the same Shiro filter cha
 in applied. The paths are relative to the application context path. The use of the value <code>authcBasic</code> here indicates that BASIC authentication is expected for every path into the application. Adding an additional Shiro filter to that chain for validating that the request isSecure() and over SSL can be achieved by changing the value to <code>ssl, authcBasic</code>. It is not likely that you need to change this element for your environment.</p><h4><a id="Active+Directory+-+Special+Note"></a>Active Directory - Special Note</h4><p>You would use LDAP configuration as documented above to authenticate against Active Directory as well.</p><p>Some Active Directory specifc things to keep in mind:</p><p>Typical AD main.ldapRealm.userDnTemplate value looks slightly different, such as  cn={0},cn=users,DC=lab,DC=sample,dc=com</p><p>Please compare this with a typical Apache DS main.ldapRealm.userDnTemplate value and make note of the difference.  uid={0},ou=people,dc=hadoop,dc=apache,dc=
 org</p><p>If your AD is configured to authenticate based on just the cn and password and does not require user DN, you do not have to specify value for main.ldapRealm.userDnTemplate.</p><h4><a id="LDAP+over+SSL+(LDAPS)+Configuration"></a>LDAP over SSL (LDAPS) Configuration</h4><p>In order to communicate with your LDAP server over SSL (again, highly recommended), you will need to modify the topology file in a couple ways and possibly provision some keying material.</p>
+</code></pre><p>This happens to be the way that we are currently configuring Shiro for BASIC/LDAP authentication. This same config approach may be used to achieve other authentication mechanisms or variations on this one. We however have not tested additional uses for it for this release.</p><h4><a id="LDAP+Configuration">LDAP Configuration</a> <a href="#LDAP+Configuration"><img src="markbook-section-link.png"/></a></h4><p>This section discusses the LDAP configuration used above for the Shiro Provider. Some of these configuration elements will need to be customized to reflect your deployment environment.</p><p><strong>main.ldapRealm</strong> - this element indicates the fully qualified classname of the Shiro realm to be used in authenticating the user. The classname provided by default in the sample is the <code>org.apache.shiro.realm.ldap.JndiLdapRealm</code> this implementation provides us with the ability to authenticate but by default has authorization disabled. In order to prov
 ide authorization - which is seen by Shiro as dependent on an LDAP schema that is specific to each organization - an extension of JndiLdapRealm is generally used to override and implement the doGetAuhtorizationInfo method. In this particular release we are providing a simple authorization provider that can be used along with the Shiro authentication provider.</p><p><strong>main.ldapRealm.userDnTemplate</strong> - in order to bind a simple username to an LDAP server that generally requires a full distinguished name (DN), we must provide the template into which the simple username will be inserted. This template allows for the creation of a DN by injecting the simple username into the common name (CN) portion of the DN. <strong>This element will need to be customized to reflect your deployment environment.</strong> The template provided in the sample is only an example and is valid only within the LDAP schema distributed with Knox and is represented by the users.ldif file in the {GATE
 WAY_HOME}conf directory.</p><p><strong>main.ldapRealm.contextFactory.url</strong> - this element is the URL that represents the host and port of LDAP server. It also includes the scheme of the protocol to use. This may be either ldap or ldaps depending on whether you are communicating with the LDAP over SSL (higly recommended). <strong>This element will need to be cusomized to reflect your deployment environment.</strong>.</p><p><strong>main.ldapRealm.contextFactory.authenticationMechanism</strong> - this element indicates the type of authentication that should be performed against the LDAP server. The current default value is <code>simple</code> which indicates a simple bind operation. This element should not need to be modified and no mechanism other than a simple bind has been tested for this particular release.</p><p><strong>urls./</strong>** - this element represents a single URL_Ant_Path_Expression and the value the Shiro filter chain to apply to it. This particular sample ind
 icates that all paths into the application have the same Shiro filter chain applied. The paths are relative to the application context path. The use of the value <code>authcBasic</code> here indicates that BASIC authentication is expected for every path into the application. Adding an additional Shiro filter to that chain for validating that the request isSecure() and over SSL can be achieved by changing the value to <code>ssl, authcBasic</code>. It is not likely that you need to change this element for your environment.</p><h4><a id="Active+Directory+-+Special+Note">Active Directory - Special Note</a> <a href="#Active+Directory+-+Special+Note"><img src="markbook-section-link.png"/></a></h4><p>You would use LDAP configuration as documented above to authenticate against Active Directory as well.</p><p>Some Active Directory specifc things to keep in mind:</p><p>Typical AD main.ldapRealm.userDnTemplate value looks slightly different, such as  cn={0},cn=users,DC=lab,DC=sample,dc=com</p>
 <p>Please compare this with a typical Apache DS main.ldapRealm.userDnTemplate value and make note of the difference.  uid={0},ou=people,dc=hadoop,dc=apache,dc=org</p><p>If your AD is configured to authenticate based on just the cn and password and does not require user DN, you do not have to specify value for main.ldapRealm.userDnTemplate.</p><h4><a id="LDAP+over+SSL+(LDAPS)+Configuration">LDAP over SSL (LDAPS) Configuration</a> <a href="#LDAP+over+SSL+(LDAPS)+Configuration"><img src="markbook-section-link.png"/></a></h4><p>In order to communicate with your LDAP server over SSL (again, highly recommended), you will need to modify the topology file in a couple ways and possibly provision some keying material.</p>
 <ol>
   <li><strong>main.ldapRealm.contextFactory.url</strong> must be changed to have the <code>ldaps</code> protocol scheme and the port must be the SSL listener port on your LDAP server.</li>
   <li>Identity certificate (keypair) provisioned to LDAP server - your LDAP server specific documentation should indicate what is requried for providing a cert or keypair to represent the LDAP server identity to connecting clients.</li>
   <li>Trusting the LDAP Server&rsquo;s public key - if the LDAP Server&rsquo;s identity certificate is issued by a well known and trusted certificate authority and is already represented in the JRE&rsquo;s cacerts truststore then you don&rsquo;t need to do anything for trusting the LDAP server&rsquo;s cert. If, however, the cert is selfsigned or issued by an untrusted authority you will need to either add it to the cacerts keystore or to another truststore that you may direct Knox to utilize through a system property.</li>
-</ol><h4><a id="Session+Configuration"></a>Session Configuration</h4><p>Knox maps each cluster topology to a web application and leverages standard JavaEE session management.</p><p>To configure session idle timeout for the topology, please specify value of parameter sessionTimeout for ShiroProvider in your topology file. If you do not specify the value for this parameter, it defaults to 30minutes.</p><p>The definition would look like the following in the topoloogy file:</p>
+</ol><h4><a id="Session+Configuration">Session Configuration</a> <a href="#Session+Configuration"><img src="markbook-section-link.png"/></a></h4><p>Knox maps each cluster topology to a web application and leverages standard JavaEE session management.</p><p>To configure session idle timeout for the topology, please specify value of parameter sessionTimeout for ShiroProvider in your topology file. If you do not specify the value for this parameter, it defaults to 30minutes.</p><p>The definition would look like the following in the topoloogy file:</p>
 <pre><code>...
 &lt;provider&gt;
     &lt;role&gt;authentication&lt;/role&gt;
@@ -517,7 +517,7 @@ ldapRealm.userDnTemplate=uid={0},ou=peop
     &lt;/param&gt;
 &lt;provider&gt;
 ...
-</code></pre><p>At present, ShiroProvider in Knox leverages JavaEE session to maintain authentication state for a user across requests using JSESSIONID cookie. So, a clieent that authenticated with Knox could pass the JSESSIONID cookie with repeated requests as long as the session has not timed out instead of submitting userid/password with every request. Presenting a valid session cookie in place of userid/password would also perform better as additional credential store lookups are avoided.</p><h3><a id="Identity+Assertion"></a>Identity Assertion</h3><p>The identity assertion provider within Knox plays the critical role of communicating the identity principal to be used within the Hadoop cluster to represent the identity that has been authenticated at the gateway.</p><p>The general responsibilities of the identity assertion provider is to interrogate the current Java Subject that has been established by the authentication or federation provider and:</p>
+</code></pre><p>At present, ShiroProvider in Knox leverages JavaEE session to maintain authentication state for a user across requests using JSESSIONID cookie. So, a clieent that authenticated with Knox could pass the JSESSIONID cookie with repeated requests as long as the session has not timed out instead of submitting userid/password with every request. Presenting a valid session cookie in place of userid/password would also perform better as additional credential store lookups are avoided.</p><h3><a id="Identity+Assertion">Identity Assertion</a> <a href="#Identity+Assertion"><img src="markbook-section-link.png"/></a></h3><p>The identity assertion provider within Knox plays the critical role of communicating the identity principal to be used within the Hadoop cluster to represent the identity that has been authenticated at the gateway.</p><p>The general responsibilities of the identity assertion provider is to interrogate the current Java Subject that has been established by the a
 uthentication or federation provider and:</p>
 <ol>
   <li>determine whether it matches any principal mapping rules and apply them appropriately</li>
   <li>determine whether it matches any group principal mapping rules and apply them</li>
@@ -542,7 +542,7 @@ ldapRealm.userDnTemplate=uid={0},ou=peop
         &lt;value&gt;*=users;hdfs=admin&lt;/value&gt;
     &lt;/param&gt;
 &lt;/provider&gt;
-</code></pre><p>This configuration identifies the same identity assertion provider but does provide principal and group mapping rules. In this case, when a user is authenticated as &ldquo;guest&rdquo; his identity is actually asserted to the Hadoop cluster as &ldquo;hdfs&rdquo;. In addition, since there are group principal mappings defined, he will also be considered as a member of the groups &ldquo;users&rdquo; and &ldquo;admin&rdquo;. In this particular example the wildcard &quot;*&ldquo; is used to indicate that all authenticated users need to be considered members of the &rdquo;users&ldquo; group and that only the user &rdquo;hdfs&ldquo; is mapped to be a member of the &rdquo;admin&quot; group.</p><p><strong>NOTE: These group memberships are currently only meaningful for Service Level Authorization using the AclsAuthorization provider. The groups are not currently asserted to the Hadoop cluster at this time. See the Authorization section within this guide to see how this is used
 .</strong></p><p>The principal mapping aspect of the identity assertion provider is important to understand in order to fully utilize the authorization features of this provider.</p><p>This feature allows us to map the authenticated principal to a runas or impersonated principal to be asserted to the Hadoop services in the backend.</p><p>When a principal mapping is defined that results in an impersonated principal being created the impersonated principal is then the effective principal.</p><p>If there is no mapping to another principal then the authenticated or primary principal is then the effective principal.</p><h4><a id="Principal+Mapping"></a>Principal Mapping</h4>
+</code></pre><p>This configuration identifies the same identity assertion provider but does provide principal and group mapping rules. In this case, when a user is authenticated as &ldquo;guest&rdquo; his identity is actually asserted to the Hadoop cluster as &ldquo;hdfs&rdquo;. In addition, since there are group principal mappings defined, he will also be considered as a member of the groups &ldquo;users&rdquo; and &ldquo;admin&rdquo;. In this particular example the wildcard &quot;*&ldquo; is used to indicate that all authenticated users need to be considered members of the &rdquo;users&ldquo; group and that only the user &rdquo;hdfs&ldquo; is mapped to be a member of the &rdquo;admin&quot; group.</p><p><strong>NOTE: These group memberships are currently only meaningful for Service Level Authorization using the AclsAuthorization provider. The groups are not currently asserted to the Hadoop cluster at this time. See the Authorization section within this guide to see how this is used
 .</strong></p><p>The principal mapping aspect of the identity assertion provider is important to understand in order to fully utilize the authorization features of this provider.</p><p>This feature allows us to map the authenticated principal to a runas or impersonated principal to be asserted to the Hadoop services in the backend.</p><p>When a principal mapping is defined that results in an impersonated principal being created the impersonated principal is then the effective principal.</p><p>If there is no mapping to another principal then the authenticated or primary principal is then the effective principal.</p><h4><a id="Principal+Mapping">Principal Mapping</a> <a href="#Principal+Mapping"><img src="markbook-section-link.png"/></a></h4>
 <pre><code>&lt;param&gt;
     &lt;name&gt;principal.mapping&lt;/name&gt;
     &lt;value&gt;{primaryPrincipal}[,...]={impersonatedPrincipal}[;...]&lt;/value&gt;
@@ -557,7 +557,7 @@ ldapRealm.userDnTemplate=uid={0},ou=peop
     &lt;name&gt;principal.mapping&lt;/name&gt;
     &lt;value&gt;guest,alice=hdfs;mary=alice2&lt;/value&gt;
 &lt;/param&gt;
-</code></pre><h4><a id="Group+Principal+Mapping"></a>Group Principal Mapping</h4>
+</code></pre><h4><a id="Group+Principal+Mapping">Group Principal Mapping</a> <a href="#Group+Principal+Mapping"><img src="markbook-section-link.png"/></a></h4>
 <pre><code>&lt;param&gt;
     &lt;name&gt;group.principal.mapping&lt;/name&gt;
     &lt;value&gt;{userName[,*|userName...]}={groupName[,groupName...]}[,...]&lt;/value&gt;
@@ -567,22 +567,22 @@ ldapRealm.userDnTemplate=uid={0},ou=peop
     &lt;name&gt;group.principal.mapping&lt;/name&gt;
     &lt;value&gt;*=users;hdfs=admin&lt;/value&gt;
 &lt;/param&gt;
-</code></pre><p>this configuration indicates that all (*) authenticated users are members of the &ldquo;users&rdquo; group and that user &ldquo;hdfs&rdquo; is a member of the admin group. Group principal mapping has been added along with the authorization provider described in this document.</p><h3><a id="Authorization"></a>Authorization</h3><h4><a id="Service+Level+Authorization"></a>Service Level Authorization</h4><p>The Knox Gateway has an out-of-the-box authorization provider that allows administrators to restrict access to the individual services within a Hadoop cluster.</p><p>This provider utilizes a simple and familiar pattern of using ACLs to protect Hadoop resources by specifying users, groups and ip addresses that are permitted access.</p><p>Note: In the examples below {serviceName} represents a real service name (e.g. WEBHDFS) and would be replaced with these values in an actual configuration.</p><h5><a id="Usecases"></a>Usecases</h5><h6><a id="USECASE-1:+Restrict+access+
 to+specific+Hadoop+services+to+specific+Users"></a>USECASE-1: Restrict access to specific Hadoop services to specific Users</h6>
+</code></pre><p>this configuration indicates that all (*) authenticated users are members of the &ldquo;users&rdquo; group and that user &ldquo;hdfs&rdquo; is a member of the admin group. Group principal mapping has been added along with the authorization provider described in this document.</p><h3><a id="Authorization">Authorization</a> <a href="#Authorization"><img src="markbook-section-link.png"/></a></h3><h4><a id="Service+Level+Authorization">Service Level Authorization</a> <a href="#Service+Level+Authorization"><img src="markbook-section-link.png"/></a></h4><p>The Knox Gateway has an out-of-the-box authorization provider that allows administrators to restrict access to the individual services within a Hadoop cluster.</p><p>This provider utilizes a simple and familiar pattern of using ACLs to protect Hadoop resources by specifying users, groups and ip addresses that are permitted access.</p><p>Note: In the examples below {serviceName} represents a real service name (e.g. WEBHDF
 S) and would be replaced with these values in an actual configuration.</p><h5><a id="Usecases">Usecases</a> <a href="#Usecases"><img src="markbook-section-link.png"/></a></h5><h6><a id="USECASE-1:+Restrict+access+to+specific+Hadoop+services+to+specific+Users">USECASE-1: Restrict access to specific Hadoop services to specific Users</a> <a href="#USECASE-1:+Restrict+access+to+specific+Hadoop+services+to+specific+Users"><img src="markbook-section-link.png"/></a></h6>
 <pre><code>&lt;param&gt;
     &lt;name&gt;{serviceName}.acl&lt;/name&gt;
     &lt;value&gt;guest;*;*&lt;/value&gt;
 &lt;/param&gt;
-</code></pre><h6><a id="USECASE-2:+Restrict+access+to+specific+Hadoop+services+to+specific+Groups"></a>USECASE-2: Restrict access to specific Hadoop services to specific Groups</h6>
+</code></pre><h6><a id="USECASE-2:+Restrict+access+to+specific+Hadoop+services+to+specific+Groups">USECASE-2: Restrict access to specific Hadoop services to specific Groups</a> <a href="#USECASE-2:+Restrict+access+to+specific+Hadoop+services+to+specific+Groups"><img src="markbook-section-link.png"/></a></h6>
 <pre><code>&lt;param&gt;
     &lt;name&gt;{serviceName}.acls&lt;/name&gt;
     &lt;value&gt;*;admins;*&lt;/value&gt;
 &lt;/param&gt;
-</code></pre><h6><a id="USECASE-3:+Restrict+access+to+specific+Hadoop+services+to+specific+Remote+IPs"></a>USECASE-3: Restrict access to specific Hadoop services to specific Remote IPs</h6>
+</code></pre><h6><a id="USECASE-3:+Restrict+access+to+specific+Hadoop+services+to+specific+Remote+IPs">USECASE-3: Restrict access to specific Hadoop services to specific Remote IPs</a> <a href="#USECASE-3:+Restrict+access+to+specific+Hadoop+services+to+specific+Remote+IPs"><img src="markbook-section-link.png"/></a></h6>
 <pre><code>&lt;param&gt;
     &lt;name&gt;{serviceName}.acl&lt;/name&gt;
     &lt;value&gt;*;*;127.0.0.1&lt;/value&gt;
 &lt;/param&gt;
-</code></pre><h6><a id="USECASE-4:+Restrict+access+to+specific+Hadoop+services+to+specific+Users+OR+users+within+specific+Groups"></a>USECASE-4: Restrict access to specific Hadoop services to specific Users OR users within specific Groups</h6>
+</code></pre><h6><a id="USECASE-4:+Restrict+access+to+specific+Hadoop+services+to+specific+Users+OR+users+within+specific+Groups">USECASE-4: Restrict access to specific Hadoop services to specific Users OR users within specific Groups</a> <a href="#USECASE-4:+Restrict+access+to+specific+Hadoop+services+to+specific+Users+OR+users+within+specific+Groups"><img src="markbook-section-link.png"/></a></h6>
 <pre><code>&lt;param&gt;
     &lt;name&gt;{serviceName}.acl.mode&lt;/name&gt;
     &lt;value&gt;OR&lt;/value&gt;
@@ -591,7 +591,7 @@ ldapRealm.userDnTemplate=uid={0},ou=peop
     &lt;name&gt;{serviceName}.acl&lt;/name&gt;
     &lt;value&gt;guest;admin;*&lt;/value&gt;
 &lt;/param&gt;
-</code></pre><h6><a id="USECASE-5:+Restrict+access+to+specific+Hadoop+services+to+specific+Users+OR+users+from+specific+Remote+IPs"></a>USECASE-5: Restrict access to specific Hadoop services to specific Users OR users from specific Remote IPs</h6>
+</code></pre><h6><a id="USECASE-5:+Restrict+access+to+specific+Hadoop+services+to+specific+Users+OR+users+from+specific+Remote+IPs">USECASE-5: Restrict access to specific Hadoop services to specific Users OR users from specific Remote IPs</a> <a href="#USECASE-5:+Restrict+access+to+specific+Hadoop+services+to+specific+Users+OR+users+from+specific+Remote+IPs"><img src="markbook-section-link.png"/></a></h6>
 <pre><code>&lt;param&gt;
     &lt;name&gt;{serviceName}.acl.mode&lt;/name&gt;
     &lt;value&gt;OR&lt;/value&gt;
@@ -600,7 +600,7 @@ ldapRealm.userDnTemplate=uid={0},ou=peop
     &lt;name&gt;{serviceName}.acl&lt;/name&gt;
     &lt;value&gt;guest;*;127.0.0.1&lt;/value&gt;
 &lt;/param&gt;
-</code></pre><h6><a id="USECASE-6:+Restrict+access+to+specific+Hadoop+services+to+users+within+specific+Groups+OR+from+specific+Remote+IPs"></a>USECASE-6: Restrict access to specific Hadoop services to users within specific Groups OR from specific Remote IPs</h6>
+</code></pre><h6><a id="USECASE-6:+Restrict+access+to+specific+Hadoop+services+to+users+within+specific+Groups+OR+from+specific+Remote+IPs">USECASE-6: Restrict access to specific Hadoop services to users within specific Groups OR from specific Remote IPs</a> <a href="#USECASE-6:+Restrict+access+to+specific+Hadoop+services+to+users+within+specific+Groups+OR+from+specific+Remote+IPs"><img src="markbook-section-link.png"/></a></h6>
 <pre><code>&lt;param&gt;
     &lt;name&gt;{serviceName}.acl.mode&lt;/name&gt;
     &lt;value&gt;OR&lt;/value&gt;
@@ -609,7 +609,7 @@ ldapRealm.userDnTemplate=uid={0},ou=peop
     &lt;name&gt;{serviceName}.acl&lt;/name&gt;
     &lt;value&gt;*;admin;127.0.0.1&lt;/value&gt;
 &lt;/param&gt;
-</code></pre><h6><a id="USECASE-7:+Restrict+access+to+specific+Hadoop+services+to+specific+Users+OR+users+within+specific+Groups+OR+from+specific+Remote+IPs"></a>USECASE-7: Restrict access to specific Hadoop services to specific Users OR users within specific Groups OR from specific Remote IPs</h6>
+</code></pre><h6><a id="USECASE-7:+Restrict+access+to+specific+Hadoop+services+to+specific+Users+OR+users+within+specific+Groups+OR+from+specific+Remote+IPs">USECASE-7: Restrict access to specific Hadoop services to specific Users OR users within specific Groups OR from specific Remote IPs</a> <a href="#USECASE-7:+Restrict+access+to+specific+Hadoop+services+to+specific+Users+OR+users+within+specific+Groups+OR+from+specific+Remote+IPs"><img src="markbook-section-link.png"/></a></h6>
 <pre><code>&lt;param&gt;
     &lt;name&gt;{serviceName}.acl.mode&lt;/name&gt;
     &lt;value&gt;OR&lt;/value&gt;
@@ -618,27 +618,27 @@ ldapRealm.userDnTemplate=uid={0},ou=peop
     &lt;name&gt;{serviceName}.acl&lt;/name&gt;
     &lt;value&gt;guest;admin;127.0.0.1&lt;/value&gt;
 &lt;/param&gt;
-</code></pre><h6><a id="USECASE-8:+Restrict+access+to+specific+Hadoop+services+to+specific+Users+AND+users+within+specific+Groups"></a>USECASE-8: Restrict access to specific Hadoop services to specific Users AND users within specific Groups</h6>
+</code></pre><h6><a id="USECASE-8:+Restrict+access+to+specific+Hadoop+services+to+specific+Users+AND+users+within+specific+Groups">USECASE-8: Restrict access to specific Hadoop services to specific Users AND users within specific Groups</a> <a href="#USECASE-8:+Restrict+access+to+specific+Hadoop+services+to+specific+Users+AND+users+within+specific+Groups"><img src="markbook-section-link.png"/></a></h6>
 <pre><code>&lt;param&gt;
     &lt;name&gt;{serviceName}.acl&lt;/name&gt;
     &lt;value&gt;guest;admin;*&lt;/value&gt;
 &lt;/param&gt;
-</code></pre><h6><a id="USECASE-9:+Restrict+access+to+specific+Hadoop+services+to+specific+Users+AND+users+from+specific+Remote+IPs"></a>USECASE-9: Restrict access to specific Hadoop services to specific Users AND users from specific Remote IPs</h6>
+</code></pre><h6><a id="USECASE-9:+Restrict+access+to+specific+Hadoop+services+to+specific+Users+AND+users+from+specific+Remote+IPs">USECASE-9: Restrict access to specific Hadoop services to specific Users AND users from specific Remote IPs</a> <a href="#USECASE-9:+Restrict+access+to+specific+Hadoop+services+to+specific+Users+AND+users+from+specific+Remote+IPs"><img src="markbook-section-link.png"/></a></h6>
 <pre><code>&lt;param&gt;
     &lt;name&gt;{serviceName}.acl&lt;/name&gt;
     &lt;value&gt;guest;*;127.0.0.1&lt;/value&gt;
 &lt;/param&gt;
-</code></pre><h6><a id="USECASE-10:+Restrict+access+to+specific+Hadoop+services+to+users+within+specific+Groups+AND+from+specific+Remote+IPs"></a>USECASE-10: Restrict access to specific Hadoop services to users within specific Groups AND from specific Remote IPs</h6>
+</code></pre><h6><a id="USECASE-10:+Restrict+access+to+specific+Hadoop+services+to+users+within+specific+Groups+AND+from+specific+Remote+IPs">USECASE-10: Restrict access to specific Hadoop services to users within specific Groups AND from specific Remote IPs</a> <a href="#USECASE-10:+Restrict+access+to+specific+Hadoop+services+to+users+within+specific+Groups+AND+from+specific+Remote+IPs"><img src="markbook-section-link.png"/></a></h6>
 <pre><code>&lt;param&gt;
     &lt;name&gt;{serviceName}.acl&lt;/name&gt;
     &lt;value&gt;*;admins;127.0.0.1&lt;/value&gt;
 &lt;/param&gt;
-</code></pre><h6><a id="USECASE-11:+Restrict+access+to+specific+Hadoop+services+to+specific+Users+AND+users+within+specific+Groups+AND+from+specific+Remote+IPs"></a>USECASE-11: Restrict access to specific Hadoop services to specific Users AND users within specific Groups AND from specific Remote IPs</h6>
+</code></pre><h6><a id="USECASE-11:+Restrict+access+to+specific+Hadoop+services+to+specific+Users+AND+users+within+specific+Groups+AND+from+specific+Remote+IPs">USECASE-11: Restrict access to specific Hadoop services to specific Users AND users within specific Groups AND from specific Remote IPs</a> <a href="#USECASE-11:+Restrict+access+to+specific+Hadoop+services+to+specific+Users+AND+users+within+specific+Groups+AND+from+specific+Remote+IPs"><img src="markbook-section-link.png"/></a></h6>
 <pre><code>&lt;param&gt;
     &lt;name&gt;{serviceName}.acl&lt;/name&gt;
     &lt;value&gt;guest;admins;127.0.0.1&lt;/value&gt;
 &lt;/param&gt;
-</code></pre><h4><a id="Configuration"></a>Configuration</h4><p>ACLs are bound to services within the topology descriptors by introducing the authorization provider with configuration like:</p>
+</code></pre><h4><a id="Configuration">Configuration</a> <a href="#Configuration"><img src="markbook-section-link.png"/></a></h4><p>ACLs are bound to services within the topology descriptors by introducing the authorization provider with configuration like:</p>
 <pre><code>&lt;provider&gt;
     &lt;role&gt;authorization&lt;/role&gt;
     &lt;name&gt;AclsAuthz&lt;/name&gt;
@@ -689,7 +689,7 @@ ldapRealm.userDnTemplate=uid={0},ou=peop
   <li>the user is &ldquo;hdfs&rdquo; or &ldquo;guest&rdquo; OR</li>
   <li>the user is in &ldquo;admin&rdquo; group OR</li>
   <li>the request is coming from 127.0.0.2 or 127.0.0.3</li>
-</ol><h4><a id="Other+Related+Configuration"></a>Other Related Configuration</h4><p>The principal mapping aspect of the identity assertion provider is important to understand in order to fully utilize the authorization features of this provider.</p><p>This feature allows us to map the authenticated principal to a runas or impersonated principal to be asserted to the Hadoop services in the backend. When a principal mapping is defined that results in an impersonated principal being created the impersonated principal is then the effective principal. If there is no mapping to another principal then the authenticated or primary principal is then the effective principal. Principal mapping has actually been available in the identity assertion provider from the beginning of Knox and is documented fully in the Identity Assertion section of this guide.</p>
+</ol><h4><a id="Other+Related+Configuration">Other Related Configuration</a> <a href="#Other+Related+Configuration"><img src="markbook-section-link.png"/></a></h4><p>The principal mapping aspect of the identity assertion provider is important to understand in order to fully utilize the authorization features of this provider.</p><p>This feature allows us to map the authenticated principal to a runas or impersonated principal to be asserted to the Hadoop services in the backend. When a principal mapping is defined that results in an impersonated principal being created the impersonated principal is then the effective principal. If there is no mapping to another principal then the authenticated or primary principal is then the effective principal. Principal mapping has actually been available in the identity assertion provider from the beginning of Knox and is documented fully in the Identity Assertion section of this guide.</p>
 <pre><code>&lt;param&gt;
     &lt;name&gt;principal.mapping&lt;/name&gt;
     &lt;value&gt;{primaryPrincipal}[,...]={impersonatedPrincipal}[;...]&lt;/value&gt;
@@ -812,15 +812,15 @@ ldapRealm.userDnTemplate=uid={0},ou=peop
         &lt;url&gt;http://localhost:10000&lt;/url&gt;
     &lt;/service&gt;
 &lt;/topology&gt;
-</code></pre><h3><a id="Secure+Clusters"></a>Secure Clusters</h3><p>See these documents for setting up a secure Hadoop cluster <a href="http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/ClusterSetup.html#Configuration_in_Secure_Mode">http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/ClusterSetup.html#Configuration_in_Secure_Mode</a> <a href="http://docs.hortonworks.com/HDPDocuments/HDP1/HDP-1.3.1/bk_installing_manually_book/content/rpm-chap14.html">http://docs.hortonworks.com/HDPDocuments/HDP1/HDP-1.3.1/bk_installing_manually_book/content/rpm-chap14.html</a></p><p>Once you have a Hadoop cluster that is using Kerberos for authentication, you have to do the following to configure Knox to work with that cluster.</p><h4><a id="Create+Unix+account+for+Knox+on+Hadoop+master+nodes"></a>Create Unix account for Knox on Hadoop master nodes</h4>

[... 764 lines stripped ...]