You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@knox.apache.org by km...@apache.org on 2013/09/26 20:21:05 UTC

svn commit: r1526638 [2/3] - in /incubator/knox: site/books/knox-incubating-0-3-0/ trunk/books/0.3.0/ trunk/books/common/ trunk/markbook/src/main/java/org/apache/hadoop/gateway/markbook/

Modified: incubator/knox/site/books/knox-incubating-0-3-0/knox-incubating-0-3-0.html
URL: http://svn.apache.org/viewvc/incubator/knox/site/books/knox-incubating-0-3-0/knox-incubating-0-3-0.html?rev=1526638&r1=1526637&r2=1526638&view=diff
==============================================================================
--- incubator/knox/site/books/knox-incubating-0-3-0/knox-incubating-0-3-0.html (original)
+++ incubator/knox/site/books/knox-incubating-0-3-0/knox-incubating-0-3-0.html Thu Sep 26 18:21:04 2013
@@ -16,20 +16,25 @@
 --><p><link href="book.css" rel="stylesheet"/></p>
 <div id="logo" style="width:100%; text-align:center">
   <img src="knox-logo.gif" alt="Knox"/>
-</div><p><br> <img src="apache-logo.gif"  alt="Apache"/> <img src="apache-incubator-logo.png"  alt="Incubator"/></p><h1>Apache Knox Gateway 0.3.0 (Incubator)</h1><h2>Table Of Contents</h2>
+</div><p><br> <img src="apache-logo.gif"  alt="Apache"/> <img src="apache-incubator-logo.png"  alt="Incubator"/></p><h1><a id="Apache+Knox+Gateway+0.3.0+(Incubator)"></a>Apache Knox Gateway 0.3.0 (Incubator)</h1><h2><a id="Table+Of+Contents"></a>Table Of Contents</h2>
 <ul>
   <li><a href="#Introduction">Introduction</a></li>
-  <li><a href="#Download">Download</a></li>
-  <li><a href="#Installation">Installation</a></li>
-  <li><a href="#Getting+Started">Getting Started</a></li>
-  <li><a href="#Supported+Services">Supported Services</a></li>
-  <li><a href="#Sandbox+Configuration">Sandbox Configuration</a></li>
-  <li><a href="#Usage+Examples">Usage Examples</a></li>
+  <li><a href="#Getting+Started">Getting Started</a>
+  <ul>
+    <li><a href="#Requirements">Requirements</a></li>
+    <li><a href="#Download">Download</a></li>
+    <li><a href="#Verify">Verify</a></li>
+    <li><a href="#Install">Install</a></li>
+    <li><a href="#Supported+Services">Supported Services</a></li>
+    <li><a href="#Basic+Usage">Basic Usage</a></li>
+    <li><a href="#Sandbox+Configuration">Sandbox Configuration</a></li>
+  </ul></li>
   <li><a href="#Gateway+Details">Gateway Details</a>
   <ul>
     <li><a href="#Authentication">Authentication</a></li>
     <li><a href="#Authorization">Authorization</a></li>
     <li><a href="#Configuration">Configuration</a></li>
+    <li><a href="#Secure+Clusters">Secure Clusters</a></li>
   </ul></li>
   <li><a href="#Client+Details">Client Details</a></li>
   <li><a href="#Service+Details">Service Details</a>
@@ -40,74 +45,36 @@
     <li><a href="#HBase">HBase/Starbase</a></li>
     <li><a href="#Hive">Hive</a></li>
   </ul></li>
-  <li><a href="#Secure+Clusters">Secure Clusters</a></li>
   <li><a href="#Trouble+Shooting">Trouble Shooting</a></li>
-  <li><a href="#Release+Verification">Release Verification</a></li>
   <li><a href="#Export+Controls">Export Controls</a></li>
-</ul><h2><a id="Introduction"></a>Introduction</h2><p>TODO</p><h2><a id="Requirements"></a>Requirements</h2><h3>Java</h3><p>Java 1.6 or later is required for the Knox Gateway runtime. Use the command below to check the version of Java installed on the system where Knox will be running.</p>
+</ul><h2><a id="Introduction"></a>Introduction</h2><p>The Apache Knox Gateway is a system that provides a single point of authentication and access for Apache Hadoop services in a cluster. The goal is to simplify Hadoop security for both users (i.e. who access the cluster data and execute jobs) and operators (i.e. who control access and manage the cluster). The gateway runs as a server (or cluster of servers) that provide centralized access to one or more Hadoop clusters. In general the goals of the gateway are as follows:</p>
+<ul>
+  <li>Provide perimeter security for Hadoop REST APIs to make Hadoop security setup easier</li>
+  <li>Support authentication and token verification security scenarios</li>
+  <li>Deliver users a single URL end-point that aggregates capabilities for data and jobs</li>
+  <li>Enable integration with enterprise and cloud identity management environments</li>
+</ul><h2><a id="Getting+Started"></a>Getting Started</h2><p>This section provides everything you need to know to get the gateway up and running against a Sandbox VM Hadoop cluster.</p><h3><a id="Requirements"></a>Requirements</h3><h4><a id="Java"></a>Java</h4><p>Java 1.6 or later is required for the Knox Gateway runtime. Use the command below to check the version of Java installed on the system where Knox will be running.</p>
 <pre><code>java -version
-</code></pre><h3>Hadoop</h3><p>An an existing Hadoop 1.x or 2.x cluster is required for Knox to protect. One of the easiest ways to ensure this it to utilize a HDP Sandbox VM. It is possible to use a Hadoop cluster deployed on EC2 but this will require additional configuration. Currently if this Hadoop cluster is secured with Kerberos only WebHDFS will work and additional configuration is required.</p><p>The Hadoop cluster should be ensured to have at least WebHDFS, WebHCat (i.e. Templeton) and Oozie configured, deployed and running. HBase/Stargate and Hive can also be accessed via the Knox Gateway given the proper versions and configuration.</p><p>The instructions that follow assume that the Gateway is <em>not</em> collocated with the Hadoop clusters themselves and (most importantly) that the hostnames and IP addresses of the cluster services are accessible by the gateway where ever it happens to be running. All of the instructions and samples are tailored to work &ldquo;out of the
  box&rdquo; against a Hortonworks Sandbox 2.x VM.</p><p>This release of the Apache Knox Gateway has been tested against the <a href="http://hortonworks.com/products/hortonworks-sandbox/">Hortonworks Sandbox 2.0</a>.</p><h2><a id="Download"></a>Download</h2><p>Download and extract the knox-{VERSION}.zip}} file into the installation directory that will contain your <a id="\{GATEWAY_HOME\"></a>{GATEWAY_HOME}. You can find the downloads for Knox releases on the [Apache mirrors|http://www.apache.org/dyn/closer.cgi/incubator/knox/].</p>
+</code></pre><h4><a id="Hadoop"></a>Hadoop</h4><p>An an existing Hadoop 1.x or 2.x cluster is required for Knox to protect. One of the easiest ways to ensure this it to utilize a Hortonworks Sandbox VM. It is possible to use a Hadoop cluster deployed on EC2 but this will require additional configuration not covered here. It is also possible to use a limited set of services in Hadoop cluster secured with Kerberos. This too required additional configuration that is not described here.</p><p>The Hadoop cluster should be ensured to have at least WebHDFS, WebHCat (i.e. Templeton) and Oozie configured, deployed and running. HBase/Stargate and Hive can also be accessed via the Knox Gateway given the proper versions and configuration.</p><p>The instructions that follow assume a few things:</p>
+<ol>
+  <li>The gateway is <em>not</em> collocated with the Hadoop clusters themselves</li>
+  <li>The host names and IP addresses of the cluster services are accessible by the gateway where ever it happens to be running.</li>
+</ol><p>All of the instructions and samples provided here are tailored and tested to work &ldquo;out of the box&rdquo; against a <a href="http://hortonworks.com/products/hortonworks-sandbox">Hortonworks Sandbox 2.x VM</a>.</p><h3><a id="Download"></a>Download</h3><p>Download and extract the knox-{VERSION}.zip file into the installation directory. This directory will be referred to as your <code>{GATEWAY_HOME}</code>. You can find the downloads for Knox releases on the <a href="http://www.apache.org/dyn/closer.cgi/incubator/knox">Apache mirrors</a>.</p>
 <ul>
   <li>Source archive: <a href="http://www.apache.org/dyn/closer.cgi/incubator/knox/0.3.0/knox-incubating-0.3.0-src.zip">knox-incubating-0.3.0-src.zip</a> (<a href="http://www.apache.org/dist/incubator/knox/0.3.0/knox-0.3.0-incubating-src.zip.asc">PGP signature</a>, <a href="http://www.apache.org/dist/incubator/knox/0.3.0/knox-incubating-0.3.0-src.zip.sha">SHA1 digest</a>, <a href="http://www.apache.org/dist/incubator/knox/0.3.0/knox-incubating-0.3.0-src.zip.md5">MD5 digest</a>)</li>
   <li>Binary archive: <a href="http://www.apache.org/dyn/closer.cgi/incubator/knox/0.3.0/knox-incubating-0.3.0.zip">knox-incubating-0.3.0.zip</a> (<a href="http://www.apache.org/dist/incubator/knox/0.3.0/knox-incubating-0.3.0.zip.asc">PGP signature</a>, <a href="http://www.apache.org/dist/incubator/knox/0.3.0/knox-incubating-0.3.0.zip.sha">SHA1 digest</a>, <a href="http://www.apache.org/dist/incubator/knox/0.3.0/knox-incubating-0.3.0.zip.md5">MD5 digest</a>)</li>
-</ul>
-<table>
-  <thead>
-    <tr>
-      <th><img src="bulb.png"  alt="$"/> Important </th>
-    </tr>
-  </thead>
-  <tbody>
-    <tr>
-      <td>Please ensure that you validate the integrity of any downloaded files as described <a href="#Release+Verification">below</a>. </td>
-    </tr>
-  </tbody>
-</table><p>Apache Knox Gateway releases are available under the <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>. See the NOTICE file contained in each release artifact for applicable copyright attribution notices.</p><h2><a id="Installation"></a>Installation</h2><h3>ZIP</h3><p>Download and extract the <code>knox-{VERSION}.zip</code> file into the installation directory that will contain your <code>{GATEWAY_HOME}</code>. You can find the downloads for Knox releases on the <a href="http://www.apache.org/dyn/closer.cgi/incubator/knox/">Apache mirrors</a>.</p>
+</ul><p>Apache Knox Gateway releases are available under the <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>. See the NOTICE file contained in each release artifact for applicable copyright attribution notices.</p><h2>{{Verify}}</h2><p>It is essential that you verify the integrity of the downloaded files using the PGP signatures. Please read Verifying Apache HTTP Server Releases for more information on why you should verify our releases.</p><p>The PGP signatures can be verified using PGP or GPG. First download the KEYS file as well as the .asc signature files for the relevant release packages. Make sure you get these files from the main distribution directory, rather than from a mirror. Then verify the signatures using one of the methods below.</p>
+<pre><code>% pgpk -a KEYS
+% pgpv knox-incubating-0.3.0.zip.asc
+</code></pre><p>or</p>
+<pre><code>% pgp -ka KEYS
+% pgp knox-incubating-0.3.0.zip.asc
+</code></pre><p>or</p>
+<pre><code>% gpg --import KEYS
+% gpg --verify knox-incubating-0.3.0.zip.asc
+</code></pre><h3><a id="Install"></a>Install</h3><h4><a id="ZIP"></a>ZIP</h4><p>Download and extract the <code>knox-{VERSION}.zip</code> file into the installation directory that will contain your <code>{GATEWAY_HOME}</code>. You can find the downloads for Knox releases on the <a href="http://www.apache.org/dyn/closer.cgi/incubator/knox">Apache mirrors</a>.</p>
 <pre><code>jar xf knox-{VERSION}.zip
-</code></pre><p>This will create a directory <code>knox-{VERSION}</code> in your current directory.</p><h3>RPM</h3><p>TODO</p><h3>Layout</h3><p>TODO - Describe the purpose of all of the directories</p><h2><a id="Getting+Started"></a>Getting Started</h2><h3>2. Enter the <code>{GATEWAY_HOME}</code> directory</h3>
-<pre><code>cd knox-{VERSION}
-</code></pre><p>The fully qualified name of this directory will be referenced as <a id="\{GATEWAY_HOME\"></a>{GATEWAY_HOME} throughout the remainder of this document.</p><h3>3. Start the demo LDAP server (ApacheDS)</h3><p>First, understand that the LDAP server provided here is for demonstration purposes. You may configure the LDAP specifics within the topology descriptor for the cluster as described in step 5 below, in order to customize what LDAP instance to use. The assumption is that most users will leverage the demo LDAP server while evaluating this release and should therefore continue with the instructions here in step 3.</p><p>Edit <a id="\{GATEWAY_HOME\}/conf/users.ldif"></a>{GATEWAY_HOME}/conf/users.ldif if required and add your users and groups to the file. A sample end user &ldquo;bob&rdquo; has been already included. Note that the passwords in this file are &ldquo;fictitious&rdquo; and have nothing to do with the actual accounts on the Hadoop cluster you are using. There
  is also a copy of this file in the templates directory that you can use to start over if necessary.</p><p>Start the LDAP server - pointing it to the config dir where it will find the users.ldif file in the conf directory.</p>
-<pre><code>java -jar bin/ldap.jar conf &amp;
-</code></pre><p>There are a number of log messages of the form <a id="Created+null."></a>Created null. that can safely be ignored. Take note of the port on which it was started as this needs to match later configuration.</p><h3>4. Start the Gateway server</h3>
-<pre><code>java -jar bin/server.jar
-</code></pre><p>Take note of the port identified in the logging output as you will need this for accessing the gateway.</p><p>The server will prompt you for the master secret (password). This secret is used to secure artifacts used to secure artifacts used by the gateway server for things like SSL, credential/password aliasing. This secret will have to be entered at startup unless you choose to persist it. Remember this secret and keep it safe. It represents the keys to the kingdom. See the Persisting the Master section for more information.</p><h3>5. Configure the Gateway with the topology of your Hadoop cluster</h3><p>Edit the file <a id="\{GATEWAY_HOME\}/deployments/sample.xml"></a>{GATEWAY_HOME}/deployments/sample.xml</p><p>Change the host and port in the urls of the <a id="<service>"></a><service> elements for NAMENODE, TEMPLETON and OOZIE services to match your Hadoop cluster deployment.</p><p>The default configuration contains the LDAP URL for a LDAP server. By default that f
 ile is configured to access the demo ApacheDS based LDAP server and its default configuration. By default, this server listens on port 33389. Optionally, you can change the LDAP URL for the LDAP server to be used for authentication. This is set via the main.ldapRealm.contextFactory.url property in the <a id="<gateway><provider><authentication>"></a><gateway><provider><authentication> section.</p><p>Save the file. The directory <a id="\{GATEWAY_HOME\}/deployments"></a>{GATEWAY_HOME}/deployments is monitored by the Gateway server and reacts to the discovery of a new or changed cluster topology descriptor by provisioning the endpoints and required filter chains to serve the needs of each cluster as described by the topology file. Note that the name of the file excluding the extension is also used as the path for that cluster in the URL. So for example the sample.xml file will result in Gateway URLs of the form <a id="\[http://\]"></a>[<a href="http://\]<a">http://\]<a</a> id=&ldquo;{}{
 gateway-host}:{gateway-port}/gateway/sample/namenode/api/v1&rdquo;&gt;</a>{}{gateway-host}:{gateway-port}/gateway/sample/namenode/api/v1</p><h3>6. Test the installation and configuration of your Gateway</h3><p>Invoke the LISTSATUS operation on HDFS represented by your configured NAMENODE by using your web browser or curl:</p>
-<pre><code>curl -i -k -u bob:bob-password -X GET \
-    &#39;https://localhost:8443/gateway/sample/namenode/api/v1/?op=LISTSTATUS&#39;
-</code></pre><p>The results of the above command should result in something to along the lines of the output below. The exact information returned is subject to the content within HDFS in your Hadoop cluster.</p>
-<pre><code>HTTP/1.1 200 OK
-Content-Type: application/json
-Content-Length: 760
-Server: Jetty(6.1.26)
-
-{&quot;FileStatuses&quot;:{&quot;FileStatus&quot;:[
-{&quot;accessTime&quot;:0,&quot;blockSize&quot;:0,&quot;group&quot;:&quot;hdfs&quot;,&quot;length&quot;:0,&quot;modificationTime&quot;:1350595859762,&quot;owner&quot;:&quot;hdfs&quot;,&quot;pathSuffix&quot;:&quot;apps&quot;,&quot;permission&quot;:&quot;755&quot;,&quot;replication&quot;:0,&quot;type&quot;:&quot;DIRECTORY&quot;},
-{&quot;accessTime&quot;:0,&quot;blockSize&quot;:0,&quot;group&quot;:&quot;mapred&quot;,&quot;length&quot;:0,&quot;modificationTime&quot;:1350595874024,&quot;owner&quot;:&quot;mapred&quot;,&quot;pathSuffix&quot;:&quot;mapred&quot;,&quot;permission&quot;:&quot;755&quot;,&quot;replication&quot;:0,&quot;type&quot;:&quot;DIRECTORY&quot;},
-{&quot;accessTime&quot;:0,&quot;blockSize&quot;:0,&quot;group&quot;:&quot;hdfs&quot;,&quot;length&quot;:0,&quot;modificationTime&quot;:1350596040075,&quot;owner&quot;:&quot;hdfs&quot;,&quot;pathSuffix&quot;:&quot;tmp&quot;,&quot;permission&quot;:&quot;777&quot;,&quot;replication&quot;:0,&quot;type&quot;:&quot;DIRECTORY&quot;},
-{&quot;accessTime&quot;:0,&quot;blockSize&quot;:0,&quot;group&quot;:&quot;hdfs&quot;,&quot;length&quot;:0,&quot;modificationTime&quot;:1350595857178,&quot;owner&quot;:&quot;hdfs&quot;,&quot;pathSuffix&quot;:&quot;user&quot;,&quot;permission&quot;:&quot;755&quot;,&quot;replication&quot;:0,&quot;type&quot;:&quot;DIRECTORY&quot;}
-]}}
-</code></pre><p>For additional information on WebHDFS, Templeton/WebHCat and Oozie REST APIs, see the following URLs respectively:</p>
-<ul>
-  <li>WebHDFS - [http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/WebHDFS.html]</li>
-  <li>Templeton/WebHCat - [http://people.apache.org/~thejas/templeton_doc_v1/]</li>
-  <li>Oozie - [http://oozie.apache.org/docs/3.3.1/WebServicesAPI.html]</li>
-</ul><h3>Examples</h3><p>More examples can be found [here|Examples].</p><h3>. Persisting the Master Secret</h3><p>The master secret is required to start the server. This secret is used to access secured artifacts by the gateway instance. Keystore, trust stores and credential stores are all protected with the master secret.</p><p>You may persist the master secret by supplying the <em>-persist-master</em> switch at startup. This will result in a warning indicating that persisting the secret is less secure than providing it at startup. We do make some provisions in order to protect the persisted password.</p><p>It is encrypted with AES 128 bit encryption and where possible the file permissions are set to only be accessible by the user that the gateway is running as.</p><p>After persisting the secret, ensure that the file at config/security/master has the appropriate permissions set for your environment. This is probably the most important layer of defense for master secret. Do not assu
 me that the encryption if sufficient protection.</p><p>A specific user should be created to run the gateway this will protect a persisted master file.</p><h3>Mapping Gateway URLs to Hadoop cluster URLs</h3><p>The Gateway functions much like a reverse proxy. As such it maintains a mapping of URLs that are exposed externally by the Gateway to URLs that are provided by the Hadoop cluster. Examples of mappings for the NameNode and Templeton are shown below. These mapping are generated from the combination of the Gateway configuration file (i.e. <a id="\{GATEWAY_HOME\}/conf/gateway-site.xml"></a>{GATEWAY_HOME}/conf/gateway-site.xml) and the cluster topology descriptors (e.g. <a id="\{GATEWAY_HOME\}/deployments/\{cluster-name\}.xml"></a>{GATEWAY_HOME}/deployments/{cluster-name}.xml).</p>
-<ul>
-  <li>HDFS (NameNode)
-  <ul>
-    <li>Gateway: {nolink:http://{gateway-host}:{gateway-port}/{gateway-path}/{cluster-name}/namenode/api/v1}</li>
-    <li>Cluster: {nolink:http://{namenode-host}:50070/webhdfs/v1}</li>
-  </ul></li>
-  <li>WebHCat (Templeton)
-  <ul>
-    <li>Gateway: {nolink:http://{gateway-host}:{gateway-port}/{gateway-path}/{cluster-name}/templeton/api/v1}</li>
-    <li>Cluster: {nolink:http://{templeton-host}:50111/templeton/v1}</li>
-  </ul></li>
-  <li>Oozie
-  <ul>
-    <li>Gateway: {nolink:http://{gateway-host}:{gateway-port}/{gateway-path}/{cluster-name}/oozie/api/v1}</li>
-    <li>Cluster: {nolink:http://{templeton-host}:11000/oozie/v1}</li>
-  </ul></li>
-</ul><p>The values for <a id="\{gateway-host\"></a>{gateway-host}, <a id="\{gateway-port\"></a>{gateway-port}, <a id="\{gateway-path\"></a>{gateway-path} are provided via the Gateway configuration file (i.e. <code>{GATEWAY_HOME\}/conf/gateway-site.xml</code>).</p><p>The value for <a id="\{cluster-name\"></a>{cluster-name} is derived from the name of the cluster topology descriptor (e.g. <a id="\{GATEWAY_HOME\}/deployments/\{cluster-name\}.xml"></a>{GATEWAY_HOME}/deployments/{cluster-name}.xml).</p><p>The value for <a id="\{namenode-host\"></a>{namenode-host} and <a id="\{templeton-host\"></a>{templeton-host} is provided via the cluster topology descriptor (e.g. <a id="\{GATEWAY_HOME\}/deployments/\{cluster-name\}.xml"></a>{GATEWAY_HOME}/deployments/{cluster-name}.xml).</p><p>Note: The ports 50070, 50111 and 11000 are the defaults for NameNode, Templeton and Oozie respectively. Their values can also be provided via the cluster topology descriptor if your Hadoop cluster uses different
  ports.</p><h2><a id="Supported+Services"></a>Supported Services</h2><p>This table enumerates the versions of various Hadoop services that have been tested to work with the Knox Gateway. Only more recent versions of some Hadoop components when secured via Kerberos can be accessed via the Knox Gateway.</p>
+</code></pre><p>This will create a directory <code>knox-{VERSION}</code> in your current directory.</p><h4><a id="RPM"></a>RPM</h4><p>TODO</p><h4><a id="Layout"></a>Layout</h4><p>TODO - Describe the purpose of all of the directories</p><h3><a id="Supported+Services"></a>Supported Services</h3><p>This table enumerates the versions of various Hadoop services that have been tested to work with the Knox Gateway. Only more recent versions of some Hadoop components when secured via Kerberos can be accessed via the Knox Gateway.</p>
 <table>
   <thead>
     <tr>
@@ -161,50 +128,70 @@ Server: Jetty(6.1.26)
       <td><img src="question.png"  alt="?"/> </td>
     </tr>
   </tbody>
-</table><p>ProxyUser feature of WebHDFS, WebHCat and Oozie required for secure cluster support seem to work fine. Knox code seems to be broken for support of secure cluster at this time for WebHDFS, WebHCat and Oozie.</p><h2><a id="Sandbox+Configuration"></a>Sandbox Configuration</h2><p>This version of the Apache Knox Gateway is tested against [Hortonworks Sandbox 1.2|http://hortonworks.com/products/hortonworks-sandbox/]</p><p>Currently there is an issue with Sandbox that prevents it from being easily used with the gateway. In order to correct the issue, you can use the commands below to login to the Sandbox VM and modify the configuration. This assumes that the name sandbox is setup to resolve to the Sandbox VM. It may be necessary to use the IP address of the Sandbox VM instead. <em>This is frequently but not always</em> <a id="{*}192.168.56.101{*"></a>{*}192.168.56.101{*}*.*</p>
-<pre><code>ssh root@sandbox
-cp /usr/lib/hadoop/conf/hdfs-site.xml /usr/lib/hadoop/conf/hdfs-site.xml.orig
-sed -e s/localhost/sandbox/ /usr/lib/hadoop/conf/hdfs-site.xml.orig &gt; /usr/lib/hadoop/conf/hdfs-site.xml
-shutdown -r now
-</code></pre><p>In addition to make it very easy to follow along with the samples for the gateway you can configure your local system to resolve the address of the Sandbox by the names <a id="vm"></a>vm and <a id="sandbox"></a>sandbox. The IP address that is shown below should be that of the Sandbox VM as it is known on your system. This will likely, but not always, be <a id="192.168.56.101"></a>192.168.56.101.</p><p>On Linux or Macintosh systems add a line like this to the end of the file&nbsp;<a id="/etc/hosts"></a>/etc/hosts&nbsp;on your local machine, <em>not the Sandbox VM</em>. <em>Note: The character between the <a id="{_}192.168.56.101{_"></a>{</em>}192.168.56.101{_} and <a id="{_}vm{_"></a>{_}vm{_} below is a <em>{_}tab{_}</em> character._</p>
-<pre><code>192.168.56.101  vm sandbox
-</code></pre><p>On Windows systems a similar but different mechanism can be used. On recent versions of windows the file that should be modified is <a id="%systemroot%\system32\drivers\etc\hosts"></a>%systemroot%\system32\drivers\etc\hosts</p><h2><a id="Usage+Examples"></a>Usage Examples</h2><p>These examples provide more detail about how to access various Apache Hadoop services via the Apache Knox Gateway.</p>
+</table><p>ProxyUser feature of WebHDFS, WebHCat and Oozie required for secure cluster support seem to work fine. Knox code seems to be broken for support of secure cluster at this time for WebHDFS, WebHCat and Oozie.</p><h3><a id="Basic+Usage"></a>Basic Usage</h3><h4><a id="Starting+Servers"></a>Starting Servers</h4><h5><a id="1.+Enter+the+`{GATEWAY_HOME}`+directory"></a>1. Enter the <code>{GATEWAY_HOME}</code> directory</h5>
+<pre><code>cd knox-{VERSION}
+</code></pre><p>The fully qualified name of this directory will be referenced as `{GATEWAY_HOME}}} throughout the remainder of this document.</p><h5><a id="2.+Start+the+demo+LDAP+server+(ApacheDS)"></a>2. Start the demo LDAP server (ApacheDS)</h5><p>First, understand that the LDAP server provided here is for demonstration purposes. You may configure the LDAP specifics within the topology descriptor for the cluster as described in step 5 below, in order to customize what LDAP instance to use. The assumption is that most users will leverage the demo LDAP server while evaluating this release and should therefore continue with the instructions here in step 3.</p><p>Edit <code>{GATEWAY_HOME}/conf/users.ldif</code> if required and add your users and groups to the file. A sample end user &ldquo;bob&rdquo; has been already included. Note that the passwords in this file are &ldquo;fictitious&rdquo; and have nothing to do with the actual accounts on the Hadoop cluster you are using. There is 
 also a copy of this file in the templates directory that you can use to start over if necessary.</p><p>Start the LDAP server - pointing it to the config dir where it will find the users.ldif file in the conf directory.</p>
+<pre><code>java -jar bin/ldap.jar conf &amp;
+</code></pre><p>There are a number of log messages of the form {{Created null.` that can safely be ignored. Take note of the port on which it was started as this needs to match later configuration.</p><h5><a id="3.+Start+the+gateway+server"></a>3. Start the gateway server</h5>
+<pre><code>java -jar bin/server.jar
+</code></pre><p>Take note of the port identified in the logging output as you will need this for accessing the gateway.</p><p>The server will prompt you for the master secret (password). This secret is used to secure artifacts used to secure artifacts used by the gateway server for things like SSL, credential/password aliasing. This secret will have to be entered at startup unless you choose to persist it. Remember this secret and keep it safe. It represents the keys to the kingdom. See the Persisting the Master section for more information.</p><h5><a id="4.+Configure+the+Gateway+with+the+topology+of+your+Hadoop+cluster"></a>4. Configure the Gateway with the topology of your Hadoop cluster</h5><p>Edit the file <code>{GATEWAY_HOME}/deployments/sandbox.xml</code></p><p>Change the host and port in the urls of the <code>&lt;service&gt;</code> elements for WEBHDFS, WEBHCAT, OOZIE, WEBHBASE and HIVE services to match your Hadoop cluster deployment.</p><p>The default configuration contains
  the LDAP URL for a LDAP server. By default that file is configured to access the demo ApacheDS based LDAP server and its default configuration. By default, this server listens on port 33389. Optionally, you can change the LDAP URL for the LDAP server to be used for authentication. This is set via the main.ldapRealm.contextFactory.url property in the <code>&lt;gateway&gt;&lt;provider&gt;&lt;authentication&gt;</code> section.</p><p>Save the file. The directory <code>{GATEWAY_HOME}/deployments</code> is monitored by the gateway server. When a new or changed cluster topology descriptor is detected, it will provision the endpoints for the services described in the topology descriptor. Note that the name of the file excluding the extension is also used as the path for that cluster in the URL. For example the <code>sandbox.xml</code> file will result in gateway URLs of the form <code>http://{gateway-host}:{gateway-port}/gateway/sandbox/webhdfs</code>.</p><h5><a id="5.+Test+the+installatio
 n+and+configuration+of+your+Gateway"></a>5. Test the installation and configuration of your Gateway</h5><p>Invoke the LISTSATUS operation on HDFS represented by your configured NAMENODE by using your web browser or curl:</p>
+<pre><code>curl -i -k -u bob:bob-password -X GET \
+    &#39;https://localhost:8443/gateway/sandbox/webhdfs/v1/?op=LISTSTATUS&#39;
+</code></pre><p>The results of the above command should result in something to along the lines of the output below. The exact information returned is subject to the content within HDFS in your Hadoop cluster.</p>
+<pre><code>HTTP/1.1 200 OK
+Content-Type: application/json
+Content-Length: 760
+Server: Jetty(6.1.26)
+
+{&quot;FileStatuses&quot;:{&quot;FileStatus&quot;:[
+{&quot;accessTime&quot;:0,&quot;blockSize&quot;:0,&quot;group&quot;:&quot;hdfs&quot;,&quot;length&quot;:0,&quot;modificationTime&quot;:1350595859762,&quot;owner&quot;:&quot;hdfs&quot;,&quot;pathSuffix&quot;:&quot;apps&quot;,&quot;permission&quot;:&quot;755&quot;,&quot;replication&quot;:0,&quot;type&quot;:&quot;DIRECTORY&quot;},
+{&quot;accessTime&quot;:0,&quot;blockSize&quot;:0,&quot;group&quot;:&quot;mapred&quot;,&quot;length&quot;:0,&quot;modificationTime&quot;:1350595874024,&quot;owner&quot;:&quot;mapred&quot;,&quot;pathSuffix&quot;:&quot;mapred&quot;,&quot;permission&quot;:&quot;755&quot;,&quot;replication&quot;:0,&quot;type&quot;:&quot;DIRECTORY&quot;},
+{&quot;accessTime&quot;:0,&quot;blockSize&quot;:0,&quot;group&quot;:&quot;hdfs&quot;,&quot;length&quot;:0,&quot;modificationTime&quot;:1350596040075,&quot;owner&quot;:&quot;hdfs&quot;,&quot;pathSuffix&quot;:&quot;tmp&quot;,&quot;permission&quot;:&quot;777&quot;,&quot;replication&quot;:0,&quot;type&quot;:&quot;DIRECTORY&quot;},
+{&quot;accessTime&quot;:0,&quot;blockSize&quot;:0,&quot;group&quot;:&quot;hdfs&quot;,&quot;length&quot;:0,&quot;modificationTime&quot;:1350595857178,&quot;owner&quot;:&quot;hdfs&quot;,&quot;pathSuffix&quot;:&quot;user&quot;,&quot;permission&quot;:&quot;755&quot;,&quot;replication&quot;:0,&quot;type&quot;:&quot;DIRECTORY&quot;}
+]}}
+</code></pre><p>For additional information on WebHDFS, Templeton/WebHCat and Oozie REST APIs, see the following URLs respectively:</p>
+<ul>
+  <li>WebHDFS - <a href="http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/WebHDFS.html">http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/WebHDFS.html</a></li>
+  <li>WebHCat (Templeton) - <a href="http://people.apache.org/~thejas/templeton_doc_v1">http://people.apache.org/~thejas/templeton_doc_v1</a></li>
+  <li>Oozie - <a href="http://oozie.apache.org/docs/3.3.1/WebServicesAPI.html">http://oozie.apache.org/docs/3.3.1/WebServicesAPI.html</a></li>
+  <li>Stargate (HBase) - <a href="http://wiki.apache.org/hadoop/Hbase/Stargate">http://wiki.apache.org/hadoop/Hbase/Stargate</a></li>
+</ul><h3><a id="More+Examples"></a>More Examples</h3><p>These examples provide more detail about how to access various Apache Hadoop services via the Apache Knox Gateway.</p>
 <ul>
   <li><a href="#WebHDFS+Examples">WebHDFS</a></li>
   <li><a href="#WebHCat+Examples">WebHCat/Templeton</a></li>
   <li><a href="#Oozie+Examples">Oozie</a></li>
   <li><a href="#HBase+Examples">HBase</a></li>
   <li><a href="#Hive+Examples">Hive</a></li>
-</ul><h2><a id="Configuration"></a>Configuration</h2><h3>Host Mapping</h3><p>TODO</p><p>That really depends upon how you have your VM configured. If you can hit <a href="http://c6401.ambari.apache.org:1022/">http://c6401.ambari.apache.org:1022/</a> directly from your client and knox host then you probably don&rsquo;t need the hostmap at all. The host map only exists for situations where a host in the hadoop cluster is known by one name externally and another internally. For example running hostname -q on sandbox returns sandbox.hortonworks.com but externally Sandbox is setup to be accesses using localhost via portmapping. The way the hostmap config works is that the <name/> element is what the hadoop cluster host is known as externally and the <value/> is how the hadoop cluster host identifies itself internally. <param><name>localhost</name><value>c6401,c6401.ambari.apache.org</value></param> You SHOULD be able to simply change <enabled>true</enabled> to false but I have a suspicion
  that that might not actually work. Please try it and file a jira if that doesn&rsquo;t work. If so, simply either remove the full provider config for hostmap or remove the <param/> that defines the mapping.</p><h3>Logging</h3><p>If necessary you can enable additional logging by editing the <code>log4j.properties</code> file in the <code>conf</code> directory. Changing the rootLogger value from <code>ERROR</code> to <code>DEBUG</code> will generate a large amount of debug logging. A number of useful, more fine loggers are also provided in the file.</p><h3>Java VM Options</h3><p>TODO</p><h3>Management of Security Artifacts</h3><p>There are a number of artifacts that are used by the gateway in ensuring the security of wire level communications, access to protected resources and the encryption of sensitive data. These artifacts can be managed from outside of the gateway instances or generated and populated by the gateway instance itself.</p><p>The following is a description of how this
  is coordinated with both standalone (development, demo, etc) gateway instances and instances as part of a cluster of gateways in mind.</p><p>Upon start of the gateway server we:</p>
-<ol>
-  <li>Look for an identity store at <code>conf/security/keystores/gateway.jks</code>.  The identity store contains the certificate and private key used to represent the identity of the server for SSL connections and signature creation.
+</ul><h2>{{Sandbox Configuration}}</h2><p>This version of the Apache Knox Gateway is tested against [Hortonworks Sandbox 2.x|sandbox]</p><p>Currently there is an issue with Sandbox that prevents it from being easily used with the gateway. In order to correct the issue, you can use the commands below to login to the Sandbox VM and modify the configuration. This assumes that the name sandbox is setup to resolve to the Sandbox VM. It may be necessary to use the IP address of the Sandbox VM instead. <em>This is frequently but not always <code>192.168.56.101</code>.</em></p>
+<pre><code>ssh root@sandbox
+cp /usr/lib/hadoop/conf/hdfs-site.xml /usr/lib/hadoop/conf/hdfs-site.xml.orig
+sed -e s/localhost/sandbox/ /usr/lib/hadoop/conf/hdfs-site.xml.orig &gt; /usr/lib/hadoop/conf/hdfs-site.xml
+shutdown -r now
+</code></pre><p>In addition to make it very easy to follow along with the samples for the gateway you can configure your local system to resolve the address of the Sandbox by the names <code>vm</code> and <code>sandbox</code>. The IP address that is shown below should be that of the Sandbox VM as it is known on your system. <em>This will likely, but not always, be <code>192.168.56.101</code>.</em></p><p>On Linux or Macintosh systems add a line like this to the end of the file <code>/etc/hosts</code> on your local machine, <em>not the Sandbox VM</em>. <em>Note: The character between the 192.168.56.101 and vm below is a <em>tab</em> character.</em></p>
+<pre><code>192.168.56.101  vm sandbox
+</code></pre><p>On Windows systems a similar but different mechanism can be used. On recent versions of windows the file that should be modified is <code>%systemroot%\system32\drivers\etc\hosts</code></p><h2>{{Gateway Details}}</h2><p>TODO</p><h3><a id="Mapping+Gateway+URLs+to+Hadoop+cluster+URLs"></a>Mapping Gateway URLs to Hadoop cluster URLs</h3><p>The Gateway functions much like a reverse proxy. As such it maintains a mapping of URLs that are exposed externally by the gateway to URLs that are provided by the Hadoop cluster. Examples of mappings for the WebHDFS, WebHCat, Oozie and Stargate/Hive are shown below. These mapping are generated from the combination of the gateway configuration file (i.e. <code>{GATEWAY_HOME}/conf/gateway-site.xml</code>) and the cluster topology descriptors (e.g. <code>{GATEWAY_HOME}/deployments/{cluster-name}.xml</code>).</p>
+<ul>
+  <li>WebHDFS
   <ul>
-    <li>If there is no identity store we create one and generate a self-signed certificate for use in standalone/demo mode.  The certificate is stored with an alias of gateway-identity.</li>
-    <li>If there is an identity store found than we ensure that it can be loaded using the provided master secret and that there is an alias with called gateway-identity.</li>
+    <li>Gateway: <code>https://{gateway-host}:{gateway-port}/{gateway-path}/{cluster-name}/webhdfs</code></li>
+    <li>Cluster: <code>http://{webhdfs-host}:50070/webhdfs</code></li>
   </ul></li>
-  <li>Look for a credential store at <code>conf/security/keystores/__gateway-credentials.jceks</code>.  This credential store is used to store secrets/passwords that are used by the gateway.  For instance, this is where the pass-phrase for accessing the gateway-identity certificate is kept.
+  <li>WebHCat (Templeton)
   <ul>
-    <li>If there is no credential store found then we create one and populate it with a generated pass-phrase for the alias <code>gateway-identity-passphrase</code>.  This is coordinated with the population of the self-signed cert into the identity-store.</li>
-    <li>If a credential store is found then we ensure that it can be loaded using the provided master secret and that the expected aliases have been populated with secrets.</li>
+    <li>Gateway: <code>https://{gateway-host}:{gateway-port}/{gateway-path}/{cluster-name}/templeton</code></li>
+    <li>Cluster: <code>http://{webhcat-host}:50111/templeton}</code></li>
   </ul></li>
-</ol><p>Upon deployment of a Hadoop cluster topology within the gateway we:</p>
-<ol>
-  <li>Look for a credential store for the topology. For instance, we have a sample topology that gets deployed out of the box. We look for <code>conf/security/keystores/sample-credentials.jceks</code>. This topology specific credential store is used for storing secrets/passwords that are used for encrypting sensitive data with topology specific keys.
+  <li>Oozie
   <ul>
-    <li>If no credential store is found for the topology being deployed then one is created for it.  Population of the aliases is delegated to the configured providers within the system that will require the use of a secret for a particular task.  They may programmatic set the value of the secret or choose to have the value for the specified alias generated through the AliasService.</li>
-    <li>If a credential store is found then we ensure that it can be loaded with the provided master secret and the configured providers have the opportunity to ensure that the aliases are populated and if not to populate them.</li>
+    <li>Gateway: <code>https://{gateway-host}:{gateway-port}/{gateway-path}/{cluster-name}/oozie</code></li>
+    <li>Cluster: <code>http://{oozie-host}:11000/oozie}</code></li>
   </ul></li>
-</ol><p>By leveraging the algorithm described above we can provide a window of opportunity for management of these artifacts in a number of ways.</p>
-<ol>
-  <li>Using a single gateway instance as a master instance the artifacts can be generated or placed into the expected location and then replicated across all of the slave instances before startup.</li>
-  <li>Using an NFS mount as a central location for the artifacts would provide a single source of truth without the need to replicate them over the network. Of course, NFS mounts have their own challenges.</li>
-</ol><p>Summary of Secrets to be Managed:</p>
-<ol>
-  <li>Master secret - the same for all gateway instances in a cluster of gateways</li>
-  <li>All security related artifacts are protected with the master secret</li>
-  <li>Secrets used by the gateway itself are stored within the gateway credential store and are the same across all gateway instances in the cluster of gateways</li>
-  <li>Secrets used by providers within cluster topologies are stored in topology specific credential stores and are the same for the same topology across the cluster of gateway instances.  However, they are specific to the topology - so secrets for one hadoop cluster are different from those of another.  This allows for fail-over from one gateway instance to another even when encryption is being used while not allowing the compromise of one encryption key to expose the data for all clusters.</li>
-</ol><p>NOTE: the SSL certificate will need special consideration depending on the type of certificate. Wildcard certs may be able to be shared across all gateway instances in a cluster. When certs are dedicated to specific machines the gateway identity store will not be able to be blindly replicated as hostname verification problems will ensue. Obviously, trust-stores will need to be taken into account as well.</p><h2><a id="Gateway+Details"></a>Gateway Details</h2><p>TODO</p><h2><a id="Configuration"></a>Configuration</h2><h3>Host Mapping</h3><p>TODO</p><p>That really depends upon how you have your VM configured. If you can hit <a href="http://c6401.ambari.apache.org:1022/">http://c6401.ambari.apache.org:1022/</a> directly from your client and knox host then you probably don&rsquo;t need the hostmap at all. The host map only exists for situations where a host in the hadoop cluster is known by one name externally and another internally. For example running hostname -q on sandbox re
 turns sandbox.hortonworks.com but externally Sandbox is setup to be accesses using localhost via portmapping. The way the hostmap config works is that the <name/> element is what the hadoop cluster host is known as externally and the <value/> is how the hadoop cluster host identifies itself internally. <param><name>localhost</name><value>c6401,c6401.ambari.apache.org</value></param> You SHOULD be able to simply change <enabled>true</enabled> to false but I have a suspicion that that might not actually work. Please try it and file a jira if that doesn&rsquo;t work. If so, simply either remove the full provider config for hostmap or remove the <param/> that defines the mapping.</p><h3>Logging</h3><p>If necessary you can enable additional logging by editing the <code>log4j.properties</code> file in the <code>conf</code> directory. Changing the rootLogger value from <code>ERROR</code> to <code>DEBUG</code> will generate a large amount of debug logging. A number of useful, more fine logg
 ers are also provided in the file.</p><h3>Java VM Options</h3><p>TODO</p><h3>Management of Security Artifacts</h3><p>There are a number of artifacts that are used by the gateway in ensuring the security of wire level communications, access to protected resources and the encryption of sensitive data. These artifacts can be managed from outside of the gateway instances or generated and populated by the gateway instance itself.</p><p>The following is a description of how this is coordinated with both standalone (development, demo, etc) gateway instances and instances as part of a cluster of gateways in mind.</p><p>Upon start of the gateway server we:</p>
+  <li>Stargate (HBase)
+  <ul>
+    <li>Gateway: <code>https://{gateway-host}:{gateway-port}/{gateway-path}/{cluster-name}/hbase</code></li>
+    <li>Cluster: <code>http://{hbase-host}:60080</code></li>
+  </ul></li>
+</ul><p>The values for <code>{gateway-host}</code>, <code>{gateway-port}</code>, <code>{gateway-path}</code> are provided via the Gateway configuration file (i.e. <code>{GATEWAY_HOME}/conf/gateway-site.xml</code>).</p><p>The value for <code>{cluster-name}</code> is derived from the name of the cluster topology descriptor (e.g. <code>{GATEWAY_HOME}/deployments/{cluster-name}.xml</code>).</p><p>The value for <code>{webhdfs-host}</code> and <code>{webhcat-host}</code> are provided via the cluster topology descriptor (e.g. <code>{GATEWAY_HOME}/deployments/{cluster-name}.xml</code>).</p><p>Note: The ports 50070, 50111, 11000 and 60080 are the defaults for WebHDFS, WebHCat, Oozie and Stargate/HBase respectively. Their values can also be provided via the cluster topology descriptor if your Hadoop cluster uses different ports.</p><h2>{{Configuration}}</h2><h3><a id="Host+Mapping"></a>Host Mapping</h3><p>TODO</p><p>That really depends upon how you have your VM configured. If you can hit <a h
 ref="http://c6401.ambari.apache.org:1022/">http://c6401.ambari.apache.org:1022/</a> directly from your client and knox host then you probably don&rsquo;t need the hostmap at all. The host map only exists for situations where a host in the hadoop cluster is known by one name externally and another internally. For example running hostname -q on sandbox returns sandbox.hortonworks.com but externally Sandbox is setup to be accesses using localhost via portmapping. The way the hostmap config works is that the <name/> element is what the hadoop cluster host is known as externally and the <value/> is how the hadoop cluster host identifies itself internally. <param><name>localhost</name><value>c6401,c6401.ambari.apache.org</value></param> You SHOULD be able to simply change <enabled>true</enabled> to false but I have a suspicion that that might not actually work. Please try it and file a jira if that doesn&rsquo;t work. If so, simply either remove the full provider config for hostmap or rem
 ove the <param/> that defines the mapping.</p><h3><a id="Logging"></a>Logging</h3><p>If necessary you can enable additional logging by editing the <code>log4j.properties</code> file in the <code>conf</code> directory. Changing the rootLogger value from <code>ERROR</code> to <code>DEBUG</code> will generate a large amount of debug logging. A number of useful, more fine loggers are also provided in the file.</p><h3><a id="Java+VM+Options"></a>Java VM Options</h3><p>TODO</p><h3><a id="Persisting+the+Master+Secret"></a>Persisting the Master Secret</h3><p>The master secret is required to start the server. This secret is used to access secured artifacts by the gateway instance. Keystore, trust stores and credential stores are all protected with the master secret.</p><p>You may persist the master secret by supplying the <em>-persist-master</em> switch at startup. This will result in a warning indicating that persisting the secret is less secure than providing it at startup. We do make some
  provisions in order to protect the persisted password.</p><p>It is encrypted with AES 128 bit encryption and where possible the file permissions are set to only be accessible by the user that the gateway is running as.</p><p>After persisting the secret, ensure that the file at config/security/master has the appropriate permissions set for your environment. This is probably the most important layer of defense for master secret. Do not assume that the encryption if sufficient protection.</p><p>A specific user should be created to run the gateway this will protect a persisted master file.</p><h3><a id="Management+of+Security+Artifacts"></a>Management of Security Artifacts</h3><p>There are a number of artifacts that are used by the gateway in ensuring the security of wire level communications, access to protected resources and the encryption of sensitive data. These artifacts can be managed from outside of the gateway instances or generated and populated by the gateway instance itself.
 </p><p>The following is a description of how this is coordinated with both standalone (development, demo, etc) gateway instances and instances as part of a cluster of gateways in mind.</p><p>Upon start of the gateway server we:</p>
 <ol>
   <li>Look for an identity store at <code>conf/security/keystores/gateway.jks</code>.  The identity store contains the certificate and private key used to represent the identity of the server for SSL connections and signature creation.
   <ul>
@@ -233,22 +220,22 @@ shutdown -r now
   <li>All security related artifacts are protected with the master secret</li>
   <li>Secrets used by the gateway itself are stored within the gateway credential store and are the same across all gateway instances in the cluster of gateways</li>
   <li>Secrets used by providers within cluster topologies are stored in topology specific credential stores and are the same for the same topology across the cluster of gateway instances.  However, they are specific to the topology - so secrets for one hadoop cluster are different from those of another.  This allows for fail-over from one gateway instance to another even when encryption is being used while not allowing the compromise of one encryption key to expose the data for all clusters.</li>
-</ol><p>NOTE: the SSL certificate will need special consideration depending on the type of certificate. Wildcard certs may be able to be shared across all gateway instances in a cluster. When certs are dedicated to specific machines the gateway identity store will not be able to be blindly replicated as hostname verification problems will ensue. Obviously, trust-stores will need to be taken into account as well.</p><h3><a id="Authentication"></a>Authentication</h3><h4>LDAP Configuration</h4><h4>Session Configuration</h4><h3><a id="Authorization"></a>Authorization</h3><h4>Service Level Authorization</h4><p>The Knox Gateway has an out-of-the-box authorization provider that allows administrators to restrict access to the individual services within a Hadoop cluster.</p><p>This provider utilizes a simple and familiar pattern of using ACLs to protect Hadoop resources by specifying users, groups and ip addresses that are permitted access.</p><p>Note: In the examples below {serviceName} rep
 resents a real service name (e.g. WEBHDFS) and would be replaced with these values in an actual configuration.</p><h5>Usecases</h5><h6>USECASE-1: Restrict access to specific Hadoop services to specific Users</h6>
+</ol><p>NOTE: the SSL certificate will need special consideration depending on the type of certificate. Wildcard certs may be able to be shared across all gateway instances in a cluster. When certs are dedicated to specific machines the gateway identity store will not be able to be blindly replicated as hostname verification problems will ensue. Obviously, trust-stores will need to be taken into account as well.</p><h3><a id="Authentication"></a>Authentication</h3><h4><a id="LDAP+Configuration"></a>LDAP Configuration</h4><h4><a id="Session+Configuration"></a>Session Configuration</h4><h3><a id="Authorization"></a>Authorization</h3><h4><a id="Service+Level+Authorization"></a>Service Level Authorization</h4><p>The Knox Gateway has an out-of-the-box authorization provider that allows administrators to restrict access to the individual services within a Hadoop cluster.</p><p>This provider utilizes a simple and familiar pattern of using ACLs to protect Hadoop resources by specifying user
 s, groups and ip addresses that are permitted access.</p><p>Note: In the examples below {serviceName} represents a real service name (e.g. WEBHDFS) and would be replaced with these values in an actual configuration.</p><h5><a id="Usecases"></a>Usecases</h5><h6><a id="USECASE-1:+Restrict+access+to+specific+Hadoop+services+to+specific+Users"></a>USECASE-1: Restrict access to specific Hadoop services to specific Users</h6>
 <pre><code>&lt;param&gt;
     &lt;name&gt;{serviceName}.acl&lt;/name&gt;
     &lt;value&gt;bob;*;*&lt;/value&gt;
 &lt;/param&gt;
-</code></pre><h6>USECASE-2: Restrict access to specific Hadoop services to specific Groups</h6>
+</code></pre><h6><a id="USECASE-2:+Restrict+access+to+specific+Hadoop+services+to+specific+Groups"></a>USECASE-2: Restrict access to specific Hadoop services to specific Groups</h6>
 <pre><code>&lt;param&gt;
     &lt;name&gt;{serviceName}.acls&lt;/name&gt;
     &lt;value&gt;*;admins;*&lt;/value&gt;
 &lt;/param&gt;
-</code></pre><h6>USECASE-3: Restrict access to specific Hadoop services to specific Remote IPs</h6>
+</code></pre><h6><a id="USECASE-3:+Restrict+access+to+specific+Hadoop+services+to+specific+Remote+IPs"></a>USECASE-3: Restrict access to specific Hadoop services to specific Remote IPs</h6>
 <pre><code>&lt;param&gt;
     &lt;name&gt;{serviceName}.acl&lt;/name&gt;
     &lt;value&gt;*;*;127.0.0.1&lt;/value&gt;
 &lt;/param&gt;
-</code></pre><h6>USECASE-4: Restrict access to specific Hadoop services to specific Users OR users within specific Groups</h6>
+</code></pre><h6><a id="USECASE-4:+Restrict+access+to+specific+Hadoop+services+to+specific+Users+OR+users+within+specific+Groups"></a>USECASE-4: Restrict access to specific Hadoop services to specific Users OR users within specific Groups</h6>
 <pre><code>&lt;param&gt;
     &lt;name&gt;{serviceName}.acl.mode&lt;/name&gt;
     &lt;value&gt;OR&lt;/value&gt;
@@ -257,7 +244,7 @@ shutdown -r now
     &lt;name&gt;{serviceName}.acl&lt;/name&gt;
     &lt;value&gt;bob;admin;*&lt;/value&gt;
 &lt;/param&gt;
-</code></pre><h6>USECASE-5: Restrict access to specific Hadoop services to specific Users OR users from specific Remote IPs</h6>
+</code></pre><h6><a id="USECASE-5:+Restrict+access+to+specific+Hadoop+services+to+specific+Users+OR+users+from+specific+Remote+IPs"></a>USECASE-5: Restrict access to specific Hadoop services to specific Users OR users from specific Remote IPs</h6>
 <pre><code>&lt;param&gt;
     &lt;name&gt;{serviceName}.acl.mode&lt;/name&gt;
     &lt;value&gt;OR&lt;/value&gt;
@@ -266,7 +253,7 @@ shutdown -r now
     &lt;name&gt;{serviceName}.acl&lt;/name&gt;
     &lt;value&gt;bob;*;127.0.0.1&lt;/value&gt;
 &lt;/param&gt;
-</code></pre><h6>USECASE-6: Restrict access to specific Hadoop services to users within specific Groups OR from specific Remote IPs</h6>
+</code></pre><h6><a id="USECASE-6:+Restrict+access+to+specific+Hadoop+services+to+users+within+specific+Groups+OR+from+specific+Remote+IPs"></a>USECASE-6: Restrict access to specific Hadoop services to users within specific Groups OR from specific Remote IPs</h6>
 <pre><code>&lt;param&gt;
     &lt;name&gt;{serviceName}.acl.mode&lt;/name&gt;
     &lt;value&gt;OR&lt;/value&gt;
@@ -275,7 +262,7 @@ shutdown -r now
     &lt;name&gt;{serviceName}.acl&lt;/name&gt;
     &lt;value&gt;*;admin;127.0.0.1&lt;/value&gt;
 &lt;/param&gt;
-</code></pre><h6>USECASE-7: Restrict access to specific Hadoop services to specific Users OR users within specific Groups OR from specific Remote IPs</h6>
+</code></pre><h6><a id="USECASE-7:+Restrict+access+to+specific+Hadoop+services+to+specific+Users+OR+users+within+specific+Groups+OR+from+specific+Remote+IPs"></a>USECASE-7: Restrict access to specific Hadoop services to specific Users OR users within specific Groups OR from specific Remote IPs</h6>
 <pre><code>&lt;param&gt;
     &lt;name&gt;{serviceName}.acl.mode&lt;/name&gt;
     &lt;value&gt;OR&lt;/value&gt;
@@ -284,27 +271,27 @@ shutdown -r now
     &lt;name&gt;{serviceName}.acl&lt;/name&gt;
     &lt;value&gt;bob;admin;127.0.0.1&lt;/value&gt;
 &lt;/param&gt;
-</code></pre><h6>USECASE-8: Restrict access to specific Hadoop services to specific Users AND users within specific Groups</h6>
+</code></pre><h6><a id="USECASE-8:+Restrict+access+to+specific+Hadoop+services+to+specific+Users+AND+users+within+specific+Groups"></a>USECASE-8: Restrict access to specific Hadoop services to specific Users AND users within specific Groups</h6>
 <pre><code>&lt;param&gt;
     &lt;name&gt;{serviceName}.acl&lt;/name&gt;
     &lt;value&gt;bob;admin;*&lt;/value&gt;
 &lt;/param&gt;
-</code></pre><h6>USECASE-9: Restrict access to specific Hadoop services to specific Users AND users from specific Remote IPs</h6>
+</code></pre><h6><a id="USECASE-9:+Restrict+access+to+specific+Hadoop+services+to+specific+Users+AND+users+from+specific+Remote+IPs"></a>USECASE-9: Restrict access to specific Hadoop services to specific Users AND users from specific Remote IPs</h6>
 <pre><code>&lt;param&gt;
     &lt;name&gt;{serviceName}.acl&lt;/name&gt;
     &lt;value&gt;bob;*;127.0.0.1&lt;/value&gt;
 &lt;/param&gt;
-</code></pre><h6>USECASE-10: Restrict access to specific Hadoop services to users within specific Groups AND from specific Remote IPs</h6>
+</code></pre><h6><a id="USECASE-10:+Restrict+access+to+specific+Hadoop+services+to+users+within+specific+Groups+AND+from+specific+Remote+IPs"></a>USECASE-10: Restrict access to specific Hadoop services to users within specific Groups AND from specific Remote IPs</h6>
 <pre><code>&lt;param&gt;
     &lt;name&gt;{serviceName}.acl&lt;/name&gt;
     &lt;value&gt;*;admins;127.0.0.1&lt;/value&gt;
 &lt;/param&gt;
-</code></pre><h6>USECASE-11: Restrict access to specific Hadoop services to specific Users AND users within specific Groups AND from specific Remote IPs</h6>
+</code></pre><h6><a id="USECASE-11:+Restrict+access+to+specific+Hadoop+services+to+specific+Users+AND+users+within+specific+Groups+AND+from+specific+Remote+IPs"></a>USECASE-11: Restrict access to specific Hadoop services to specific Users AND users within specific Groups AND from specific Remote IPs</h6>
 <pre><code>&lt;param&gt;
     &lt;name&gt;{serviceName}.acl&lt;/name&gt;
     &lt;value&gt;bob;admins;127.0.0.1&lt;/value&gt;
 &lt;/param&gt;
-</code></pre><h4>Configuration</h4><p>ACLs are bound to services within the topology descriptors by introducing the authorization provider with configuration like:</p>
+</code></pre><h4><a id="Configuration"></a>Configuration</h4><p>ACLs are bound to services within the topology descriptors by introducing the authorization provider with configuration like:</p>
 <pre><code>&lt;provider&gt;
     &lt;role&gt;authorization&lt;/role&gt;
     &lt;name&gt;AclsAuthz&lt;/name&gt;
@@ -350,7 +337,7 @@ shutdown -r now
   <li>the user is &ldquo;hdfs&rdquo; or &ldquo;bob&rdquo; OR</li>
   <li>the user is in &ldquo;admin&rdquo; group OR</li>
   <li>the request is coming from 127.0.0.2 or 127.0.0.3</li>
-</ol><h4>Other Related Configuration</h4><p>The principal mapping aspect of the identity assertion provider is important to understand in order to fully utilize the authorization features of this provider.</p><p>This feature allows us to map the authenticated principal to a runas or impersonated principal to be asserted to the Hadoop services in the backend. When a principal mapping is defined that results in an impersonated principal being created the impersonated principal is then the effective principal. If there is no mapping to another principal then the authenticated or primary principal is then the effective principal. Principal mapping has actually been available in the identity assertion provider from the beginning of Knox. Although hasn’t been adequately documented as of yet.</p>
+</ol><h4><a id="Other+Related+Configuration"></a>Other Related Configuration</h4><p>The principal mapping aspect of the identity assertion provider is important to understand in order to fully utilize the authorization features of this provider.</p><p>This feature allows us to map the authenticated principal to a runas or impersonated principal to be asserted to the Hadoop services in the backend. When a principal mapping is defined that results in an impersonated principal being created the impersonated principal is then the effective principal. If there is no mapping to another principal then the authenticated or primary principal is then the effective principal. Principal mapping has actually been available in the identity assertion provider from the beginning of Knox. Although hasn’t been adequately documented as of yet.</p>
 <pre><code>&lt;param&gt;
     &lt;name&gt;principal.mapping&lt;/name&gt;
     &lt;value&gt;{primaryPrincipal}[,…]={impersonatedPrincipal}[;…]&lt;/value&gt;
@@ -484,7 +471,36 @@ shutdown -r now
         &lt;url&gt;http://localhost:10000/&lt;/url&gt;
     &lt;/service&gt;
 &lt;/topology&gt;
-</code></pre><h2><a id="Client+Details"></a>Client Details</h2><p>Hadoop requires a client that can be used to interact remotely with the services provided by Hadoop cluster. This will also be true when using the Apache Knox Gateway to provide perimeter security and centralized access for these services. The two primary existing clients for Hadoop are the CLI (i.e. Command Line Interface, hadoop) and HUE (i.e. Hadoop User Environment). For several reasons however, neither of these clients can <em>currently</em> be used to access Hadoop services via the Apache Knox Gateway.</p><p>This led to thinking about a very simple client that could help people use and evaluate the gateway. The list below outlines the general requirements for such a client.</p>
+</code></pre><h2>{{Secure Clusters}}</h2><p>If your Hadoop cluster is secured with Kerberos authentication, you have to do the following on Knox side.</p><h3><a id="Secure+the+Hadoop+Cluster"></a>Secure the Hadoop Cluster</h3><p>Please secure Hadoop services with Keberos authentication.</p><p>Please see instructions at [http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/ClusterSetup.html#Configuration_in_Secure_Mode] and [http://docs.hortonworks.com/HDPDocuments/HDP1/HDP-1.3.1/bk_installing_manually_book/content/rpm-chap14.html]</p><h3><a id="Create+Unix+account+for+Knox+on+Hadoop+master+nodes"></a>Create Unix account for Knox on Hadoop master nodes</h3>
+<pre><code>useradd \-g hadoop knox
+</code></pre><h3><a id="Create+Kerberos+principal,+keytab+for+Knox"></a>Create Kerberos principal, keytab for Knox</h3><p>One way of doing this, assuming your KDC realm is EXAMPLE.COM</p><p>ssh into your host running KDC</p>
+<pre><code>kadmin.local
+add_principal -randkey knox/knox@EXAMPLE.COM
+ktadd -norandkey -k /etc/security/keytabs/knox.service.keytab
+</code></pre><h3><a id="Grant+Proxy+privileges+for+Knox+user+in+`core-site.xml`+on+Hadoop+master+nodes"></a>Grant Proxy privileges for Knox user in <code>core-site.xml</code> on Hadoop master nodes</h3><p>Update <code>core-site.xml</code> and add the following lines towards the end of the file.</p><p>Please replace FQDN_OF_KNOX_HOST with right value in your cluster. You could use * for local developer testing if Knox host does not have static IP.</p>
+<pre><code>&lt;property&gt;
+    &lt;name&gt;hadoop.proxyuser.knox.groups&lt;/name&gt;
+    &lt;value&gt;users&lt;/value&gt;
+&lt;/property&gt;
+&lt;property&gt;
+    &lt;name&gt;hadoop.proxyuser.knox.hosts&lt;/name&gt;
+    &lt;value&gt;FQDN_OF_KNOX_HOST&lt;/value&gt;
+&lt;/property&gt;
+</code></pre><h3><a id="Grant+proxy+privilege+for+Knox+in+`oozie-stie.xml`+on+Oozie+host"></a>Grant proxy privilege for Knox in <code>oozie-stie.xml</code> on Oozie host</h3><p>Update <code>oozie-site.xml</code> and add the following lines towards the end of the file.</p><p>Please replace FQDN_OF_KNOX_HOST with right value in your cluster. You could use * for local developer testing if Knox host does not have static IP.</p>
+<pre><code>&lt;property&gt;
+   &lt;name&gt;oozie.service.ProxyUserService.proxyuser.knox.groups&lt;/name&gt;
+   &lt;value&gt;users&lt;/value&gt;
+&lt;/property&gt;
+&lt;property&gt;
+   &lt;name&gt;oozie.service.ProxyUserService.proxyuser.knox.hosts&lt;/name&gt;
+   &lt;value&gt;FQDN_OF_KNOX_HOST&lt;/value&gt;
+&lt;/property&gt;
+</code></pre><h3><a id="Copy+knox+keytab+to+Knox+host"></a>Copy knox keytab to Knox host</h3><p>Please add unix account for knox on Knox host</p>
+<pre><code>useradd -g hadoop knox
+</code></pre><p>Please copy knox.service.keytab created on KDC host on to your Knox host /etc/knox/conf/knox.service.keytab</p>
+<pre><code>chown knox knox.service.keytab
+chmod 400 knox.service.keytab
+</code></pre><h3><a id="Update+krb5.conf+at+/etc/knox/conf/krb5.conf+on+Knox+host"></a>Update krb5.conf at /etc/knox/conf/krb5.conf on Knox host</h3><p>You could copy the <code>templates/krb5.conf</code> file provided in the Knox binary download and customize it to suit your cluster.</p><h3><a id="Update+`krb5JAASLogin.conf`+at+`/etc/knox/conf/krb5JAASLogin.conf`+on+Knox+host"></a>Update <code>krb5JAASLogin.conf</code> at <code>/etc/knox/conf/krb5JAASLogin.conf</code> on Knox host</h3><p>You could copy the <code>templates/krb5JAASLogin.conf</code> file provided in the Knox binary download and customize it to suit your cluster.</p><h3><a id="Update+`gateway-site.xml`+on+Knox+host+on+Knox+host"></a>Update <code>gateway-site.xml</code> on Knox host on Knox host</h3><p>Update <code>conf/gateway-site.xml</code> in your Knox installation and set the value of <code>gateway.hadoop.kerberos.secured</code> to true.</p><h3><a id="Restart+Knox"></a>Restart Knox</h3><p>After you do the above con
 figurations and restart Knox, Knox would use SPNego to authenticate with Hadoop services and Oozie. There is not change in the way you make calls to Knox whether you use Curl or Knox DSL.</p><h2>{{Client Details}}</h2><p>Hadoop requires a client that can be used to interact remotely with the services provided by Hadoop cluster. This will also be true when using the Apache Knox Gateway to provide perimeter security and centralized access for these services. The two primary existing clients for Hadoop are the CLI (i.e. Command Line Interface, hadoop) and HUE (i.e. Hadoop User Environment). For several reasons however, neither of these clients can <em>currently</em> be used to access Hadoop services via the Apache Knox Gateway.</p><p>This led to thinking about a very simple client that could help people use and evaluate the gateway. The list below outlines the general requirements for such a client.</p>
 <ul>
   <li>Promote the evaluation and adoption of the Apache Knox Gateway</li>
   <li>Simple to deploy and use on data worker desktops to access to remote Hadoop clusters</li>
@@ -496,20 +512,20 @@ shutdown -r now
   <li>Aligned with the Apache Knox Gateway&rsquo;s overall goals for security</li>
 </ul><p>The result is a very simple DSL (<a href="http://en.wikipedia.org/wiki/Domain-specific_language">Domain Specific Language</a>) of sorts that is used via <a href="http://groovy.codehaus.org">Groovy</a> scripts. Here is an example of a command that copies a file from the local file system to HDFS.</p><p><em>Note: The variables session, localFile and remoteFile are assumed to be defined.</em></p>
 <pre><code>Hdfs.put( session ).file( localFile ).to( remoteFile ).now()
-</code></pre><p><em>This work is very early in development but is also very useful in its current state.</em> <em>We are very interested in receiving feedback about how to improve this feature and the DSL in particular.</em></p><p>A note of thanks to <a href="https://code.google.com/p/rest-assured/">REST-assured</a> which provides a <a href="http://en.wikipedia.org/wiki/Fluent_interface">Fluent interface</a> style DSL for testing REST services. It served as the initial inspiration for the creation of this DSL.</p><h3>Assumptions</h3><p>This document assumes a few things about your environment in order to simplify the examples.</p>
+</code></pre><p><em>This work is very early in development but is also very useful in its current state.</em> <em>We are very interested in receiving feedback about how to improve this feature and the DSL in particular.</em></p><p>A note of thanks to <a href="https://code.google.com/p/rest-assured/">REST-assured</a> which provides a <a href="http://en.wikipedia.org/wiki/Fluent_interface">Fluent interface</a> style DSL for testing REST services. It served as the initial inspiration for the creation of this DSL.</p><h3><a id="Assumptions"></a>Assumptions</h3><p>This document assumes a few things about your environment in order to simplify the examples.</p>
 <ul>
   <li>The JVM is executable as simply java.</li>
   <li>The Apache Knox Gateway is installed and functional.</li>
   <li>The example commands are executed within the context of the GATEWAY_HOME current directory. The GATEWAY_HOME directory is the directory within the Apache Knox Gateway installation that contains the README file and the bin, conf and deployments directories.</li>
   <li>A few examples require the use of commands from a standard Groovy installation. These examples are optional but to try them you will need Groovy <a href="http://groovy.codehaus.org/Installing+Groovy">installed</a>.</li>
-</ul><h3>Assumptions</h3><p>The DSL requires a shell to interpret the Groovy script. The shell can either be used interactively or to execute a script file. To simplify use, the distribution contains an embedded version of the Groovy shell.</p><p>The shell can be run interactively. Use the command <code>exit</code> to exit.</p>
+</ul><h3><a id="Assumptions"></a>Assumptions</h3><p>The DSL requires a shell to interpret the Groovy script. The shell can either be used interactively or to execute a script file. To simplify use, the distribution contains an embedded version of the Groovy shell.</p><p>The shell can be run interactively. Use the command <code>exit</code> to exit.</p>
 <pre><code>java -jar bin/shell.jar
 </code></pre><p>When running interactively it may be helpful to reduce some of the output generated by the shell console. Use the following command in the interactive shell to reduce that output. This only needs to be done once as these preferences are persisted.</p>
 <pre><code>set verbosity QUIET
 set show-last-result false
 </code></pre><p>Also when running interactively use the <code>exit</code> command to terminate the shell. Using <code>^C</code> to exit can sometimes leaves the parent shell in a problematic state.</p><p>The shell can also be used to execute a script by passing a single filename argument.</p>
 <pre><code>java -jar bin/shell.jar samples/ExamplePutFile.groovy
-</code></pre><h3>Examples</h3><p>Once the shell can be launched the DSL can be used to interact with the gateway and Hadoop. Below is a very simple example of an interactive shell session to upload a file to HDFS.</p>
+</code></pre><h3><a id="Examples"></a>Examples</h3><p>Once the shell can be launched the DSL can be used to interact with the gateway and Hadoop. Below is a very simple example of an interactive shell session to upload a file to HDFS.</p>
 <pre><code>java -jar bin/shell.jar
 knox:000&gt; hadoop = Hadoop.login( &quot;https://localhost:8443/gateway/sample&quot;, &quot;bob&quot;, &quot;bob-password&quot; )
 knox:000&gt; Hdfs.put( hadoop ).file( &quot;README&quot; ).to( &quot;/tmp/example/README&quot; ).now()
@@ -552,7 +568,7 @@ json = (new JsonSlurper()).parseText( te
 println json.FileStatuses.FileStatus.pathSuffix
 hadoop.shutdown()
 exit
-</code></pre><p>Notice the <code>Hdfs.rm</code> command. This is included simply to ensure that the script can be rerun. Without this an error would result the second time it is run.</p><h3>Futures</h3><p>The DSL supports the ability to invoke commands asynchronously via the later() invocation method. The object returned from the later() method is a java.util.concurrent.Future parametrized with the response type of the command. This is an example of how to asynchronously put a file to HDFS.</p>
+</code></pre><p>Notice the <code>Hdfs.rm</code> command. This is included simply to ensure that the script can be rerun. Without this an error would result the second time it is run.</p><h3><a id="Futures"></a>Futures</h3><p>The DSL supports the ability to invoke commands asynchronously via the later() invocation method. The object returned from the later() method is a java.util.concurrent.Future parametrized with the response type of the command. This is an example of how to asynchronously put a file to HDFS.</p>
 <pre><code>future = Hdfs.put(hadoop).file(&quot;README&quot;).to(&quot;tmp/example/README&quot;).later()
 println future.get().statusCode
 </code></pre><p>The future.get() method will block until the asynchronous command is complete. To illustrate the usefulness of this however multiple concurrent commands are required.</p>
@@ -561,13 +577,13 @@ licenseFuture = Hdfs.put(hadoop).file(&q
 hadoop.waitFor( readmeFuture, licenseFuture )
 println readmeFuture.get().statusCode
 println licenseFuture.get().statusCode
-</code></pre><p>The hadoop.waitFor() method will wait for one or more asynchronous commands to complete.</p><h3>Closures</h3><p>Futures alone only provide asynchronous invocation of the command. What if some processing should also occur asynchronously once the command is complete. Support for this is provided by closures. Closures are blocks of code that are passed into the later() invocation method. In Groovy these are contained within {} immediately after a method. These blocks of code are executed once the asynchronous command is complete.</p>
+</code></pre><p>The hadoop.waitFor() method will wait for one or more asynchronous commands to complete.</p><h3><a id="Closures"></a>Closures</h3><p>Futures alone only provide asynchronous invocation of the command. What if some processing should also occur asynchronously once the command is complete. Support for this is provided by closures. Closures are blocks of code that are passed into the later() invocation method. In Groovy these are contained within {} immediately after a method. These blocks of code are executed once the asynchronous command is complete.</p>
 <pre><code>Hdfs.put(hadoop).file(&quot;README&quot;).to(&quot;tmp/example/README&quot;).later(){ println it.statusCode }
 </code></pre><p>In this example the put() command is executed on a separate thread and once complete the <code>println it.statusCode</code> block is executed on that thread. The it variable is automatically populated by Groovy and is a reference to the result that is returned from the future or now() method. The future example above can be rewritten to illustrate the use of closures.</p>
 <pre><code>readmeFuture = Hdfs.put(hadoop).file(&quot;README&quot;).to(&quot;tmp/example/README&quot;).later() { println it.statusCode }
 licenseFuture = Hdfs.put(hadoop).file(&quot;LICENSE&quot;).to(&quot;tmp/example/LICENSE&quot;).later() { println it.statusCode }
 hadoop.waitFor( readmeFuture, licenseFuture )
-</code></pre><p>Again, the hadoop.waitFor() method will wait for one or more asynchronous commands to complete.</p><h3>Constructs</h3><p>In order to understand the DSL there are three primary constructs that need to be understood.</p><h3>Hadoop</h3><p>This construct encapsulates the client side session state that will be shared between all command invocations. In particular it will simplify the management of any tokens that need to be presented with each command invocation. It also manages a thread pool that is used by all asynchronous commands which is why it is important to call one of the shutdown methods.</p><p>The syntax associated with this is expected to change we expect that credentials will not need to be provided to the gateway. Rather it is expected that some form of access token will be used to initialize the session.</p><h3>Services</h3><p>Services are the primary extension point for adding new suites of commands. The built in examples are: Hdfs, Job and Workflow. The d
 esire for extensibility is the reason for the slightly awkward Hdfs.ls(hadoop) syntax. Certainly something more like <code>hadoop.hdfs().ls()</code> would have been preferred but this would prevent adding new commands easily. At a minimum it would result in extension commands with a different syntax from the &ldquo;built-in&rdquo; commands.</p><p>The service objects essentially function as a factory for a suite of commands.</p><h3>Commands</h3><p>Commands provide the behavior of the DSL. They typically follow a Fluent interface style in order to allow for single line commands. There are really three parts to each command: Request, Invocation, Response</p><h3>Request</h3><p>The request is populated by all of the methods following the &ldquo;verb&rdquo; method and the &ldquo;invoke&rdquo; method. For example in <code>Hdfs.rm(hadoop).ls(dir).now()</code> the request is populated between the &ldquo;verb&rdquo; method <code>rm()</code> and the &ldquo;invoke&rdquo; method <code>now()</cod
 e>.</p><h3>Invocation</h3><p>The invocation method controls how the request is invoked. Currently supported synchronous and asynchronous invocation. The now() method executes the request and returns the result immediately. The later() method submits the request to be executed later and returns a future from which the result can be retrieved. In addition later() invocation method can optionally be provided a closure to execute when the request is complete. See the Futures and Closures sections below for additional detail and examples.</p><h3>Response</h3><p>The response contains the results of the invocation of the request. In most cases the response is a thin wrapper over the HTTP response. In fact many commands will share a single BasicResponse type that only provides a few simple methods.</p>
+</code></pre><p>Again, the hadoop.waitFor() method will wait for one or more asynchronous commands to complete.</p><h3><a id="Constructs"></a>Constructs</h3><p>In order to understand the DSL there are three primary constructs that need to be understood.</p><h3><a id="Hadoop"></a>Hadoop</h3><p>This construct encapsulates the client side session state that will be shared between all command invocations. In particular it will simplify the management of any tokens that need to be presented with each command invocation. It also manages a thread pool that is used by all asynchronous commands which is why it is important to call one of the shutdown methods.</p><p>The syntax associated with this is expected to change we expect that credentials will not need to be provided to the gateway. Rather it is expected that some form of access token will be used to initialize the session.</p><h3><a id="Services"></a>Services</h3><p>Services are the primary extension point for adding new suites of com
 mands. The built in examples are: Hdfs, Job and Workflow. The desire for extensibility is the reason for the slightly awkward Hdfs.ls(hadoop) syntax. Certainly something more like <code>hadoop.hdfs().ls()</code> would have been preferred but this would prevent adding new commands easily. At a minimum it would result in extension commands with a different syntax from the &ldquo;built-in&rdquo; commands.</p><p>The service objects essentially function as a factory for a suite of commands.</p><h3><a id="Commands"></a>Commands</h3><p>Commands provide the behavior of the DSL. They typically follow a Fluent interface style in order to allow for single line commands. There are really three parts to each command: Request, Invocation, Response</p><h3><a id="Request"></a>Request</h3><p>The request is populated by all of the methods following the &ldquo;verb&rdquo; method and the &ldquo;invoke&rdquo; method. For example in <code>Hdfs.rm(hadoop).ls(dir).now()</code> the request is populated betw
 een the &ldquo;verb&rdquo; method <code>rm()</code> and the &ldquo;invoke&rdquo; method <code>now()</code>.</p><h3><a id="Invocation"></a>Invocation</h3><p>The invocation method controls how the request is invoked. Currently supported synchronous and asynchronous invocation. The now() method executes the request and returns the result immediately. The later() method submits the request to be executed later and returns a future from which the result can be retrieved. In addition later() invocation method can optionally be provided a closure to execute when the request is complete. See the Futures and Closures sections below for additional detail and examples.</p><h3><a id="Response"></a>Response</h3><p>The response contains the results of the invocation of the request. In most cases the response is a thin wrapper over the HTTP response. In fact many commands will share a single BasicResponse type that only provides a few simple methods.</p>
 <pre><code>public int getStatusCode()
 public long getContentLength()
 public String getContentType()
@@ -578,7 +594,7 @@ public byte[] getBytes()
 public void close();
 </code></pre><p>Thanks to Groovy these methods can be accessed as attributes. In the some of the examples the staticCode was retrieved for example.</p>
 <pre><code>println Hdfs.put(hadoop).rm(dir).now().statusCode
-</code></pre><p>Groovy will invoke the getStatusCode method to retrieve the statusCode attribute.</p><p>The three methods getStream(), getBytes() and getString deserve special attention. Care must be taken that the HTTP body is read only once. Therefore one of these methods (and only one) must be called once and only once. Calling one of these more than once will cause an error. Failing to call one of these methods once will result in lingering open HTTP connections. The close() method may be used if the caller is not interested in reading the result body. Most commands that do not expect a response body will call close implicitly. If the body is retrieved via getBytes() or getString(), the close() method need not be called. When using getStream(), care must be taken to consume the entire body otherwise lingering open HTTP connections will result. The close() method may be called after reading the body partially to discard the remainder of the body.</p><h3>Services</h3><p>There are 
 three basic DSL services and commands bundled with the shell.</p><h4>HDFS</h4><p>Provides basic HDFS commands. <em>Using these DSL commands requires that WebHDFS be running in the Hadoop cluster.</em></p><h4>Jobs (Templeton/WebHCat)</h4><p>Provides basic job submission and status commands. <em>Using these DSL commands requires that Templeton/WebHCat be running in the Hadoop cluster.</em></p><h4>Workflow (Oozie)</h4><p>Provides basic workflow submission and status commands. <em>Using these DSL commands requires that Oozie be running in the Hadoop cluster.</em></p><h3>HDFS Commands (WebHDFS)</h3><h4>ls() - List the contents of a HDFS directory.</h4>
+</code></pre><p>Groovy will invoke the getStatusCode method to retrieve the statusCode attribute.</p><p>The three methods getStream(), getBytes() and getString deserve special attention. Care must be taken that the HTTP body is read only once. Therefore one of these methods (and only one) must be called once and only once. Calling one of these more than once will cause an error. Failing to call one of these methods once will result in lingering open HTTP connections. The close() method may be used if the caller is not interested in reading the result body. Most commands that do not expect a response body will call close implicitly. If the body is retrieved via getBytes() or getString(), the close() method need not be called. When using getStream(), care must be taken to consume the entire body otherwise lingering open HTTP connections will result. The close() method may be called after reading the body partially to discard the remainder of the body.</p><h3><a id="Services"></a>Servi
 ces</h3><p>There are three basic DSL services and commands bundled with the shell.</p><h4><a id="HDFS"></a>HDFS</h4><p>Provides basic HDFS commands. <em>Using these DSL commands requires that WebHDFS be running in the Hadoop cluster.</em></p><h4><a id="Jobs+(Templeton/WebHCat)"></a>Jobs (Templeton/WebHCat)</h4><p>Provides basic job submission and status commands. <em>Using these DSL commands requires that Templeton/WebHCat be running in the Hadoop cluster.</em></p><h4><a id="Workflow+(Oozie)"></a>Workflow (Oozie)</h4><p>Provides basic workflow submission and status commands. <em>Using these DSL commands requires that Oozie be running in the Hadoop cluster.</em></p><h3><a id="HDFS+Commands+(WebHDFS)"></a>HDFS Commands (WebHDFS)</h3><h4><a id="ls()+-+List+the+contents+of+a+HDFS+directory."></a>ls() - List the contents of a HDFS directory.</h4>
 <ul>
   <li>Request
   <ul>
@@ -592,7 +608,7 @@ public void close();
   <ul>
     <li><code>Hdfs.ls(hadoop).ls().dir(&quot;/&quot;).now()</code></li>
   </ul></li>
-</ul><h4>rm() - Remove a HDFS file or directory.</h4>
+</ul><h4><a id="rm()+-+Remove+a+HDFS+file+or+directory."></a>rm() - Remove a HDFS file or directory.</h4>
 <ul>
   <li>Request
   <ul>
@@ -607,7 +623,7 @@ public void close();
   <ul>
     <li><code>Hdfs.rm(hadoop).file(&quot;/tmp/example&quot;).recursive().now()</code></li>
   </ul></li>
-</ul><h4>put() - Copy a file from the local file system to HDFS.</h4>
+</ul><h4><a id="put()+-+Copy+a+file+from+the+local+file+system+to+HDFS."></a>put() - Copy a file from the local file system to HDFS.</h4>
 <ul>
   <li>Request
   <ul>
@@ -623,7 +639,7 @@ public void close();
   <ul>
     <li><code>Hdfs.put(hadoop).file(&quot;localFile&quot;).to(&quot;/tmp/example/remoteFile&quot;).now()</code></li>
   </ul></li>
-</ul><h4>get() - Copy a file from HDFS to the local file system.</h4>
+</ul><h4><a id="get()+-+Copy+a+file+from+HDFS+to+the+local+file+system."></a>get() - Copy a file from HDFS to the local file system.</h4>
 <ul>
   <li>Request
   <ul>
@@ -638,7 +654,7 @@ public void close();
   <ul>
     <li><code>Hdfs.get(hadoop).file(&quot;localFile&quot;).from(&quot;/tmp/example/remoteFile&quot;).now()</code></li>
   </ul></li>
-</ul><h4>mkdir() - Create a directory in HDFS.</h4>
+</ul><h4><a id="mkdir()+-+Create+a+directory+in+HDFS."></a>mkdir() - Create a directory in HDFS.</h4>
 <ul>
   <li>Request
   <ul>
@@ -653,7 +669,7 @@ public void close();
   <ul>
     <li><code>Hdfs.mkdir(hadoop).dir(&quot;/tmp/example&quot;).perm(&quot;777&quot;).now()</code></li>
   </ul></li>
-</ul><h3>Job Commands (WebHCat/Templeton)</h3><h4>submitJava() - Submit a Java MapReduce job.</h4>
+</ul><h3><a id="Job+Commands+(WebHCat/Templeton)"></a>Job Commands (WebHCat/Templeton)</h3><h4><a id="submitJava()+-+Submit+a+Java+MapReduce+job."></a>submitJava() - Submit a Java MapReduce job.</h4>
 <ul>
   <li>Request
   <ul>
@@ -670,7 +686,7 @@ public void close();
   <ul>
     <li><code>Job.submitJava(hadoop).jar(remoteJarName).app(appName).input(remoteInputDir).output(remoteOutputDir).now().jobId</code></li>
   </ul></li>
-</ul><h4>submitPig() - Submit a Pig job.</h4>
+</ul><h4><a id="submitPig()+-+Submit+a+Pig+job."></a>submitPig() - Submit a Pig job.</h4>
 <ul>
   <li>Request
   <ul>
@@ -686,7 +702,7 @@ public void close();
   <ul>
     <li><code>Job.submitPig(hadoop).file(remotePigFileName).arg(&quot;-v&quot;).statusDir(remoteStatusDir).now()</code></li>
   </ul></li>
-</ul><h4>submitHive() - Submit a Hive job.</h4>
+</ul><h4><a id="submitHive()+-+Submit+a+Hive+job."></a>submitHive() - Submit a Hive job.</h4>
 <ul>
   <li>Request
   <ul>
@@ -702,7 +718,7 @@ public void close();
   <ul>
     <li><code>Job.submitHive(hadoop).file(remoteHiveFileName).arg(&quot;-v&quot;).statusDir(remoteStatusDir).now()</code></li>
   </ul></li>
-</ul><h4>queryQueue() - Return a list of all job IDs registered to the user.</h4>
+</ul><h4><a id="queryQueue()+-+Return+a+list+of+all+job+IDs+registered+to+the+user."></a>queryQueue() - Return a list of all job IDs registered to the user.</h4>
 <ul>
   <li>Request
   <ul>
@@ -716,7 +732,7 @@ public void close();
   <ul>
     <li><code>Job.queryQueue(hadoop).now().string</code></li>
   </ul></li>
-</ul><h4>queryStatus() - Check the status of a job and get related job information given its job ID.</h4>
+</ul><h4><a id="queryStatus()+-+Check+the+status+of+a+job+and+get+related+job+information+given+its+job+ID."></a>queryStatus() - Check the status of a job and get related job information given its job ID.</h4>
 <ul>
   <li>Request
   <ul>
@@ -730,7 +746,7 @@ public void close();
   <ul>
     <li><code>Job.queryStatus(hadoop).jobId(jobId).now().string</code></li>
   </ul></li>
-</ul><h3>Workflow Commands (Oozie)</h3><h4>submit() - Submit a workflow job.</h4>
+</ul><h3><a id="Workflow+Commands+(Oozie)"></a>Workflow Commands (Oozie)</h3><h4><a id="submit()+-+Submit+a+workflow+job."></a>submit() - Submit a workflow job.</h4>
 <ul>
   <li>Request
   <ul>
@@ -746,7 +762,7 @@ public void close();
   <ul>
     <li><code>Workflow.submit(hadoop).file(localFile).action(&quot;start&quot;).now()</code></li>
   </ul></li>
-</ul><h4>status() - Query the status of a workflow job.</h4>
+</ul><h4><a id="status()+-+Query+the+status+of+a+workflow+job."></a>status() - Query the status of a workflow job.</h4>
 <ul>
   <li>Request
   <ul>
@@ -760,7 +776,7 @@ public void close();
   <ul>
     <li><code>Workflow.status(hadoop).jobId(jobId).now().string</code></li>
   </ul></li>
-</ul><h3>Extension</h3><p>Extensibility is a key design goal of the KnoxShell and DSL. There are two ways to provide extended functionality for use with the shell. The first is to simply create Groovy scripts that use the DSL to perform a useful task. The second is to add new services and commands. In order to add new service and commands new classes must be written in either Groovy or Java and added to the classpath of the shell. Fortunately there is a very simple way to add classes and JARs to the shell classpath. The first time the shell is executed it will create a configuration file in the same directory as the JAR with the same base name and a <code>.cfg</code> extension.</p>
+</ul><h3><a id="Extension"></a>Extension</h3><p>Extensibility is a key design goal of the KnoxShell and DSL. There are two ways to provide extended functionality for use with the shell. The first is to simply create Groovy scripts that use the DSL to perform a useful task. The second is to add new services and commands. In order to add new service and commands new classes must be written in either Groovy or Java and added to the classpath of the shell. Fortunately there is a very simple way to add classes and JARs to the shell classpath. The first time the shell is executed it will create a configuration file in the same directory as the JAR with the same base name and a <code>.cfg</code> extension.</p>
 <pre><code>bin/shell.jar
 bin/shell.cfg
 </code></pre><p>That file contains both the main class for the shell as well as a definition of the classpath. Currently that file will by default contain the following.</p>
@@ -768,7 +784,7 @@ bin/shell.cfg
 class.path=../lib; ../lib/*.jar; ../ext; ../ext/*.jar

[... 534 lines stripped ...]