You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@knox.apache.org by km...@apache.org on 2014/09/17 19:06:34 UTC
svn commit: r1625685 - in /knox: site/books/knox-0-4-0/ site/books/knox-0-5-0/ trunk/books/0.5.0/

Author: kminder
Date: Wed Sep 17 17:06:34 2014
New Revision: 1625685

URL: http://svn.apache.org/r1625685
Log:
Updates for HDFS HA support.

Modified:
    knox/site/books/knox-0-4-0/deployment-overview.png
    knox/site/books/knox-0-4-0/deployment-provider.png
    knox/site/books/knox-0-4-0/deployment-service.png
    knox/site/books/knox-0-4-0/runtime-overview.png
    knox/site/books/knox-0-4-0/runtime-request-processing.png
    knox/site/books/knox-0-5-0/knox-0-5-0.html
    knox/trunk/books/0.5.0/service_webhdfs.md

Modified: knox/site/books/knox-0-4-0/deployment-overview.png
URL: http://svn.apache.org/viewvc/knox/site/books/knox-0-4-0/deployment-overview.png?rev=1625685&r1=1625684&r2=1625685&view=diff
==============================================================================
Binary files - no diff available.

Modified: knox/site/books/knox-0-4-0/deployment-provider.png
URL: http://svn.apache.org/viewvc/knox/site/books/knox-0-4-0/deployment-provider.png?rev=1625685&r1=1625684&r2=1625685&view=diff
==============================================================================
Binary files - no diff available.

Modified: knox/site/books/knox-0-4-0/deployment-service.png
URL: http://svn.apache.org/viewvc/knox/site/books/knox-0-4-0/deployment-service.png?rev=1625685&r1=1625684&r2=1625685&view=diff
==============================================================================
Binary files - no diff available.

Modified: knox/site/books/knox-0-4-0/runtime-overview.png
URL: http://svn.apache.org/viewvc/knox/site/books/knox-0-4-0/runtime-overview.png?rev=1625685&r1=1625684&r2=1625685&view=diff
==============================================================================
Binary files - no diff available.

Modified: knox/site/books/knox-0-4-0/runtime-request-processing.png
URL: http://svn.apache.org/viewvc/knox/site/books/knox-0-4-0/runtime-request-processing.png?rev=1625685&r1=1625684&r2=1625685&view=diff
==============================================================================
Binary files - no diff available.

Modified: knox/site/books/knox-0-5-0/knox-0-5-0.html
URL: http://svn.apache.org/viewvc/knox/site/books/knox-0-5-0/knox-0-5-0.html?rev=1625685&r1=1625684&r2=1625685&view=diff
==============================================================================
--- knox/site/books/knox-0-5-0/knox-0-5-0.html (original)
+++ knox/site/books/knox-0-5-0/knox-0-5-0.html Wed Sep 17 17:06:34 2014
@@ -1625,7 +1625,7 @@ dep/commons-codec-1.7.jar
   </tbody>
 </table><p>However, there is a subtle difference to URLs that are returned by WebHDFS in the Location header of many requests. Direct WebHDFS requests may return Location headers that contain the address of a particular Data Node. The gateway will rewrite these URLs to ensure subsequent requests come back through the gateway and internal cluster details are protected.</p><p>A WebHDFS request to the Node Node to retrieve a file will return a URL of the form below in the Location header.</p>
 <pre><code>http://{datanode-host}:{data-node-port}/webhdfs/v1/{path}?...
-</code></pre><p>Note that this URL contains the newtwork location of a Data Node. The gateway will rewrite this URL to look like the URL below.</p>
+</code></pre><p>Note that this URL contains the network location of a Data Node. The gateway will rewrite this URL to look like the URL below.</p>
 <pre><code>https://{gateway-host}:{gateway-port}/{gateway-path}/{custer-name}/webhdfs/data/v1/{path}?_={encrypted-query-parameters}
 </code></pre><p>The <code>{encrypted-query-parameters}</code> will contain the <code>{datanode-host}</code> and <code>{datanode-port}</code> information. This information along with the original query parameters are encrypted so that the internal Hadoop details are protected.</p><h4><a id="WebHDFS+Examples"></a>WebHDFS Examples</h4><p>The examples below upload a file, download the file and list the contents of the directory.</p><h5><a id="WebHDFS+via+client+DSL"></a>WebHDFS via client DSL</h5><p>You can use the Groovy example scripts and interpreter provided with the distribution.</p>
 <pre><code>java -jar bin/shell.jar samples/ExampleWebHdfsPutGet.groovy
@@ -1763,7 +1763,30 @@ session.shutdown()
   <ul>
     <li><code>Hdfs.rm( session ).file( &quot;/user/guest/example&quot; ).recursive().now()</code></li>
   </ul></li>
-</ul><h3><a id="WebHCat"></a>WebHCat</h3><p>WebHCat is a related but separate service from Hive. As such it is installed and configured independently. The <a href="https://cwiki.apache.org/confluence/display/Hive/WebHCat">WebHCat wiki pages</a> describe this processes. In sandbox this configuration file for WebHCat is located at /etc/hadoop/hcatalog/webhcat-site.xml. Note the properties shown below as they are related to configuration required by the gateway.</p>
+</ul><h3><a id="WebHDFS+HA"></a>WebHDFS HA</h3><p>Knox provides basic failover and retry functionality for REST API calls made to WebHDFS when HDFS HA has been configured and enabled.</p><p>To enable HA functionality for WebHDFS in Knox the following configuration has to be added to the topology file.</p>
+<pre><code>&lt;provider&gt;
+   &lt;role&gt;ha&lt;/role&gt;
+   &lt;name&gt;HaProvider&lt;/name&gt;
+   &lt;enabled&gt;true&lt;/enabled&gt;
+   &lt;param&gt;
+       &lt;name&gt;WEBHDFS&lt;/name&gt;
+       &lt;value&gt;maxFailoverAttempts=3;failoverSleep=1000;maxRetryAttempts=300;retrySleep=1000;enabled=true&lt;/value&gt;
+   &lt;/param&gt;
+&lt;/provider&gt;
+</code></pre><p>The role and name of the provider above must be as shown. The name in the &lsquo;param&rsquo; section must match that of the service role name that is being configured for HA and the value in the &lsquo;param&rsquo; section is the configuration for that particular service in HA mode. In this case the name is &lsquo;WEBHDFS&rsquo;.</p><p>The various configuration parameters are described below:</p>
+<ul>
+  <li><p>maxFailoverAttempts - This is the maximum number of times a failover will be attempted. The failover strategy at this time is very simplistic in that the next URL in the list of URLs provided for the service is used and the one that failed is put at the bottom of the list. If the list is exhausted and the maximum number of attempts is not reached then the first URL that failed will be tried again (the list will start again from the original top entry).</p></li>
+  <li><p>failoverSleep - The amount of time in millis that the process will wait or sleep before attempting to failover.</p></li>
+  <li><p>maxRetryAttempts - The is the maximum number of times that a retry request will be attempted. Unlike failover, the retry is done on the same URL that failed. This is a special case in HDFS when the node is in safe mode. The expectation is that the node will come out of safe mode so a retry is desirable here as opposed to a failover.</p></li>
+  <li><p>retrySleep - The amount of time in millis that the process will wait or sleep before a retry is issued.</p></li>
+  <li><p>enabled - Flag to turn the particular service on or off for HA.</p></li>
+</ul><p>And for the service configuration itself the additional URLs that standby nodes should be added to the list. The active URL (at the time of configuration) should ideally be added to the top of the list.</p>
+<pre><code>&lt;service&gt;
+    &lt;role&gt;WEBHDFS&lt;/role&gt;
+    &lt;url&gt;http://{host1}:50070/webhdfs&lt;/url&gt;
+    &lt;url&gt;http://{host2}:50070/webhdfs&lt;/url&gt;
+&lt;/service&gt;
+</code></pre><h3><a id="WebHCat"></a>WebHCat</h3><p>WebHCat is a related but separate service from Hive. As such it is installed and configured independently. The <a href="https://cwiki.apache.org/confluence/display/Hive/WebHCat">WebHCat wiki pages</a> describe this processes. In sandbox this configuration file for WebHCat is located at /etc/hadoop/hcatalog/webhcat-site.xml. Note the properties shown below as they are related to configuration required by the gateway.</p>
 <pre><code>&lt;property&gt;
     &lt;name&gt;templeton.port&lt;/name&gt;
     &lt;value&gt;50111&lt;/value&gt;

Modified: knox/trunk/books/0.5.0/service_webhdfs.md
URL: http://svn.apache.org/viewvc/knox/trunk/books/0.5.0/service_webhdfs.md?rev=1625685&r1=1625684&r2=1625685&view=diff
==============================================================================
--- knox/trunk/books/0.5.0/service_webhdfs.md (original)
+++ knox/trunk/books/0.5.0/service_webhdfs.md Wed Sep 17 17:06:34 2014
@@ -77,7 +77,7 @@ A WebHDFS request to the Node Node to re
 
     http://{datanode-host}:{data-node-port}/webhdfs/v1/{path}?...
 
-Note that this URL contains the newtwork location of a Data Node.
+Note that this URL contains the network location of a Data Node.
 The gateway will rewrite this URL to look like the URL below.
 
     https://{gateway-host}:{gateway-port}/{gateway-path}/{custer-name}/webhdfs/data/v1/{path}?_={encrypted-query-parameters}
@@ -234,6 +234,61 @@ Use can use cURL to directly invoke the 
     * `Hdfs.rm( session ).file( "/user/guest/example" ).recursive().now()`
 
 
+### WebHDFS HA ###
+
+Knox provides basic failover and retry functionality for REST API calls made to WebHDFS when HDFS HA has been 
+configured and enabled.
+
+To enable HA functionality for WebHDFS in Knox the following configuration has to be added to the topology file.
+
+    <provider>
+       <role>ha</role>
+       <name>HaProvider</name>
+       <enabled>true</enabled>
+       <param>
+           <name>WEBHDFS</name>
+           <value>maxFailoverAttempts=3;failoverSleep=1000;maxRetryAttempts=300;retrySleep=1000;enabled=true</value>
+       </param>
+    </provider>
+    
+The role and name of the provider above must be as shown. The name in the 'param' section must match that of the service 
+role name that is being configured for HA and the value in the 'param' section is the configuration for that particular
+service in HA mode. In this case the name is 'WEBHDFS'.
+
+The various configuration parameters are described below:
+     
+* maxFailoverAttempts - 
+This is the maximum number of times a failover will be attempted. The failover strategy at this time is very simplistic
+in that the next URL in the list of URLs provided for the service is used and the one that failed is put at the bottom 
+of the list. If the list is exhausted and the maximum number of attempts is not reached then the first URL that failed 
+will be tried again (the list will start again from the original top entry).
+
+* failoverSleep - 
+The amount of time in millis that the process will wait or sleep before attempting to failover.
+
+* maxRetryAttempts - 
+The is the maximum number of times that a retry request will be attempted. Unlike failover, the retry is done on the 
+same URL that failed. This is a special case in HDFS when the node is in safe mode. The expectation is that the node will
+come out of safe mode so a retry is desirable here as opposed to a failover.
+
+* retrySleep - 
+The amount of time in millis that the process will wait or sleep before a retry is issued.
+
+* enabled - 
+Flag to turn the particular service on or off for HA.
+
+And for the service configuration itself the additional URLs that standby nodes should be added to the list. The active 
+URL (at the time of configuration) should ideally be added to the top of the list.
+
+
+    <service>
+        <role>WEBHDFS</role>
+        <url>http://{host1}:50070/webhdfs</url>
+        <url>http://{host2}:50070/webhdfs</url>
+    </service>
+    
+
+