You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@knox.apache.org by km...@apache.org on 2013/09/27 15:27:04 UTC

svn commit: r1526895 - in /incubator/knox: site/books/knox-incubating-0-3-0/ trunk/books/0.3.0/ trunk/markbook/src/main/java/org/apache/hadoop/gateway/markbook/

Author: kminder
Date: Fri Sep 27 13:27:03 2013
New Revision: 1526895

URL: http://svn.apache.org/r1526895
Log:
Added topology descriptor docs.

Modified:
    incubator/knox/site/books/knox-incubating-0-3-0/knox-incubating-0-3-0.html
    incubator/knox/trunk/books/0.3.0/book.md
    incubator/knox/trunk/books/0.3.0/config.md
    incubator/knox/trunk/books/0.3.0/service_hive.md
    incubator/knox/trunk/markbook/src/main/java/org/apache/hadoop/gateway/markbook/MarkBook.java

Modified: incubator/knox/site/books/knox-incubating-0-3-0/knox-incubating-0-3-0.html
URL: http://svn.apache.org/viewvc/incubator/knox/site/books/knox-incubating-0-3-0/knox-incubating-0-3-0.html?rev=1526895&r1=1526894&r2=1526895&view=diff
==============================================================================
--- incubator/knox/site/books/knox-incubating-0-3-0/knox-incubating-0-3-0.html (original)
+++ incubator/knox/site/books/knox-incubating-0-3-0/knox-incubating-0-3-0.html Fri Sep 27 13:27:03 2013
@@ -31,6 +31,7 @@
   </ul></li>
   <li><a href="#Gateway+Details">Gateway Details</a>
   <ul>
+    <li><a href="#Configuration">Configuration</a></li>
     <li><a href="#Authentication">Authentication</a></li>
     <li><a href="#Authorization">Authorization</a></li>
     <li><a href="#Configuration">Configuration</a></li>
@@ -274,7 +275,37 @@ Server: Jetty(6.1.26)
     <li>Gateway: <code>https://{gateway-host}:{gateway-port}/{gateway-path}/{cluster-name}/hbase</code></li>
     <li>Cluster: <code>http://{hbase-host}:60080</code></li>
   </ul></li>
-</ul><p>The values for <code>{gateway-host}</code>, <code>{gateway-port}</code>, <code>{gateway-path}</code> are provided via the gateway configuration file (i.e. <code>{GATEWAY_HOME}/conf/gateway-site.xml</code>).</p><p>The value for <code>{cluster-name}</code> is derived from the file name of the cluster topology descriptor (e.g. <code>{GATEWAY_HOME}/deployments/{cluster-name}.xml</code>).</p><p>The value for <code>{webhdfs-host}</code>, <code>{webhcat-host}</code>, <code>{oozie-host}</code> and <code>{hbase-host}</code> are provided via the cluster topology descriptor (e.g. <code>{GATEWAY_HOME}/deployments/{cluster-name}.xml</code>).</p><p>Note: The ports 50070, 50111, 11000 and 60080 are the defaults for WebHDFS, WebHCat, Oozie and Stargate/HBase respectively. Their values can also be provided via the cluster topology descriptor if your Hadoop cluster uses different ports.</p><h3><a id="Configuration"></a>Configuration</h3><h4><a id="Topology+Descriptors"></a>Topology Descriptor
 s</h4><p>The topology descriptor files provide the gateway per cluster configuration information. This includes configuration for both the providers within the gateway and the services within the Hadoop cluster.</p><h4><a id="Host+Mapping"></a>Host Mapping</h4><p>TODO - Complete Host Mapping docs.</p><p>That really depends upon how you have your VM configured. If you can hit <a href="http://c6401.ambari.apache.org:1022/">http://c6401.ambari.apache.org:1022/</a> directly from your client and knox host then you probably don&rsquo;t need the hostmap at all. The host map only exists for situations where a host in the hadoop cluster is known by one name externally and another internally. For example running hostname -q on sandbox returns sandbox.hortonworks.com but externally Sandbox is setup to be accesses using localhost via portmapping. The way the hostmap config works is that the <name/> element is what the hadoop cluster host is known as externally and the <value/> is how the hadoop
  cluster host identifies itself internally. <param><name>localhost</name><value>c6401,c6401.ambari.apache.org</value></param> You SHOULD be able to simply change <enabled>true</enabled> to false but I have a suspicion that that might not actually work. Please try it and file a jira if that doesn&rsquo;t work. If so, simply either remove the full provider config for hostmap or remove the <param/> that defines the mapping.</p><h4><a id="Logging"></a>Logging</h4><p>If necessary you can enable additional logging by editing the <code>log4j.properties</code> file in the <code>conf</code> directory. Changing the rootLogger value from <code>ERROR</code> to <code>DEBUG</code> will generate a large amount of debug logging. A number of useful, more fine loggers are also provided in the file.</p><h4><a id="Java+VM+Options"></a>Java VM Options</h4><p>TODO - Java VM options doc.</p><h4><a id="Persisting+the+Master+Secret"></a>Persisting the Master Secret</h4><p>The master secret is required to st
 art the server. This secret is used to access secured artifacts by the gateway instance. Keystore, trust stores and credential stores are all protected with the master secret.</p><p>You may persist the master secret by supplying the <em>-persist-master</em> switch at startup. This will result in a warning indicating that persisting the secret is less secure than providing it at startup. We do make some provisions in order to protect the persisted password.</p><p>It is encrypted with AES 128 bit encryption and where possible the file permissions are set to only be accessible by the user that the gateway is running as.</p><p>After persisting the secret, ensure that the file at config/security/master has the appropriate permissions set for your environment. This is probably the most important layer of defense for master secret. Do not assume that the encryption if sufficient protection.</p><p>A specific user should be created to run the gateway this will protect a persisted master file
 .</p><h4><a id="Management+of+Security+Artifacts"></a>Management of Security Artifacts</h4><p>There are a number of artifacts that are used by the gateway in ensuring the security of wire level communications, access to protected resources and the encryption of sensitive data. These artifacts can be managed from outside of the gateway instances or generated and populated by the gateway instance itself.</p><p>The following is a description of how this is coordinated with both standalone (development, demo, etc) gateway instances and instances as part of a cluster of gateways in mind.</p><p>Upon start of the gateway server we:</p>
+</ul><p>The values for <code>{gateway-host}</code>, <code>{gateway-port}</code>, <code>{gateway-path}</code> are provided via the gateway configuration file (i.e. <code>{GATEWAY_HOME}/conf/gateway-site.xml</code>).</p><p>The value for <code>{cluster-name}</code> is derived from the file name of the cluster topology descriptor (e.g. <code>{GATEWAY_HOME}/deployments/{cluster-name}.xml</code>).</p><p>The value for <code>{webhdfs-host}</code>, <code>{webhcat-host}</code>, <code>{oozie-host}</code> and <code>{hbase-host}</code> are provided via the cluster topology descriptor (e.g. <code>{GATEWAY_HOME}/deployments/{cluster-name}.xml</code>).</p><p>Note: The ports 50070, 50111, 11000 and 60080 are the defaults for WebHDFS, WebHCat, Oozie and Stargate/HBase respectively. Their values can also be provided via the cluster topology descriptor if your Hadoop cluster uses different ports.</p><h3><a id="Configuration"></a>Configuration</h3><h4><a id="Topology+Descriptors"></a>Topology Descriptor
 s</h4><p>The topology descriptor files provide the gateway with per-cluster configuration information. This includes configuration for both the providers within the gateway and the services within the Hadoop cluster. These files are located in <code>{GATEWAY_HOME}/deployments</code>. The general outline of this document looks like this.</p>
+<pre><code>&lt;topology&gt;
+    &lt;gateway&gt;
+        &lt;provider&gt;
+        &lt;/provider&gt;
+    &lt;/gateway&gt;
+    &lt;service&gt;
+    &lt;/service&gt;
+&lt;/topology&gt;
+</code></pre><p>There are typically multiple <code>&lt;provider&gt;</code> and <code>&lt;service&gt;</code> elements.</p>
+<dl><dt>/topology</dt><dd>Defines the provider and configuration and service topology for a single Hadoop cluster.</dd><dt>/topology/gateway</dt><dd>Groups all of the provider elements</dd><dt>/topology/gateway/provider</dt><dd>Defines the configuration of a specific provider for the cluster.</dd><dt>/topology/service</dt><dd>Defines the location of a specific Hadoop service within the Hadoop cluster.</dd>
+</dl><h5><a id="Provider+Configuration"></a>Provider Configuration</h5><p>Provider configuration is used to customize the behavior of a particular gateway feature. The general outline of a provider element looks like this.</p>
+<pre><code>&lt;provider&gt;
+    &lt;role&gt;authentication&lt;/role&gt;
+    &lt;name&gt;ShiroProvider&lt;/name&gt;
+    &lt;enabled&gt;true&lt;/enabled&gt;
+    &lt;param&gt;
+        &lt;name&gt;&lt;/name&gt;
+        &lt;value&gt;&lt;/value&gt;
+    &lt;/param&gt;
+&lt;/provider&gt;
+</code></pre>
+<dl><dt>/topology/gateway/provider</dt><dd>Groups information for a specific provider.</dd><dt>/topology/gateway/provider/role</dt><dd>Defines the role of a particular provider. There are a number of pre-defined roles used by out-of-the-box provider plugins for the gateay. These roles are: authentication, identity-assertion, authentication, rewrite and hostmap</dd><dt>/topology/gateway/provider/name</dt><dd>Defines the name of the provider for which this configuration applies. There can be multiple provider implementations for a given role. Specifying the name is used identify which particular provider is being configured. Typically each topology descriptor should contain only one provider for each role but there are exceptions.</dd><dt>/topology/gateway/provider/enabled</dt><dd>Allows a particular provider to be enabled or disabled via <code>true</code> or <code>false</code> respectively. When a provider is disabled any filters associated with that provider are excluded from the pr
 ocessing chain.</dd><dt>/topology/gateway/provider/param</dt><dd>These elements are used to supply provider configuration. There can be zero or more of these per provider.</dd><dt>/topology/gateway/provider/param/name</dt><dd>The name of a parameter to pass to the provider.</dd><dt>/topology/gateway/provider/param/value</dt><dd>The value of a parameter to pass to the provider.</dd>
+</dl><h5><a id="Service+Configuration"></a>Service Configuration</h5><p>Service configuration is used to specify the location of services within the Hadoop cluster. The general outline of a service element looks like this.</p>
+<pre><code>&lt;service&gt;
+    &lt;role&gt;WEBHDFS&lt;/role&gt;
+    &lt;url&gt;http://localhost:50070/webhdfs&lt;/url&gt;
+&lt;/service&gt;
+</code></pre>
+<dl><dt>/topology/service</dt><dd>Provider information about a particular service within the Hadoop cluster. Not all services are necessarily exposed as gateway endpoints.</dd><dt>/topology/service/role</dt><dd>Identifies the role of this service. Currently supported roles are: WEBHDFS, WEBHCAT, WEBHBASE, OOZIE, HIVE, NAMENODE, JOBTRACKER Additional service roles can be supported via plugins.</dd><dt>topology/service/url</dt><dd>The URL identifying the location of a particular service within the Hadoop cluster.</dd>
+</dl><h4><a id="Host+Mapping"></a>Host Mapping</h4><p>TODO - Complete Host Mapping docs.</p><p>That really depends upon how you have your VM configured. If you can hit <a href="http://c6401.ambari.apache.org:1022/">http://c6401.ambari.apache.org:1022/</a> directly from your client and knox host then you probably don&rsquo;t need the hostmap at all. The host map only exists for situations where a host in the hadoop cluster is known by one name externally and another internally. For example running hostname -q on sandbox returns sandbox.hortonworks.com but externally Sandbox is setup to be accesses using localhost via portmapping. The way the hostmap config works is that the <name/> element is what the hadoop cluster host is known as externally and the <value/> is how the hadoop cluster host identifies itself internally. <param><name>localhost</name><value>c6401,c6401.ambari.apache.org</value></param> You SHOULD be able to simply change <enabled>true</enabled> to false but I have a su
 spicion that that might not actually work. Please try it and file a jira if that doesn&rsquo;t work. If so, simply either remove the full provider config for hostmap or remove the <param/> that defines the mapping.</p><h4><a id="Logging"></a>Logging</h4><p>If necessary you can enable additional logging by editing the <code>log4j.properties</code> file in the <code>conf</code> directory. Changing the rootLogger value from <code>ERROR</code> to <code>DEBUG</code> will generate a large amount of debug logging. A number of useful, more fine loggers are also provided in the file.</p><h4><a id="Java+VM+Options"></a>Java VM Options</h4><p>TODO - Java VM options doc.</p><h4><a id="Persisting+the+Master+Secret"></a>Persisting the Master Secret</h4><p>The master secret is required to start the server. This secret is used to access secured artifacts by the gateway instance. Keystore, trust stores and credential stores are all protected with the master secret.</p><p>You may persist the master s
 ecret by supplying the <em>-persist-master</em> switch at startup. This will result in a warning indicating that persisting the secret is less secure than providing it at startup. We do make some provisions in order to protect the persisted password.</p><p>It is encrypted with AES 128 bit encryption and where possible the file permissions are set to only be accessible by the user that the gateway is running as.</p><p>After persisting the secret, ensure that the file at config/security/master has the appropriate permissions set for your environment. This is probably the most important layer of defense for master secret. Do not assume that the encryption if sufficient protection.</p><p>A specific user should be created to run the gateway this will protect a persisted master file.</p><h4><a id="Management+of+Security+Artifacts"></a>Management of Security Artifacts</h4><p>There are a number of artifacts that are used by the gateway in ensuring the security of wire level communications, 
 access to protected resources and the encryption of sensitive data. These artifacts can be managed from outside of the gateway instances or generated and populated by the gateway instance itself.</p><p>The following is a description of how this is coordinated with both standalone (development, demo, etc) gateway instances and instances as part of a cluster of gateways in mind.</p><p>Upon start of the gateway server we:</p>
 <ol>
   <li>Look for an identity store at <code>conf/security/keystores/gateway.jks</code>.  The identity store contains the certificate and private key used to represent the identity of the server for SSL connections and signature creation.
   <ul>
@@ -1899,7 +1930,7 @@ session.shutdown(10, SECONDS)
   <li>Client side (JDBC):
   <ol>
     <li>Hive JDBC in HTTP mode depends on following libraries to run successfully(must be in the classpath):  Hive Thrift artifacts classes, commons-codec.jar, commons-configuration.jar, commons-lang.jar, commons-logging.jar, hadoop-core.jar, hive-cli.jar, hive-common.jar, hive-jdbc.jar, hive-service.jar, hive-shims.jar, httpclient.jar, httpcore.jar, slf4j-api.jar;</li>
-    <li>Import gateway certificate into the default JRE truststore.  It is located in the <code>/lib/security/cacerts</code>  <code>keytool -import -alias hadoop.gateway -file hadoop.gateway.cer -keystore &lt;java-home&gt;/lib/security/cacerts</code>  Alternatively you can run your sample with additional parameters:  <code>-Djavax.net.ssl.trustStoreType=JKS -Djavax.net.ssl.trustStore=&lt;path-to-trust-store&gt; -Djavax.net.ssl.trustStorePassword=&lt;trust-store-password&gt;</code>  <code>keytool -import -alias hadoop.gateway -file hadoop.gateway.cer -keystore &lt;java-home&gt;/lib/security/cacerts</code></li>
+    <li>Import gateway certificate into the default JRE truststore.  It is located in the <code>/lib/security/cacerts</code>  <code>keytool -import -alias hadoop.gateway -file hadoop.gateway.cer -keystore &lt;java-home&gt;/lib/security/cacerts</code>  Alternatively you can run your sample with additional parameters:  <code>-Djavax.net.ssl.trustStoreType=JKS -Djavax.net.ssl.trustStore=&lt;path-to-trust-store&gt; -Djavax.net.ssl.trustStorePassword=&lt;trust-store-password&gt;</code></li>
     <li>Connection URL has to be following:  <code>jdbc:hive2://&lt;gateway-host&gt;:&lt;gateway-port&gt;/?hive.server2.servermode=https;hive.server2.http.path=&lt;gateway-path&gt;/&lt;cluster-name&gt;/hive</code></li>
     <li>Look at <a href="https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-DDLOperations">https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-DDLOperations</a> for examples.  Hint: For testing it would be better to execute &ldquo;set hive.security.authorization.enabled=false&rdquo; as the first statement.  Hint: Good examples of Hive DDL/DML can be found here <a href="http://gettingstarted.hadooponazure.com/hw/hive.html">http://gettingstarted.hadooponazure.com/hw/hive.html</a></li>
   </ol></li>

Modified: incubator/knox/trunk/books/0.3.0/book.md
URL: http://svn.apache.org/viewvc/incubator/knox/trunk/books/0.3.0/book.md?rev=1526895&r1=1526894&r2=1526895&view=diff
==============================================================================
--- incubator/knox/trunk/books/0.3.0/book.md (original)
+++ incubator/knox/trunk/books/0.3.0/book.md Fri Sep 27 13:27:03 2013
@@ -38,6 +38,7 @@
     * #[Basic Usage]
     * #[Sandbox Configuration]
 * #[Gateway Details]
+    * #[Configuration]
     * #[Authentication]
     * #[Authorization]
     * #[Configuration]

Modified: incubator/knox/trunk/books/0.3.0/config.md
URL: http://svn.apache.org/viewvc/incubator/knox/trunk/books/0.3.0/config.md?rev=1526895&r1=1526894&r2=1526895&view=diff
==============================================================================
--- incubator/knox/trunk/books/0.3.0/config.md (original)
+++ incubator/knox/trunk/books/0.3.0/config.md Fri Sep 27 13:27:03 2013
@@ -19,8 +19,98 @@
 
 #### Topology Descriptors ####
 
-The topology descriptor files provide the gateway per cluster configuration information.
+The topology descriptor files provide the gateway with per-cluster configuration information.
 This includes configuration for both the providers within the gateway and the services within the Hadoop cluster.
+These files are located in `{GATEWAY_HOME}/deployments`.
+The general outline of this document looks like this.
+
+    <topology>
+        <gateway>
+            <provider>
+            </provider>
+        </gateway>
+        <service>
+        </service>
+    </topology>
+
+There are typically multiple `<provider>` and `<service>` elements.
+
+/topology
+: Defines the provider and configuration and service topology for a single Hadoop cluster.
+
+/topology/gateway
+: Groups all of the provider elements
+
+/topology/gateway/provider
+: Defines the configuration of a specific provider for the cluster.
+
+/topology/service
+: Defines the location of a specific Hadoop service within the Hadoop cluster.
+
+##### Provider Configuration #####
+
+Provider configuration is used to customize the behavior of a particular gateway feature.
+The general outline of a provider element looks like this.
+
+    <provider>
+        <role>authentication</role>
+        <name>ShiroProvider</name>
+        <enabled>true</enabled>
+        <param>
+            <name></name>
+            <value></value>
+        </param>
+    </provider>
+
+/topology/gateway/provider
+: Groups information for a specific provider.
+
+/topology/gateway/provider/role
+: Defines the role of a particular provider.
+There are a number of pre-defined roles used by out-of-the-box provider plugins for the gateay.
+These roles are: authentication, identity-assertion, authentication, rewrite and hostmap
+
+/topology/gateway/provider/name
+: Defines the name of the provider for which this configuration applies.
+There can be multiple provider implementations for a given role.
+Specifying the name is used identify which particular provider is being configured.
+Typically each topology descriptor should contain only one provider for each role but there are exceptions.
+
+/topology/gateway/provider/enabled
+: Allows a particular provider to be enabled or disabled via `true` or `false` respectively.
+When a provider is disabled any filters associated with that provider are excluded from the processing chain.
+
+/topology/gateway/provider/param
+: These elements are used to supply provider configuration.
+There can be zero or more of these per provider.
+
+/topology/gateway/provider/param/name
+: The name of a parameter to pass to the provider.
+
+/topology/gateway/provider/param/value
+: The value of a parameter to pass to the provider.
+
+##### Service Configuration #####
+
+Service configuration is used to specify the location of services within the Hadoop cluster.
+The general outline of a service element looks like this.
+
+    <service>
+        <role>WEBHDFS</role>
+        <url>http://localhost:50070/webhdfs</url>
+    </service>
+
+/topology/service
+: Provider information about a particular service within the Hadoop cluster.
+Not all services are necessarily exposed as gateway endpoints.
+
+/topology/service/role
+: Identifies the role of this service.
+Currently supported roles are: WEBHDFS, WEBHCAT, WEBHBASE, OOZIE, HIVE, NAMENODE, JOBTRACKER
+Additional service roles can be supported via plugins.
+
+topology/service/url
+: The URL identifying the location of a particular service within the Hadoop cluster.
 
 #### Host Mapping ####
 

Modified: incubator/knox/trunk/books/0.3.0/service_hive.md
URL: http://svn.apache.org/viewvc/incubator/knox/trunk/books/0.3.0/service_hive.md?rev=1526895&r1=1526894&r2=1526895&view=diff
==============================================================================
--- incubator/knox/trunk/books/0.3.0/service_hive.md (original)
+++ incubator/knox/trunk/books/0.3.0/service_hive.md Fri Sep 27 13:27:03 2013
@@ -52,7 +52,6 @@ This document assumes a few things about
           `keytool -import -alias hadoop.gateway -file hadoop.gateway.cer -keystore <java-home>/lib/security/cacerts`
        Alternatively you can run your sample with additional parameters:
           `-Djavax.net.ssl.trustStoreType=JKS -Djavax.net.ssl.trustStore=<path-to-trust-store> -Djavax.net.ssl.trustStorePassword=<trust-store-password>`
-       `keytool -import -alias hadoop.gateway -file hadoop.gateway.cer -keystore <java-home>/lib/security/cacerts`
     3. Connection URL has to be following:
        `jdbc:hive2://<gateway-host>:<gateway-port>/?hive.server2.servermode=https;hive.server2.http.path=<gateway-path>/<cluster-name>/hive`
     4. Look at https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-DDLOperations for examples.

Modified: incubator/knox/trunk/markbook/src/main/java/org/apache/hadoop/gateway/markbook/MarkBook.java
URL: http://svn.apache.org/viewvc/incubator/knox/trunk/markbook/src/main/java/org/apache/hadoop/gateway/markbook/MarkBook.java?rev=1526895&r1=1526894&r2=1526895&view=diff
==============================================================================
--- incubator/knox/trunk/markbook/src/main/java/org/apache/hadoop/gateway/markbook/MarkBook.java (original)
+++ incubator/knox/trunk/markbook/src/main/java/org/apache/hadoop/gateway/markbook/MarkBook.java Fri Sep 27 13:27:03 2013
@@ -73,7 +73,7 @@ public class MarkBook {
   private static void storeHtml( CommandLine command, String markdown ) throws IOException {
     PegDownProcessor processor = new PegDownProcessor(
         Extensions.AUTOLINKS | Extensions.FENCED_CODE_BLOCKS | Extensions.QUOTES +
-        Extensions.SMARTS | Extensions.TABLES | Extensions.WIKILINKS );
+        Extensions.SMARTS | Extensions.TABLES | Extensions.DEFINITIONS );
     log( "Converting markdown (" + markdown.length() + " bytes) to HTML" );
     String html = processor.markdownToHtml( markdown.toString() );
     File outputFile = new File( command.getOptionValue( "o" ) );