You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@knox.apache.org by am...@apache.org on 2022/03/30 15:22:58 UTC

svn commit: r1899392 [10/11] - in /knox/trunk: ./ books/2.0.0/ books/2.0.0/dev-guide/ books/2.0.0/img/ books/2.0.0/img/adminui/ books/2.0.0/knoxshell-guide/

Added: knox/trunk/books/2.0.0/service_hive.md
URL: http://svn.apache.org/viewvc/knox/trunk/books/2.0.0/service_hive.md?rev=1899392&view=auto
==============================================================================
--- knox/trunk/books/2.0.0/service_hive.md (added)
+++ knox/trunk/books/2.0.0/service_hive.md Wed Mar 30 15:22:57 2022
@@ -0,0 +1,329 @@
+<!---
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       https://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+-->
+
+### Hive ###
+
+The [Hive wiki pages](https://cwiki.apache.org/confluence/display/Hive/Home) describe Hive installation and configuration processes.
+In sandbox configuration file for Hive is located at `/etc/hive/hive-site.xml`.
+Hive Server has to be started in HTTP mode.
+Note the properties shown below as they are related to configuration required by the gateway.
+
+    <property>
+        <name>hive.server2.thrift.http.port</name>
+        <value>10001</value>
+        <description>Port number when in HTTP mode.</description>
+    </property>
+
+    <property>
+        <name>hive.server2.thrift.http.path</name>
+        <value>cliservice</value>
+        <description>Path component of URL endpoint when in HTTP mode.</description>
+    </property>
+
+    <property>
+        <name>hive.server2.transport.mode</name>
+        <value>http</value>
+        <description>Server transport mode. "binary" or "http".</description>
+    </property>
+
+    <property>
+        <name>hive.server2.allow.user.substitution</name>
+        <value>true</value>
+    </property>
+
+The gateway by default includes a sample topology descriptor file `{GATEWAY_HOME}/deployments/sandbox.xml`.
+The value in this sample is configured to work with an installed Sandbox VM.
+
+    <service>
+        <role>HIVE</role>
+        <url>http://localhost:10001/cliservice</url>
+        <param>
+            <name>replayBufferSize</name>
+            <value>8</value>
+        </param>
+    </service>
+
+By default the gateway is configured to use the binary transport mode for Hive in the Sandbox.
+
+A default replayBufferSize of 8KB is shown in the sample topology file above.  This may need to be increased if your query size is larger.
+
+#### Hive JDBC URL Mapping ####
+
+| ------- | ------------------------------------------------------------------------------- |
+| Gateway | `jdbc:hive2://{gateway-host}:{gateway-port}/;ssl=true;sslTrustStore={gateway-trust-store-path};trustStorePassword={gateway-trust-store-password};transportMode=http;httpPath={gateway-path}/{cluster-name}/hive` |
+| Cluster | `http://{hive-host}:{hive-port}/{hive-path}` |
+
+#### Hive Examples ####
+
+This guide provides detailed examples for how to do some basic interactions with Hive via the Apache Knox Gateway.
+
+##### Hive Setup #####
+
+1. Make sure you are running the correct version of Hive to ensure JDBC/Thrift/HTTP support.
+2. Make sure Hive Server is running on the correct port.
+3. Make sure Hive Server is running in HTTP mode.
+4. Client side (JDBC):
+     1. Hive JDBC in HTTP mode depends on following minimal libraries set to run successfully(must be in the classpath):
+         * hive-jdbc-0.14.0-standalone.jar;
+         * commons-logging-1.1.3.jar;
+     2. Connection URL has to be the following: `jdbc:hive2://{gateway-host}:{gateway-port}/;ssl=true;sslTrustStore={gateway-trust-store-path};trustStorePassword={gateway-trust-store-password};transportMode=http;httpPath={gateway-path}/{cluster-name}/hive`
+     3. Look at https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-DDLOperations for examples.
+       Hint: For testing it would be better to execute `set hive.security.authorization.enabled=false` as the first statement.
+       Hint: Good examples of Hive DDL/DML can be found here http://gettingstarted.hadooponazure.com/hw/hive.html
+
+##### Customization #####
+
+This example may need to be tailored to the execution environment.
+In particular host name, host port, user name, user password and context path may need to be changed to match your environment.
+In particular there is one example file in the distribution that may need to be customized.
+Take a moment to review this file.
+All of the values that may need to be customized can be found together at the top of the file.
+
+* samples/hive/java/jdbc/sandbox/HiveJDBCSample.java
+
+##### Client JDBC Example #####
+
+Sample example for creating new table, loading data into it from the file system local to the Hive server and querying data from that table.
+
+###### Java ######
+
+    import java.sql.Connection;
+    import java.sql.DriverManager;
+    import java.sql.ResultSet;
+    import java.sql.SQLException;
+    import java.sql.Statement;
+
+    import java.util.logging.Level;
+    import java.util.logging.Logger;
+
+    public class HiveJDBCSample {
+
+      public static void main( String[] args ) {
+        Connection connection = null;
+        Statement statement = null;
+        ResultSet resultSet = null;
+
+        try {
+          String user = "guest";
+          String password = user + "-password";
+          String gatewayHost = "localhost";
+          int gatewayPort = 8443;
+          String trustStore = "/usr/lib/knox/data/security/keystores/gateway.jks";
+          String trustStorePassword = "knoxsecret";
+          String contextPath = "gateway/sandbox/hive";
+          String connectionString = String.format( "jdbc:hive2://%s:%d/;ssl=true;sslTrustStore=%s;trustStorePassword=%s?hive.server2.transport.mode=http;hive.server2.thrift.http.path=/%s", gatewayHost, gatewayPort, trustStore, trustStorePassword, contextPath );
+
+          // load Hive JDBC Driver
+          Class.forName( "org.apache.hive.jdbc.HiveDriver" );
+
+          // configure JDBC connection
+          connection = DriverManager.getConnection( connectionString, user, password );
+
+          statement = connection.createStatement();
+
+          // disable Hive authorization - it could be omitted if Hive authorization
+          // was configured properly
+          statement.execute( "set hive.security.authorization.enabled=false" );
+
+          // create sample table
+          statement.execute( "CREATE TABLE logs(column1 string, column2 string, column3 string, column4 string, column5 string, column6 string, column7 string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ' '" );
+
+          // load data into Hive from file /tmp/log.txt which is placed on the local file system
+          statement.execute( "LOAD DATA LOCAL INPATH '/tmp/log.txt' OVERWRITE INTO TABLE logs" );
+
+          resultSet = statement.executeQuery( "SELECT * FROM logs" );
+
+          while ( resultSet.next() ) {
+            System.out.println( resultSet.getString( 1 ) + " --- " + resultSet.getString( 2 ) + " --- " + resultSet.getString( 3 ) + " --- " + resultSet.getString( 4 ) );
+          }
+        } catch ( ClassNotFoundException ex ) {
+          Logger.getLogger( HiveJDBCSample.class.getName() ).log( Level.SEVERE, null, ex );
+        } catch ( SQLException ex ) {
+          Logger.getLogger( HiveJDBCSample.class.getName() ).log( Level.SEVERE, null, ex );
+        } finally {
+          if ( resultSet != null ) {
+            try {
+              resultSet.close();
+            } catch ( SQLException ex ) {
+              Logger.getLogger( HiveJDBCSample.class.getName() ).log( Level.SEVERE, null, ex );
+            }
+          }
+          if ( statement != null ) {
+            try {
+              statement.close();
+            } catch ( SQLException ex ) {
+              Logger.getLogger( HiveJDBCSample.class.getName() ).log( Level.SEVERE, null, ex );
+            }
+          }
+          if ( connection != null ) {
+            try {
+              connection.close();
+            } catch ( SQLException ex ) {
+              Logger.getLogger( HiveJDBCSample.class.getName() ).log( Level.SEVERE, null, ex );
+            }
+          }
+        }
+      }
+    }
+
+###### Groovy ######
+
+Make sure that `{GATEWAY_HOME/ext}` directory contains the following libraries for successful execution:
+
+- hive-jdbc-0.14.0-standalone.jar;
+- commons-logging-1.1.3.jar;
+
+There are several ways to execute this sample depending upon your preference.
+
+You can use the Groovy interpreter provided with the distribution.
+
+    java -jar bin/shell.jar samples/hive/groovy/jdbc/sandbox/HiveJDBCSample.groovy
+
+You can manually type in the KnoxShell DSL script into the interactive Groovy interpreter provided with the distribution.
+
+    java -jar bin/shell.jar
+
+Each line from the file below will need to be typed or copied into the interactive shell.
+
+    import java.sql.DriverManager
+
+    user = "guest";
+    password = user + "-password";
+    gatewayHost = "localhost";
+    gatewayPort = 8443;
+    trustStore = "/usr/lib/knox/data/security/keystores/gateway.jks";
+    trustStorePassword = "knoxsecret";
+    contextPath = "gateway/sandbox/hive";
+    connectionString = String.format( "jdbc:hive2://%s:%d/;ssl=true;sslTrustStore=%s;trustStorePassword=%s?hive.server2.transport.mode=http;hive.server2.thrift.http.path=/%s", gatewayHost, gatewayPort, trustStore, trustStorePassword, contextPath );
+
+    // Load Hive JDBC Driver
+    Class.forName( "org.apache.hive.jdbc.HiveDriver" );
+
+    // Configure JDBC connection
+    connection = DriverManager.getConnection( connectionString, user, password );
+
+    statement = connection.createStatement();
+
+    // Disable Hive authorization - This can be omitted if Hive authorization is configured properly
+    statement.execute( "set hive.security.authorization.enabled=false" );
+
+    // Create sample table
+    statement.execute( "CREATE TABLE logs(column1 string, column2 string, column3 string, column4 string, column5 string, column6 string, column7 string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ' '" );
+
+    // Load data into Hive from file /tmp/log.txt which is placed on the local file system
+    statement.execute( "LOAD DATA LOCAL INPATH '/tmp/sample.log' OVERWRITE INTO TABLE logs" );
+
+    resultSet = statement.executeQuery( "SELECT * FROM logs" );
+
+    while ( resultSet.next() ) {
+      System.out.println( resultSet.getString( 1 ) + " --- " + resultSet.getString( 2 ) );
+    }
+
+    resultSet.close();
+    statement.close();
+    connection.close();
+
+Examples use 'log.txt' with content:
+
+    2012-02-03 18:35:34 SampleClass6 [INFO] everything normal for id 577725851
+    2012-02-03 18:35:34 SampleClass4 [FATAL] system problem at id 1991281254
+    2012-02-03 18:35:34 SampleClass3 [DEBUG] detail for id 1304807656
+    2012-02-03 18:35:34 SampleClass3 [WARN] missing id 423340895
+    2012-02-03 18:35:34 SampleClass5 [TRACE] verbose detail for id 2082654978
+    2012-02-03 18:35:34 SampleClass0 [ERROR] incorrect id  1886438513
+    2012-02-03 18:35:34 SampleClass9 [TRACE] verbose detail for id 438634209
+    2012-02-03 18:35:34 SampleClass8 [DEBUG] detail for id 2074121310
+    2012-02-03 18:35:34 SampleClass0 [TRACE] verbose detail for id 1505582508
+    2012-02-03 18:35:34 SampleClass0 [TRACE] verbose detail for id 1903854437
+    2012-02-03 18:35:34 SampleClass7 [DEBUG] detail for id 915853141
+    2012-02-03 18:35:34 SampleClass3 [TRACE] verbose detail for id 303132401
+    2012-02-03 18:35:34 SampleClass6 [TRACE] verbose detail for id 151914369
+    2012-02-03 18:35:34 SampleClass2 [DEBUG] detail for id 146527742
+    ...
+
+Expected output:
+
+    2012-02-03 --- 18:35:34 --- SampleClass6 --- [INFO]
+    2012-02-03 --- 18:35:34 --- SampleClass4 --- [FATAL]
+    2012-02-03 --- 18:35:34 --- SampleClass3 --- [DEBUG]
+    2012-02-03 --- 18:35:34 --- SampleClass3 --- [WARN]
+    2012-02-03 --- 18:35:34 --- SampleClass5 --- [TRACE]
+    2012-02-03 --- 18:35:34 --- SampleClass0 --- [ERROR]
+    2012-02-03 --- 18:35:34 --- SampleClass9 --- [TRACE]
+    2012-02-03 --- 18:35:34 --- SampleClass8 --- [DEBUG]
+    2012-02-03 --- 18:35:34 --- SampleClass0 --- [TRACE]
+    2012-02-03 --- 18:35:34 --- SampleClass0 --- [TRACE]
+    2012-02-03 --- 18:35:34 --- SampleClass7 --- [DEBUG]
+    2012-02-03 --- 18:35:34 --- SampleClass3 --- [TRACE]
+    2012-02-03 --- 18:35:34 --- SampleClass6 --- [TRACE]
+    2012-02-03 --- 18:35:34 --- SampleClass2 --- [DEBUG]
+    ...
+
+### HiveServer2 HA ###
+
+Knox provides basic failover functionality for calls made to Hive Server when more than one HiveServer2 instance is
+installed in the cluster and registered with the same ZooKeeper ensemble. The HA functionality in this case fetches the
+HiveServer2 URL information from a ZooKeeper ensemble, so the user need only supply the necessary ZooKeeper
+configuration and not the Hive connection URLs.
+
+To enable HA functionality for Hive in Knox the following configuration has to be added to the topology file.
+
+    <provider>
+        <role>ha</role>
+        <name>HaProvider</name>
+        <enabled>true</enabled>
+        <param>
+            <name>HIVE</name>
+            <value>maxFailoverAttempts=3;failoverSleep=1000;enabled=true;zookeeperEnsemble=machine1:2181,machine2:2181,machine3:2181;zookeeperNamespace=hiveserver2</value>
+       </param>
+    </provider>
+
+The role and name of the provider above must be as shown. The name in the 'param' section must match that of the service
+role name that is being configured for HA and the value in the 'param' section is the configuration for that particular
+service in HA mode. In this case the name is 'HIVE'.
+
+The various configuration parameters are described below:
+
+* maxFailoverAttempts -
+This is the maximum number of times a failover will be attempted. The failover strategy at this time is very simplistic
+in that the next URL in the list of URLs provided for the service is used and the one that failed is put at the bottom
+of the list. If the list is exhausted and the maximum number of attempts is not reached then the first URL will be tried
+again after the list is fetched again from Zookeeper (a refresh of the list is done at this point)
+
+* failoverSleep -
+The amount of time in millis that the process will wait or sleep before attempting to failover.
+
+* enabled -
+Flag to turn the particular service on or off for HA.
+
+* zookeeperEnsemble -
+A comma separated list of host names (or IP addresses) of the zookeeper hosts that consist of the ensemble that the Hive
+servers register their information with. This value can be obtained from Hive's config file hive-site.xml as the value
+for the parameter 'hive.zookeeper.quorum'.
+
+* zookeeperNamespace -
+This is the namespace under which HiveServer2 information is registered in the Zookeeper ensemble. This value can be
+obtained from Hive's config file hive-site.xml as the value for the parameter 'hive.server2.zookeeper.namespace'.
+
+
+And for the service configuration itself the URLs need not be added to the list. For example.
+
+    <service>
+        <role>HIVE</role>
+    </service>
+
+Please note that there is no `<url>` tag specified here as the URLs for the Hive servers are obtained from Zookeeper.

Added: knox/trunk/books/2.0.0/service_kafka.md
URL: http://svn.apache.org/viewvc/knox/trunk/books/2.0.0/service_kafka.md?rev=1899392&view=auto
==============================================================================
--- knox/trunk/books/2.0.0/service_kafka.md (added)
+++ knox/trunk/books/2.0.0/service_kafka.md Wed Mar 30 15:22:57 2022
@@ -0,0 +1,108 @@
+<!---
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       https://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+--->
+
+### Kafka ###
+
+Knox provides gateway functionality to Kafka when used with the Confluent Kafka REST Proxy. The Kafka REST APIs allow the user to view the status 
+of the cluster, perform administrative actions and produce messages.
+
+<p>Note: Consumption of messages via Knox at this time is not supported.</p>  
+
+The docs for the Confluent Kafka REST Proxy can be found here:
+http://docs.confluent.io/current/kafka-rest/docs/index.html
+
+To enable this functionality, a topology file needs to have the following configuration:
+
+    <service>
+        <role>KAFKA</role>
+        <url>http://<kafka-rest-host>:<kafka-rest-port></url>
+    </service>
+
+The default Kafka REST Proxy port is 8082. If it is configured to some other port, that configuration can be found in 
+`kafka-rest.properties` under the property `listeners`.
+
+#### Kafka URL Mapping ####
+
+For Kafka URLs, the mapping of Knox Gateway accessible URLs to direct Kafka URLs is the following.
+
+| ------- | ------------------------------------------------------------------------------------- |
+| Gateway | `https://{gateway-host}:{gateway-port}/{gateway-path}/{cluster-name}/kafka` |
+| Cluster | `http://{kakfa-rest-host}:{kafka-rest-port}}`                               |
+
+
+#### Kafka Examples via cURL
+
+Some of the various calls that can be made and examples using curl are listed below.
+
+    # 0. Getting topic info
+    
+    curl -ikv -u guest:guest-password -X GET 'https://localhost:8443/gateway/sandbox/kafka/topics'
+
+    # 1. Publish message to topic
+    
+    curl -ikv -u guest:guest-password -X POST 'https://localhost:8443/gateway/sandbox/kafka/topics/TOPIC1' -H 'Content-Type: application/vnd.kafka.json.v2+json' -H 'Accept: application/vnd.kafka.v2+json' --data '"records":[{"value":{"foo":"bar"}}]}'
+
+### Kafka HA ###
+
+Knox provides basic failover functionality for calls made to Kafka. Since the Confluent Kafka REST Proxy does not register
+itself with ZooKeeper, the HA component looks in ZooKeeper for instances of Kafka and then performs a light weight ping for
+the presence of the REST Proxy on the same hosts. As such the Kafka REST Proxy must be installed on the same host as Kafka.
+The user should not supply URLs in the service definition.  
+
+Note: Users of Ambari must manually startup the Confluent Kafka REST Proxy.
+
+To enable HA functionality for Kafka in Knox the following configuration has to be added to the topology file.
+
+    <provider>
+        <role>ha</role>
+        <name>HaProvider</name>
+        <enabled>true</enabled>
+        <param>
+            <name>KAFKA</name>
+            <value>maxFailoverAttempts=3;failoverSleep=1000;enabled=true;zookeeperEnsemble=machine1:2181,machine2:2181,machine3:2181</value>
+       </param>
+    </provider>
+
+The role and name of the provider above must be as shown. The name in the 'param' section must match that of the service
+role name that is being configured for HA and the value in the 'param' section is the configuration for that particular
+service in HA mode. In this case the name is 'KAFKA'.
+
+The various configuration parameters are described below:
+
+* maxFailoverAttempts -
+This is the maximum number of times a failover will be attempted. The failover strategy at this time is very simplistic
+in that the next URL in the list of URLs provided for the service is used and the one that failed is put at the bottom
+of the list. If the list is exhausted and the maximum number of attempts is not reached then the first URL will be tried
+again after the list is fetched again from Zookeeper (a refresh of the list is done at this point)
+
+* failoverSleep -
+The amount of time in millis that the process will wait or sleep before attempting to failover.
+
+* enabled -
+Flag to turn the particular service on or off for HA.
+
+* zookeeperEnsemble -
+A comma separated list of host names (or IP addresses) of the ZooKeeper hosts that consist of the ensemble that the Kafka
+servers register their information with. 
+
+And for the service configuration itself the URLs need NOT be added to the list. For example:
+
+    <service>
+        <role>KAFKA</role>
+    </service>
+
+Please note that there is no `<url>` tag specified here as the URLs for the Kafka servers are obtained from ZooKeeper.

Added: knox/trunk/books/2.0.0/service_livy.md
URL: http://svn.apache.org/viewvc/knox/trunk/books/2.0.0/service_livy.md?rev=1899392&view=auto
==============================================================================
--- knox/trunk/books/2.0.0/service_livy.md (added)
+++ knox/trunk/books/2.0.0/service_livy.md Wed Mar 30 15:22:57 2022
@@ -0,0 +1,52 @@
+<!---
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       https://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+--->
+
+### Livy Server ###
+
+Knox provides proxied access to Livy server for submitting Spark jobs.
+The gateway can be used to provide authentication and encryption for clients to
+servers like Livy.
+
+#### Gateway configuration ####
+
+The Gateway can be configured for Livy by modifying the topology XML file
+and providing a new service XML file.
+
+In the topology XML file, add the following with the correct hostname:
+
+    <service>
+      <role>LIVYSERVER</role>
+      <url>http://<livy-server>:8998</url>
+    </service>
+
+Livy server will use proxyUser to run the Spark session. To avoid that a user can 
+provide here any user (e.g. a more privileged), Knox will need to rewrite the 
+JSON body to replace what so ever is the value of proxyUser is with the username of
+the authenticated user.
+
+    {  
+      "driverMemory":"2G",
+      "executorCores":4,
+      "executorMemory":"8G",
+      "proxyUser":"bernhard",
+      "conf":{  
+        "spark.master":"yarn-cluster",
+        "spark.jars.packages":"com.databricks:spark-csv_2.10:1.5.0"
+      }
+    } 
+
+The above is an example request body to be used to create a Spark session via Livy server and illustrates the "proxyUser" that requires rewrite.

Added: knox/trunk/books/2.0.0/service_oozie.md
URL: http://svn.apache.org/viewvc/knox/trunk/books/2.0.0/service_oozie.md?rev=1899392&view=auto
==============================================================================
--- knox/trunk/books/2.0.0/service_oozie.md (added)
+++ knox/trunk/books/2.0.0/service_oozie.md Wed Mar 30 15:22:57 2022
@@ -0,0 +1,200 @@
+<!---
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       https://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+--->
+
+### Oozie ###
+
+
+Oozie is a Hadoop component that provides complex job workflows to be submitted and managed.
+Please refer to the latest [Oozie documentation](http://oozie.apache.org/docs/4.2.0/) for details.
+
+In order to make Oozie accessible via the gateway there are several important Hadoop configuration settings.
+These all relate to the network endpoint exposed by various Hadoop services.
+
+The HTTP endpoint at which Oozie is running can be found via the `oozie.base.url property` in the `oozie-site.xml` file.
+In a Sandbox installation this can typically be found in `/etc/oozie/conf/oozie-site.xml`.
+
+    <property>
+        <name>oozie.base.url</name>
+        <value>http://sandbox.hortonworks.com:11000/oozie</value>
+    </property>
+
+The RPC address at which the Resource Manager exposes the JOBTRACKER endpoint can be found via the `yarn.resourcemanager.address` in the `yarn-site.xml` file.
+In a Sandbox installation this can typically be found in `/etc/hadoop/conf/yarn-site.xml`.
+
+    <property>
+        <name>yarn.resourcemanager.address</name>
+        <value>sandbox.hortonworks.com:8050</value>
+    </property>
+
+The RPC address at which the Name Node exposes its RPC endpoint can be found via the `dfs.namenode.rpc-address` in the `hdfs-site.xml` file.
+In a Sandbox installation this can typically be found in `/etc/hadoop/conf/hdfs-site.xml`.
+
+    <property>
+        <name>dfs.namenode.rpc-address</name>
+        <value>sandbox.hortonworks.com:8020</value>
+    </property>
+
+If HDFS has been configured to be in High Availability mode (HA), then instead of the RPC address mentioned above for the Name Node, look up and use the logical name of the service found via `dfs.nameservices` in `hdfs-site.xml`. For example,
+
+    <property>
+        <name>dfs.nameservices</name>
+        <value>ha-service</value>
+    </property>
+
+Please note, only one of the URLs, either the RPC endpoint or the HA service name should be used as the NAMENODE HDFS URL in the gateway topology file.
+
+The information above must be provided to the gateway via a topology descriptor file.
+These topology descriptor files are placed in `{GATEWAY_HOME}/deployments`.
+An example that is setup for the default configuration of the Sandbox is `{GATEWAY_HOME}/deployments/sandbox.xml`.
+These values will need to be changed for non-default Sandbox or other Hadoop cluster configuration.
+
+    <service>
+        <role>NAMENODE</role>
+        <url>hdfs://localhost:8020</url>
+    </service>
+    <service>
+        <role>JOBTRACKER</role>
+        <url>rpc://localhost:8050</url>
+    </service>
+    <service>
+        <role>OOZIE</role>
+        <url>http://localhost:11000/oozie</url>
+        <param>
+            <name>replayBufferSize</name>
+            <value>8</value>
+        </param>
+    </service>
+
+A default replayBufferSize of 8KB is shown in the sample topology file above.  This may need to be increased if your request size is larger.
+
+#### Oozie URL Mapping ####
+
+For Oozie URLs, the mapping of Knox Gateway accessible URLs to direct Oozie URLs is simple.
+
+| ------- | --------------------------------------------------------------------------- |
+| Gateway | `https://{gateway-host}:{gateway-port}/{gateway-path}/{cluster-name}/oozie` |
+| Cluster | `http://{oozie-host}:{oozie-port}/oozie}`                                   |
+
+
+#### Oozie Request Changes ####
+
+TODO - In some cases the Oozie requests needs to be slightly different when made through the gateway.
+These changes are required in order to protect the client from knowing the internal structure of the Hadoop cluster.
+
+
+#### Oozie Example via Client DSL ####
+
+This example will also submit the familiar WordCount Java MapReduce job to the Hadoop cluster via the gateway using the KnoxShell DSL.
+However in this case the job will be submitted via a Oozie workflow.
+There are several ways to do this depending upon your preference.
+
+You can use the "embedded" Groovy interpreter provided with the distribution.
+
+    java -jar bin/shell.jar samples/ExampleOozieWorkflow.groovy
+
+You can manually type in the KnoxShell DSL script into the "embedded" Groovy interpreter provided with the distribution.
+
+    java -jar bin/shell.jar
+
+Each line from the file `samples/ExampleOozieWorkflow.groovy` will need to be typed or copied into the interactive shell.
+
+#### Oozie Example via cURL
+
+The example below illustrates the sequence of curl commands that could be used to run a "word count" map reduce job via an Oozie workflow.
+
+It utilizes the hadoop-examples.jar from a Hadoop install for running a simple word count job.
+A copy of that jar has been included in the samples directory for convenience.
+
+In addition a workflow definition and configuration file is required.
+These have not been included but are available for download.
+Download [workflow-definition.xml](workflow-definition.xml) and [workflow-configuration.xml](workflow-configuration.xml) and store them in the `{GATEWAY_HOME}` directory.
+Review the contents of workflow-configuration.xml to ensure that it matches your environment.
+
+Take care to follow the instructions below where replacement values are required.
+These replacement values are identified with `{ }` markup.
+
+    # 0. Optionally cleanup the test directory in case a previous example was run without cleaning up.
+    curl -i -k -u guest:guest-password -X DELETE \
+        'https://localhost:8443/gateway/sandbox/webhdfs/v1/user/guest/example?op=DELETE&recursive=true'
+
+    # 1. Create the inode for workflow definition file in /user/guest/example
+    curl -i -k -u guest:guest-password -X PUT \
+        'https://localhost:8443/gateway/sandbox/webhdfs/v1/user/guest/example/workflow.xml?op=CREATE'
+
+    # 2. Upload the workflow definition file.  This file can be found in {GATEWAY_HOME}/templates
+    curl -i -k -u guest:guest-password -T workflow-definition.xml -X PUT \
+        '{Value Location header from command above}'
+
+    # 3. Create the inode for hadoop-examples.jar in /user/guest/example/lib
+    curl -i -k -u guest:guest-password -X PUT \
+        'https://localhost:8443/gateway/sandbox/webhdfs/v1/user/guest/example/lib/hadoop-examples.jar?op=CREATE'
+
+    # 4. Upload hadoop-examples.jar to /user/guest/example/lib.  Use a hadoop-examples.jar from a Hadoop install.
+    curl -i -k -u guest:guest-password -T samples/hadoop-examples.jar -X PUT \
+        '{Value Location header from command above}'
+
+    # 5. Create the inode for a sample input file readme.txt in /user/guest/example/input.
+    curl -i -k -u guest:guest-password -X PUT \
+        'https://localhost:8443/gateway/sandbox/webhdfs/v1/user/guest/example/input/README?op=CREATE'
+
+    # 6. Upload readme.txt to /user/guest/example/input.  Use the readme.txt in {GATEWAY_HOME}.
+    # The sample below uses this README file found in {GATEWAY_HOME}.
+    curl -i -k -u guest:guest-password -T README -X PUT \
+        '{Value of Location header from command above}'
+
+    # 7. Submit the job via Oozie
+    # Take note of the Job ID in the JSON response as this will be used in the next step.
+    curl -i -k -u guest:guest-password -H Content-Type:application/xml -T workflow-configuration.xml \
+        -X POST 'https://localhost:8443/gateway/sandbox/oozie/v1/jobs?action=start'
+
+    # 8. Query the job status via Oozie.
+    curl -i -k -u guest:guest-password -X GET \
+        'https://localhost:8443/gateway/sandbox/oozie/v1/job/{Job ID from JSON body}'
+
+    # 9. List the contents of the output directory /user/guest/example/output
+    curl -i -k -u guest:guest-password -X GET \
+        'https://localhost:8443/gateway/sandbox/webhdfs/v1/user/guest/example/output?op=LISTSTATUS'
+
+    # 10. Optionally cleanup the test directory
+    curl -i -k -u guest:guest-password -X DELETE \
+        'https://localhost:8443/gateway/sandbox/webhdfs/v1/user/guest/example?op=DELETE&recursive=true'
+
+### Oozie Client DSL ###
+
+#### submit() - Submit a workflow job.
+
+* Request
+    * text (String) - XML formatted workflow configuration string.
+    * file (String) - A filename containing XML formatted workflow configuration.
+    * action (String) - The initial action to take on the job.  Optional: Default is "start".
+* Response
+    * BasicResponse
+* Example
+    * `Workflow.submit(session).file(localFile).action("start").now()`
+
+#### status() - Query the status of a workflow job.
+
+* Request
+    * jobId (String) - The job ID to check. This is the ID received when the job was created.
+* Response
+    * BasicResponse
+* Example
+    * `Workflow.status(session).jobId(jobId).now().string`
+
+### Oozie HA ###
+
+Please look at #[Default Service HA support]

Added: knox/trunk/books/2.0.0/service_service_test.md
URL: http://svn.apache.org/viewvc/knox/trunk/books/2.0.0/service_service_test.md?rev=1899392&view=auto
==============================================================================
--- knox/trunk/books/2.0.0/service_service_test.md (added)
+++ knox/trunk/books/2.0.0/service_service_test.md Wed Mar 30 15:22:57 2022
@@ -0,0 +1,228 @@
+<!---
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       https://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+--->
+
+
+### Service Test API
+
+The gateway supports a Service Test API that can be used to test Knox's ability to connect to each of the different Hadoop services via a simple HTTP GET request. To be able to access this API, one must add the following lines into the topology for which you wish to run the service test.
+
+    <service>
+      <role>SERVICE-TEST</role>
+    </service>
+
+After adding the above to a topology, you can make a cURL request with the following structure
+
+    curl -i -k "https://{gateway-hostname}:{gateway-port}/gateway/{topology-name}/service-test?username=guest&password=guest-password"
+
+An alternate method of providing credentials:
+
+    curl -i -k -u guest:guest-password https://{gateway-hostname}:{gateway-port}/gateway/{topology-name}/service-test
+
+Below is an example response. The gateway is also capable of returning XML if specified in the request's "Accept" HTTP header.
+
+    {
+        "serviceTestWrapper": {
+         "Tests": {
+          "ServiceTest": [
+           {
+            "serviceName": "WEBHDFS",
+            "requestURL": "http://{gateway-host}:{gateway-port}/gateway/{topology-name}/webhdfs/v1/?op=LISTSTATUS",
+            "responseContent": "Content-Length:0,Content-Type: application/json;charset=utf-8",
+            "httpCode": 200,
+            "message": "Request sucessful."
+           },
+           {
+            "serviceName": "WEBHCAT",
+            "requestURL": "http://{gateway-host}:{gateway-port}/gateway/{topology-name}/templeton/v1/status",
+            "responseContent": "Content-Length:0,Content-Type: application/json;charset=utf-8",
+            "httpCode": 200,
+            "message": "Request sucessful."
+           },
+           {
+            "serviceName": "WEBHCAT",
+            "requestURL": "http://{gateway-host}:{gateway-port}/gateway/{topology-name}/templeton/v1/version",
+            "responseContent": "Content-Length:0,Content-Type: application/json;charset=utf-8",
+            "httpCode": 200,
+            "message": "Request sucessful."
+           },
+           {
+            "serviceName": "WEBHCAT",
+            "requestURL": "http://{gateway-host}:{gateway-port}/gateway/{topology-name}/templeton/v1/version/hive",
+            "responseContent": "Content-Length:0,Content-Type: application/json;charset=utf-8",
+            "httpCode": 200,
+            "message": "Request sucessful."
+           },
+           {
+            "serviceName": "WEBHCAT",
+            "requestURL": "http://{gateway-host}:{gateway-port}/gateway/{topology-name}/templeton/v1/version/hadoop",
+            "responseContent": "Content-Length:0,Content-Type: application/json;charset=utf-8",
+            "httpCode": 200,
+            "message": "Request sucessful."
+           },
+           {
+            "serviceName": "OOZIE",
+            "requestURL": "http://{gateway-host}:{gateway-port}/gateway/{topology-name}/oozie/v1/admin/build-version",
+            "responseContent": "Content-Length:0,Content-Type: application/json;charset=utf-8",
+            "httpCode": 200,
+            "message": "Request sucessful."
+           },
+           {
+            "serviceName": "OOZIE",
+            "requestURL": "http://{gateway-host}:{gateway-port}/gateway/{topology-name}/oozie/v1/admin/status",
+            "responseContent": "Content-Length:0,Content-Type: application/json;charset=utf-8",
+            "httpCode": 200,
+            "message": "Request sucessful."
+           },
+           {
+            "serviceName": "OOZIE",
+            "requestURL": "http://{gateway-host}:{gateway-port}/gateway/{topology-name}/oozie/versions",
+            "responseContent": "Content-Length:0,Content-Type: application/json;charset=utf-8",
+            "httpCode": 200,
+            "message": "Request sucessful."
+           },
+           {
+            "serviceName": "WEBHBASE",
+            "requestURL": "http://{gateway-host}:{gateway-port}/gateway/{topology-name}/hbase/version",
+            "responseContent": "Content-Length:0,Content-Type: application/json;charset=utf-8",
+            "httpCode": 200,
+            "message": "Request sucessful."
+           },
+           {
+            "serviceName": "WEBHBASE",
+            "requestURL": "http://{gateway-host}:{gateway-port}/gateway/{topology-name}/hbase/version/cluster",
+            "responseContent": "Content-Length:0,Content-Type: application/json;charset=utf-8",
+            "httpCode": 200,
+            "message": "Request sucessful."
+           },
+           {
+            "serviceName": "WEBHBASE",
+            "requestURL": "http://{gateway-host}:{gateway-port}/gateway/{topology-name}/hbase/status/cluster",
+            "responseContent": "Content-Length:0,Content-Type: application/json;charset=utf-8",
+            "httpCode": 200,
+            "message": "Request sucessful."
+           },
+           {
+            "serviceName": "WEBHBASE",
+            "requestURL": "http://{gateway-host}:{gateway-port}/gateway/{topology-name}/hbase",
+            "responseContent": "Content-Length:0,Content-Type: application/json;charset=utf-8",
+            "httpCode": 200,
+            "message": "Request sucessful."
+           },
+           {
+            "serviceName": "RESOURCEMANAGER",
+            "requestURL": "http://{gateway-host}:{gateway-port}/gateway/{topology-name}/resourcemanager/v1/{topology-name}/info",
+            "responseContent": "Content-Length:0,Content-Type: application/json;charset=utf-8",
+            "httpCode": 200,
+            "message": "Request sucessful."
+           },
+           {
+            "serviceName": "RESOURCEMANAGER",
+            "requestURL": "http://{gateway-host}:{gateway-port}/gateway/{topology-name}/resourcemanager/v1/{topology-name}/metrics",
+            "responseContent": "Content-Length:0,Content-Type: application/json;charset=utf-8",
+            "httpCode": 200,
+            "message": "Request sucessful."
+           },
+           {
+            "serviceName": "RESOURCEMANAGER",
+            "requestURL": "http://{gateway-host}:{gateway-port}/gateway/{topology-name}/resourcemanager/v1/{topology-name}/apps",
+            "responseContent": "Content-Length:0,Content-Type: application/json;charset=utf-8",
+            "httpCode": 200,
+            "message": "Request sucessful."
+           },
+           {
+            "serviceName": "FALCON",
+            "requestURL": "http://{gateway-host}:{gateway-port}/gateway/{topology-name}/falcon/api/admin/stack",
+            "responseContent": "Content-Length:0,Content-Type: application/json;charset=utf-8",
+            "httpCode": 200,
+            "message": "Request sucessful."
+           },
+           {
+            "serviceName": "FALCON",
+            "requestURL": "http://{gateway-host}:{gateway-port}/gateway/{topology-name}/falcon/api/admin/version",
+            "responseContent": "Content-Length:0,Content-Type: application/json;charset=utf-8",
+            "httpCode": 200,
+            "message": "Request sucessful."
+           },
+           {
+            "serviceName": "FALCON",
+            "requestURL": "http://{gateway-host}:{gateway-port}/gateway/{topology-name}/falcon/api/metadata/lineage/serialize",
+            "responseContent": "Content-Length:0,Content-Type: application/json;charset=utf-8",
+            "httpCode": 200,
+            "message": "Request sucessful."
+           },
+           {
+            "serviceName": "FALCON",
+            "requestURL": "http://{gateway-host}:{gateway-port}/gateway/{topology-name}/falcon/api/metadata/lineage/vertices/all",
+            "responseContent": "Content-Length:0,Content-Type: application/json;charset=utf-8",
+            "httpCode": 200,
+            "message": "Request sucessful."
+           },
+           {
+            "serviceName": "FALCON",
+            "requestURL": "http://{gateway-host}:{gateway-port}/gateway/{topology-name}/falcon/api/metadata/lineage/edges/all",
+            "responseContent": "Content-Length:0,Content-Type: application/json;charset=utf-8",
+            "httpCode": 200,
+            "message": "Request sucessful."
+           },
+           {
+            "serviceName": "STORM",
+            "requestURL": "http://{gateway-host}:{gateway-port}/gateway/{topology-name}/storm/api/v1/cluster/configuration",
+            "responseContent": "Content-Length:0,Content-Type: application/json;charset=utf-8",
+            "httpCode": 200,
+            "message": "Request sucessful."
+           },
+           {
+            "serviceName": "STORM",
+            "requestURL": "http://{gateway-host}:{gateway-port}/gateway/{topology-name}/storm/api/v1/cluster/summary",
+            "responseContent": "Content-Length:0,Content-Type: application/json;charset=utf-8",
+            "httpCode": 200,
+            "message": "Request sucessful."
+           },
+           {
+            "serviceName": "STORM",
+            "requestURL": "http://{gateway-host}:{gateway-port}/gateway/{topology-name}/storm/api/v1/supervisor/summary",
+            "responseContent": "Content-Length:0,Content-Type: application/json;charset=utf-8",
+            "httpCode": 200,
+            "message": "Request sucessful."
+           },
+           {
+            "serviceName": "STORM",
+            "requestURL": "http://{gateway-host}:{gateway-port}/gateway/{topology-name}/storm/api/v1/topology/summary",
+            "responseContent": "Content-Length:0,Content-Type: application/json;charset=utf-8",
+            "httpCode": 200,
+            "message": "Request sucessful."
+           }
+          ]
+         },
+         "messages": {
+          "message": [
+
+          ]
+         }
+        }
+    }
+
+
+We can see that this service-test makes HTTP requests to each of the services through Knox using the specified topology. The test will only make calls to those services that have entries within the topology file.
+
+##### Adding and Changing test URLs
+
+URLs for each service are stored in `{GATEWAY_HOME}/data/services/{service-name}/{service-version}/service.xml`. Each `<testURL>` element represents a service resource that will be tested if the service is set up in the topology. You can add or remove these from the `service.xml` file. Just note if you add URLs there is no guarantee in the order they will be tested. All default URLs have been tested and work on various clusters. If a new URL is added and doesn't respond in a way the user expects then it is up to the user to determine whether the URL is correct or not.
+
+##### Some important things to note:
+ - In the first cURL request, the quotes are necessary around the URL or else a command line terminal will not include the `&password` query parameter in the request.
+ - This API call does not require any credentials to receive a response from Knox, but expect to receive 401 responses from each of the services if none are provided.

Added: knox/trunk/books/2.0.0/service_solr.md
URL: http://svn.apache.org/viewvc/knox/trunk/books/2.0.0/service_solr.md?rev=1899392&view=auto
==============================================================================
--- knox/trunk/books/2.0.0/service_solr.md (added)
+++ knox/trunk/books/2.0.0/service_solr.md Wed Mar 30 15:22:57 2022
@@ -0,0 +1,119 @@
+<!---
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       https://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+--->
+
+### Solr ###
+
+Knox provides gateway functionality to Solr with support for versions 5.5+ and 6+. The Solr REST APIs allow the user to view the status 
+of the collections, perform administrative actions and query collections.
+
+See the Solr Quickstart (http://lucene.apache.org/solr/quickstart.html) section of the Solr documentation for examples of the Solr REST API.
+
+Since Knox provides an abstraction over Solr and ZooKeeper, the use of the SolrJ CloudSolrClient is no longer supported.  You should replace 
+instances of CloudSolrClient with HttpSolrClient.
+
+<p>Note: Updates to Solr via Knox require a POST operation require the use of preemptive authentication which is not directly supported by the 
+SolrJ API at this time.</p>  
+
+To enable this functionality, a topology file needs to have the following configuration:
+
+    <service>
+        <role>SOLR</role>
+        <version>6.0.0</version>
+        <url>http://<solr-host>:<solr-port></url>
+    </service>
+
+The default Solr port is 8983. Adjust the version specified to either '5.5.0 or '6.0.0'.
+
+For Solr 5.5.0 you also need to change the role name to `SOLRAPI` like this:
+
+    <service>
+        <role>SOLRAPI</role>
+        <version>5.5.0</version>
+        <url>http://<solr-host>:<solr-port></url>
+    </service>
+
+
+#### Solr URL Mapping ####
+
+For Solr URLs, the mapping of Knox Gateway accessible URLs to direct Solr URLs is the following.
+
+| ------- | ------------------------------------------------------------------------------------- |
+| Gateway | `https://{gateway-host}:{gateway-port}/{gateway-path}/{cluster-name}/solr` |
+| Cluster | `http://{solr-host}:{solr-port}/solr`                               |
+
+
+#### Solr Examples via cURL
+
+Some of the various calls that can be made and examples using curl are listed below.
+
+    # 0. Query collection
+    
+    curl -ikv -u guest:guest-password -X GET 'https://localhost:8443/gateway/sandbox/solr/select?q=*:*&wt=json'
+
+    # 1. Query cluster status
+    
+    curl -ikv -u guest:guest-password -X POST 'https://localhost:8443/gateway/sandbox/solr/admin/collections?action=CLUSTERSTATUS' 
+
+### Solr HA ###
+
+Knox provides basic failover functionality for calls made to Solr Cloud when more than one Solr instance is
+installed in the cluster and registered with the same ZooKeeper ensemble. The HA functionality in this case fetches the
+Solr URL information from a ZooKeeper ensemble, so the user need only supply the necessary ZooKeeper
+configuration and not the Solr connection URLs.
+
+To enable HA functionality for Solr Cloud in Knox the following configuration has to be added to the topology file.
+
+    <provider>
+        <role>ha</role>
+        <name>HaProvider</name>
+        <enabled>true</enabled>
+        <param>
+            <name>SOLR</name>
+            <value>maxFailoverAttempts=3;failoverSleep=1000;enabled=true;zookeeperEnsemble=machine1:2181,machine2:2181,machine3:2181</value>
+       </param>
+    </provider>
+
+The role and name of the provider above must be as shown. The name in the 'param' section must match that of the service
+role name that is being configured for HA and the value in the 'param' section is the configuration for that particular
+service in HA mode. In this case the name is 'SOLR'.
+
+The various configuration parameters are described below:
+
+* maxFailoverAttempts -
+This is the maximum number of times a failover will be attempted. The failover strategy at this time is very simplistic
+in that the next URL in the list of URLs provided for the service is used and the one that failed is put at the bottom
+of the list. If the list is exhausted and the maximum number of attempts is not reached then the first URL will be tried
+again after the list is fetched again from ZooKeeper (a refresh of the list is done at this point)
+
+* failoverSleep -
+The amount of time in millis that the process will wait or sleep before attempting to failover.
+
+* enabled -
+Flag to turn the particular service on or off for HA.
+
+* zookeeperEnsemble -
+A comma separated list of host names (or IP addresses) of the zookeeper hosts that consist of the ensemble that the Solr
+servers register their information with. 
+
+And for the service configuration itself the URLs need NOT be added to the list. For example.
+
+    <service>
+        <role>SOLR</role>
+        <version>6.0.0</version>
+    </service>
+
+Please note that there is no `<url>` tag specified here as the URLs for the Solr servers are obtained from ZooKeeper.

Added: knox/trunk/books/2.0.0/service_ssl_certificate_trust.md
URL: http://svn.apache.org/viewvc/knox/trunk/books/2.0.0/service_ssl_certificate_trust.md?rev=1899392&view=auto
==============================================================================
--- knox/trunk/books/2.0.0/service_ssl_certificate_trust.md (added)
+++ knox/trunk/books/2.0.0/service_ssl_certificate_trust.md Wed Mar 30 15:22:57 2022
@@ -0,0 +1,171 @@
+<!---
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       https://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+--->
+
+### TLS/SSL Certificate Trust ###
+
+When the Gateway dispatches requests to a configured service using TLS/SSL, that service's certificate 
+must be trusted inorder for the connection to succeed.  To do this, the Gateway checks 
+a configured trust store for the service's certificate or the certificate of the CA that issued that 
+certificate. 
+
+If not explicitly set, the Gateway will use its configured identity keystore as the trust store.
+By default, this keystore is located at `{GATEWAY_HOME}/data/security/keystores/gateway.jks`; however, 
+a custom identity keystore may be set in the gateway-site.xml file. See `gateway.tls.keystore.password.alias`, `gateway.tls.keystore.path`, 
+and `gateway.tls.keystore.type`. 
+   
+The trust store is configured at the Gatway-level.  There is no support to set a different trust store
+per service. To use a specific trust store, the following configuration elements may be set in the 
+gateway-site.xml file:
+
+| Configuration Element                          | Description                                               |
+| -----------------------------------------------|-----------------------------------------------------------|
+| gateway.httpclient.truststore.path             | Fully qualified path to the trust store to use. Default is the keystore used to hold the Gateway's identity.  See `gateway.tls.keystore.path`.|
+| gateway.httpclient.truststore.type             | Keystore type of the trust store. Default is JKS.         |
+| gateway.httpclient.truststore.password.alias   | Alias for the password to the trust store.|
+
+
+If `gateway.httpclient.truststore.path` is not set, the keystore used to hold the Gateway's identity 
+will be used as the trust store. 
+
+However, if `gateway.httpclient.truststore.path` is set, it is expected that 
+`gateway.httpclient.truststore.type` and `gateway.httpclient.truststore.password.alias` are set
+appropriately. If `gateway.httpclient.truststore.type` is not set, the Gateway will assume the trust 
+store is a JKS file. If `gateway.httpclient.truststore.password.alias` is not set, the Gateway will
+assume the alias name is "gateway-httpclient-truststore-password".  In any case, if the 
+trust store password is different from the Gateway's master secret then it can be set using
+
+    knoxcli.sh create-alias {password-alias} --value {pwd} 
+  
+If a password is not found using the provided (or default) alias name, then the Gateway's master secret 
+will be used.
+
+All topologies deployed within the Gateway instance will use the configured trust store to verify a 
+service's identity.  
+<!---
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       https://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+--->
+
+### TLS/SSL Certificate Trust ###
+
+When the Gateway dispatches requests to a configured service using TLS/SSL, that service's certificate 
+must be trusted inorder for the connection to succeed.  To do this, the Gateway checks 
+a configured trust store for the service's certificate or the certificate of the CA that issued that 
+certificate. 
+
+If not explicitly set, the Gateway will use its configured identity keystore as the trust store.
+By default, this keystore is located at `{GATEWAY_HOME}/data/security/keystores/gateway.jks`; however, 
+a custom identity keystore may be set in the gateway-site.xml file. See `gateway.tls.keystore.password.alias`, `gateway.tls.keystore.path`, 
+and `gateway.tls.keystore.type`. 
+   
+The trust store is configured at the Gatway-level.  There is no support to set a different trust store
+per service. To use a specific trust store, the following configuration elements may be set in the 
+gateway-site.xml file:
+
+| Configuration Element                          | Description                                               |
+| -----------------------------------------------|-----------------------------------------------------------|
+| gateway.httpclient.truststore.path             | Fully qualified path to the trust store to use. Default is the keystore used to hold the Gateway's identity.  See `gateway.tls.keystore.path`.|
+| gateway.httpclient.truststore.type             | Keystore type of the trust store. Default is JKS.         |
+| gateway.httpclient.truststore.password.alias   | Alias for the password to the trust store.|
+
+
+If `gateway.httpclient.truststore.path` is not set, the keystore used to hold the Gateway's identity 
+will be used as the trust store. 
+
+However, if `gateway.httpclient.truststore.path` is set, it is expected that 
+`gateway.httpclient.truststore.type` and `gateway.httpclient.truststore.password.alias` are set
+appropriately. If `gateway.httpclient.truststore.type` is not set, the Gateway will assume the trust 
+store is a JKS file. If `gateway.httpclient.truststore.password.alias` is not set, the Gateway will
+assume the alias name is "gateway-httpclient-truststore-password".  In any case, if the 
+trust store password is different from the Gateway's master secret then it can be set using
+
+    knoxcli.sh create-alias {password-alias} --value {pwd} 
+  
+If a password is not found using the provided (or default) alias name, then the Gateway's master secret 
+will be used.
+
+All topologies deployed within the Gateway instance will use the configured trust store to verify a 
+service's identity.  
+<!---
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       https://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+--->
+
+### TLS/SSL Certificate Trust ###
+
+When the Gateway dispatches requests to a configured service using TLS/SSL, that service's certificate 
+must be trusted inorder for the connection to succeed.  To do this, the Gateway checks 
+a configured trust store for the service's certificate or the certificate of the CA that issued that 
+certificate. 
+
+If not explicitly set, the Gateway will use its configured identity keystore as the trust store.
+By default, this keystore is located at `{GATEWAY_HOME}/data/security/keystores/gateway.jks`; however, 
+a custom identity keystore may be set in the gateway-site.xml file. See `gateway.tls.keystore.password.alias`, `gateway.tls.keystore.path`, 
+and `gateway.tls.keystore.type`. 
+   
+The trust store is configured at the Gatway-level.  There is no support to set a different trust store
+per service. To use a specific trust store, the following configuration elements may be set in the 
+gateway-site.xml file:
+
+| Configuration Element                          | Description                                               |
+| -----------------------------------------------|-----------------------------------------------------------|
+| gateway.httpclient.truststore.path             | Fully qualified path to the trust store to use. Default is the keystore used to hold the Gateway's identity.  See `gateway.tls.keystore.path`.|
+| gateway.httpclient.truststore.type             | Keystore type of the trust store. Default is JKS.         |
+| gateway.httpclient.truststore.password.alias   | Alias for the password to the trust store.|
+
+
+If `gateway.httpclient.truststore.path` is not set, the keystore used to hold the Gateway's identity 
+will be used as the trust store. 
+
+However, if `gateway.httpclient.truststore.path` is set, it is expected that 
+`gateway.httpclient.truststore.type` and `gateway.httpclient.truststore.password.alias` are set
+appropriately. If `gateway.httpclient.truststore.type` is not set, the Gateway will assume the trust 
+store is a JKS file. If `gateway.httpclient.truststore.password.alias` is not set, the Gateway will
+assume the alias name is "gateway-httpclient-truststore-password".  In any case, if the 
+trust store password is different from the Gateway's master secret then it can be set using
+
+    knoxcli.sh create-alias {password-alias} --value {pwd} 
+  
+If a password is not found using the provided (or default) alias name, then the Gateway's master secret 
+will be used.
+
+All topologies deployed within the Gateway instance will use the configured trust store to verify a 
+service's identity.  

Added: knox/trunk/books/2.0.0/service_storm.md
URL: http://svn.apache.org/viewvc/knox/trunk/books/2.0.0/service_storm.md?rev=1899392&view=auto
==============================================================================
--- knox/trunk/books/2.0.0/service_storm.md (added)
+++ knox/trunk/books/2.0.0/service_storm.md Wed Mar 30 15:22:57 2022
@@ -0,0 +1,112 @@
+<!---
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       https://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+--->
+
+### Storm ###
+
+Storm is a distributed realtime computation system. Storm exposes REST APIs for UI functionality that can be used for
+retrieving metrics data and configuration information as well as management operations such as starting or stopping topologies.
+
+The docs for this can be found here
+
+https://github.com/apache/storm/blob/master/docs/STORM-UI-REST-API.md
+
+To enable this functionality, a topology file needs to have the following configuration:
+
+    <service>
+        <role>STORM</role>
+        <url>http://<hostname>:<port></url>
+    </service>
+
+The default UI daemon port is 8744. If it is configured to some other port, that configuration can be
+found in `storm.yaml` as the value for the property `ui.port`.
+
+In addition to the storm service configuration above, a STORM-LOGVIEWER service must be configured if the
+log files are to be retrieved through Knox. The value of the port for the logviewer can be found by the property
+`logviewer.port` also in the file `storm.yaml`.
+
+    <service>
+        <role>STORM-LOGVIEWER</role>
+        <url>http://<hostname>:<port></url>
+    </service>
+
+
+#### Storm URL Mapping ####
+
+For Storm URLs, the mapping of Knox Gateway accessible URLs to direct Storm URLs is the following.
+
+| ------- | ------------------------------------------------------------------------------------- |
+| Gateway | `https://{gateway-host}:{gateway-port}/{gateway-path}/{cluster-name}/storm` |
+| Cluster | `http://{storm-host}:{storm-port}`                                      |
+
+For the log viewer the mapping is as follows
+
+| ------- | ------------------------------------------------------------------------------------- |
+| Gateway | `https://{gateway-host}:{gateway-port}/{gateway-path}/{cluster-name}/storm/logviewer` |
+| Cluster | `http://{storm-logviewer-host}:{storm-logviewer-port}`                                      |
+
+
+#### Storm Examples
+
+Some of the various calls that can be made and examples using curl are listed below.
+
+    # 0. Getting cluster configuration
+    
+    curl -ikv -u guest:guest-password -X GET 'https://localhost:8443/gateway/sandbox/storm/api/v1/cluster/configuration'
+    
+    # 1. Getting cluster summary information
+    
+    curl -ikv -u guest:guest-password -X GET 'https://localhost:8443/gateway/sandbox/storm/api/v1/cluster/summary'
+
+    # 2. Getting supervisor summary information
+    
+    curl -ikv -u guest:guest-password -X GET 'https://localhost:8443/gateway/sandbox/storm/api/v1/supervisor/summary'
+    
+    # 3. topologies summary information
+    
+    curl -ikv -u guest:guest-password -X GET 'https://localhost:8443/gateway/sandbox/storm/api/v1/topology/summary'
+    
+    # 4. Getting specific topology information. Substitute {id} with the topology id.
+    
+    curl -ikv -u guest:guest-password -X GET 'https://localhost:8443/gateway/sandbox/storm/api/v1/topology/{id}'
+
+    # 5. To get component level information. Substitute {id} with the topology id and {component} with the component id e.g. 'spout'
+    
+    curl -ikv -u guest:guest-password -X GET 'https://localhost:8443/gateway/sandbox/storm/api/v1/topology/{id}/component/{component}'
+
+
+The following POST operations all require a 'x-csrf-token' header along with other information that can be stored in a cookie file.
+In particular the 'ring-session' header and 'JSESSIONID'.
+
+    # 6. To activate a topology. Substitute {id} with the topology id and {token-value} with the x-csrf-token value.
+
+    curl -ik -b ~/cookiejar.txt -c ~/cookiejar.txt -u guest:guest-password -H 'x-csrf-token:{token-value}' -X POST \
+     http://localhost:8744/api/v1/topology/{id}/activate
+
+    # 7. To de-activate a topology. Substitute {id} with the topology id and {token-value} with the x-csrf-token value.
+
+    curl -ik -b ~/cookiejar.txt -c ~/cookiejar.txt -u guest:guest-password -H 'x-csrf-token:{token-value}' -X POST \
+     http://localhost:8744/api/v1/topology/{id}/deactivate
+
+    # 8. To rebalance a topology. Substitute {id} with the topology id and {token-value} with the x-csrf-token value.
+
+    curl -ik -b ~/cookiejar.txt -c ~/cookiejar.txt -u guest:guest-password -H 'x-csrf-token:{token-value}' -X POST \
+     http://localhost:8744/api/v1/topology/{id}/rebalance/0
+
+    # 9. To kill a topology. Substitute {id} with the topology id and {token-value} with the x-csrf-token value.
+
+    curl -ik -b ~/cookiejar.txt -c ~/cookiejar.txt -u guest:guest-password -H 'x-csrf-token:{token-value}' -X POST \
+     http://localhost:8744/api/v1/topology/{id}/kill/0

Added: knox/trunk/books/2.0.0/service_webhcat.md
URL: http://svn.apache.org/viewvc/knox/trunk/books/2.0.0/service_webhcat.md?rev=1899392&view=auto
==============================================================================
--- knox/trunk/books/2.0.0/service_webhcat.md (added)
+++ knox/trunk/books/2.0.0/service_webhcat.md Wed Mar 30 15:22:57 2022
@@ -0,0 +1,181 @@
+<!---
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       https://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+--->
+
+### WebHCat ###
+
+WebHCat (also called _Templeton_) is a related but separate service from HiveServer2.
+As such it is installed and configured independently.
+The [WebHCat wiki pages](https://cwiki.apache.org/confluence/display/Hive/WebHCat) describe this processes.
+In sandbox this configuration file for WebHCat is located at `/etc/hadoop/hcatalog/webhcat-site.xml`.
+Note the properties shown below as they are related to configuration required by the gateway.
+
+    <property>
+        <name>templeton.port</name>
+        <value>50111</value>
+    </property>
+
+Also important is the configuration of the JOBTRACKER RPC endpoint.
+For Hadoop 2 this can be found in the `yarn-site.xml` file.
+In Sandbox this file can be found at `/etc/hadoop/conf/yarn-site.xml`.
+The property `yarn.resourcemanager.address` within that file is relevant for the gateway's configuration.
+
+    <property>
+        <name>yarn.resourcemanager.address</name>
+        <value>sandbox.hortonworks.com:8050</value>
+    </property>
+
+See #[WebHDFS] for details about locating the Hadoop configuration for the NAMENODE endpoint.
+
+The gateway by default includes a sample topology descriptor file `{GATEWAY_HOME}/deployments/sandbox.xml`.
+The values in this sample are configured to work with an installed Sandbox VM.
+
+    <service>
+        <role>NAMENODE</role>
+        <url>hdfs://localhost:8020</url>
+    </service>
+    <service>
+        <role>JOBTRACKER</role>
+        <url>rpc://localhost:8050</url>
+    </service>
+    <service>
+        <role>WEBHCAT</role>
+        <url>http://localhost:50111/templeton</url>
+    </service>
+
+The URLs provided for the role NAMENODE and JOBTRACKER do not result in an endpoint being exposed by the gateway.
+This information is only required so that other URLs can be rewritten that reference the appropriate RPC address for Hadoop services.
+This prevents clients from needing to be aware of the internal cluster details.
+Note that for Hadoop 2 the JOBTRACKER RPC endpoint is provided by the Resource Manager component.
+
+By default the gateway is configured to use the HTTP endpoint for WebHCat in the Sandbox.
+This could alternatively be configured to use the HTTPS endpoint by providing the correct address.
+
+#### WebHCat URL Mapping ####
+
+For WebHCat URLs, the mapping of Knox Gateway accessible URLs to direct WebHCat URLs is simple.
+
+| ------- | ------------------------------------------------------------------------------- |
+| Gateway | `https://{gateway-host}:{gateway-port}/{gateway-path}/{cluster-name}/templeton` |
+| Cluster | `http://{webhcat-host}:{webhcat-port}/templeton}`                               |
+
+
+#### WebHCat via cURL
+
+Users can use cURL to directly invoke the REST APIs via the gateway. For the full list of available REST calls look at the WebHCat documentation. This is a simple curl command to test the connection:
+
+    curl -i -k -u guest:guest-password 'https://localhost:8443/gateway/sandbox/templeton/v1/status'
+
+
+#### WebHCat Example ####
+
+This example will submit the familiar WordCount Java MapReduce job to the Hadoop cluster via the gateway using the KnoxShell DSL.
+There are several ways to do this depending upon your preference.
+
+You can use the "embedded" Groovy interpreter provided with the distribution.
+
+    java -jar bin/shell.jar samples/ExampleWebHCatJob.groovy
+
+You can manually type in the KnoxShell DSL script into the "embedded" Groovy interpreter provided with the distribution.
+
+    java -jar bin/shell.jar
+
+Each line from the file `samples/ExampleWebHCatJob.groovy` would then need to be typed or copied into the interactive shell.
+
+#### WebHCat Client DSL ####
+
+##### submitJava() - Submit a Java MapReduce job.
+
+* Request
+    * jar (String) - The remote file name of the JAR containing the app to execute.
+    * app (String) - The app name to execute. This is _wordcount_ for example not the class name.
+    * input (String) - The remote directory name to use as input for the job.
+    * output (String) - The remote directory name to store output from the job.
+* Response
+    * jobId : String - The job ID of the submitted job.  Consumes body.
+* Example
+
+
+    Job.submitJava(session)
+        .jar(remoteJarName)
+        .app(appName)
+        .input(remoteInputDir)
+        .output(remoteOutputDir)
+        .now()
+        .jobId
+
+##### submitPig() - Submit a Pig job.
+
+* Request
+    * file (String) - The remote file name of the pig script.
+    * arg (String) - An argument to pass to the script.
+    * statusDir (String) - The remote directory to store status output.
+* Response
+    * jobId : String - The job ID of the submitted job.  Consumes body.
+* Example
+    * `Job.submitPig(session).file(remotePigFileName).arg("-v").statusDir(remoteStatusDir).now()`
+
+##### submitHive() - Submit a Hive job.
+
+* Request
+    * file (String) - The remote file name of the hive script.
+    * arg (String) - An argument to pass to the script.
+    * statusDir (String) - The remote directory to store status output.
+* Response
+    * jobId : String - The job ID of the submitted job.  Consumes body.
+* Example
+    * `Job.submitHive(session).file(remoteHiveFileName).arg("-v").statusDir(remoteStatusDir).now()`
+
+#### submitSqoop Job API ####
+Using the Knox DSL, you can now easily submit and monitor [Apache Sqoop](https://sqoop.apache.org) jobs. The WebHCat Job class now supports the `submitSqoop` command.
+
+    Job.submitSqoop(session)
+        .command("import --connect jdbc:mysql://hostname:3306/dbname ... ")
+        .statusDir(remoteStatusDir)
+        .now().jobId
+
+The `submitSqoop` command supports the following arguments:
+
+* command (String) - The sqoop command string to execute.
+* files (String) - Comma separated files to be copied to the templeton controller job.
+* optionsfile (String) - The remote file which contain Sqoop command need to run.
+* libdir (String) - The remote directory containing jdbc jar to include with sqoop lib
+* statusDir (String) - The remote directory to store status output.
+
+A complete example is available here: https://cwiki.apache.org/confluence/display/KNOX/2016/11/08/Running+SQOOP+job+via+KNOX+Shell+DSL
+
+
+##### queryQueue() - Return a list of all job IDs registered to the user.
+
+* Request
+    * No request parameters.
+* Response
+    * BasicResponse
+* Example
+    * `Job.queryQueue(session).now().string`
+
+##### queryStatus() - Check the status of a job and get related job information given its job ID.
+
+* Request
+    * jobId (String) - The job ID to check. This is the ID received when the job was created.
+* Response
+    * BasicResponse
+* Example
+    * `Job.queryStatus(session).jobId(jobId).now().string`
+
+### WebHCat HA ###
+
+Please look at #[Default Service HA support]