You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@oozie.apache.org by an...@apache.org on 2018/09/14 14:38:57 UTC

[14/21] oozie git commit: OOZIE-2734 amend [docs] Switch from TWiki to Markdown (asalamon74 via andras.piros, pbacsko, gezapeti)

http://git-wip-us.apache.org/repos/asf/oozie/blob/6a6f2199/docs/src/site/markdown/DG_SparkActionExtension.md
----------------------------------------------------------------------
diff --git a/docs/src/site/markdown/DG_SparkActionExtension.md b/docs/src/site/markdown/DG_SparkActionExtension.md
new file mode 100644
index 0000000..5a56cca
--- /dev/null
+++ b/docs/src/site/markdown/DG_SparkActionExtension.md
@@ -0,0 +1,436 @@
+
+
+[::Go back to Oozie Documentation Index::](index.html)
+
+-----
+
+# Oozie Spark Action Extension
+
+<!-- MACRO{toc|fromDepth=1|toDepth=4} -->
+
+## Spark Action
+
+The `spark` action runs a Spark job.
+
+The workflow job will wait until the Spark job completes before
+continuing to the next action.
+
+To run the Spark job, you have to configure the `spark` action with
+the `resource-manager`, `name-node`, Spark `master` elements as
+well as the necessary elements, arguments and configuration.
+
+Spark options can be specified in an element called `spark-opts`.
+
+A `spark` action can be configured to create or delete HDFS directories
+before starting the Spark job.
+
+Oozie EL expressions can be used in the inline configuration. Property
+values specified in the `configuration` element override values specified
+in the `job-xml` file.
+
+**Syntax:**
+
+
+```
+<workflow-app name="[WF-DEF-NAME]" xmlns="uri:oozie:workflow:1.0">
+    ...
+    <action name="[NODE-NAME]">
+        <spark xmlns="uri:oozie:spark-action:1.0">
+            <resource-manager>[RESOURCE-MANAGER]</resource-manager>
+            <name-node>[NAME-NODE]</name-node>
+            <prepare>
+               <delete path="[PATH]"/>
+               ...
+               <mkdir path="[PATH]"/>
+               ...
+            </prepare>
+            <job-xml>[SPARK SETTINGS FILE]</job-xml>
+            <configuration>
+                <property>
+                    <name>[PROPERTY-NAME]</name>
+                    <value>[PROPERTY-VALUE]</value>
+                </property>
+                ...
+            </configuration>
+            <master>[SPARK MASTER URL]</master>
+            <mode>[SPARK MODE]</mode>
+            <name>[SPARK JOB NAME]</name>
+            <class>[SPARK MAIN CLASS]</class>
+            <jar>[SPARK DEPENDENCIES JAR / PYTHON FILE]</jar>
+            <spark-opts>[SPARK-OPTIONS]</spark-opts>
+            <arg>[ARG-VALUE]</arg>
+                ...
+            <arg>[ARG-VALUE]</arg>
+            ...
+        </spark>
+        <ok to="[NODE-NAME]"/>
+        <error to="[NODE-NAME]"/>
+    </action>
+    ...
+</workflow-app>
+```
+
+The `prepare` element, if present, indicates a list of paths to delete
+or create before starting the job. Specified paths must start with `hdfs://HOST:PORT`.
+
+The `job-xml` element, if present, specifies a file containing configuration
+for the Spark job. Multiple `job-xml` elements are allowed in order to
+specify multiple `job.xml` files.
+
+The `configuration` element, if present, contains configuration
+properties that are passed to the Spark job.
+
+The `master` element indicates the url of the Spark Master. Ex: `spark://host:port`, `mesos://host:port`, yarn-cluster, yarn-client,
+or local.
+
+The `mode` element if present indicates the mode of spark, where to run spark driver program. Ex: client,cluster.  This is typically
+not required because you can specify it as part of `master` (i.e. master`yarn, mode`client is equivalent to master=yarn-client).
+A local `master` always runs in client mode.
+
+Depending on the `master` (and `mode`) entered, the Spark job will run differently as follows:
+
+   * local mode: everything runs here in the Launcher Job.
+   * yarn-client mode: the driver runs here in the Launcher Job and the executor in Yarn.
+   * yarn-cluster mode: the driver and executor run in Yarn.
+
+The `name` element indicates the name of the spark application.
+
+The `class` element if present, indicates the spark's application main class.
+
+The `jar` element indicates a comma separated list of jars or python files.
+
+The `spark-opts` element, if present, contains a list of Spark options that can be passed to Spark. Spark configuration
+options can be passed by specifying '--conf key=value' or other Spark CLI options.
+Values containing whitespaces can be enclosed by double quotes.
+
+Some examples of the `spark-opts` element:
+
+   * '--conf key=value'
+   * '--conf key1=value1 value2'
+   * '--conf key1="value1 value2"'
+   * '--conf key1=value1 key2="value2 value3"'
+   * '--conf key=value --verbose --properties-file user.properties'
+
+There are several ways to define properties that will be passed to Spark. They are processed in the following order:
+
+   * propagated from `oozie.service.SparkConfigurationService.spark.configurations`
+   * read from a localized `spark-defaults.conf` file
+   * read from a file defined in `spark-opts` via the `--properties-file`
+   * properties defined in `spark-opts` element
+
+(The latter takes precedence over the former.)
+The server propagated properties, the `spark-defaults.conf` and the user-defined properties file are merged together into a
+single properties file as Spark handles only one file in its `--properties-file` option.
+
+The `arg` element if present, contains arguments that can be passed to spark application.
+
+In case some property values are present both in `spark-defaults.conf` and as property key/value pairs generated by Oozie, the user
+configured values from `spark-defaults.conf` are prepended to the ones generated by Oozie, as part of the Spark arguments list.
+
+Following properties to prepend to Spark arguments:
+
+   * `spark.executor.extraClassPath`
+   * `spark.driver.extraClassPath`
+   * `spark.executor.extraJavaOptions`
+   * `spark.driver.extraJavaOptions`
+
+All the above elements can be parameterized (templatized) using EL
+expressions.
+
+**Example:**
+
+
+```
+<workflow-app name="sample-wf" xmlns="uri:oozie:workflow:1.0">
+    ...
+    <action name="myfirstsparkjob">
+        <spark xmlns="uri:oozie:spark-action:1.0">
+            <resource-manager>foo:8032</resource-manager>
+            <name-node>bar:8020</name-node>
+            <prepare>
+                <delete path="${jobOutput}"/>
+            </prepare>
+            <configuration>
+                <property>
+                    <name>mapred.compress.map.output</name>
+                    <value>true</value>
+                </property>
+            </configuration>
+            <master>local[*]</master>
+            <mode>client</mode>
+            <name>Spark Example</name>
+            <class>org.apache.spark.examples.mllib.JavaALS</class>
+            <jar>/lib/spark-examples_2.10-1.1.0.jar</jar>
+            <spark-opts>--executor-memory 20G --num-executors 50
+             --conf spark.executor.extraJavaOptions="-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp"</spark-opts>
+            <arg>inputpath=hdfs://localhost/input/file.txt</arg>
+            <arg>value=2</arg>
+        </spark>
+        <ok to="myotherjob"/>
+        <error to="errorcleanup"/>
+    </action>
+    ...
+</workflow-app>
+```
+
+### Spark Action Logging
+
+Spark action logs are redirected to the Oozie Launcher map-reduce job task STDOUT/STDERR that runs Spark.
+
+From Oozie web-console, from the Spark action pop up using the 'Console URL' link, it is possible
+to navigate to the Oozie Launcher map-reduce job task logs via the Hadoop job-tracker web-console.
+
+### Spark on YARN
+
+To ensure that your Spark job shows up in the Spark History Server, make sure to specify these three Spark configuration properties
+either in `spark-opts` with `--conf` or from `oozie.service.SparkConfigurationService.spark.configurations` in oozie-site.xml.
+
+1. spark.yarn.historyServer.address=SPH-HOST:18088
+
+2. spark.eventLog.dir=`hdfs://NN:8020/user/spark/applicationHistory`
+
+3. spark.eventLog.enabled=true
+
+### PySpark with Spark Action
+
+To submit PySpark scripts with Spark Action, pyspark dependencies must be available in sharelib or in workflow's lib/ directory.
+For more information, please refer to [installation document.](AG_Install.html#Oozie_Share_Lib)
+
+**Example:**
+
+
+```
+<workflow-app name="sample-wf" xmlns="uri:oozie:workflow:1.0">
+    ....
+    <action name="myfirstpysparkjob">
+        <spark xmlns="uri:oozie:spark-action:1.0">
+            <resource-manager>foo:8032</resource-manager>
+            <name-node>bar:8020</name-node>
+            <prepare>
+                <delete path="${jobOutput}"/>
+            </prepare>
+            <configuration>
+                <property>
+                    <name>mapred.compress.map.output</name>
+                    <value>true</value>
+                </property>
+            </configuration>
+            <master>yarn-cluster</master>
+            <name>Spark Example</name>
+            <jar>pi.py</jar>
+            <spark-opts>--executor-memory 20G --num-executors 50
+            --conf spark.executor.extraJavaOptions="-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp"</spark-opts>
+            <arg>100</arg>
+        </spark>
+        <ok to="myotherjob"/>
+        <error to="errorcleanup"/>
+    </action>
+    ...
+</workflow-app>
+```
+
+The `jar` element indicates python file. Refer to the file by it's localized name, because only local files are allowed
+in PySpark. The py file should be in the lib/ folder next to the workflow.xml or added using the `file` element so that
+it's localized to the working directory with just its name.
+
+### Using Symlink in \<jar\>
+
+A symlink must be specified using [file](WorkflowFunctionalSpec.html#a3.2.2.1_Adding_Files_and_Archives_for_the_Job) element. Then, you can use
+the symlink name in `jar` element.
+
+**Example:**
+
+Specifying relative path for symlink:
+
+Make sure that the file is within the application directory i.e. `oozie.wf.application.path` .
+
+```
+        <spark xmlns="uri:oozie:spark-action:1.0">
+        ...
+            <jar>py-spark-example-symlink.py</jar>
+            ...
+            ...
+            <file>py-spark.py#py-spark-example-symlink.py</file>
+        ...
+        </spark>
+```
+
+Specifying full path for symlink:
+
+```
+        <spark xmlns="uri:oozie:spark-action:1.0">
+        ...
+            <jar>spark-example-symlink.jar</jar>
+            ...
+            ...
+            <file>hdfs://localhost:8020/user/testjars/all-oozie-examples.jar#spark-example-symlink.jar</file>
+        ...
+        </spark>
+```
+
+
+
+## Appendix, Spark XML-Schema
+
+### AE.A Appendix A, Spark XML-Schema
+
+#### Spark Action Schema Version 1.0
+
+```
+<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
+           xmlns:spark="uri:oozie:spark-action:1.0" elementFormDefault="qualified"
+           targetNamespace="uri:oozie:spark-action:1.0">
+.
+    <xs:include schemaLocation="oozie-common-1.0.xsd"/>
+.
+    <xs:element name="spark" type="spark:ACTION"/>
+.
+    <xs:complexType name="ACTION">
+        <xs:sequence>
+            <xs:choice>
+                <xs:element name="job-tracker" type="xs:string" minOccurs="0" maxOccurs="1"/>
+                <xs:element name="resource-manager" type="xs:string" minOccurs="0" maxOccurs="1"/>
+            </xs:choice>
+            <xs:element name="name-node" type="xs:string" minOccurs="0" maxOccurs="1"/>
+            <xs:element name="prepare" type="spark:PREPARE" minOccurs="0" maxOccurs="1"/>
+            <xs:element name="launcher" type="spark:LAUNCHER" minOccurs="0" maxOccurs="1"/>
+            <xs:element name="job-xml" type="xs:string" minOccurs="0" maxOccurs="unbounded"/>
+            <xs:element name="configuration" type="spark:CONFIGURATION" minOccurs="0" maxOccurs="1"/>
+            <xs:element name="master" type="xs:string" minOccurs="1" maxOccurs="1"/>
+            <xs:element name="mode" type="xs:string" minOccurs="0" maxOccurs="1"/>
+            <xs:element name="name" type="xs:string" minOccurs="1" maxOccurs="1"/>
+            <xs:element name="class" type="xs:string" minOccurs="0" maxOccurs="1"/>
+            <xs:element name="jar" type="xs:string" minOccurs="1" maxOccurs="1"/>
+            <xs:element name="spark-opts" type="xs:string" minOccurs="0" maxOccurs="1"/>
+            <xs:element name="arg" type="xs:string" minOccurs="0" maxOccurs="unbounded"/>
+            <xs:element name="file" type="xs:string" minOccurs="0" maxOccurs="unbounded"/>
+            <xs:element name="archive" type="xs:string" minOccurs="0" maxOccurs="unbounded"/>
+        </xs:sequence>
+    </xs:complexType>
+.
+</xs:schema>
+```
+
+#### Spark Action Schema Version 0.2
+
+```
+<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
+           xmlns:spark="uri:oozie:spark-action:0.2" elementFormDefault="qualified"
+           targetNamespace="uri:oozie:spark-action:0.2">
+
+    <xs:element name="spark" type="spark:ACTION"/>
+
+    <xs:complexType name="ACTION">
+        <xs:sequence>
+            <xs:element name="job-tracker" type="xs:string" minOccurs="0" maxOccurs="1"/>
+            <xs:element name="name-node" type="xs:string" minOccurs="0" maxOccurs="1"/>
+            <xs:element name="prepare" type="spark:PREPARE" minOccurs="0" maxOccurs="1"/>
+            <xs:element name="job-xml" type="xs:string" minOccurs="0" maxOccurs="unbounded"/>
+            <xs:element name="configuration" type="spark:CONFIGURATION" minOccurs="0" maxOccurs="1"/>
+            <xs:element name="master" type="xs:string" minOccurs="1" maxOccurs="1"/>
+            <xs:element name="mode" type="xs:string" minOccurs="0" maxOccurs="1"/>
+            <xs:element name="name" type="xs:string" minOccurs="1" maxOccurs="1"/>
+            <xs:element name="class" type="xs:string" minOccurs="0" maxOccurs="1"/>
+            <xs:element name="jar" type="xs:string" minOccurs="1" maxOccurs="1"/>
+            <xs:element name="spark-opts" type="xs:string" minOccurs="0" maxOccurs="1"/>
+            <xs:element name="arg" type="xs:string" minOccurs="0" maxOccurs="unbounded"/>
+            <xs:element name="file" type="xs:string" minOccurs="0" maxOccurs="unbounded"/>
+            <xs:element name="archive" type="xs:string" minOccurs="0" maxOccurs="unbounded"/>
+        </xs:sequence>
+    </xs:complexType>
+
+    <xs:complexType name="CONFIGURATION">
+        <xs:sequence>
+            <xs:element name="property" minOccurs="1" maxOccurs="unbounded">
+                <xs:complexType>
+                    <xs:sequence>
+                        <xs:element name="name" minOccurs="1" maxOccurs="1" type="xs:string"/>
+                        <xs:element name="value" minOccurs="1" maxOccurs="1" type="xs:string"/>
+                        <xs:element name="description" minOccurs="0" maxOccurs="1" type="xs:string"/>
+                    </xs:sequence>
+                </xs:complexType>
+            </xs:element>
+        </xs:sequence>
+    </xs:complexType>
+
+    <xs:complexType name="PREPARE">
+        <xs:sequence>
+            <xs:element name="delete" type="spark:DELETE" minOccurs="0" maxOccurs="unbounded"/>
+            <xs:element name="mkdir" type="spark:MKDIR" minOccurs="0" maxOccurs="unbounded"/>
+        </xs:sequence>
+    </xs:complexType>
+
+    <xs:complexType name="DELETE">
+        <xs:attribute name="path" type="xs:string" use="required"/>
+    </xs:complexType>
+
+    <xs:complexType name="MKDIR">
+        <xs:attribute name="path" type="xs:string" use="required"/>
+    </xs:complexType>
+
+</xs:schema>
+```
+
+#### Spark Action Schema Version 0.1
+
+```
+<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
+           xmlns:spark="uri:oozie:spark-action:0.1" elementFormDefault="qualified"
+           targetNamespace="uri:oozie:spark-action:0.1">
+
+    <xs:element name="spark" type="spark:ACTION"/>
+
+    <xs:complexType name="ACTION">
+        <xs:sequence>
+            <xs:element name="job-tracker" type="xs:string" minOccurs="1" maxOccurs="1"/>
+            <xs:element name="name-node" type="xs:string" minOccurs="1" maxOccurs="1"/>
+            <xs:element name="prepare" type="spark:PREPARE" minOccurs="0" maxOccurs="1"/>
+            <xs:element name="job-xml" type="xs:string" minOccurs="0" maxOccurs="unbounded"/>
+            <xs:element name="configuration" type="spark:CONFIGURATION" minOccurs="0" maxOccurs="1"/>
+            <xs:element name="master" type="xs:string" minOccurs="1" maxOccurs="1"/>
+            <xs:element name="mode" type="xs:string" minOccurs="0" maxOccurs="1"/>
+            <xs:element name="name" type="xs:string" minOccurs="1" maxOccurs="1"/>
+            <xs:element name="class" type="xs:string" minOccurs="0" maxOccurs="1"/>
+            <xs:element name="jar" type="xs:string" minOccurs="1" maxOccurs="1"/>
+            <xs:element name="spark-opts" type="xs:string" minOccurs="0" maxOccurs="1"/>
+            <xs:element name="arg" type="xs:string" minOccurs="0" maxOccurs="unbounded"/>
+        </xs:sequence>
+    </xs:complexType>
+
+    <xs:complexType name="CONFIGURATION">
+        <xs:sequence>
+            <xs:element name="property" minOccurs="1" maxOccurs="unbounded">
+                <xs:complexType>
+                    <xs:sequence>
+                        <xs:element name="name" minOccurs="1" maxOccurs="1" type="xs:string"/>
+                        <xs:element name="value" minOccurs="1" maxOccurs="1" type="xs:string"/>
+                        <xs:element name="description" minOccurs="0" maxOccurs="1" type="xs:string"/>
+                    </xs:sequence>
+                </xs:complexType>
+            </xs:element>
+        </xs:sequence>
+    </xs:complexType>
+
+    <xs:complexType name="PREPARE">
+        <xs:sequence>
+            <xs:element name="delete" type="spark:DELETE" minOccurs="0" maxOccurs="unbounded"/>
+            <xs:element name="mkdir" type="spark:MKDIR" minOccurs="0" maxOccurs="unbounded"/>
+        </xs:sequence>
+    </xs:complexType>
+
+    <xs:complexType name="DELETE">
+        <xs:attribute name="path" type="xs:string" use="required"/>
+    </xs:complexType>
+
+    <xs:complexType name="MKDIR">
+        <xs:attribute name="path" type="xs:string" use="required"/>
+    </xs:complexType>
+
+</xs:schema>
+```
+[::Go back to Oozie Documentation Index::](index.html)
+
+
+
+
+

http://git-wip-us.apache.org/repos/asf/oozie/blob/6a6f2199/docs/src/site/markdown/DG_SqoopActionExtension.md
----------------------------------------------------------------------
diff --git a/docs/src/site/markdown/DG_SqoopActionExtension.md b/docs/src/site/markdown/DG_SqoopActionExtension.md
new file mode 100644
index 0000000..b186c5a
--- /dev/null
+++ b/docs/src/site/markdown/DG_SqoopActionExtension.md
@@ -0,0 +1,348 @@
+
+
+[::Go back to Oozie Documentation Index::](index.html)
+
+-----
+
+# Oozie Sqoop Action Extension
+
+<!-- MACRO{toc|fromDepth=1|toDepth=4} -->
+
+## Sqoop Action
+
+**IMPORTANT:** The Sqoop action requires Apache Hadoop 1.x or 2.x.
+
+The `sqoop` action runs a Sqoop job.
+
+The workflow job will wait until the Sqoop job completes before
+continuing to the next action.
+
+To run the Sqoop job, you have to configure the `sqoop` action with the `resource-manager`, `name-node` and Sqoop `command`
+or `arg` elements as well as configuration.
+
+A `sqoop` action can be configured to create or delete HDFS directories
+before starting the Sqoop job.
+
+Sqoop configuration can be specified with a file, using the `job-xml`
+element, and inline, using the `configuration` elements.
+
+Oozie EL expressions can be used in the inline configuration. Property
+values specified in the `configuration` element override values specified
+in the `job-xml` file.
+
+Note that YARN `yarn.resourcemanager.address` / `resource-manager` and HDFS `fs.default.name` / `name-node` properties must not
+be present in the inline configuration.
+
+As with Hadoop `map-reduce` jobs, it is possible to add files and
+archives in order to make them available to the Sqoop job. Refer to the
+[WorkflowFunctionalSpec#FilesArchives][Adding Files and Archives for the Job]
+section for more information about this feature.
+
+**Syntax:**
+
+
+```
+<workflow-app name="[WF-DEF-NAME]" xmlns="uri:oozie:workflow:1.0">
+    ...
+    <action name="[NODE-NAME]">
+        <sqoop xmlns="uri:oozie:sqoop-action:1.0">
+            <resource-manager>[RESOURCE-MANAGER]</resource-manager>
+            <name-node>[NAME-NODE]</name-node>
+            <prepare>
+               <delete path="[PATH]"/>
+               ...
+               <mkdir path="[PATH]"/>
+               ...
+            </prepare>
+            <configuration>
+                <property>
+                    <name>[PROPERTY-NAME]</name>
+                    <value>[PROPERTY-VALUE]</value>
+                </property>
+                ...
+            </configuration>
+            <command>[SQOOP-COMMAND]</command>
+            <arg>[SQOOP-ARGUMENT]</arg>
+            ...
+            <file>[FILE-PATH]</file>
+            ...
+            <archive>[FILE-PATH]</archive>
+            ...
+        </sqoop>
+        <ok to="[NODE-NAME]"/>
+        <error to="[NODE-NAME]"/>
+    </action>
+    ...
+</workflow-app>
+```
+
+The `prepare` element, if present, indicates a list of paths to delete
+or create before starting the job. Specified paths must start with `hdfs://HOST:PORT`.
+
+The `job-xml` element, if present, specifies a file containing configuration
+for the Sqoop job. As of schema 0.3, multiple `job-xml` elements are allowed in order to
+specify multiple `job.xml` files.
+
+The `configuration` element, if present, contains configuration
+properties that are passed to the Sqoop job.
+
+**Sqoop command**
+
+The Sqoop command can be specified either using the `command` element or multiple `arg`
+elements.
+
+When using the `command` element, Oozie will split the command on every space
+into multiple arguments.
+
+When using the `arg` elements, Oozie will pass each argument value as an argument to Sqoop.
+
+The `arg` variant should be used when there are spaces within a single argument.
+
+Consult the Sqoop documentation for a complete list of valid Sqoop commands.
+
+All the above elements can be parameterized (templatized) using EL
+expressions.
+
+**Examples:**
+
+Using the `command` element:
+
+
+```
+<workflow-app name="sample-wf" xmlns="uri:oozie:workflow:1.0">
+    ...
+    <action name="myfirsthivejob">
+        <sqoop xmlns="uri:oozie:sqoop-action:1.0">
+            <resource-manager>foo:8032</resource-manager>
+            <name-node>bar:8020</name-node>
+            <prepare>
+                <delete path="${jobOutput}"/>
+            </prepare>
+            <configuration>
+                <property>
+                    <name>mapred.compress.map.output</name>
+                    <value>true</value>
+                </property>
+            </configuration>
+            <command>import  --connect jdbc:hsqldb:file:db.hsqldb --table TT --target-dir hdfs://localhost:8020/user/tucu/foo -m 1</command>
+        </sqoop>
+        <ok to="myotherjob"/>
+        <error to="errorcleanup"/>
+    </action>
+    ...
+</workflow-app>
+```
+
+The same Sqoop action using `arg` elements:
+
+
+```
+<workflow-app name="sample-wf" xmlns="uri:oozie:workflow:1.0">
+    ...
+    <action name="myfirstsqoopjob">
+        <sqoop xmlns="uri:oozie:sqoop-action:1.0">
+            <resource-manager>foo:8032</resource-manager>
+            <name-node>bar:8020</name-node>
+            <prepare>
+                <delete path="${jobOutput}"/>
+            </prepare>
+            <configuration>
+                <property>
+                    <name>mapred.compress.map.output</name>
+                    <value>true</value>
+                </property>
+            </configuration>
+            <arg>import</arg>
+            <arg>--connect</arg>
+            <arg>jdbc:hsqldb:file:db.hsqldb</arg>
+            <arg>--table</arg>
+            <arg>TT</arg>
+            <arg>--target-dir</arg>
+            <arg>hdfs://localhost:8020/user/tucu/foo</arg>
+            <arg>-m</arg>
+            <arg>1</arg>
+        </sqoop>
+        <ok to="myotherjob"/>
+        <error to="errorcleanup"/>
+    </action>
+    ...
+</workflow-app>
+```
+
+NOTE: The `arg` elements syntax, while more verbose, allows to have spaces in a single argument, something useful when
+using free from queries.
+
+### Sqoop Action Counters
+
+The counters of the map-reduce job run by the Sqoop action are available to be used in the workflow via the
+[hadoop:counters() EL function](WorkflowFunctionalSpec.html#HadoopCountersEL).
+
+If the Sqoop action run an import all command, the `hadoop:counters()` EL will return the aggregated counters
+of all map-reduce jobs run by the Sqoop import all command.
+
+### Sqoop Action Logging
+
+Sqoop action logs are redirected to the Oozie Launcher map-reduce job task STDOUT/STDERR that runs Sqoop.
+
+From Oozie web-console, from the Sqoop action pop up using the 'Console URL' link, it is possible
+to navigate to the Oozie Launcher map-reduce job task logs via the Hadoop job-tracker web-console.
+
+The logging level of the Sqoop action can set in the Sqoop action configuration using the
+property `oozie.sqoop.log.level`. The default value is `INFO`.
+
+## Appendix, Sqoop XML-Schema
+
+### AE.A Appendix A, Sqoop XML-Schema
+
+#### Sqoop Action Schema Version 1.0
+
+```
+<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
+           xmlns:sqoop="uri:oozie:sqoop-action:1.0"
+           elementFormDefault="qualified"
+           targetNamespace="uri:oozie:sqoop-action:1.0">
+.
+    <xs:include schemaLocation="oozie-common-1.0.xsd"/>
+.
+    <xs:element name="sqoop" type="sqoop:ACTION"/>
+.
+    <xs:complexType name="ACTION">
+        <xs:sequence>
+            <xs:choice>
+                <xs:element name="job-tracker" type="xs:string" minOccurs="0" maxOccurs="1"/>
+                <xs:element name="resource-manager" type="xs:string" minOccurs="0" maxOccurs="1"/>
+            </xs:choice>
+            <xs:element name="name-node" type="xs:string" minOccurs="0" maxOccurs="1"/>
+            <xs:element name="prepare" type="sqoop:PREPARE" minOccurs="0" maxOccurs="1"/>
+            <xs:element name="launcher" type="sqoop:LAUNCHER" minOccurs="0" maxOccurs="1"/>
+            <xs:element name="job-xml" type="xs:string" minOccurs="0" maxOccurs="unbounded"/>
+            <xs:element name="configuration" type="sqoop:CONFIGURATION" minOccurs="0" maxOccurs="1"/>
+            <xs:choice>
+                <xs:element name="command" type="xs:string" minOccurs="1" maxOccurs="1"/>
+                <xs:element name="arg" type="xs:string" minOccurs="1" maxOccurs="unbounded"/>
+            </xs:choice>
+            <xs:element name="file" type="xs:string" minOccurs="0" maxOccurs="unbounded"/>
+            <xs:element name="archive" type="xs:string" minOccurs="0" maxOccurs="unbounded"/>
+        </xs:sequence>
+    </xs:complexType>
+.
+</xs:schema>
+```
+
+#### Sqoop Action Schema Version 0.3
+
+```
+<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
+           xmlns:sqoop="uri:oozie:sqoop-action:0.3" elementFormDefault="qualified"
+           targetNamespace="uri:oozie:sqoop-action:0.3">
+
+    <xs:element name="sqoop" type="sqoop:ACTION"/>
+
+    <xs:complexType name="ACTION">
+        <xs:sequence>
+            <xs:element name="job-tracker" type="xs:string" minOccurs="1" maxOccurs="1"/>
+            <xs:element name="name-node" type="xs:string" minOccurs="1" maxOccurs="1"/>
+            <xs:element name="prepare" type="sqoop:PREPARE" minOccurs="0" maxOccurs="1"/>
+            <xs:element name="job-xml" type="xs:string" minOccurs="0" maxOccurs="unbounded"/>
+            <xs:element name="configuration" type="sqoop:CONFIGURATION" minOccurs="0" maxOccurs="1"/>
+            <xs:choice>
+                <xs:element name="command" type="xs:string" minOccurs="1" maxOccurs="1"/>
+                <xs:element name="arg" type="xs:string" minOccurs="1" maxOccurs="unbounded"/>
+            </xs:choice>
+            <xs:element name="file" type="xs:string" minOccurs="0" maxOccurs="unbounded"/>
+            <xs:element name="archive" type="xs:string" minOccurs="0" maxOccurs="unbounded"/>
+        </xs:sequence>
+    </xs:complexType>
+
+    <xs:complexType name="CONFIGURATION">
+        <xs:sequence>
+            <xs:element name="property" minOccurs="1" maxOccurs="unbounded">
+                <xs:complexType>
+                    <xs:sequence>
+                        <xs:element name="name" minOccurs="1" maxOccurs="1" type="xs:string"/>
+                        <xs:element name="value" minOccurs="1" maxOccurs="1" type="xs:string"/>
+                        <xs:element name="description" minOccurs="0" maxOccurs="1" type="xs:string"/>
+                    </xs:sequence>
+                </xs:complexType>
+            </xs:element>
+        </xs:sequence>
+    </xs:complexType>
+
+    <xs:complexType name="PREPARE">
+        <xs:sequence>
+            <xs:element name="delete" type="sqoop:DELETE" minOccurs="0" maxOccurs="unbounded"/>
+            <xs:element name="mkdir" type="sqoop:MKDIR" minOccurs="0" maxOccurs="unbounded"/>
+        </xs:sequence>
+    </xs:complexType>
+
+    <xs:complexType name="DELETE">
+        <xs:attribute name="path" type="xs:string" use="required"/>
+    </xs:complexType>
+
+    <xs:complexType name="MKDIR">
+        <xs:attribute name="path" type="xs:string" use="required"/>
+    </xs:complexType>
+
+</xs:schema>
+```
+
+#### Sqoop Action Schema Version 0.2
+
+```
+<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
+           xmlns:sqoop="uri:oozie:sqoop-action:0.2" elementFormDefault="qualified"
+           targetNamespace="uri:oozie:sqoop-action:0.2">
+
+    <xs:element name="sqoop" type="sqoop:ACTION"/>
+.
+    <xs:complexType name="ACTION">
+        <xs:sequence>
+            <xs:element name="job-tracker" type="xs:string" minOccurs="1" maxOccurs="1"/>
+            <xs:element name="name-node" type="xs:string" minOccurs="1" maxOccurs="1"/>
+            <xs:element name="prepare" type="sqoop:PREPARE" minOccurs="0" maxOccurs="1"/>
+            <xs:element name="job-xml" type="xs:string" minOccurs="0" maxOccurs="1"/>
+            <xs:element name="configuration" type="sqoop:CONFIGURATION" minOccurs="0" maxOccurs="1"/>
+            <xs:choice>
+                <xs:element name="command" type="xs:string" minOccurs="1" maxOccurs="1"/>
+                <xs:element name="arg" type="xs:string" minOccurs="1" maxOccurs="unbounded"/>
+            </xs:choice>
+            <xs:element name="file" type="xs:string" minOccurs="0" maxOccurs="unbounded"/>
+            <xs:element name="archive" type="xs:string" minOccurs="0" maxOccurs="unbounded"/>
+        </xs:sequence>
+    </xs:complexType>
+.
+    <xs:complexType name="CONFIGURATION">
+        <xs:sequence>
+            <xs:element name="property" minOccurs="1" maxOccurs="unbounded">
+                <xs:complexType>
+                    <xs:sequence>
+                        <xs:element name="name" minOccurs="1" maxOccurs="1" type="xs:string"/>
+                        <xs:element name="value" minOccurs="1" maxOccurs="1" type="xs:string"/>
+                        <xs:element name="description" minOccurs="0" maxOccurs="1" type="xs:string"/>
+                    </xs:sequence>
+                </xs:complexType>
+            </xs:element>
+        </xs:sequence>
+    </xs:complexType>
+.
+    <xs:complexType name="PREPARE">
+        <xs:sequence>
+            <xs:element name="delete" type="sqoop:DELETE" minOccurs="0" maxOccurs="unbounded"/>
+            <xs:element name="mkdir" type="sqoop:MKDIR" minOccurs="0" maxOccurs="unbounded"/>
+        </xs:sequence>
+    </xs:complexType>
+.
+    <xs:complexType name="DELETE">
+        <xs:attribute name="path" type="xs:string" use="required"/>
+    </xs:complexType>
+.
+    <xs:complexType name="MKDIR">
+        <xs:attribute name="path" type="xs:string" use="required"/>
+    </xs:complexType>
+.
+</xs:schema>
+```
+
+[::Go back to Oozie Documentation Index::](index.html)
+
+

http://git-wip-us.apache.org/repos/asf/oozie/blob/6a6f2199/docs/src/site/markdown/DG_SshActionExtension.md
----------------------------------------------------------------------
diff --git a/docs/src/site/markdown/DG_SshActionExtension.md b/docs/src/site/markdown/DG_SshActionExtension.md
new file mode 100644
index 0000000..e53e1c3
--- /dev/null
+++ b/docs/src/site/markdown/DG_SshActionExtension.md
@@ -0,0 +1,161 @@
+
+
+[::Go back to Oozie Documentation Index::](index.html)
+
+-----
+
+# Oozie Ssh Action Extension
+
+<!-- MACRO{toc|fromDepth=1|toDepth=4} -->
+
+## Ssh Action
+
+The `ssh` action starts a shell command on a remote machine as a remote secure shell in background. The workflow job
+will wait until the remote shell command completes before continuing to the next action.
+
+The shell command must be present in the remote machine and it must be available for execution via the command path.
+
+The shell command is executed in the home directory of the specified user in the remote host.
+
+The output (STDOUT) of the ssh job can be made available to the workflow job after the ssh job ends. This information
+could be used from within decision nodes. If the output of the ssh job is made available to the workflow job the shell
+command must follow the following requirements:
+
+   * The format of the output must be a valid Java Properties file.
+   * The size of the output must not exceed 2KB.
+
+Note: Ssh Action will fail if any output is written to standard error / output upon login (e.g. .bashrc of the remote
+user contains ls -a).
+
+Note: Ssh Action will fail if oozie fails to ssh connect to host for action status check
+(e.g., the host is under heavy load, or network is bad) after a configurable number (3 by default) of retries.
+The first retry will wait a configurable period of time ( 3 seconds by default) before check.
+The following retries will wait 2 times of previous wait time.
+
+**Syntax:**
+
+
+```
+<workflow-app name="[WF-DEF-NAME]" xmlns="uri:oozie:workflow:1.0">
+    ...
+    <action name="[NODE-NAME]">
+        <ssh xmlns="uri:oozie:ssh-action:0.1">
+            <host>[USER]@[HOST]</host>
+            <command>[SHELL]</command>
+            <args>[ARGUMENTS]</args>
+            ...
+            <capture-output/>
+        </ssh>
+        <ok to="[NODE-NAME]"/>
+        <error to="[NODE-NAME]"/>
+    </action>
+    ...
+</workflow-app>
+```
+
+The `host` indicates the user and host where the shell will be executed.
+
+**IMPORTANT:** The `oozie.action.ssh.allow.user.at.host` property, in the `oozie-site.xml` configuration, indicates if
+an alternate user than the one submitting the job can be used for the ssh invocation. By default this property is set
+to `true`.
+
+The `command` element indicates the shell command to execute.
+
+The `args` element, if present, contains parameters to be passed to the shell command. If more than one `args` element
+is present they are concatenated in order. When an `args` element contains a space, even when quoted, it will be considered as
+separate arguments (i.e. "Hello World" becomes "Hello" and "World").  Starting with ssh schema 0.2, you can use the `arg` element
+(note that this is different than the `args` element) to specify arguments that have a space in them (i.e. "Hello World" is
+preserved as "Hello World").  You can use either `args` elements, `arg` elements, or neither; but not both in the same action.
+
+If the `capture-output` element is present, it indicates Oozie to capture output of the STDOUT of the ssh command
+execution. The ssh command output must be in Java Properties file format and it must not exceed 2KB. From within the
+workflow definition, the output of an ssh action node is accessible via the =String action:output(String node,
+String key)= function (Refer to section '4.2.6 Action EL Functions').
+
+The configuration of the `ssh` action can be parameterized (templatized) using EL expressions.
+
+**Example:**
+
+
+```
+<workflow-app name="sample-wf" xmlns="uri:oozie:workflow:1.0">
+    ...
+    <action name="myssjob">
+        <ssh xmlns="uri:oozie:ssh-action:0.1">
+            <host>foo@bar.com<host>
+            <command>uploaddata</command>
+            <args>jdbc:derby://bar.com:1527/myDB</args>
+            <args>hdfs://foobar.com:8020/usr/tucu/myData</args>
+        </ssh>
+        <ok to="myotherjob"/>
+        <error to="errorcleanup"/>
+    </action>
+    ...
+</workflow-app>
+```
+
+In the above example, the `uploaddata` shell command is executed with two arguments, `jdbc:derby://foo.com:1527/myDB`
+and `hdfs://foobar.com:8020/usr/tucu/myData`.
+
+The `uploaddata` shell must be available in the remote host and available in the command path.
+
+The output of the command will be ignored because the `capture-output` element is not present.
+
+## Appendix, Ssh XML-Schema
+
+### AE.A Appendix A, Ssh XML-Schema
+
+#### Ssh Action Schema Version 0.2
+
+
+```
+<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
+           xmlns:ssh="uri:oozie:ssh-action:0.2" elementFormDefault="qualified"
+           targetNamespace="uri:oozie:ssh-action:0.2">
+.
+    <xs:element name="ssh" type="ssh:ACTION"/>
+.
+    <xs:complexType name="ACTION">
+        <xs:sequence>
+            <xs:element name="host" type="xs:string" minOccurs="1" maxOccurs="1"/>
+            <xs:element name="command" type="xs:string" minOccurs="1" maxOccurs="1"/>
+            <xs:choice>
+              <xs:element name="args" type="xs:string" minOccurs="0" maxOccurs="unbounded"/>
+              <xs:element name="arg" type="xs:string" minOccurs="0" maxOccurs="unbounded"/>
+            </xs:choice>
+            <xs:element name="capture-output" type="ssh:FLAG" minOccurs="0" maxOccurs="1"/>
+        </xs:sequence>
+    </xs:complexType>
+.
+    <xs:complexType name="FLAG"/>
+.
+</xs:schema>
+```
+
+#### Ssh Action Schema Version 0.1
+
+
+```
+<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
+           xmlns:ssh="uri:oozie:ssh-action:0.1" elementFormDefault="qualified"
+           targetNamespace="uri:oozie:ssh-action:0.1">
+.
+    <xs:element name="ssh" type="ssh:ACTION"/>
+.
+    <xs:complexType name="ACTION">
+        <xs:sequence>
+            <xs:element name="host" type="xs:string" minOccurs="1" maxOccurs="1"/>
+            <xs:element name="command" type="xs:string" minOccurs="1" maxOccurs="1"/>
+            <xs:element name="args" type="xs:string" minOccurs="0" maxOccurs="unbounded"/>
+            <xs:element name="capture-output" type="ssh:FLAG" minOccurs="0" maxOccurs="1"/>
+        </xs:sequence>
+    </xs:complexType>
+.
+    <xs:complexType name="FLAG"/>
+.
+</xs:schema>
+```
+
+[::Go back to Oozie Documentation Index::](index.html)
+
+

http://git-wip-us.apache.org/repos/asf/oozie/blob/6a6f2199/docs/src/site/markdown/DG_WorkflowReRun.md
----------------------------------------------------------------------
diff --git a/docs/src/site/markdown/DG_WorkflowReRun.md b/docs/src/site/markdown/DG_WorkflowReRun.md
new file mode 100644
index 0000000..c128681
--- /dev/null
+++ b/docs/src/site/markdown/DG_WorkflowReRun.md
@@ -0,0 +1,42 @@
+
+
+[::Go back to Oozie Documentation Index::](index.html)
+
+# Workflow ReRrun
+
+<!-- MACRO{toc|fromDepth=1|toDepth=4} -->
+## Configs
+
+   * oozie.wf.application.path
+   * Only one of following two configurations is mandatory. Both should not be defined at the same time
+      * oozie.wf.rerun.skip.nodes
+      * oozie.wf.rerun.failnodes
+   * Skip nodes are comma separated list of action names. They can be any action nodes including decision node.
+   * The valid value of  `oozie.wf.rerun.failnodes` is true or false.
+   * If secured hadoop version is used, the following two properties needs to be specified as well
+      * mapreduce.jobtracker.kerberos.principal
+      * dfs.namenode.kerberos.principal.
+   * Configurations can be passed as -D param.
+
+```
+$ oozie job -oozie http://localhost:11000/oozie -rerun 14-20090525161321-oozie-joe -Doozie.wf.rerun.skip.nodes=<>
+```
+
+## Pre-Conditions
+
+   * Workflow with id wfId should exist.
+   * Workflow with id wfId should be in SUCCEEDED/KILLED/FAILED.
+   * If specified , nodes in the config oozie.wf.rerun.skip.nodes must be completed successfully.
+
+## ReRun
+
+   * Reloads the configs.
+   * If no configuration is passed, existing coordinator/workflow configuration will be used. If configuration is passed then, it will be merged with existing workflow configuration. Input configuration will take the precedence.
+   * Currently there is no way to remove an existing configuration but only override by passing a different value in the input configuration.
+   * Creates a new Workflow Instance with the same wfId.
+   * Deletes the actions that are not skipped from the DB and copies data from old Workflow Instance to new one for skipped actions.
+   * Action handler will skip the nodes given in the config with the same exit transition as before.
+
+[::Go back to Oozie Documentation Index::](index.html)
+
+

http://git-wip-us.apache.org/repos/asf/oozie/blob/6a6f2199/docs/src/site/markdown/ENG_MiniOozie.md
----------------------------------------------------------------------
diff --git a/docs/src/site/markdown/ENG_MiniOozie.md b/docs/src/site/markdown/ENG_MiniOozie.md
new file mode 100644
index 0000000..e793676
--- /dev/null
+++ b/docs/src/site/markdown/ENG_MiniOozie.md
@@ -0,0 +1,83 @@
+
+
+[::Go back to Oozie Documentation Index::](index.html)
+
+# Running MiniOozie Tests
+
+<!-- MACRO{toc|fromDepth=1|toDepth=4} -->
+
+## System Requirements
+
+   * Unix box (tested on Mac OS X and Linux)
+   * Java JDK 1.8+
+   * Eclipse (tested on 3.5 and 3.6)
+   * [Maven 3.0.1+](http://maven.apache.org/)
+
+The Maven command (mvn) must be in the command path.
+
+## Installing Oozie Jars To Maven Cache
+
+Oozie source tree is at Apache SVN or Apache GIT. MiniOozie sample project is under Oozie source tree.
+
+The following command downloads Oozie trunk to local:
+
+
+```
+$ svn co https://svn.apache.org/repos/asf/incubator/oozie/trunk
+```
+
+OR
+
+
+```
+$ git clone git://github.com/apache/oozie.git
+```
+
+To run MiniOozie tests, the required jars like oozie-core, oozie-client, oozie-core-tests need to be
+available in remote maven repositories or local maven repository. The local maven cache for the above
+jars can be created and installed using the command:
+
+
+```
+$ mvn clean install -DskipTests -DtestJarSimple
+```
+
+The following properties should be specified to install correct jars for MiniOozie:
+
+   * -DskipTests       : ignore executing Oozie unittests
+   * -DtestJarSimple=  : build only required test classes to oozie-core-tests
+
+MiniOozie is a folder named 'minitest' under Oozie source tree. Two sample tests are included in the project.
+The following command to execute tests under MiniOozie:
+
+
+```
+$ cd minitest
+$ mvn clean test
+```
+
+## Create Tests Using MiniOozie
+
+MiniOozie is a JUnit test class to test Oozie applications such as workflow and coordinator. The test case
+needs to extend from MiniOozieTestCase and does the same as the example class 'WorkflowTest.java' to create Oozie
+workflow application properties and workflow XML. The example file is under Oozie source tree:
+
+   * `minitest/src/test/java/org/apache/oozie/test/WorkflowTest.java`
+
+## IDE Setup
+
+Eclipse and IntelliJ can use directly MiniOozie Maven project files. MiniOozie project can be imported to
+Eclipse and IntelliJ as independent project.
+
+The test directories under MiniOozie are:
+
+   * `minitest/src/test/java` : as test-source directory
+   * `minitest/src/test/resources` : as test-resource directory
+
+
+Also asynchronous actions like FS action can be used / tested using `LocalOozie` / `OozieClient` API.
+Please see `fs-decision.xml` workflow example.
+
+[::Go back to Oozie Documentation Index::](index.html)
+
+