You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@oozie.apache.org by as...@apache.org on 2019/12/06 08:59:58 UTC
svn commit: r1870914 [14/19] - in /oozie/site/trunk: ./ content/ content/resources/docs/5.2.0/ content/resources/docs/5.2.0/css/ content/resources/docs/5.2.0/fonts/ content/resources/docs/5.2.0/images/ content/resources/docs/5.2.0/images/logos/ content...

Added: oozie/site/trunk/content/resources/docs/5.2.0/WorkflowFunctionalSpec.html
URL: http://svn.apache.org/viewvc/oozie/site/trunk/content/resources/docs/5.2.0/WorkflowFunctionalSpec.html?rev=1870914&view=auto
==============================================================================
--- oozie/site/trunk/content/resources/docs/5.2.0/WorkflowFunctionalSpec.html (added)
+++ oozie/site/trunk/content/resources/docs/5.2.0/WorkflowFunctionalSpec.html Fri Dec  6 08:59:57 2019
@@ -0,0 +1,4960 @@
+<!DOCTYPE html>
+<!--
+ | Generated by Apache Maven Doxia at 2019-12-05 
+ | Rendered using Apache Maven Fluido Skin 1.4
+-->
+<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
+  <head>
+    <meta charset="UTF-8" />
+    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
+    <meta name="Date-Revision-yyyymmdd" content="20191205" />
+    <meta http-equiv="Content-Language" content="en" />
+    <title>Oozie &#x2013; </title>
+    <link rel="stylesheet" href="./css/apache-maven-fluido-1.4.min.css" />
+    <link rel="stylesheet" href="./css/site.css" />
+    <link rel="stylesheet" href="./css/print.css" media="print" />
+
+      
+    <script type="text/javascript" src="./js/apache-maven-fluido-1.4.min.js"></script>
+
+    
+                  </head>
+        <body class="topBarDisabled">
+          
+        
+    
+        <div class="container-fluid">
+          <div id="banner">
+        <div class="pull-left">
+                                    <a href="https://oozie.apache.org/" id="bannerLeft">
+                                                                                        <img src="https://oozie.apache.org/images/oozie_200x.png"  alt="Oozie"/>
+                </a>
+                      </div>
+        <div class="pull-right">  </div>
+        <div class="clear"><hr/></div>
+      </div>
+
+      <div id="breadcrumbs">
+        <ul class="breadcrumb">
+                
+                    
+                              <li class="">
+                    <a href="http://www.apache.org/" class="externalLink" title="Apache">
+        Apache</a>
+                    <span class="divider">/</span>
+      </li>
+            <li class="">
+                    <a href="../../" title="Oozie">
+        Oozie</a>
+                    <span class="divider">/</span>
+      </li>
+            <li class="">
+                    <a href="../" title="docs">
+        docs</a>
+                    <span class="divider">/</span>
+      </li>
+                <li class="">
+                    <a href="./" title="5.2.0">
+        5.2.0</a>
+                    <span class="divider">/</span>
+      </li>
+        <li class="active "></li>
+        
+                
+                    
+                  <li id="publishDate" class="pull-right"><span class="divider">|</span> Last Published: 2019-12-05</li>
+              <li id="projectVersion" class="pull-right">
+                    Version: 5.2.0
+        </li>
+            
+                            </ul>
+      </div>
+
+            
+      <div class="row-fluid">
+        <div id="leftColumn" class="span2">
+          <div class="well sidebar-nav">
+                
+                    
+                <ul class="nav nav-list">
+  </ul>
+                
+                    
+                
+          <hr />
+
+           <div id="poweredBy">
+                            <div class="clear"></div>
+                            <div class="clear"></div>
+                            <div class="clear"></div>
+                            <div class="clear"></div>
+                             <a href="http://maven.apache.org/" title="Built by Maven" class="poweredBy">
+        <img class="builtBy" alt="Built by Maven" src="./images/logos/maven-feather.png" />
+      </a>
+                  </div>
+          </div>
+        </div>
+        
+                
+        <div id="bodyColumn"  class="span10" >
+                                  
+            <p><a href="index.html">::Go back to Oozie Documentation Index::</a></p><hr />
+<h1>Oozie Specification, a Hadoop Workflow System</h1>
+<p>The goal of this document is to define a workflow engine system specialized in coordinating the execution of Hadoop Map/Reduce and Pig jobs.</p>
+<ul>
+<li><a href="#Changelog">Changelog</a></li>
+<li><a href="#a0_Definitions">0 Definitions</a></li>
+<li><a href="#a1_Specification_Highlights">1 Specification Highlights</a></li>
+<li><a href="#a2_Workflow_Definition">2 Workflow Definition</a>
+<ul>
+<li><a href="#a2.1_Cycles_in_Workflow_Definitions">2.1 Cycles in Workflow Definitions</a></li></ul></li>
+<li><a href="#a3_Workflow_Nodes">3 Workflow Nodes</a>
+<ul>
+<li><a href="#a3.1_Control_Flow_Nodes">3.1 Control Flow Nodes</a>
+<ul>
+<li><a href="#a3.1.1_Start_Control_Node">3.1.1 Start Control Node</a></li>
+<li><a href="#a3.1.2_End_Control_Node">3.1.2 End Control Node</a></li>
+<li><a href="#a3.1.3_Kill_Control_Node">3.1.3 Kill Control Node</a></li>
+<li><a href="#a3.1.4_Decision_Control_Node">3.1.4 Decision Control Node</a></li>
+<li><a href="#a3.1.5_Fork_and_Join_Control_Nodes">3.1.5 Fork and Join Control Nodes</a></li></ul></li>
+<li><a href="#a3.2_Workflow_Action_Nodes">3.2 Workflow Action Nodes</a>
+<ul>
+<li><a href="#a3.2.1_Action_Basis">3.2.1 Action Basis</a>
+<ul>
+<li><a href="#a3.2.1.1_Action_ComputationProcessing_Is_Always_Remote">3.2.1.1 Action Computation/Processing Is Always Remote</a></li>
+<li><a href="#a3.2.1.2_Actions_Are_Asynchronous">3.2.1.2 Actions Are Asynchronous</a></li>
+<li><a href="#a3.2.1.3_Actions_Have_2_Transitions_ok_and_error">3.2.1.3 Actions Have 2 Transitions, ok and error</a></li>
+<li><a href="#a3.2.1.4_Action_Recovery">3.2.1.4 Action Recovery</a></li></ul></li>
+<li><a href="#a3.2.2_Map-Reduce_Action">3.2.2 Map-Reduce Action</a>
+<ul>
+<li><a href="#a3.2.2.1_Adding_Files_and_Archives_for_the_Job">3.2.2.1 Adding Files and Archives for the Job</a></li>
+<li><a href="#a3.2.2.2_Configuring_the_MapReduce_action_with_Java_code">3.2.2.2 Configuring the MapReduce action with Java code</a></li>
+<li><a href="#a3.2.2.3_Streaming">3.2.2.3 Streaming</a></li>
+<li><a href="#a3.2.2.4_Pipes">3.2.2.4 Pipes</a></li>
+<li><a href="#a3.2.2.5_Syntax">3.2.2.5 Syntax</a></li></ul></li>
+<li><a href="#a3.2.3_Pig_Action">3.2.3 Pig Action</a></li>
+<li><a href="#a3.2.4_Fs_HDFS_action">3.2.4 Fs (HDFS) action</a></li>
+<li><a href="#a3.2.5_Sub-workflow_Action">3.2.5 Sub-workflow Action</a></li>
+<li><a href="#a3.2.6_Java_Action">3.2.6 Java Action</a>
+<ul>
+<li><a href="#a3.2.6.1_Overriding_an_actions_Main_class">3.2.6.1 Overriding an action&#x2019;s Main class</a></li></ul></li></ul></li></ul></li>
+<li><a href="#a4_Parameterization_of_Workflows">4 Parameterization of Workflows</a>
+<ul>
+<li><a href="#a4.1_Workflow_Job_Properties_or_Parameters">4.1 Workflow Job Properties (or Parameters)</a></li>
+<li><a href="#a4.2_Expression_Language_Functions">4.2 Expression Language Functions</a>
+<ul>
+<li><a href="#a4.2.1_Basic_EL_Constants">4.2.1 Basic EL Constants</a></li>
+<li><a href="#a4.2.2_Basic_EL_Functions">4.2.2 Basic EL Functions</a></li>
+<li><a href="#a4.2.3_Workflow_EL_Functions">4.2.3 Workflow EL Functions</a></li>
+<li><a href="#a4.2.4_Hadoop_EL_Constants">4.2.4 Hadoop EL Constants</a></li>
+<li><a href="#a4.2.5_Hadoop_EL_Functions">4.2.5 Hadoop EL Functions</a></li>
+<li><a href="#a4.2.6_Hadoop_Jobs_EL_Function">4.2.6 Hadoop Jobs EL Function</a></li>
+<li><a href="#a4.2.7_HDFS_EL_Functions">4.2.7 HDFS EL Functions</a></li>
+<li><a href="#a4.2.8_HCatalog_EL_Functions">4.2.8 HCatalog EL Functions</a></li></ul></li></ul></li>
+<li><a href="#a5_Workflow_Notifications">5 Workflow Notifications</a>
+<ul>
+<li><a href="#a5.1_Workflow_Job_Status_Notification">5.1 Workflow Job Status Notification</a></li>
+<li><a href="#a5.2_Node_Start_and_End_Notifications">5.2 Node Start and End Notifications</a></li></ul></li>
+<li><a href="#a6_User_Propagation">6 User Propagation</a></li>
+<li><a href="#a7_Workflow_Application_Deployment">7 Workflow Application Deployment</a></li>
+<li><a href="#a8_External_Data_Assumptions">8 External Data Assumptions</a></li>
+<li><a href="#a9_Workflow_Jobs_Lifecycle">9 Workflow Jobs Lifecycle</a>
+<ul>
+<li><a href="#a9.1_Workflow_Job_Lifecycle">9.1 Workflow Job Lifecycle</a></li>
+<li><a href="#a9.2_Workflow_Action_Lifecycle">9.2 Workflow Action Lifecycle</a></li></ul></li>
+<li><a href="#a10_Workflow_Jobs_Recovery_re-run">10 Workflow Jobs Recovery (re-run)</a></li>
+<li><a href="#a11_Oozie_Web_Services_API">11 Oozie Web Services API</a></li>
+<li><a href="#a12_Client_API">12 Client API</a></li>
+<li><a href="#a13_Command_Line_Tools">13 Command Line Tools</a></li>
+<li><a href="#a14_Web_UI_Console">14 Web UI Console</a></li>
+<li><a href="#a15_Customizing_Oozie_with_Extensions">15 Customizing Oozie with Extensions</a></li>
+<li><a href="#a16_Workflow_Jobs_Priority">16 Workflow Jobs Priority</a></li>
+<li><a href="#a17_HDFS_Share_Libraries_for_Workflow_Applications_since_Oozie_2.3">17 HDFS Share Libraries for Workflow Applications (since Oozie 2.3)</a>
+<ul>
+<li><a href="#a17.1_Action_Share_Library_Override_since_Oozie_3.3">17.1 Action Share Library Override (since Oozie 3.3)</a></li>
+<li><a href="#a17.2_Action_Share_Library_Exclude_since_Oozie_5.2">17.2 Action Share Library Exclude (since Oozie 5.2)</a></li></ul></li>
+<li><a href="#a18_User-Retry_for_Workflow_Actions_since_Oozie_3.1">18 User-Retry for Workflow Actions (since Oozie 3.1)</a></li>
+<li><a href="#a19_Global_Configurations">19 Global Configurations</a></li>
+<li><a href="#a20_Suspend_On_Nodes">20 Suspend On Nodes</a></li>
+<li><a href="#Appendixes">Appendixes</a>
+<ul>
+<li><a href="#Appendix_A_Oozie_Workflow_and_Common_XML_Schemas">Appendix A, Oozie Workflow and Common XML Schemas</a>
+<ul>
+<li><a href="#Oozie_Workflow_Schema_Version_1.0">Oozie Workflow Schema Version 1.0</a></li>
+<li><a href="#Oozie_Common_Schema_Version_1.0">Oozie Common Schema Version 1.0</a></li>
+<li><a href="#Oozie_Workflow_Schema_Version_0.5">Oozie Workflow Schema Version 0.5</a></li>
+<li><a href="#Oozie_Workflow_Schema_Version_0.4.5">Oozie Workflow Schema Version 0.4.5</a></li>
+<li><a href="#Oozie_Workflow_Schema_Version_0.4">Oozie Workflow Schema Version 0.4</a></li>
+<li><a href="#Oozie_Workflow_Schema_Version_0.3">Oozie Workflow Schema Version 0.3</a></li>
+<li><a href="#Oozie_Workflow_Schema_Version_0.2.5">Oozie Workflow Schema Version 0.2.5</a></li>
+<li><a href="#Oozie_Workflow_Schema_Version_0.2">Oozie Workflow Schema Version 0.2</a></li>
+<li><a href="#Oozie_SLA_Version_0.2">Oozie SLA Version 0.2</a></li>
+<li><a href="#Oozie_SLA_Version_0.1">Oozie SLA Version 0.1</a></li>
+<li><a href="#Oozie_Workflow_Schema_Version_0.1">Oozie Workflow Schema Version 0.1</a></li></ul></li>
+<li><a href="#Appendix_B_Workflow_Examples">Appendix B, Workflow Examples</a>
+<ul>
+<li><a href="#Fork_and_Join_Example">Fork and Join Example</a></li></ul></li></ul></li></ul>
+
+<div class="section">
+<h2><a name="Changelog"></a>Changelog</h2>
+<p><b>2016FEB19</b></p>
+<ul>
+
+<li>3.2.7 Updated notes on System.exit(int n) behavior</li>
+</ul>
+<p><b>2015APR29</b></p>
+<ul>
+
+<li>3.2.1.4 Added notes about Java action retries</li>
+<li>3.2.7 Added notes about Java action retries</li>
+</ul>
+<p><b>2014MAY08</b></p>
+<ul>
+
+<li>3.2.2.4 Added support for fully qualified job-xml path</li>
+</ul>
+<p><b>2013JUL03</b></p>
+<ul>
+
+<li>Appendix A, Added new workflow schema 0.5 and SLA schema 0.2</li>
+</ul>
+<p><b>2012AUG30</b></p>
+<ul>
+
+<li>4.2.2 Added two EL functions (replaceAll and appendAll)</li>
+</ul>
+<p><b>2012JUL26</b></p>
+<ul>
+
+<li>Appendix A, updated XML schema 0.4 to include <tt>parameters</tt> element</li>
+<li>4.1 Updated to mention about <tt>parameters</tt> element as of schema 0.4</li>
+</ul>
+<p><b>2012JUL23</b></p>
+<ul>
+
+<li>Appendix A, updated XML schema 0.4 (Fs action)</li>
+<li>3.2.4 Updated to mention that a <tt>name-node</tt>, a <tt>job-xml</tt>, and a <tt>configuration</tt> element are allowed in the Fs action as of schema 0.4</li>
+</ul>
+<p><b>2012JUN19</b></p>
+<ul>
+
+<li>Appendix A, added XML schema 0.4</li>
+<li>3.2.2.4 Updated to mention that multiple <tt>job-xml</tt> elements are allowed as of schema 0.4</li>
+<li>3.2.3 Updated to mention that multiple <tt>job-xml</tt> elements are allowed as of schema 0.4</li>
+</ul>
+<p><b>2011AUG17</b></p>
+<ul>
+
+<li>3.2.4 fs &#x2018;chmod&#x2019; xml closing element typo in Example corrected</li>
+</ul>
+<p><b>2011AUG12</b></p>
+<ul>
+
+<li>3.2.4 fs &#x2018;move&#x2019; action characteristics updated, to allow for consistent source and target paths and existing target path only if directory</li>
+<li>18, Update the doc for user-retry of workflow action.</li>
+</ul>
+<p><b>2011FEB19</b></p>
+<ul>
+
+<li>10, Update the doc to rerun from the failed node.</li>
+</ul>
+<p><b>2010OCT31</b></p>
+<ul>
+
+<li>17, Added new section on Shared Libraries</li>
+</ul>
+<p><b>2010APR27</b></p>
+<ul>
+
+<li>3.2.3 Added new &#x201c;arguments&#x201d; tag to PIG actions</li>
+<li>3.2.5 SSH actions are deprecated in Oozie schema 0.1 and removed in Oozie schema 0.2</li>
+<li>Appendix A, Added schema version 0.2</li>
+</ul>
+<p><b>2009OCT20</b></p>
+<ul>
+
+<li>Appendix A, updated XML schema</li>
+</ul>
+<p><b>2009SEP15</b></p>
+<ul>
+
+<li>3.2.6 Removing support for sub-workflow in a different Oozie instance (removing the &#x2018;oozie&#x2019; element)</li>
+</ul>
+<p><b>2009SEP07</b></p>
+<ul>
+
+<li>3.2.2.3 Added Map Reduce Pipes specifications.</li>
+<li>3.2.2.4 Map-Reduce Examples. Previously was 3.2.2.3.</li>
+</ul>
+<p><b>2009SEP02</b></p>
+<ul>
+
+<li>10 Added missing skip nodes property name.</li>
+<li>3.2.1.4 Reworded action recovery explanation.</li>
+</ul>
+<p><b>2009AUG26</b></p>
+<ul>
+
+<li>3.2.9 Added <tt>java</tt> action type</li>
+<li>3.1.4 Example uses EL constant to refer to counter group/name</li>
+</ul>
+<p><b>2009JUN09</b></p>
+<ul>
+
+<li>12.2.4 Added build version resource to admin end-point</li>
+<li>3.2.6 Added flag to propagate workflow configuration to sub-workflows</li>
+<li>10 Added behavior for workflow job parameters given in the rerun</li>
+<li>11.3.4 workflows info returns pagination information</li>
+</ul>
+<p><b>2009MAY18</b></p>
+<ul>
+
+<li>3.1.4 decision node, &#x2018;default&#x2019; element, &#x2018;name&#x2019; attribute changed to &#x2018;to&#x2019;</li>
+<li>3.1.5 fork node, &#x2018;transition&#x2019; element changed to &#x2018;start&#x2019;, &#x2018;to&#x2019; attribute change to &#x2018;path&#x2019;</li>
+<li>3.1.5 join node, &#x2018;transition&#x2019; element remove, added &#x2018;to&#x2019; attribute to &#x2018;join&#x2019; element</li>
+<li>3.2.1.4 Rewording on action recovery section</li>
+<li>3.2.2 map-reduce action, added &#x2018;job-tracker&#x2019;, &#x2018;name-node&#x2019; actions, &#x2018;file&#x2019;, &#x2018;file&#x2019; and &#x2018;archive&#x2019; elements</li>
+<li>3.2.2.1 map-reduce action, remove from &#x2018;streaming&#x2019; element &#x2018;file&#x2019;, &#x2018;file&#x2019; and &#x2018;archive&#x2019; elements</li>
+<li>3.2.2.2 map-reduce action, reorganized streaming section</li>
+<li>3.2.3 pig action, removed information about implementation (SSH), changed elements names</li>
+<li>3.2.4 fs action, removed &#x2018;fs-uri&#x2019; and &#x2018;user-name&#x2019; elements, file system URI is now specified in path, user is propagated</li>
+<li>3.2.6 sub-workflow action, renamed elements &#x2018;oozie-url&#x2019; to &#x2018;oozie&#x2019; and &#x2018;workflow-app&#x2019; to &#x2018;app-path&#x2019;</li>
+<li>4 Properties that are valid Java identifiers can be used as ${NAME}</li>
+<li>4.1 Renamed default properties file from &#x2018;configuration.xml&#x2019; to &#x2018;default-configuration.xml&#x2019;</li>
+<li>4.2 Changes in EL Constants and Functions</li>
+<li>5 Updated notification behavior and tokens</li>
+<li>6 Changed user propagation behavior</li>
+<li>7 Changed application packaging from ZIP to HDFS directory</li>
+<li>Removed application lifecycle and self containment model sections</li>
+<li>10 Changed workflow job recovery, simplified recovery behavior</li>
+<li>11 Detailed Web Services API</li>
+<li>12 Updated  Client API section</li>
+<li>15 Updated  Action Executor API section</li>
+<li>Appendix A XML namespace updated to &#x2018;uri:oozie:workflow:0.1&#x2019;</li>
+<li>Appendix A Updated XML schema to changes in map-reduce/pig/fs/ssh actions</li>
+<li>Appendix B Updated workflow example to schema changes</li>
+</ul>
+<p><b>2009MAR25</b></p>
+<ul>
+
+<li>Changing all references of HWS to Oozie (project name)</li>
+<li>Typos, XML Formatting</li>
+<li>XML Schema URI correction</li>
+</ul>
+<p><b>2009MAR09</b></p>
+<ul>
+
+<li>Changed <tt>CREATED</tt> job state to <tt>PREP</tt> to have same states as Hadoop</li>
+<li>Renamed &#x2018;hadoop-workflow&#x2019; element to &#x2018;workflow-app&#x2019;</li>
+<li>Decision syntax changed to be &#x2018;switch/case&#x2019; with no transition indirection</li>
+<li>Action nodes common root element &#x2018;action&#x2019;, with the action type as sub-element (using a single built-in XML schema)</li>
+<li>Action nodes have 2 explicit transitions &#x2018;ok to&#x2019; and &#x2018;error to&#x2019; enforced by XML schema</li>
+<li>Renamed &#x2018;fail&#x2019; action element to &#x2018;kill&#x2019;</li>
+<li>Renamed &#x2018;hadoop&#x2019; action element to &#x2018;map-reduce&#x2019;</li>
+<li>Renamed &#x2018;hdfs&#x2019; action element to &#x2018;fs&#x2019;</li>
+<li>Updated all XML snippets and examples</li>
+<li>Made user propagation simpler and consistent</li>
+<li>Added Oozie XML schema to Appendix A</li>
+<li>Added workflow example to Appendix B</li>
+</ul>
+<p><b>2009FEB22</b></p>
+<ul>
+
+<li>Opened <a class="externalLink" href="https://issues.apache.org/jira/browse/HADOOP-5303">JIRA HADOOP-5303</a></li>
+</ul>
+<p><b>27/DEC/2012:</b></p>
+<ul>
+
+<li>Added information on dropping hcatalog table partitions in prepare block</li>
+<li>Added hcatalog EL functions section</li>
+</ul></div>
+<div class="section">
+<h2><a name="a0_Definitions"></a>0 Definitions</h2>
+<p><b>Action:</b> An execution/computation task (Map-Reduce job, Pig job, a shell command). It can also be referred as task or &#x2018;action node&#x2019;.</p>
+<p><b>Workflow:</b> A collection of actions arranged in a control dependency DAG (Directed Acyclic Graph). &#x201c;control dependency&#x201d; from one action to another means that the second action can&#x2019;t run until the first action has completed.</p>
+<p><b>Workflow Definition:</b> A programmatic description of a workflow that can be executed.</p>
+<p><b>Workflow Definition Language:</b> The language used to define a Workflow Definition.</p>
+<p><b>Workflow Job:</b> An executable instance of a workflow definition.</p>
+<p><b>Workflow Engine:</b> A system that executes workflows jobs. It can also be referred as a DAG engine.</p></div>
+<div class="section">
+<h2><a name="a1_Specification_Highlights"></a>1 Specification Highlights</h2>
+<p>A Workflow application is DAG that coordinates the following types of actions: Hadoop, Pig, and sub-workflows.</p>
+<p>Flow control operations within the workflow applications can be done using decision, fork and join nodes. Cycles in workflows are not supported.</p>
+<p>Actions and decisions can be parameterized with job properties, actions output (i.e. Hadoop counters) and file information (file exists, file size, etc). Formal parameters are expressed in the workflow definition as <tt>${VAR}</tt> variables.</p>
+<p>A Workflow application is a ZIP file that contains the workflow definition (an XML file), all the necessary files to run all the actions: JAR files for Map/Reduce jobs, shells for streaming Map/Reduce jobs, native libraries, Pig scripts, and other resource files.</p>
+<p>Before running a workflow job, the corresponding workflow application must be deployed in Oozie.</p>
+<p>Deploying workflow application and running workflow jobs can be done via command line tools, a WS API and a Java API.</p>
+<p>Monitoring the system and workflow jobs can be done via a web console, command line tools, a WS API and a Java API.</p>
+<p>When submitting a workflow job, a set of properties resolving all the formal parameters in the workflow definitions must be provided. This set of properties is a Hadoop configuration.</p>
+<p>Possible states for a workflow jobs are: <tt>PREP</tt>, <tt>RUNNING</tt>, <tt>SUSPENDED</tt>, <tt>SUCCEEDED</tt>, <tt>KILLED</tt> and <tt>FAILED</tt>.</p>
+<p>In the case of a action start failure in a workflow job, depending on the type of failure, Oozie will attempt automatic retries, it will request a manual retry or it will fail the workflow job.</p>
+<p>Oozie can make HTTP callback notifications on action start/end/failure events and workflow end/failure events.</p>
+<p>In the case of workflow job failure, the workflow job can be resubmitted skipping previously completed actions. Before doing a resubmission the workflow application could be updated with a patch to fix a problem in the workflow application code.</p>
+<p><a name="WorkflowDefinition"></a></p></div>
+<div class="section">
+<h2><a name="a2_Workflow_Definition"></a>2 Workflow Definition</h2>
+<p>A workflow definition is a DAG with control flow nodes (start, end, decision, fork, join, kill) or action nodes (map-reduce, pig, etc.), nodes are connected by transitions arrows.</p>
+<p>The workflow definition language is XML based and it is called hPDL (Hadoop Process Definition Language).</p>
+<p>Refer to the Appendix A for the<a href="WorkflowFunctionalSpec.html#OozieWFSchema">Oozie Workflow Definition XML Schema</a>. Appendix B has <a href="WorkflowFunctionalSpec.html#OozieWFExamples">Workflow Definition Examples</a>.</p>
+<div class="section">
+<h3><a name="a2.1_Cycles_in_Workflow_Definitions"></a>2.1 Cycles in Workflow Definitions</h3>
+<p>Oozie does not support cycles in workflow definitions, workflow definitions must be a strict DAG.</p>
+<p>At workflow application deployment time, if Oozie detects a cycle in the workflow definition it must fail the deployment.</p></div></div>
+<div class="section">
+<h2><a name="a3_Workflow_Nodes"></a>3 Workflow Nodes</h2>
+<p>Workflow nodes are classified in control flow nodes and action nodes:</p>
+<ul>
+
+<li><b>Control flow nodes:</b> nodes that control the start and end of the workflow and workflow job execution path.</li>
+<li><b>Action nodes:</b> nodes that trigger the execution of a computation/processing task.</li>
+</ul>
+<p>Node names and transitions must be conform to the following pattern <tt>[a-zA-Z][\-_a-zA-Z0-0]*</tt>, of up to 20 characters long.</p>
+<div class="section">
+<h3><a name="a3.1_Control_Flow_Nodes"></a>3.1 Control Flow Nodes</h3>
+<p>Control flow nodes define the beginning and the end of a workflow (the <tt>start</tt>, <tt>end</tt> and <tt>kill</tt> nodes) and provide a mechanism to control the workflow execution path (the <tt>decision</tt>, <tt>fork</tt> and <tt>join</tt> nodes).</p>
+<p><a name="StartNode"></a></p>
+<div class="section">
+<h4><a name="a3.1.1_Start_Control_Node"></a>3.1.1 Start Control Node</h4>
+<p>The <tt>start</tt> node is the entry point for a workflow job, it indicates the first workflow node the workflow job must transition to.</p>
+<p>When a workflow is started, it automatically transitions to the node specified in the <tt>start</tt>.</p>
+<p>A workflow definition must have one <tt>start</tt> node.</p>
+<p><b>Syntax:</b></p>
+
+<div>
+<div>
+<pre class="source">&lt;workflow-app name=&quot;[WF-DEF-NAME]&quot; xmlns=&quot;uri:oozie:workflow:1.0&quot;&gt;
+  ...
+  &lt;start to=&quot;[NODE-NAME]&quot;/&gt;
+  ...
+&lt;/workflow-app&gt;
+</pre></div></div>
+
+<p>The <tt>to</tt> attribute is the name of first workflow node to execute.</p>
+<p><b>Example:</b></p>
+
+<div>
+<div>
+<pre class="source">&lt;workflow-app name=&quot;foo-wf&quot; xmlns=&quot;uri:oozie:workflow:1.0&quot;&gt;
+    ...
+    &lt;start to=&quot;firstHadoopJob&quot;/&gt;
+    ...
+&lt;/workflow-app&gt;
+</pre></div></div>
+
+<p><a name="EndNode"></a></p></div>
+<div class="section">
+<h4><a name="a3.1.2_End_Control_Node"></a>3.1.2 End Control Node</h4>
+<p>The <tt>end</tt> node is the end for a workflow job, it indicates that the workflow job has completed successfully.</p>
+<p>When a workflow job reaches the <tt>end</tt> it finishes successfully (SUCCEEDED).</p>
+<p>If one or more actions started by the workflow job are executing when the <tt>end</tt> node is reached, the actions will be killed. In this scenario the workflow job is still considered as successfully run.</p>
+<p>A workflow definition must have one <tt>end</tt> node.</p>
+<p><b>Syntax:</b></p>
+
+<div>
+<div>
+<pre class="source">&lt;workflow-app name=&quot;[WF-DEF-NAME]&quot; xmlns=&quot;uri:oozie:workflow:1.0&quot;&gt;
+    ...
+    &lt;end name=&quot;[NODE-NAME]&quot;/&gt;
+    ...
+&lt;/workflow-app&gt;
+</pre></div></div>
+
+<p>The <tt>name</tt> attribute is the name of the transition to do to end the workflow job.</p>
+<p><b>Example:</b></p>
+
+<div>
+<div>
+<pre class="source">&lt;workflow-app name=&quot;foo-wf&quot; xmlns=&quot;uri:oozie:workflow:1.0&quot;&gt;
+    ...
+    &lt;end name=&quot;end&quot;/&gt;
+&lt;/workflow-app&gt;
+</pre></div></div>
+
+<p><a name="KillNode"></a></p></div>
+<div class="section">
+<h4><a name="a3.1.3_Kill_Control_Node"></a>3.1.3 Kill Control Node</h4>
+<p>The <tt>kill</tt> node allows a workflow job to kill itself.</p>
+<p>When a workflow job reaches the <tt>kill</tt> it finishes in error (KILLED).</p>
+<p>If one or more actions started by the workflow job are executing when the <tt>kill</tt> node is reached, the actions will be killed.</p>
+<p>A workflow definition may have zero or more <tt>kill</tt> nodes.</p>
+<p><b>Syntax:</b></p>
+
+<div>
+<div>
+<pre class="source">&lt;workflow-app name=&quot;[WF-DEF-NAME]&quot; xmlns=&quot;uri:oozie:workflow:1.0&quot;&gt;
+    ...
+    &lt;kill name=&quot;[NODE-NAME]&quot;&gt;
+        &lt;message&gt;[MESSAGE-TO-LOG]&lt;/message&gt;
+    &lt;/kill&gt;
+    ...
+&lt;/workflow-app&gt;
+</pre></div></div>
+
+<p>The <tt>name</tt> attribute in the <tt>kill</tt> node is the name of the Kill action node.</p>
+<p>The content of the <tt>message</tt> element will be logged as the kill reason for the workflow job.</p>
+<p>A <tt>kill</tt> node does not have transition elements because it ends the workflow job, as <tt>KILLED</tt>.</p>
+<p><b>Example:</b></p>
+
+<div>
+<div>
+<pre class="source">&lt;workflow-app name=&quot;foo-wf&quot; xmlns=&quot;uri:oozie:workflow:1.0&quot;&gt;
+    ...
+    &lt;kill name=&quot;killBecauseNoInput&quot;&gt;
+        &lt;message&gt;Input unavailable&lt;/message&gt;
+    &lt;/kill&gt;
+    ...
+&lt;/workflow-app&gt;
+</pre></div></div>
+
+<p><a name="DecisionNode"></a></p></div>
+<div class="section">
+<h4><a name="a3.1.4_Decision_Control_Node"></a>3.1.4 Decision Control Node</h4>
+<p>A <tt>decision</tt> node enables a workflow to make a selection on the execution path to follow.</p>
+<p>The behavior of a <tt>decision</tt> node can be seen as a switch-case statement.</p>
+<p>A <tt>decision</tt> node consists of a list of predicates-transition pairs plus a default transition. Predicates are evaluated in order or appearance until one of them evaluates to <tt>true</tt> and the corresponding transition is taken. If none of the predicates evaluates to <tt>true</tt> the <tt>default</tt> transition is taken.</p>
+<p>Predicates are JSP Expression Language (EL) expressions (refer to section 4.2 of this document) that resolve into a boolean value, <tt>true</tt> or <tt>false</tt>. For example:</p>
+
+<div>
+<div>
+<pre class="source">    ${fs:fileSize('/usr/foo/myinputdir') gt 10 * GB}
+</pre></div></div>
+
+<p><b>Syntax:</b></p>
+
+<div>
+<div>
+<pre class="source">&lt;workflow-app name=&quot;[WF-DEF-NAME]&quot; xmlns=&quot;uri:oozie:workflow:1.0&quot;&gt;
+    ...
+    &lt;decision name=&quot;[NODE-NAME]&quot;&gt;
+        &lt;switch&gt;
+            &lt;case to=&quot;[NODE_NAME]&quot;&gt;[PREDICATE]&lt;/case&gt;
+            ...
+            &lt;case to=&quot;[NODE_NAME]&quot;&gt;[PREDICATE]&lt;/case&gt;
+            &lt;default to=&quot;[NODE_NAME]&quot;/&gt;
+        &lt;/switch&gt;
+    &lt;/decision&gt;
+    ...
+&lt;/workflow-app&gt;
+</pre></div></div>
+
+<p>The <tt>name</tt> attribute in the <tt>decision</tt> node is the name of the decision node.</p>
+<p>Each <tt>case</tt> elements contains a predicate and a transition name. The predicate ELs are evaluated in order until one returns <tt>true</tt> and the corresponding transition is taken.</p>
+<p>The <tt>default</tt> element indicates the transition to take if none of the predicates evaluates to <tt>true</tt>.</p>
+<p>All decision nodes must have a <tt>default</tt> element to avoid bringing the workflow into an error state if none of the predicates evaluates to true.</p>
+<p><b>Example:</b></p>
+
+<div>
+<div>
+<pre class="source">&lt;workflow-app name=&quot;foo-wf&quot; xmlns=&quot;uri:oozie:workflow:1.0&quot;&gt;
+    ...
+    &lt;decision name=&quot;mydecision&quot;&gt;
+        &lt;switch&gt;
+            &lt;case to=&quot;reconsolidatejob&quot;&gt;
+              ${fs:fileSize(secondjobOutputDir) gt 10 * GB}
+            &lt;/case&gt; &lt;case to=&quot;rexpandjob&quot;&gt;
+              ${fs:fileSize(secondjobOutputDir) lt 100 * MB}
+            &lt;/case&gt;
+            &lt;case to=&quot;recomputejob&quot;&gt;
+              ${ hadoop:counters('secondjob')[RECORDS][REDUCE_OUT] lt 1000000 }
+            &lt;/case&gt;
+            &lt;default to=&quot;end&quot;/&gt;
+        &lt;/switch&gt;
+    &lt;/decision&gt;
+    ...
+&lt;/workflow-app&gt;
+</pre></div></div>
+
+<p><a name="ForkJoinNodes"></a></p></div>
+<div class="section">
+<h4><a name="a3.1.5_Fork_and_Join_Control_Nodes"></a>3.1.5 Fork and Join Control Nodes</h4>
+<p>A <tt>fork</tt> node splits one path of execution into multiple concurrent paths of execution.</p>
+<p>A <tt>join</tt> node waits until every concurrent execution path of a previous <tt>fork</tt> node arrives to it.</p>
+<p>The <tt>fork</tt> and <tt>join</tt> nodes must be used in pairs. The <tt>join</tt> node assumes concurrent execution paths are children of the same <tt>fork</tt> node.</p>
+<p><b>Syntax:</b></p>
+
+<div>
+<div>
+<pre class="source">&lt;workflow-app name=&quot;[WF-DEF-NAME]&quot; xmlns=&quot;uri:oozie:workflow:1.0&quot;&gt;
+    ...
+    &lt;fork name=&quot;[FORK-NODE-NAME]&quot;&gt;
+        &lt;path start=&quot;[NODE-NAME]&quot; /&gt;
+        ...
+        &lt;path start=&quot;[NODE-NAME]&quot; /&gt;
+    &lt;/fork&gt;
+    ...
+    &lt;join name=&quot;[JOIN-NODE-NAME]&quot; to=&quot;[NODE-NAME]&quot; /&gt;
+    ...
+&lt;/workflow-app&gt;
+</pre></div></div>
+
+<p>The <tt>name</tt> attribute in the <tt>fork</tt> node is the name of the workflow fork node. The <tt>start</tt> attribute in the <tt>path</tt> elements in the <tt>fork</tt> node indicate the name of the workflow node that will be part of the concurrent execution paths.</p>
+<p>The <tt>name</tt> attribute in the <tt>join</tt> node is the name of the workflow join node. The <tt>to</tt> attribute in the <tt>join</tt> node indicates the name of the workflow node that will executed after all concurrent execution paths of the corresponding fork arrive to the join node.</p>
+<p><b>Example:</b></p>
+
+<div>
+<div>
+<pre class="source">&lt;workflow-app name=&quot;sample-wf&quot; xmlns=&quot;uri:oozie:workflow:1.0&quot;&gt;
+    ...
+    &lt;fork name=&quot;forking&quot;&gt;
+        &lt;path start=&quot;firstparalleljob&quot;/&gt;
+        &lt;path start=&quot;secondparalleljob&quot;/&gt;
+    &lt;/fork&gt;
+    &lt;action name=&quot;firstparallejob&quot;&gt;
+        &lt;map-reduce&gt;
+            &lt;resource-manager&gt;foo:8032&lt;/resource-manager&gt;
+            &lt;name-node&gt;bar:8020&lt;/name-node&gt;
+            &lt;job-xml&gt;job1.xml&lt;/job-xml&gt;
+        &lt;/map-reduce&gt;
+        &lt;ok to=&quot;joining&quot;/&gt;
+        &lt;error to=&quot;kill&quot;/&gt;
+    &lt;/action&gt;
+    &lt;action name=&quot;secondparalleljob&quot;&gt;
+        &lt;map-reduce&gt;
+            &lt;resource-manager&gt;foo:8032&lt;/resource-manager&gt;
+            &lt;name-node&gt;bar:8020&lt;/name-node&gt;
+            &lt;job-xml&gt;job2.xml&lt;/job-xml&gt;
+        &lt;/map-reduce&gt;
+        &lt;ok to=&quot;joining&quot;/&gt;
+        &lt;error to=&quot;kill&quot;/&gt;
+    &lt;/action&gt;
+    &lt;join name=&quot;joining&quot; to=&quot;nextaction&quot;/&gt;
+    ...
+&lt;/workflow-app&gt;
+</pre></div></div>
+
+<p>By default, Oozie performs some validation that any forking in a workflow is valid and won&#x2019;t lead to any incorrect behavior or instability.  However, if Oozie is preventing a workflow from being submitted and you are very certain that it should work, you can disable forkjoin validation so that Oozie will accept the workflow.  To disable this validation just for a specific workflow, simply set <tt>oozie.wf.validate.ForkJoin</tt> to <tt>false</tt> in the job.properties file.  To disable this validation for all workflows, simply set <tt>oozie.validate.ForkJoin</tt> to <tt>false</tt> in the oozie-site.xml file.  Disabling this validation is determined by the AND of both of these properties, so it will be disabled if either or both are set to false and only enabled if both are set to true (or not specified).</p>
+<p><a name="ActionNodes"></a></p></div></div>
+<div class="section">
+<h3><a name="a3.2_Workflow_Action_Nodes"></a>3.2 Workflow Action Nodes</h3>
+<p>Action nodes are the mechanism by which a workflow triggers the execution of a computation/processing task.</p>
+<div class="section">
+<h4><a name="a3.2.1_Action_Basis"></a>3.2.1 Action Basis</h4>
+<p>The following sub-sections define common behavior and capabilities for all action types.</p>
+<div class="section">
+<h5><a name="a3.2.1.1_Action_ComputationProcessing_Is_Always_Remote"></a>3.2.1.1 Action Computation/Processing Is Always Remote</h5>
+<p>All computation/processing tasks triggered by an action node are remote to Oozie. No workflow application specific computation/processing task is executed within Oozie.</p></div>
+<div class="section">
+<h5><a name="a3.2.1.2_Actions_Are_Asynchronous"></a>3.2.1.2 Actions Are Asynchronous</h5>
+<p>All computation/processing tasks triggered by an action node are executed asynchronously by Oozie. For most types of computation/processing tasks triggered by workflow action, the workflow job has to wait until the computation/processing task completes before transitioning to the following node in the workflow.</p>
+<p>The exception is the <tt>fs</tt> action that is handled as a synchronous action.</p>
+<p>Oozie can detect completion of computation/processing tasks by two different means, callbacks and polling.</p>
+<p>When a computation/processing tasks is started by Oozie, Oozie provides a unique callback URL to the task, the task should invoke the given URL to notify its completion.</p>
+<p>For cases that the task failed to invoke the callback URL for any reason (i.e. a transient network failure) or when the type of task cannot invoke the callback URL upon completion, Oozie has a mechanism to poll computation/processing tasks for completion.</p></div>
+<div class="section">
+<h5><a name="a3.2.1.3_Actions_Have_2_Transitions_ok_and_error"></a>3.2.1.3 Actions Have 2 Transitions, <tt>ok</tt> and <tt>error</tt></h5>
+<p>If a computation/processing task -triggered by a workflow- completes successfully, it transitions to <tt>ok</tt>.</p>
+<p>If a computation/processing task -triggered by a workflow- fails to complete successfully, its transitions to <tt>error</tt>.</p>
+<p>If a computation/processing task exits in error, there computation/processing task must provide <tt>error-code</tt> and <tt>error-message</tt> information to Oozie. This information can be used from <tt>decision</tt> nodes to implement a fine grain error handling at workflow application level.</p>
+<p>Each action type must clearly define all the error codes it can produce.</p></div>
+<div class="section">
+<h5><a name="a3.2.1.4_Action_Recovery"></a>3.2.1.4 Action Recovery</h5>
+<p>Oozie provides recovery capabilities when starting or ending actions.</p>
+<p>Once an action starts successfully Oozie will not retry starting the action if the action fails during its execution. The assumption is that the external system (i.e. Hadoop) executing the action has enough resilience to recover jobs once it has started (i.e. Hadoop task retries).</p>
+<p>Java actions are a special case with regard to retries.  Although Oozie itself does not retry Java actions should they fail after they have successfully started, Hadoop itself can cause the action to be restarted due to a map task retry on the map task running the Java application.  See the Java Action section below for more detail.</p>
+<p>For failures that occur prior to the start of the job, Oozie will have different recovery strategies depending on the nature of the failure.</p>
+<p>If the failure is of transient nature, Oozie will perform retries after a pre-defined time interval. The number of retries and timer interval for a type of action must be pre-configured at Oozie level. Workflow jobs can override such configuration.</p>
+<p>Examples of a transient failures are network problems or a remote system temporary unavailable.</p>
+<p>If the failure is of non-transient nature, Oozie will suspend the workflow job until an manual or programmatic intervention resumes the workflow job and the action start or end is retried. It is the responsibility of an administrator or an external managing system to perform any necessary cleanup before resuming the workflow job.</p>
+<p>If the failure is an error and a retry will not resolve the problem, Oozie will perform the error transition for the action.</p>
+<p><a name="MapReduceAction"></a></p></div></div>
+<div class="section">
+<h4><a name="a3.2.2_Map-Reduce_Action"></a>3.2.2 Map-Reduce Action</h4>
+<p>The <tt>map-reduce</tt> action starts a Hadoop map/reduce job from a workflow. Hadoop jobs can be Java Map/Reduce jobs or streaming jobs.</p>
+<p>A <tt>map-reduce</tt> action can be configured to perform file system cleanup and directory creation before starting the map reduce job. This capability enables Oozie to retry a Hadoop job in the situation of a transient failure (Hadoop checks the non-existence of the job output directory and then creates it when the Hadoop job is starting, thus a retry without cleanup of the job output directory would fail).</p>
+<p>The workflow job will wait until the Hadoop map/reduce job completes before continuing to the next action in the workflow execution path.</p>
+<p>The counters of the Hadoop job and job exit status (<tt>FAILED</tt>, <tt>KILLED</tt> or <tt>SUCCEEDED</tt>) must be available to the workflow job after the Hadoop jobs ends. This information can be used from within decision nodes and other actions configurations.</p>
+<p>The <tt>map-reduce</tt> action has to be configured with all the necessary Hadoop JobConf properties to run the Hadoop map/reduce job.</p>
+<p>Hadoop JobConf properties can be specified as part of</p>
+<ul>
+
+<li>the <tt>config-default.xml</tt> or</li>
+<li>JobConf XML file bundled with the workflow application or</li>
+<li>&lt;global&gt; tag in workflow definition or</li>
+<li>Inline <tt>map-reduce</tt> action configuration or</li>
+<li>An implementation of OozieActionConfigurator specified by the &lt;config-class&gt; tag in workflow definition.</li>
+</ul>
+<p>The configuration properties are loaded in the following above order i.e. <tt>streaming</tt>, <tt>job-xml</tt>, <tt>configuration</tt>, and <tt>config-class</tt>, and the precedence order is later values override earlier values.</p>
+<p>Streaming and inline property values can be parameterized (templatized) using EL expressions.</p>
+<p>The Hadoop <tt>mapred.job.tracker</tt> and <tt>fs.default.name</tt> properties must not be present in the job-xml and inline configuration.</p>
+<p><a name="FilesArchives"></a></p>
+<div class="section">
+<h5><a name="a3.2.2.1_Adding_Files_and_Archives_for_the_Job"></a>3.2.2.1 Adding Files and Archives for the Job</h5>
+<p>The <tt>file</tt>, <tt>archive</tt> elements make available, to map-reduce jobs, files and archives. If the specified path is relative, it is assumed the file or archiver are within the application directory, in the corresponding sub-path. If the path is absolute, the file or archive it is expected in the given absolute path.</p>
+<p>Files specified with the <tt>file</tt> element, will be symbolic links in the home directory of the task.</p>
+<p>If a file is a native library (an &#x2018;.so&#x2019; or a &#x2018;.so.#&#x2019; file), it will be symlinked as and &#x2018;.so&#x2019; file in the task running directory, thus available to the task JVM.</p>
+<p>To force a symlink for a file on the task running directory, use a &#x2018;#&#x2019; followed by the symlink name. For example &#x2018;mycat.sh#cat&#x2019;.</p>
+<p>Refer to Hadoop distributed cache documentation for details more details on files and archives.</p></div>
+<div class="section">
+<h5><a name="a3.2.2.2_Configuring_the_MapReduce_action_with_Java_code"></a>3.2.2.2 Configuring the MapReduce action with Java code</h5>
+<p>Java code can be used to further configure the MapReduce action.  This can be useful if you already have &#x201c;driver&#x201d; code for your MapReduce action, if you&#x2019;re more familiar with MapReduce&#x2019;s Java API, if there&#x2019;s some configuration that requires logic, or some configuration that&#x2019;s difficult to do in straight XML (e.g. Avro).</p>
+<p>Create a class that implements the org.apache.oozie.action.hadoop.OozieActionConfigurator interface from the &#x201c;oozie-sharelib-oozie&#x201d; artifact.  It contains a single method that receives a <tt>JobConf</tt> as an argument.  Any configuration properties set on this <tt>JobConf</tt> will be used by the MapReduce action.</p>
+<p>The OozieActionConfigurator has this signature:</p>
+
+<div>
+<div>
+<pre class="source">public interface OozieActionConfigurator {
+    public void configure(JobConf actionConf) throws OozieActionConfiguratorException;
+}
+</pre></div></div>
+
+<p>where <tt>actionConf</tt> is the <tt>JobConf</tt> you can update.  If you need to throw an Exception, you can wrap it in an <tt>OozieActionConfiguratorException</tt>, also in the &#x201c;oozie-sharelib-oozie&#x201d; artifact.</p>
+<p>For example:</p>
+
+<div>
+<div>
+<pre class="source">package com.example;
+
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.FileInputFormat;
+import org.apache.hadoop.mapred.FileOutputFormat;
+import org.apache.hadoop.mapred.JobConf;
+import org.apache.oozie.action.hadoop.OozieActionConfigurator;
+import org.apache.oozie.action.hadoop.OozieActionConfiguratorException;
+import org.apache.oozie.example.SampleMapper;
+import org.apache.oozie.example.SampleReducer;
+
+public class MyConfigClass implements OozieActionConfigurator {
+
+    @Override
+    public void configure(JobConf actionConf) throws OozieActionConfiguratorException {
+        if (actionConf.getUser() == null) {
+            throw new OozieActionConfiguratorException(&quot;No user set&quot;);
+        }
+        actionConf.setMapperClass(SampleMapper.class);
+        actionConf.setReducerClass(SampleReducer.class);
+        FileInputFormat.setInputPaths(actionConf, new Path(&quot;/user/&quot; + actionConf.getUser() + &quot;/input-data&quot;));
+        FileOutputFormat.setOutputPath(actionConf, new Path(&quot;/user/&quot; + actionConf.getUser() + &quot;/output&quot;));
+        ...
+    }
+}
+</pre></div></div>
+
+<p>To use your config class in your MapReduce action, simply compile it into a jar, make the jar available to your action, and specify the class name in the <tt>config-class</tt> element (this requires at least schema 0.5):</p>
+
+<div>
+<div>
+<pre class="source">&lt;workflow-app name=&quot;[WF-DEF-NAME]&quot; xmlns=&quot;uri:oozie:workflow:1.0&quot;&gt;
+    ...
+    &lt;action name=&quot;[NODE-NAME]&quot;&gt;
+        &lt;map-reduce&gt;
+            ...
+            &lt;job-xml&gt;[JOB-XML-FILE]&lt;/job-xml&gt;
+            &lt;configuration&gt;
+                &lt;property&gt;
+                    &lt;name&gt;[PROPERTY-NAME]&lt;/name&gt;
+                    &lt;value&gt;[PROPERTY-VALUE]&lt;/value&gt;
+                &lt;/property&gt;
+                ...
+            &lt;/configuration&gt;
+            &lt;config-class&gt;com.example.MyConfigClass&lt;/config-class&gt;
+            ...
+        &lt;/map-reduce&gt;
+        &lt;ok to=&quot;[NODE-NAME]&quot;/&gt;
+        &lt;error to=&quot;[NODE-NAME]&quot;/&gt;
+    &lt;/action&gt;
+    ...
+&lt;/workflow-app&gt;
+</pre></div></div>
+
+<p>Another example of this can be found in the &#x201c;map-reduce&#x201d; example that comes with Oozie.</p>
+<p>A useful tip: The initial <tt>JobConf</tt> passed to the <tt>configure</tt> method includes all of the properties listed in the <tt>configuration</tt> section of the MR action in a workflow.  If you need to pass any information to your OozieActionConfigurator, you can simply put them here.</p>
+<p><a name="StreamingMapReduceAction"></a></p></div>
+<div class="section">
+<h5><a name="a3.2.2.3_Streaming"></a>3.2.2.3 Streaming</h5>
+<p>Streaming information can be specified in the <tt>streaming</tt> element.</p>
+<p>The <tt>mapper</tt> and <tt>reducer</tt> elements are used to specify the executable/script to be used as mapper and reducer.</p>
+<p>User defined scripts must be bundled with the workflow application and they must be declared in the <tt>files</tt> element of the streaming configuration. If the are not declared in the <tt>files</tt> element of the configuration it is assumed they will be available (and in the command PATH) of the Hadoop slave machines.</p>
+<p>Some streaming jobs require Files found on HDFS to be available to the mapper/reducer scripts. This is done using the <tt>file</tt> and <tt>archive</tt> elements described in the previous section.</p>
+<p>The Mapper/Reducer can be overridden by a <tt>mapred.mapper.class</tt> or <tt>mapred.reducer.class</tt> properties in the <tt>job-xml</tt> file or <tt>configuration</tt> elements.</p>
+<p><a name="PipesMapReduceAction"></a></p></div>
+<div class="section">
+<h5><a name="a3.2.2.4_Pipes"></a>3.2.2.4 Pipes</h5>
+<p>Pipes information can be specified in the <tt>pipes</tt> element.</p>
+<p>A subset of the command line options which can be used while using the Hadoop Pipes Submitter can be specified via elements - <tt>map</tt>, <tt>reduce</tt>, <tt>inputformat</tt>, <tt>partitioner</tt>, <tt>writer</tt>, <tt>program</tt>.</p>
+<p>The <tt>program</tt> element is used to specify the executable/script to be used.</p>
+<p>User defined program must be bundled with the workflow application.</p>
+<p>Some pipe jobs require Files found on HDFS to be available to the mapper/reducer scripts. This is done using the <tt>file</tt> and <tt>archive</tt> elements described in the previous section.</p>
+<p>Pipe properties can be overridden by specifying them in the <tt>job-xml</tt> file or <tt>configuration</tt> element.</p></div>
+<div class="section">
+<h5><a name="a3.2.2.5_Syntax"></a>3.2.2.5 Syntax</h5>
+
+<div>
+<div>
+<pre class="source">&lt;workflow-app name=&quot;[WF-DEF-NAME]&quot; xmlns=&quot;uri:oozie:workflow:1.0&quot;&gt;
+    ...
+    &lt;action name=&quot;[NODE-NAME]&quot;&gt;
+        &lt;map-reduce&gt;
+            &lt;resource-manager&gt;[RESOURCE-MANAGER]&lt;/resource-manager&gt;
+            &lt;name-node&gt;[NAME-NODE]&lt;/name-node&gt;
+            &lt;prepare&gt;
+                &lt;delete path=&quot;[PATH]&quot;/&gt;
+                ...
+                &lt;mkdir path=&quot;[PATH]&quot;/&gt;
+                ...
+            &lt;/prepare&gt;
+            &lt;streaming&gt;
+                &lt;mapper&gt;[MAPPER-PROCESS]&lt;/mapper&gt;
+                &lt;reducer&gt;[REDUCER-PROCESS]&lt;/reducer&gt;
+                &lt;record-reader&gt;[RECORD-READER-CLASS]&lt;/record-reader&gt;
+                &lt;record-reader-mapping&gt;[NAME=VALUE]&lt;/record-reader-mapping&gt;
+                ...
+                &lt;env&gt;[NAME=VALUE]&lt;/env&gt;
+                ...
+            &lt;/streaming&gt;
+			&lt;!-- Either streaming or pipes can be specified for an action, not both --&gt;
+            &lt;pipes&gt;
+                &lt;map&gt;[MAPPER]&lt;/map&gt;
+                &lt;reduce&gt;[REDUCER]&lt;/reducer&gt;
+                &lt;inputformat&gt;[INPUTFORMAT]&lt;/inputformat&gt;
+                &lt;partitioner&gt;[PARTITIONER]&lt;/partitioner&gt;
+                &lt;writer&gt;[OUTPUTFORMAT]&lt;/writer&gt;
+                &lt;program&gt;[EXECUTABLE]&lt;/program&gt;
+            &lt;/pipes&gt;
+            &lt;job-xml&gt;[JOB-XML-FILE]&lt;/job-xml&gt;
+            &lt;configuration&gt;
+                &lt;property&gt;
+                    &lt;name&gt;[PROPERTY-NAME]&lt;/name&gt;
+                    &lt;value&gt;[PROPERTY-VALUE]&lt;/value&gt;
+                &lt;/property&gt;
+                ...
+            &lt;/configuration&gt;
+            &lt;config-class&gt;com.example.MyConfigClass&lt;/config-class&gt;
+            &lt;file&gt;[FILE-PATH]&lt;/file&gt;
+            ...
+            &lt;archive&gt;[FILE-PATH]&lt;/archive&gt;
+            ...
+        &lt;/map-reduce&gt;
+
+        &lt;ok to=&quot;[NODE-NAME]&quot;/&gt;
+        &lt;error to=&quot;[NODE-NAME]&quot;/&gt;
+    &lt;/action&gt;
+    ...
+&lt;/workflow-app&gt;
+</pre></div></div>
+
+<p>The <tt>prepare</tt> element, if present, indicates a list of paths to delete before starting the job. This should be used exclusively for directory cleanup or dropping of hcatalog table or table partitions for the job to be executed. The delete operation will be performed in the <tt>fs.default.name</tt> filesystem for hdfs URIs. The format for specifying hcatalog table URI is hcat://[metastore server]:[port]/[database name]/[table name] and format to specify a hcatalog table partition URI is <tt>hcat://[metastore server]:[port]/[database name]/[table name]/[partkey1]=[value];[partkey2]=[value]</tt>. In case of a hcatalog URI, the hive-site.xml needs to be shipped using <tt>file</tt> tag and the hcatalog and hive jars need to be placed in workflow lib directory or specified using <tt>archive</tt> tag.</p>
+<p>The <tt>job-xml</tt> element, if present, must refer to a Hadoop JobConf <tt>job.xml</tt> file bundled in the workflow application. By default the <tt>job.xml</tt> file is taken from the workflow application namenode, regardless the namenode specified for the action. To specify a <tt>job.xml</tt> on another namenode use a fully qualified file path. The <tt>job-xml</tt> element is optional and as of schema 0.4, multiple <tt>job-xml</tt> elements are allowed in order to specify multiple Hadoop JobConf <tt>job.xml</tt> files.</p>
+<p>The <tt>configuration</tt> element, if present, contains JobConf properties for the Hadoop job.</p>
+<p>Properties specified in the <tt>configuration</tt> element override properties specified in the file specified in the <tt>job-xml</tt> element.</p>
+<p>As of schema 0.5, the <tt>config-class</tt> element, if present, contains a class that implements OozieActionConfigurator that can be used to further configure the MapReduce job.</p>
+<p>Properties specified in the <tt>config-class</tt> class override properties specified in <tt>configuration</tt> element.</p>
+<p>External Stats can be turned on/off by specifying the property <i>oozie.action.external.stats.write</i> as <i>true</i> or <i>false</i> in the configuration element of workflow.xml. The default value for this property is <i>false</i>.</p>
+<p>The <tt>file</tt> element, if present, must specify the target symbolic link for binaries by separating the original file and target with a # (file#target-sym-link). This is not required for libraries.</p>
+<p>The <tt>mapper</tt> and <tt>reducer</tt> process for streaming jobs, should specify the executable command with URL encoding. e.g. &#x2018;%&#x2019; should be replaced by &#x2018;%25&#x2019;.</p>
+<p><b>Example:</b></p>
+
+<div>
+<div>
+<pre class="source">&lt;workflow-app name=&quot;foo-wf&quot; xmlns=&quot;uri:oozie:workflow:1.0&quot;&gt;
+    ...
+    &lt;action name=&quot;myfirstHadoopJob&quot;&gt;
+        &lt;map-reduce&gt;
+            &lt;resource-manager&gt;foo:8032&lt;/resource-manager&gt;
+            &lt;name-node&gt;bar:8020&lt;/name-node&gt;
+            &lt;prepare&gt;
+                &lt;delete path=&quot;hdfs://foo:8020/usr/tucu/output-data&quot;/&gt;
+            &lt;/prepare&gt;
+            &lt;job-xml&gt;/myfirstjob.xml&lt;/job-xml&gt;
+            &lt;configuration&gt;
+                &lt;property&gt;
+                    &lt;name&gt;mapred.input.dir&lt;/name&gt;
+                    &lt;value&gt;/usr/tucu/input-data&lt;/value&gt;
+                &lt;/property&gt;
+                &lt;property&gt;
+                    &lt;name&gt;mapred.output.dir&lt;/name&gt;
+                    &lt;value&gt;/usr/tucu/input-data&lt;/value&gt;
+                &lt;/property&gt;
+                &lt;property&gt;
+                    &lt;name&gt;mapred.reduce.tasks&lt;/name&gt;
+                    &lt;value&gt;${firstJobReducers}&lt;/value&gt;
+                &lt;/property&gt;
+                &lt;property&gt;
+                    &lt;name&gt;oozie.action.external.stats.write&lt;/name&gt;
+                    &lt;value&gt;true&lt;/value&gt;
+                &lt;/property&gt;
+            &lt;/configuration&gt;
+        &lt;/map-reduce&gt;
+        &lt;ok to=&quot;myNextAction&quot;/&gt;
+        &lt;error to=&quot;errorCleanup&quot;/&gt;
+    &lt;/action&gt;
+    ...
+&lt;/workflow-app&gt;
+</pre></div></div>
+
+<p>In the above example, the number of Reducers to be used by the Map/Reduce job has to be specified as a parameter of the workflow job configuration when creating the workflow job.</p>
+<p><b>Streaming Example:</b></p>
+
+<div>
+<div>
+<pre class="source">&lt;workflow-app name=&quot;sample-wf&quot; xmlns=&quot;uri:oozie:workflow:1.0&quot;&gt;
+    ...
+    &lt;action name=&quot;firstjob&quot;&gt;
+        &lt;map-reduce&gt;
+            &lt;resource-manager&gt;foo:8032&lt;/resource-manager&gt;
+            &lt;name-node&gt;bar:8020&lt;/name-node&gt;
+            &lt;prepare&gt;
+                &lt;delete path=&quot;${output}&quot;/&gt;
+            &lt;/prepare&gt;
+            &lt;streaming&gt;
+                &lt;mapper&gt;/bin/bash testarchive/bin/mapper.sh testfile&lt;/mapper&gt;
+                &lt;reducer&gt;/bin/bash testarchive/bin/reducer.sh&lt;/reducer&gt;
+            &lt;/streaming&gt;
+            &lt;configuration&gt;
+                &lt;property&gt;
+                    &lt;name&gt;mapred.input.dir&lt;/name&gt;
+                    &lt;value&gt;${input}&lt;/value&gt;
+                &lt;/property&gt;
+                &lt;property&gt;
+                    &lt;name&gt;mapred.output.dir&lt;/name&gt;
+                    &lt;value&gt;${output}&lt;/value&gt;
+                &lt;/property&gt;
+                &lt;property&gt;
+                    &lt;name&gt;stream.num.map.output.key.fields&lt;/name&gt;
+                    &lt;value&gt;3&lt;/value&gt;
+                &lt;/property&gt;
+            &lt;/configuration&gt;
+            &lt;file&gt;/users/blabla/testfile.sh#testfile&lt;/file&gt;
+            &lt;archive&gt;/users/blabla/testarchive.jar#testarchive&lt;/archive&gt;
+        &lt;/map-reduce&gt;
+        &lt;ok to=&quot;end&quot;/&gt;
+        &lt;error to=&quot;kill&quot;/&gt;
+    &lt;/action&gt;
+  ...
+&lt;/workflow-app&gt;
+</pre></div></div>
+
+<p><b>Pipes Example:</b></p>
+
+<div>
+<div>
+<pre class="source">&lt;workflow-app name=&quot;sample-wf&quot; xmlns=&quot;uri:oozie:workflow:1.0&quot;&gt;
+    ...
+    &lt;action name=&quot;firstjob&quot;&gt;
+        &lt;map-reduce&gt;
+            &lt;resource-manager&gt;foo:8032&lt;/resource-manager&gt;
+            &lt;name-node&gt;bar:8020&lt;/name-node&gt;
+            &lt;prepare&gt;
+                &lt;delete path=&quot;${output}&quot;/&gt;
+            &lt;/prepare&gt;
+            &lt;pipes&gt;
+                &lt;program&gt;bin/wordcount-simple#wordcount-simple&lt;/program&gt;
+            &lt;/pipes&gt;
+            &lt;configuration&gt;
+                &lt;property&gt;
+                    &lt;name&gt;mapred.input.dir&lt;/name&gt;
+                    &lt;value&gt;${input}&lt;/value&gt;
+                &lt;/property&gt;
+                &lt;property&gt;
+                    &lt;name&gt;mapred.output.dir&lt;/name&gt;
+                    &lt;value&gt;${output}&lt;/value&gt;
+                &lt;/property&gt;
+            &lt;/configuration&gt;
+            &lt;archive&gt;/users/blabla/testarchive.jar#testarchive&lt;/archive&gt;
+        &lt;/map-reduce&gt;
+        &lt;ok to=&quot;end&quot;/&gt;
+        &lt;error to=&quot;kill&quot;/&gt;
+    &lt;/action&gt;
+  ...
+&lt;/workflow-app&gt;
+</pre></div></div>
+
+<p><a name="PigAction"></a></p></div></div>
+<div class="section">
+<h4><a name="a3.2.3_Pig_Action"></a>3.2.3 Pig Action</h4>
+<p>The <tt>pig</tt> action starts a Pig job.</p>
+<p>The workflow job will wait until the pig job completes before continuing to the next action.</p>
+<p>The <tt>pig</tt> action has to be configured with the resource-manager, name-node, pig script and the necessary parameters and configuration to run the Pig job.</p>
+<p>A <tt>pig</tt> action can be configured to perform HDFS files/directories cleanup or HCatalog partitions cleanup before starting the Pig job. This capability enables Oozie to retry a Pig job in the situation of a transient failure (Pig creates temporary directories for intermediate data, thus a retry without cleanup would fail).</p>
+<p>Hadoop JobConf properties can be specified as part of</p>
+<ul>
+
+<li>the <tt>config-default.xml</tt> or</li>
+<li>JobConf XML file bundled with the workflow application or</li>
+<li>&lt;global&gt; tag in workflow definition or</li>
+<li>Inline <tt>pig</tt> action configuration.</li>
+</ul>
+<p>The configuration properties are loaded in the following above order i.e. <tt>job-xml</tt> and <tt>configuration</tt>, and the precedence order is later values override earlier values.</p>
+<p>Inline property values can be parameterized (templatized) using EL expressions.</p>
+<p>The YARN <tt>yarn.resourcemanager.address</tt> and HDFS <tt>fs.default.name</tt> properties must not be present in the job-xml and inline configuration.</p>
+<p>As with Hadoop map-reduce jobs, it  is possible to add files and archives to be available to the Pig job, refer to section [#FilesArchives][Adding Files and Archives for the Job].</p>
+<p><b>Syntax for Pig actions in Oozie schema 1.0:</b></p>
+
+<div>
+<div>
+<pre class="source">&lt;workflow-app name=&quot;[WF-DEF-NAME]&quot; xmlns=&quot;uri:oozie:workflow:1.0&quot;&gt;
+    ...
+    &lt;action name=&quot;[NODE-NAME]&quot;&gt;
+        &lt;pig&gt;
+            &lt;resource-manager&gt;[RESOURCE-MANAGER]&lt;/resource-manager&gt;
+            &lt;name-node&gt;[NAME-NODE]&lt;/name-node&gt;
+            &lt;prepare&gt;
+               &lt;delete path=&quot;[PATH]&quot;/&gt;
+               ...
+               &lt;mkdir path=&quot;[PATH]&quot;/&gt;
+               ...
+            &lt;/prepare&gt;
+            &lt;job-xml&gt;[JOB-XML-FILE]&lt;/job-xml&gt;
+            &lt;configuration&gt;
+                &lt;property&gt;
+                    &lt;name&gt;[PROPERTY-NAME]&lt;/name&gt;
+                    &lt;value&gt;[PROPERTY-VALUE]&lt;/value&gt;
+                &lt;/property&gt;
+                ...
+            &lt;/configuration&gt;
+            &lt;script&gt;[PIG-SCRIPT]&lt;/script&gt;
+            &lt;param&gt;[PARAM-VALUE]&lt;/param&gt;
+                ...
+            &lt;param&gt;[PARAM-VALUE]&lt;/param&gt;
+            &lt;argument&gt;[ARGUMENT-VALUE]&lt;/argument&gt;
+                ...
+            &lt;argument&gt;[ARGUMENT-VALUE]&lt;/argument&gt;
+            &lt;file&gt;[FILE-PATH]&lt;/file&gt;
+            ...
+            &lt;archive&gt;[FILE-PATH]&lt;/archive&gt;
+            ...
+        &lt;/pig&gt;
+        &lt;ok to=&quot;[NODE-NAME]&quot;/&gt;
+        &lt;error to=&quot;[NODE-NAME]&quot;/&gt;
+    &lt;/action&gt;
+    ...
+&lt;/workflow-app&gt;
+</pre></div></div>
+
+<p><b>Syntax for Pig actions in Oozie schema 0.2:</b></p>
+
+<div>
+<div>
+<pre class="source">&lt;workflow-app name=&quot;[WF-DEF-NAME]&quot; xmlns=&quot;uri:oozie:workflow:0.2&quot;&gt;
+    ...
+    &lt;action name=&quot;[NODE-NAME]&quot;&gt;
+        &lt;pig&gt;
+            &lt;job-tracker&gt;[JOB-TRACKER]&lt;/job-tracker&gt;
+            &lt;name-node&gt;[NAME-NODE]&lt;/name-node&gt;
+            &lt;prepare&gt;
+               &lt;delete path=&quot;[PATH]&quot;/&gt;
+               ...
+               &lt;mkdir path=&quot;[PATH]&quot;/&gt;
+               ...
+            &lt;/prepare&gt;
+            &lt;job-xml&gt;[JOB-XML-FILE]&lt;/job-xml&gt;
+            &lt;configuration&gt;
+                &lt;property&gt;
+                    &lt;name&gt;[PROPERTY-NAME]&lt;/name&gt;
+                    &lt;value&gt;[PROPERTY-VALUE]&lt;/value&gt;
+                &lt;/property&gt;
+                ...
+            &lt;/configuration&gt;
+            &lt;script&gt;[PIG-SCRIPT]&lt;/script&gt;
+            &lt;param&gt;[PARAM-VALUE]&lt;/param&gt;
+                ...
+            &lt;param&gt;[PARAM-VALUE]&lt;/param&gt;
+            &lt;argument&gt;[ARGUMENT-VALUE]&lt;/argument&gt;
+                ...
+            &lt;argument&gt;[ARGUMENT-VALUE]&lt;/argument&gt;
+            &lt;file&gt;[FILE-PATH]&lt;/file&gt;
+            ...
+            &lt;archive&gt;[FILE-PATH]&lt;/archive&gt;
+            ...
+        &lt;/pig&gt;
+        &lt;ok to=&quot;[NODE-NAME]&quot;/&gt;
+        &lt;error to=&quot;[NODE-NAME]&quot;/&gt;
+    &lt;/action&gt;
+    ...
+&lt;/workflow-app&gt;
+</pre></div></div>
+
+<p><b>Syntax for Pig actions in Oozie schema 0.1:</b></p>
+
+<div>
+<div>
+<pre class="source">&lt;workflow-app name=&quot;[WF-DEF-NAME]&quot; xmlns=&quot;uri:oozie:workflow:0.1&quot;&gt;
+    ...
+    &lt;action name=&quot;[NODE-NAME]&quot;&gt;
+        &lt;pig&gt;
+            &lt;job-tracker&gt;[JOB-TRACKER]&lt;/job-tracker&gt;
+            &lt;name-node&gt;[NAME-NODE]&lt;/name-node&gt;
+            &lt;prepare&gt;
+               &lt;delete path=&quot;[PATH]&quot;/&gt;
+               ...
+               &lt;mkdir path=&quot;[PATH]&quot;/&gt;
+               ...
+            &lt;/prepare&gt;
+            &lt;job-xml&gt;[JOB-XML-FILE]&lt;/job-xml&gt;
+            &lt;configuration&gt;
+                &lt;property&gt;
+                    &lt;name&gt;[PROPERTY-NAME]&lt;/name&gt;
+                    &lt;value&gt;[PROPERTY-VALUE]&lt;/value&gt;
+                &lt;/property&gt;
+                ...
+            &lt;/configuration&gt;
+            &lt;script&gt;[PIG-SCRIPT]&lt;/script&gt;
+            &lt;param&gt;[PARAM-VALUE]&lt;/param&gt;
+                ...
+            &lt;param&gt;[PARAM-VALUE]&lt;/param&gt;
+            &lt;file&gt;[FILE-PATH]&lt;/file&gt;
+            ...
+            &lt;archive&gt;[FILE-PATH]&lt;/archive&gt;
+            ...
+        &lt;/pig&gt;
+        &lt;ok to=&quot;[NODE-NAME]&quot;/&gt;
+        &lt;error to=&quot;[NODE-NAME]&quot;/&gt;
+    &lt;/action&gt;
+    ...
+&lt;/workflow-app&gt;
+</pre></div></div>
+
+<p>The <tt>prepare</tt> element, if present, indicates a list of paths to delete before starting the job. This should be used exclusively for directory cleanup or dropping of hcatalog table or table partitions for the job to be executed. The delete operation will be performed in the <tt>fs.default.name</tt> filesystem for hdfs URIs. The format for specifying hcatalog table URI is hcat://[metastore server]:[port]/[database name]/[table name] and format to specify a hcatalog table partition URI is <tt>hcat://[metastore server]:[port]/[database name]/[table name]/[partkey1]=[value];[partkey2]=[value]</tt>. In case of a hcatalog URI, the hive-site.xml needs to be shipped using <tt>file</tt> tag and the hcatalog and hive jars need to be placed in workflow lib directory or specified using <tt>archive</tt> tag.</p>
+<p>The <tt>job-xml</tt> element, if present, must refer to a Hadoop JobConf <tt>job.xml</tt> file bundled in the workflow application. The <tt>job-xml</tt> element is optional and as of schema 0.4, multiple <tt>job-xml</tt> elements are allowed in order to specify multiple Hadoop JobConf <tt>job.xml</tt> files.</p>
+<p>The <tt>configuration</tt> element, if present, contains JobConf properties for the underlying Hadoop jobs.</p>
+<p>Properties specified in the <tt>configuration</tt> element override properties specified in the file specified in the <tt>job-xml</tt> element.</p>
+<p>External Stats can be turned on/off by specifying the property <i>oozie.action.external.stats.write</i> as <i>true</i> or <i>false</i> in the configuration element of workflow.xml. The default value for this property is <i>false</i>.</p>
+<p>The inline and job-xml configuration properties are passed to the Hadoop jobs submitted by Pig runtime.</p>
+<p>The <tt>script</tt> element contains the pig script to execute. The pig script can be templatized with variables of the form <tt>${VARIABLE}</tt>. The values of these variables can then be specified using the <tt>params</tt> element.</p>
+<p>NOTE: Oozie will perform the parameter substitution before firing the pig job. This is different from the <a class="externalLink" href="http://wiki.apache.org/pig/ParameterSubstitution">parameter substitution mechanism provided by Pig</a>, which has a few limitations.</p>
+<p>The <tt>params</tt> element, if present, contains parameters to be passed to the pig script.</p>
+<p><b>In Oozie schema 0.2:</b> The <tt>arguments</tt> element, if present, contains arguments to be passed to the pig script.</p>
+<p>All the above elements can be parameterized (templatized) using EL expressions.</p>
+<p><b>Example for Oozie schema 0.2:</b></p>
+
+<div>
+<div>
+<pre class="source">&lt;workflow-app name=&quot;sample-wf&quot; xmlns=&quot;uri:oozie:workflow:0.2&quot;&gt;
+    ...
+    &lt;action name=&quot;myfirstpigjob&quot;&gt;
+        &lt;pig&gt;
+            &lt;job-tracker&gt;foo:8021&lt;/job-tracker&gt;
+            &lt;name-node&gt;bar:8020&lt;/name-node&gt;
+            &lt;prepare&gt;
+                &lt;delete path=&quot;${jobOutput}&quot;/&gt;
+            &lt;/prepare&gt;
+            &lt;configuration&gt;
+                &lt;property&gt;
+                    &lt;name&gt;mapred.compress.map.output&lt;/name&gt;
+                    &lt;value&gt;true&lt;/value&gt;
+                &lt;/property&gt;
+                &lt;property&gt;
+                    &lt;name&gt;oozie.action.external.stats.write&lt;/name&gt;
+                    &lt;value&gt;true&lt;/value&gt;
+                &lt;/property&gt;
+            &lt;/configuration&gt;
+            &lt;script&gt;/mypigscript.pig&lt;/script&gt;
+            &lt;argument&gt;-param&lt;/argument&gt;
+            &lt;argument&gt;INPUT=${inputDir}&lt;/argument&gt;
+            &lt;argument&gt;-param&lt;/argument&gt;
+            &lt;argument&gt;OUTPUT=${outputDir}/pig-output3&lt;/argument&gt;
+        &lt;/pig&gt;
+        &lt;ok to=&quot;myotherjob&quot;/&gt;
+        &lt;error to=&quot;errorcleanup&quot;/&gt;
+    &lt;/action&gt;
+    ...
+&lt;/workflow-app&gt;
+</pre></div></div>
+
+<p><b>Example for Oozie schema 0.1:</b></p>
+
+<div>
+<div>
+<pre class="source">&lt;workflow-app name=&quot;sample-wf&quot; xmlns=&quot;uri:oozie:workflow:0.1&quot;&gt;
+    ...
+    &lt;action name=&quot;myfirstpigjob&quot;&gt;
+        &lt;pig&gt;
+            &lt;job-tracker&gt;foo:8021&lt;/job-tracker&gt;
+            &lt;name-node&gt;bar:8020&lt;/name-node&gt;
+            &lt;prepare&gt;
+                &lt;delete path=&quot;${jobOutput}&quot;/&gt;
+            &lt;/prepare&gt;
+            &lt;configuration&gt;
+                &lt;property&gt;
+                    &lt;name&gt;mapred.compress.map.output&lt;/name&gt;
+                    &lt;value&gt;true&lt;/value&gt;
+                &lt;/property&gt;
+            &lt;/configuration&gt;
+            &lt;script&gt;/mypigscript.pig&lt;/script&gt;
+            &lt;param&gt;InputDir=/home/tucu/input-data&lt;/param&gt;
+            &lt;param&gt;OutputDir=${jobOutput}&lt;/param&gt;
+        &lt;/pig&gt;
+        &lt;ok to=&quot;myotherjob&quot;/&gt;
+        &lt;error to=&quot;errorcleanup&quot;/&gt;
+    &lt;/action&gt;
+    ...
+&lt;/workflow-app&gt;
+</pre></div></div>
+
+<p><a name="FsAction"></a></p></div>
+<div class="section">
+<h4><a name="a3.2.4_Fs_HDFS_action"></a>3.2.4 Fs (HDFS) action</h4>
+<p>The <tt>fs</tt> action allows to manipulate files and directories in HDFS from a workflow application. The supported commands are <tt>move</tt>, <tt>delete</tt>, <tt>mkdir</tt>, <tt>chmod</tt>, <tt>touchz</tt>, <tt>setrep</tt> and <tt>chgrp</tt>.</p>
+<p>The FS commands are executed synchronously from within the FS action, the workflow job will wait until the specified file commands are completed before continuing to the next action.</p>
+<p>Path names specified in the <tt>fs</tt> action can be parameterized (templatized) using EL expressions. Path name should be specified as a absolute path. In case of <tt>move</tt>, <tt>delete</tt>, <tt>chmod</tt> and <tt>chgrp</tt> commands, a glob pattern can also be specified instead of an absolute path. For <tt>move</tt>, glob pattern can only be specified for source path and not the target.</p>
+<p>Each file path must specify the file system URI, for move operations, the target must not specify the system URI.</p>
+<p><b>IMPORTANT:</b> For the purposes of copying files within a cluster it is recommended to refer to the <tt>distcp</tt> action instead. Refer to <a href="DG_DistCpActionExtension.html"><tt>distcp</tt></a> action to copy files within a cluster.</p>
+<p><b>IMPORTANT:</b> All the commands within <tt>fs</tt> action do not happen atomically, if a <tt>fs</tt> action fails half way in the commands being executed, successfully executed commands are not rolled back. The <tt>fs</tt> action, before executing any command must check that source paths exist and target paths don&#x2019;t exist (constraint regarding target relaxed for the <tt>move</tt> action. See below for details), thus failing before executing any command. Therefore the validity of all paths specified in one <tt>fs</tt> action are evaluated before any of the file operation are executed. Thus there is less chance of an error occurring while the <tt>fs</tt> action executes.</p>
+<p><b>Syntax:</b></p>
+
+<div>
+<div>
+<pre class="source">&lt;workflow-app name=&quot;[WF-DEF-NAME]&quot; xmlns=&quot;uri:oozie:workflow:1.0&quot;&gt;
+    ...
+    &lt;action name=&quot;[NODE-NAME]&quot;&gt;
+        &lt;fs&gt;
+            &lt;delete path='[PATH]' skip-trash='[true/false]'/&gt;
+            ...
+            &lt;mkdir path='[PATH]'/&gt;
+            ...
+            &lt;move source='[SOURCE-PATH]' target='[TARGET-PATH]'/&gt;
+            ...
+            &lt;chmod path='[PATH]' permissions='[PERMISSIONS]' dir-files='false' /&gt;
+            ...
+            &lt;touchz path='[PATH]' /&gt;
+            ...
+            &lt;chgrp path='[PATH]' group='[GROUP]' dir-files='false' /&gt;
+            ...
+            &lt;setrep path='[PATH]' replication-factor='2'/&gt;
+        &lt;/fs&gt;
+        &lt;ok to=&quot;[NODE-NAME]&quot;/&gt;
+        &lt;error to=&quot;[NODE-NAME]&quot;/&gt;
+    &lt;/action&gt;
+    ...
+&lt;/workflow-app&gt;
+</pre></div></div>
+
+<p>The <tt>delete</tt> command deletes the specified path, if it is a directory it deletes recursively all its content and then deletes the directory. By default it does skip trash. It can be moved to trash by setting the value of skip-trash to &#x2018;false&#x2019;. It can also be used to drop hcat tables/partitions. This is the only FS command which supports HCatalog URIs as well. For eg:</p>
+
+<div>
+<div>
+<pre class="source">&lt;delete path='hcat://[metastore server]:[port]/[database name]/[table name]'/&gt;
+OR
+&lt;delete path='hcat://[metastore server]:[port]/[database name]/[table name]/[partkey1]=[value];[partkey2]=[value];...'/&gt;
+</pre></div></div>
+
+<p>The <tt>mkdir</tt> command creates the specified directory, it creates all missing directories in the path. If the directory already exist it does a no-op.</p>
+<p>In the <tt>move</tt> command the <tt>source</tt> path must exist. The following scenarios are addressed for a <tt>move</tt>:</p>
+<ul>
+
+<li>The file system URI(e.g. <tt>hdfs://{nameNode}</tt>) can be skipped in the <tt>target</tt> path. It is understood to be the same as that of the source. But if the target path does contain the system URI, it cannot be different than that of the source.</li>
+<li>The parent directory of the <tt>target</tt> path must exist</li>
+<li>For the <tt>target</tt> path, if it is a file, then it must not already exist.</li>
+<li>However, if the <tt>target</tt> path is an already existing directory, the <tt>move</tt> action will place your <tt>source</tt> as a child of the <tt>target</tt> directory.</li>
+</ul>
+<p>The <tt>chmod</tt> command changes the permissions for the specified path. Permissions can be specified using the Unix Symbolic representation (e.g. -rwxrw-rw-) or an octal representation (755). When doing a <tt>chmod</tt> command on a directory, by default the command is applied to the directory and the files one level within the directory. To apply the <tt>chmod</tt> command to the directory, without affecting the files within it, the <tt>dir-files</tt> attribute must be set to <tt>false</tt>. To apply the <tt>chmod</tt> command recursively to all levels within a directory, put a <tt>recursive</tt> element inside the &lt;chmod&gt; element.</p>
+<p>The <tt>touchz</tt> command creates a zero length file in the specified path if none exists. If one already exists, then touchz will perform a touch operation. Touchz works only for absolute paths.</p>
+<p>The <tt>chgrp</tt> command changes the group for the specified path. When doing a <tt>chgrp</tt> command on a directory, by default the command is applied to the directory and the files one level within the directory. To apply the <tt>chgrp</tt> command to the directory, without affecting the files within it, the <tt>dir-files</tt> attribute must be set to <tt>false</tt>. To apply the <tt>chgrp</tt> command recursively to all levels within a directory, put a <tt>recursive</tt> element inside the &lt;chgrp&gt; element.</p>
+<p>The <tt>setrep</tt> command changes replication factor of an hdfs file(s). Changing RF of directories or symlinks is not supported; this action requires an argument for RF.</p>
+<p><b>Example:</b></p>
+
+<div>
+<div>
+<pre class="source">&lt;workflow-app name=&quot;sample-wf&quot; xmlns=&quot;uri:oozie:workflow:1.0&quot;&gt;
+    ...
+    &lt;action name=&quot;hdfscommands&quot;&gt;
+         &lt;fs&gt;
+            &lt;delete path='hdfs://foo:8020/usr/tucu/temp-data'/&gt;
+            &lt;mkdir path='archives/${wf:id()}'/&gt;
+            &lt;move source='${jobInput}' target='archives/${wf:id()}/processed-input'/&gt;
+            &lt;chmod path='${jobOutput}' permissions='-rwxrw-rw-' dir-files='true'&gt;&lt;recursive/&gt;&lt;/chmod&gt;
+            &lt;chgrp path='${jobOutput}' group='testgroup' dir-files='true'&gt;&lt;recursive/&gt;&lt;/chgrp&gt;
+            &lt;setrep path='archives/${wf:id()/filename(s)}' replication-factor='2'/&gt;
+        &lt;/fs&gt;
+        &lt;ok to=&quot;myotherjob&quot;/&gt;
+        &lt;error to=&quot;errorcleanup&quot;/&gt;
+    &lt;/action&gt;
+    ...
+&lt;/workflow-app&gt;
+</pre></div></div>
+
+<p>In the above example, a directory named after the workflow job ID is created and the input of the job, passed as workflow configuration parameter, is archived under the previously created directory.</p>
+<p>As of schema 0.4, if a <tt>name-node</tt> element is specified, then it is not necessary for any of the paths to start with the file system URI as it is taken from the <tt>name-node</tt> element. This is also true if the name-node is specified in the global section (see <a href="WorkflowFunctionalSpec.html#GlobalConfigurations">Global Configurations</a>)</p>
+<p>As of schema 0.4, zero or more <tt>job-xml</tt> elements can be specified; these must refer to Hadoop JobConf <tt>job.xml</tt> formatted files bundled in the workflow application. They can be used to set additional properties for the FileSystem instance.</p>
+<p>As of schema 0.4, if a <tt>configuration</tt> element is specified, then it will also be used to set additional JobConf properties for the FileSystem instance. Properties specified in the <tt>configuration</tt> element override properties specified in the files specified by any <tt>job-xml</tt> elements.</p>
+<p><b>Example:</b></p>
+
+<div>
+<div>
+<pre class="source">&lt;workflow-app name=&quot;sample-wf&quot; xmlns=&quot;uri:oozie:workflow:0.4&quot;&gt;
+    ...
+    &lt;action name=&quot;hdfscommands&quot;&gt;
+        &lt;fs&gt;
+           &lt;name-node&gt;hdfs://foo:8020&lt;/name-node&gt;
+           &lt;job-xml&gt;fs-info.xml&lt;/job-xml&gt;
+           &lt;configuration&gt;
+             &lt;property&gt;
+               &lt;name&gt;some.property&lt;/name&gt;
+               &lt;value&gt;some.value&lt;/value&gt;
+             &lt;/property&gt;
+           &lt;/configuration&gt;
+           &lt;delete path='/usr/tucu/temp-data'/&gt;
+        &lt;/fs&gt;
+        &lt;ok to=&quot;myotherjob&quot;/&gt;
+        &lt;error to=&quot;errorcleanup&quot;/&gt;
+    &lt;/action&gt;
+    ...
+&lt;/workflow-app&gt;
+</pre></div></div>
+
+<p><a name="SubWorkflowAction"></a></p></div>
+<div class="section">
+<h4><a name="a3.2.5_Sub-workflow_Action"></a>3.2.5 Sub-workflow Action</h4>
+<p>The <tt>sub-workflow</tt> action runs a child workflow job.</p>
+<p>The parent workflow job will wait until the child workflow job has completed.</p>
+<p>There can be several sub-workflows defined within a single workflow, each under its own action element.</p>
+<p><b>Syntax:</b></p>
+
+<div>
+<div>
+<pre class="source">&lt;workflow-app name=&quot;[WF-DEF-NAME]&quot; xmlns=&quot;uri:oozie:workflow:1.0&quot;&gt;
+    ...
+    &lt;action name=&quot;[NODE-NAME]&quot;&gt;
+        &lt;sub-workflow&gt;
+            &lt;app-path&gt;[WF-APPLICATION-PATH]&lt;/app-path&gt;
+            &lt;propagate-configuration/&gt;
+            &lt;configuration&gt;
+                &lt;property&gt;
+                    &lt;name&gt;[PROPERTY-NAME]&lt;/name&gt;
+                    &lt;value&gt;[PROPERTY-VALUE]&lt;/value&gt;
+                &lt;/property&gt;
+                ...
+            &lt;/configuration&gt;
+        &lt;/sub-workflow&gt;
+        &lt;ok to=&quot;[NODE-NAME]&quot;/&gt;
+        &lt;error to=&quot;[NODE-NAME]&quot;/&gt;
+    &lt;/action&gt;
+    ...
+&lt;/workflow-app&gt;
+</pre></div></div>
+
+<p>The child workflow job runs in the same Oozie system instance where the parent workflow job is running.</p>
+<p>The <tt>app-path</tt> element specifies the path to the workflow application of the child workflow job.</p>
+<p>The <tt>propagate-configuration</tt> flag, if present, indicates that the workflow job configuration should be propagated to the child workflow.</p>
+<p>The <tt>configuration</tt> section can be used to specify the job properties that are required to run the child workflow job.</p>
+<p>The configuration of the <tt>sub-workflow</tt> action can be parameterized (templatized) using EL expressions.</p>
+<p><b>Example:</b></p>
+
+<div>
+<div>
+<pre class="source">&lt;workflow-app name=&quot;sample-wf&quot; xmlns=&quot;uri:oozie:workflow:1.0&quot;&gt;
+    ...
+    &lt;action name=&quot;a&quot;&gt;
+        &lt;sub-workflow&gt;
+            &lt;app-path&gt;child-wf&lt;/app-path&gt;
+            &lt;configuration&gt;
+                &lt;property&gt;
+                    &lt;name&gt;input.dir&lt;/name&gt;
+                    &lt;value&gt;${wf:id()}/second-mr-output&lt;/value&gt;
+                &lt;/property&gt;
+            &lt;/configuration&gt;
+        &lt;/sub-workflow&gt;
+        &lt;ok to=&quot;end&quot;/&gt;
+        &lt;error to=&quot;kill&quot;/&gt;
+    &lt;/action&gt;
+    ...
+&lt;/workflow-app&gt;
+</pre></div></div>
+
+<p>In the above example, the workflow definition with the name <tt>child-wf</tt> will be run on the Oozie instance at <tt>.http://myhost:11000/oozie</tt>. The specified workflow application must be already deployed on the target Oozie instance.</p>
+<p>A configuration parameter <tt>input.dir</tt> is being passed as job property to the child workflow job.</p>
+<p>The subworkflow can inherit the lib jars from the parent workflow by setting <tt>oozie.subworkflow.classpath.inheritance</tt> to true in oozie-site.xml or on a per-job basis by setting <tt>oozie.wf.subworkflow.classpath.inheritance</tt> to true in a job.properties file. If both are specified, <tt>oozie.wf.subworkflow.classpath.inheritance</tt> has priority.  If the subworkflow and the parent have conflicting jars, the subworkflow&#x2019;s jar has priority.  By default, <tt>oozie.wf.subworkflow.classpath.inheritance</tt> is set to false.</p>
+<p>To prevent errant workflows from starting infinitely recursive subworkflows, <tt>oozie.action.subworkflow.max.depth</tt> can be specified in oozie-site.xml to set the maximum depth of subworkflow calls.  For example, if set to 3, then a workflow can start subwf1, which can start subwf2, which can start subwf3; but if subwf3 tries to start subwf4, then the action will fail.  The default is 50.</p>
+<p><a name="JavaAction"></a></p></div>
+<div class="section">
+<h4><a name="a3.2.6_Java_Action"></a>3.2.6 Java Action</h4>
+<p>The <tt>java</tt> action will execute the <tt>public static void main(String[] args)</tt> method of the specified main Java class.</p>
+<p>Java applications are executed in the Hadoop cluster as map-reduce job with a single Mapper task.</p>
+<p>The workflow job will wait until the java application completes its execution before continuing to the next action.</p>
+<p>The <tt>java</tt> action has to be configured with the resource-manager, name-node, main Java class, JVM options and arguments.</p>
+<p>To indicate an <tt>ok</tt> action transition, the main Java class must complete gracefully the <tt>main</tt> method invocation.</p>
+<p>To indicate an <tt>error</tt> action transition, the main Java class must throw an exception.</p>
+<p>The main Java class can call <tt>System.exit(int n)</tt>. Exit code zero is regarded as OK, while non-zero exit codes will cause the <tt>java</tt> action to do an <tt>error</tt> transition and exit.</p>
+<p>A <tt>java</tt> action can be configured to perform HDFS files/directories cleanup or HCatalog partitions cleanup before starting the Java application. This capability enables Oozie to retry a Java application in the situation of a transient or non-transient failure (This can be used to cleanup any temporary data which may have been created by the Java application in case of failure).</p>
+<p>A <tt>java</tt> action can create a Hadoop configuration for interacting with a cluster (e.g. launching a map-reduce job). Oozie prepares a Hadoop configuration file which includes the environments site configuration files (e.g. hdfs-site.xml, mapred-site.xml, etc) plus the properties added to the <tt>&lt;configuration&gt;</tt> section of the <tt>java</tt> action. The Hadoop configuration file is made available as a local file to the Java application in its running directory. It can be added to the <tt>java</tt> actions Hadoop configuration by referencing the system property: <tt>oozie.action.conf.xml</tt>. For example:</p>
+
+<div>
+<div>
+<pre class="source">// loading action conf prepared by Oozie
+Configuration actionConf = new Configuration(false);
+actionConf.addResource(new Path(&quot;file:///&quot;, System.getProperty(&quot;oozie.action.conf.xml&quot;)));
+</pre></div></div>
+
+<p>If <tt>oozie.action.conf.xml</tt> is not added then the job will pick up the mapred-default properties and this may result in unexpected behaviour. For repeated configuration properties later values override earlier ones.</p>
+<p>Inline property values can be parameterized (templatized) using EL expressions.</p>
+<p>The YARN <tt>yarn.resourcemanager.address</tt> (<tt>resource-manager</tt>) and HDFS <tt>fs.default.name</tt> (<tt>name-node</tt>) properties must not be present in the <tt>job-xml</tt> and in the inline configuration.</p>
+<p>As with <tt>map-reduce</tt> and <tt>pig</tt> actions, it  is possible to add files and archives to be available to the Java application. Refer to section [#FilesArchives][Adding Files and Archives for the Job].</p>
+<p>The <tt>capture-output</tt> element can be used to propagate values back into Oozie context, which can then be accessed via EL-functions. This needs to be written out as a java properties format file. The filename is obtained via a System property specified by the constant <tt>oozie.action.output.properties</tt></p>
+<p><b>IMPORTANT:</b> In order for a Java action to succeed on a secure cluster, it must propagate the Hadoop delegation token like in the following code snippet (this is benign on non-secure clusters):</p>
+
+<div>
+<div>
+<pre class="source">// propagate delegation related props from launcher job to MR job
+if (System.getenv(&quot;HADOOP_TOKEN_FILE_LOCATION&quot;) != null) {
+    jobConf.set(&quot;mapreduce.job.credentials.binary&quot;, System.getenv(&quot;HADOOP_TOKEN_FILE_LOCATION&quot;));
+}
+</pre></div></div>
+
+<p><b>IMPORTANT:</b> Because the Java application is run from within a Map-Reduce job, from Hadoop 0.20. onwards a queue must be assigned to it. The queue name must be specified as a configuration property.</p>
+<p><b>IMPORTANT:</b> The Java application from a Java action is executed in a single map task.  If the task is abnormally terminated, such as due to a TaskTracker restart (e.g. during cluster maintenance), the task will be retried via the normal Hadoop task retry mechanism.  To avoid workflow failure, the application should be written in a fashion that is resilient to such retries, for example by detecting and deleting incomplete outputs or picking back up from complete outputs.  Furthermore, if a Java action spawns asynchronous activity outside the JVM of the action itself (such as by launching additional MapReduce jobs), the application must consider the possibility of collisions with activity spawned by the new instance.</p>
+<p><b>Syntax:</b></p>
+
+<div>
+<div>
+<pre class="source">&lt;workflow-app name=&quot;[WF-DEF-NAME]&quot; xmlns=&quot;uri:oozie:workflow:1.0&quot;&gt;
+    ...
+    &lt;action name=&quot;[NODE-NAME]&quot;&gt;
+        &lt;java&gt;
+            &lt;resource-manager&gt;[RESOURCE-MANAGER]&lt;/resource-manager&gt;
+            &lt;name-node&gt;[NAME-NODE]&lt;/name-node&gt;
+            &lt;prepare&gt;
+               &lt;delete path=&quot;[PATH]&quot;/&gt;
+               ...
+               &lt;mkdir path=&quot;[PATH]&quot;/&gt;
+               ...
+            &lt;/prepare&gt;
+            &lt;job-xml&gt;[JOB-XML]&lt;/job-xml&gt;
+            &lt;configuration&gt;
+                &lt;property&gt;
+                    &lt;name&gt;[PROPERTY-NAME]&lt;/name&gt;
+                    &lt;value&gt;[PROPERTY-VALUE]&lt;/value&gt;
+                &lt;/property&gt;
+                ...
+            &lt;/configuration&gt;
+            &lt;main-class&gt;[MAIN-CLASS]&lt;/main-class&gt;
+			&lt;java-opts&gt;[JAVA-STARTUP-OPTS]&lt;/java-opts&gt;
+			&lt;arg&gt;ARGUMENT&lt;/arg&gt;
+            ...
+            &lt;file&gt;[FILE-PATH]&lt;/file&gt;
+            ...
+            &lt;archive&gt;[FILE-PATH]&lt;/archive&gt;
+            ...
+            &lt;capture-output /&gt;
+        &lt;/java&gt;
+        &lt;ok to=&quot;[NODE-NAME]&quot;/&gt;
+        &lt;error to=&quot;[NODE-NAME]&quot;/&gt;
+    &lt;/action&gt;
+    ...
+&lt;/workflow-app&gt;
+</pre></div></div>
+
+<p>The <tt>prepare</tt> element, if present, indicates a list of paths to delete before starting the Java application. This should be used exclusively for directory cleanup or dropping of hcatalog table or table partitions for the Java application to be executed. In case of <tt>delete</tt>, a glob pattern can be used to specify path. The format for specifying hcatalog table URI is hcat://[metastore server]:[port]/[database name]/[table name] and format to specify a hcatalog table partition URI is <tt>hcat://[metastore server]:[port]/[database name]/[table name]/[partkey1]=[value];[partkey2]=[value]</tt>. In case of a hcatalog URI, the hive-site.xml needs to be shipped using <tt>file</tt> tag and the hcatalog and hive jars need to be placed in workflow lib directory or specified using <tt>archive</tt> tag.</p>

[... 3526 lines stripped ...]