You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@oozie.apache.org by ka...@apache.org on 2012/08/08 03:56:54 UTC

svn commit: r1370632 - /incubator/oozie/site/publish/java-cookbook.html

Author: kamrul
Date: Wed Aug  8 01:56:54 2012
New Revision: 1370632

URL: http://svn.apache.org/viewvc?rev=1370632&view=rev
Log:
Committing new generated html files

Added:
    incubator/oozie/site/publish/java-cookbook.html

Added: incubator/oozie/site/publish/java-cookbook.html
URL: http://svn.apache.org/viewvc/incubator/oozie/site/publish/java-cookbook.html?rev=1370632&view=auto
==============================================================================
--- incubator/oozie/site/publish/java-cookbook.html (added)
+++ incubator/oozie/site/publish/java-cookbook.html Wed Aug  8 01:56:54 2012
@@ -0,0 +1,450 @@
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
+<!-- Generated by Apache Maven Doxia at Aug 7, 2012 -->
+<html xmlns="http://www.w3.org/1999/xhtml">
+  <head>
+    <title>Apache Oozie - Java Cookbook</title>
+    <style type="text/css" media="all">
+      @import url("./css/maven-base.css");
+      @import url("./css/maven-theme.css");
+      @import url("./css/site.css");
+    </style>
+    <link rel="stylesheet" href="./css/print.css" type="text/css" media="print" />
+        <meta name="author" content="$maven.build.timestamp" />
+        <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
+      </head>
+  <body class="composite">
+    <div id="banner">
+                  <span id="bannerLeft">
+                 
+                </span>
+                    <div class="clear">
+        <hr/>
+      </div>
+    </div>
+    <div id="breadcrumbs">
+            
+                                <div class="xleft">
+        Last Published: 2012-08-07
+                          |                   <a href="index.html">Apache Oozie</a>
+        &gt;
+    Apache Oozie - Java Cookbook
+              </div>
+            <div class="xright">            <a href="http://www.apache.org/" class="externalLink">ASF</a>
+              
+                                 Version: 3.1.0-SNAPSHOT
+      </div>
+      <div class="clear">
+        <hr/>
+      </div>
+    </div>
+    <div id="leftColumn">
+      <div id="navcolumn">
+             
+                                                <h5>Project</h5>
+                  <ul>
+                  <li class="none">
+                  <a href="./index.html">Home</a>
+            </li>
+                  <li class="none">
+                  <a href="./Downloads.html">Downloads</a>
+            </li>
+                  <li class="none">
+                  <a href="./Credits.html">Credits</a>
+            </li>
+                  <li class="none">
+                  <a href="./MailingLists.html">Mailing Lists</a>
+            </li>
+                  <li class="none">
+                  <a href="./IssueTracking.html">Issue Tracking</a>
+            </li>
+                  <li class="none">
+                  <a href="./IRCChannel.html">IRC Channel</a>
+            </li>
+          </ul>
+                       <h5>Developers</h5>
+                  <ul>
+                  <li class="none">
+                  <a href="./VersionControl.html">Version Control</a>
+            </li>
+                  <li class="none">
+                  <a href="./HowToContribute.html">How To Contribute</a>
+            </li>
+                  <li class="none">
+                  <a href="HowToRelease.html">How to Release</a>
+            </li>
+          </ul>
+                       <h5>Documentation</h5>
+                  <ul>
+                  <li class="none">
+                  <a href="./docs/3.2.0-incubating/docs/index.html">3.2.0-incubating</a>
+            </li>
+                  <li class="none">
+                  <a href="./docs/3.2.0-incubating/docs/DG_QuickStart.html">Quick start</a>
+            </li>
+                  <li class="none">
+                  <a href="./overview.html">Overview</a>
+            </li>
+                  <li class="none">
+                  <a href="./map-reduce-cookbook.html">MapReduce Cookbook</a>
+            </li>
+                  <li class="none">
+                  <a href="./pig-cookbook.html">Pig Cookbook</a>
+            </li>
+                  <li class="none">
+                  <a href="./java-cookbook.html">Java Cookbook</a>
+            </li>
+                  <li class="none">
+                  <a href="./docs">Older releases</a>
+            </li>
+          </ul>
+                                 <a href="http://maven.apache.org/" title="Built by Maven" class="poweredBy">
+          <img alt="Built by Maven" src="./images/logos/maven-feather.png"/>
+        </a>
+                       
+                            </div>
+    </div>
+    <div id="bodyColumn">
+      <div id="contentBox">
+        <div class="section"><h2>Java Cookbook</h2>
+<p>This document comprehensively describes the procedure of running Java code using Oozie. Its targeted audience is all forms of users who will install, use and operate Oozie.</p>
+<div class="section"><h3>Java Action specification</h3>
+<p>The Java action will execute the public static void main(String[] args) method of the specified main Java class. The required Java class(es) should be packaged in the form of a JAR and placed within your workflow application's <b>lib</b> directory.</p>
+<p>i.e.</p>
+<ul><li>wf-app-dir/workflow.xml</li>
+<li>wf-app-dir/lib</li>
+<li>wf-app-dir/lib/myJavaClasses.JAR</li>
+</ul>
+<p>When executing a java application without oozie, one specifies the class having the main function (main-class), run-time JVM options and the arguments to be passed.</p>
+<div class="source"><pre>    $ java -Xms512m a.b.c.MyMainClass arg1 arg2</pre>
+</div>
+<p>Now with Oozie, they are specified as inline tags in the <b>workflow.xml</b> file.</p>
+<div class="source"><pre>  &lt;action name='java1'&gt;
+    &lt;java&gt;
+  ...
+          &lt;main-class&gt; a.b.c.MyJavaMain &lt;/main-class&gt;
+          &lt;java-opts&gt; -Xms512m &lt;/java-opts&gt;
+          &lt;arg&gt; arg1 &lt;/arg&gt;
+          &lt;arg&gt; arg2 &lt;/arg&gt;
+  ...
+    &lt;/java&gt;
+  &lt;/action&gt;</pre>
+</div>
+<p>The <tt>java-opts</tt> element, if present, contains the command line parameters which are to be used to start the JVM that will execute the Java application. For multiple command line parameters, there may instead be a <tt>java-opt</tt> element for each.</p>
+<p>The <tt>arg</tt> elements, if present, contains arguments for the main function. The value of each <tt>arg</tt> element is considered a single argument and they are passed to the main method <i>in the same order</i>.</p>
+<p>Java applications are executed in the Hadoop cluster as map-reduce job with a single Mapper task. Hence the Java action has to be configured with the following properties in the form of XML tags:</p>
+<ul><li><tt>&lt;jobtracker&gt;</tt></li>
+<li><tt>&lt;namenode&gt;</tt></li>
+<li><tt>&lt;configuration&gt;</tt> element is used to specify key/value properties for the map-reduce job. Some common properties include:<ul><li><tt>mapred.job.queue.name</tt> specifies the queue-name that the job will be submitted to since the Java application is run from within a Map-Reduce job. If not mentioned, the default queue <i>default</i> is assumed.<div class="source"><pre>      &lt;action name='java1'&gt;
+          &lt;java&gt;
+              &lt;job-tracker&gt;foo.bar:8021&lt;/job-tracker&gt;
+              &lt;name-node&gt;foo1.bar:8020&lt;/name-node&gt;
+                                ...
+              &lt;configuration&gt;
+               &lt;property&gt;
+                    &lt;name&gt;abc&lt;/name&gt;
+                    &lt;value&gt;def&lt;/value&gt;
+               &lt;/property&gt;
+            &lt;/configuration&gt;
+          &lt;/java&gt;
+      &lt;/action&gt;</pre>
+</div>
+</li>
+</ul>
+<p><b>Prepare block</b></p>
+<p>A java action can be configured to perform HDFS files/directories cleanup such as deleting an existing output directory (<tt>&lt;delete&gt;</tt>) or creating a new one (<tt>&lt;mkdir&gt;</tt>) before starting the Java application. This capability enables Oozie to retry a Java application in the situation of a transient or non-transient failure (This can be used to cleanup any temporary data which may have been created by the Java application in case of failure).</p>
+<p>The prepare element, if present, indicates a list of paths to do file operations upon, before starting the Java application. This should be used exclusively for directory cleanup for the Java application to be executed.</p>
+<p><b>Capture-output element</b></p>
+<ul><li>The capture-output element can be used to propagate values back into Oozie context, which can then be accessed via EL-functions. Thus, in a workflow application with multiple actions, the output of the Java action can be accessed by subsequent actions.<ul><li>This needs to be written out as a Java properties format file specified via the property <tt>oozie.action.output.properties</tt>.</li>
+<li>A useful EL function for capturing action output data is<ul><li>Map wf:actionData(String node)</li>
+</ul>
+</li>
+</ul>
+<p>This function is only applicable to action nodes that produce output data on completion.</p>
+<p>The output data is in a Java Properties format and via this EL function it is available as a Map ..</p>
+<ul><li>From Oozie v3.3 onwards, more EL functions are available so your Java code can process this output in numerous formats (e.g. Properties, JSON etc.)<p>For more details, refer to the section on EL functions in the <a class="externalLink" href="http://incubator.apache.org/oozie/docs/3.1.3/docs/WorkflowFunctionalSpec.html#a4.2.2_Basic_EL_Functions">Workflow functional specification</a>.</p>
+</li>
+</ul>
+<p>The example below illustrates the use of <tt>&lt;capture-output&gt;</tt> element and the corresponding java main function definition.</p>
+<ul><li>In this example, we pass a the PASS_ME variable between the java action and the pig1 action.</li>
+<li>The PASS_ME variable is given the value 123456 in the java-main action named java1.</li>
+<li>The pig1 action subsequently reads the value of the PASS_ME variable and passes it to the PIG script.<div class="source"><pre>
+&lt;workflow-app xmlns='uri:oozie:workflow:0.1' name='java-wf'&gt;
+    &lt;start to='java1' /&gt;
+
+    &lt;action name='java1'&gt;
+        &lt;java&gt;
+            &lt;job-tracker&gt;${jobTracker}&lt;/job-tracker&gt;
+            &lt;name-node&gt;${nameNode}&lt;/name-node&gt;
+            &lt;configuration&gt;
+               &lt;property&gt;
+                    &lt;name&gt;mapred.job.queue.name&lt;/name&gt;
+                    &lt;value&gt;${queueName}&lt;/value&gt;
+                &lt;/property&gt;
+            &lt;/configuration&gt;
+            &lt;main-class&gt;org.apache.oozie.test.MyTest&lt;/main-class&gt;
+            &lt;arg&gt;${outputFileName}&lt;/arg&gt;
+            &lt;capture-output/&gt;
+        &lt;/java&gt;
+        &lt;ok to=&quot;pig1&quot; /&gt;
+        &lt;error to=&quot;fail&quot; /&gt;
+    &lt;/action&gt;
+
+
+    &lt;action name='pig1'&gt;
+        &lt;pig&gt;
+            &lt;job-tracker&gt;${jobTracker}&lt;/job-tracker&gt;
+            &lt;name-node&gt;${nameNode}&lt;/name-node&gt;
+            &lt;configuration&gt;
+                &lt;property&gt;
+                    &lt;name&gt;mapred.job.queue.name&lt;/name&gt;
+                    &lt;value&gt;${queueName}&lt;/value&gt;
+                &lt;/property&gt;
+            &lt;/configuration&gt;
+            &lt;script&gt;script.pig&lt;/script&gt;
+            &lt;param&gt;MY_VAR=${wf:actionData('java1')['PASS_ME']}&lt;/param&gt;
+        &lt;/pig&gt;
+        &lt;ok to=&quot;end&quot; /&gt;
+        &lt;error to=&quot;fail&quot; /&gt;
+    &lt;/action&gt;
+
+    &lt;kill name=&quot;fail&quot;&gt;
+        &lt;message&gt;Pig failed, error message[${wf:errorMessage(wf:lastErrorNode())}]&lt;/message&gt;
+    &lt;/kill&gt;
+    &lt;end name='end' /&gt;
+&lt;/workflow-app&gt;</pre>
+</div>
+<p><b>Java Main Class</b></p>
+<p>The main() method writes a Property file to the path specified in the 'oozie.action.output.properties' ENVIRONMENT variable.</p>
+<div class="source"><pre>
+package org.apache.oozie.test;
+
+import java.io.*;
+import java.util.Properties;
+
+public class MyTest {
+
+   ////////////////////////////////
+   // Do whatever you want in here
+   ////////////////////////////////
+   public static void main (String[] args)
+   {
+      String fileName = args[0];
+      try{
+         File file = new File(System.getProperty(&quot;oozie.action.output.properties&quot;));
+         Properties props = new Properties();
+         props.setProperty(&quot;PASS_ME&quot;, &quot;123456&quot;);
+
+         OutputStream os = new FileOutputStream(file);
+         props.store(os, &quot;&quot;);
+         os.close();
+         System.out.println(file.getAbsolutePath());
+      }
+      catch (Exception e) {
+         e.printStackTrace();
+      }
+   }
+}
+</pre>
+</div>
+</li>
+</ul>
+<p><b>Launcher configuration properties</b></p>
+<p>Oozie executes the Java action within a Launcher mapper on the compute node. Some commonly used <tt>&lt;configuration&gt;</tt> properties passed for the java action can be as follows:</p>
+<ul><li><tt>&lt;oozie.mapred.child.java.opts&gt;</tt> similar to using the <tt>&lt;java-opts&gt;</tt> described before</li>
+<li>Setting environment variables<div class="source"><pre>          &lt;property&gt;
+          &lt;name&gt;oozie.mapred.child.env&lt;/name&gt;
+          &lt;value&gt;A=foo&lt;/value&gt;
+          &lt;/property&gt;</pre>
+</div>
+</li>
+</ul>
+<p>Thus summarily, mapred properties are applied to the java-action map job by prefixing &quot;oozie.&quot; to those property names.</p>
+</li>
+</ul>
+</li>
+</ul>
+</div>
+<div class="section"><h3>Java Action Transition</h3>
+<p>The workflow job will wait until the java application completes its execution before continuing to the next action. To indicate an ok action transition, the main Java class must complete gracefully the main method invocation.</p>
+<p>To indicate an error action transition, the main Java class must throw an exception.</p>
+<p>e.g. If the main Java class calls System.exit(int n) where n is <b>non-zero</b>, this will make the java action to do an <b>error transition</b> regardless of the used exit code.</p>
+<p>Non-zero system exit will be error whereas a System.exit(0) will lead to <tt>ok</tt> transition.</p>
+</div>
+<div class="section"><h3>Debugging inside Java classes</h3>
+<ul><li>If the Java class function has any System.out.print commands, their output is seen on the HDFS Jobtracker task log (accessible via Oozie web-console by clicking on the job in the list and then using the console URL for jobtracker to view the log for your specific task).</li>
+<li>Alternately, if you know your job-id from CLI, you can look up your job's logs from the jobtracker webpage and then the 'map' task logs. The Oozie launcher has some very informative log statements such as job properties and exception traces in these task logs.</li>
+<li>If you are performing incremental changes to your java classes and want to see the changes reflected, you need to update the JAR that is supposed to include your java class. In this case, the JAR is the one inside your workflow application's lib directory.<div class="source"><pre>        $ jar uf /path/to/jar -C /path/to/a/ a/b/c/myMainClass.class</pre>
+</div>
+<p>where a.b.c.myMainClass is the package structure you wish to maintain in the JAR.</p>
+</li>
+</ul>
+</div>
+<div class="section"><h3>Java System Properties</h3>
+<p>Within your Java Main class, you can query for the following system properties pertaining to your Oozie job.</p>
+<ul><li><b>oozie.job.id</b> : Workflow ID</li>
+<li><b>oozie.action.id</b> : Action ID</li>
+<li><b>oozie.action.conf.xml</b> : local path to the resolved action configuration</li>
+<li><b>oozie.action.output.properties</b> : Action properties output as a Java Properties file (described more in Java Action Specification section. See the use-cases for illustration)</li>
+</ul>
+<p>This provides for a convenient way to access these values directly within the action code.</p>
+</div>
+<div class="section"><h3>Examples and Use-Cases</h3>
+<p>Following are example workflow applications that illustrate use-cases of the Oozie Java action.</p>
+<p><b>I: Using Java-Main action to copy local file to HDFS</b></p>
+<p>Assume a local file <tt>$<a name="filename">filename</a></tt> can be accessed by all cluster nodes.</p>
+<ul><li>Define a java action in your workflow.xml<div class="source"><pre>&lt;workflow-app xmlns='uri:oozie:workflow:0.3' name='java-filecopy-wf'&gt;
+    &lt;start to='java1' /&gt;
+        &lt;action name='java1'&gt;
+        &lt;java&gt;
+            &lt;job-tracker&gt;${jobTracker}&lt;/job-tracker&gt;
+            &lt;name-node&gt;${nameNode}&lt;/name-node&gt;
+            &lt;configuration&gt;
+                &lt;property&gt;
+                    &lt;name&gt;mapred.job.queue.name&lt;/name&gt;
+                    &lt;value&gt;${queueName}&lt;/value&gt;
+                &lt;/property&gt;
+            &lt;/configuration&gt;
+            &lt;main-class&gt;testCopyFromLocal&lt;/main-class&gt;
+            &lt;arg&gt;${filename}&lt;/arg&gt;
+            &lt;arg&gt;${nameNode}${testDir}&lt;/arg&gt;
+            &lt;capture-output/&gt;
+        &lt;/java&gt;
+        &lt;ok to=&quot;end&quot; /&gt;
+        &lt;error to=&quot;fail&quot; /&gt;
+    &lt;/action&gt;
+    &lt;kill name=&quot;fail&quot;&gt;
+        &lt;message&gt;Java failed, error message[${wf:errorMessage(wf:lastErrorNode())}]
+            &lt;/message&gt;
+    &lt;/kill&gt;
+    &lt;end name='end' /&gt;
+&lt;/workflow-app&gt;</pre>
+</div>
+</li>
+<li>Here is a sample java main class.<div class="source"><pre>
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.FSDataOutputStream;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.conf.Configuration;
+import java.io.File;
+import java.io.IOException;
+
+public class testCopyFromLocal {
+
+        public static void main (String[] args) throws IOException {
+                String src = args[0];
+                String dst = args[1];
+                System.out.println(&quot;testCopyFromLocal, source= &quot; + src);
+                System.out.println(&quot;testCopyFromLocal, target= &quot; + dst);
+                Configuration conf = new Configuration();
+                Path src1 = new Path(src);
+                Path dst1 = new Path(dst);
+                FileSystem fs = FileSystem.get(conf);
+                try{
+                   //delete local file after copy
+                   fs.copyFromLocalFile(true, true, src1, dst1);
+                }
+                catch(IOException ex) {
+                   System.err.println(&quot;IOException during copy operation &quot; +
+                                       ex.toString());
+                   ex.printStackTrace();
+                   System.exit(1);
+                }
+        }
+}</pre>
+</div>
+</li>
+</ul>
+<p><b>II: Java-Main Action decision nodes</b></p>
+<p>This example is to illustrate how action output data captured using capture output can be used in decision nodes.</p>
+<div class="source"><pre>&lt;workflow-app xmlns='uri:oozie:workflow:0.3' name='java-actionprops-wf'&gt;
+    &lt;start to='java3' /&gt;
+        &lt;action name='java3'&gt;
+        &lt;java&gt;
+            &lt;job-tracker&gt;${jobTracker}&lt;/job-tracker&gt;
+            &lt;name-node&gt;${nameNode}&lt;/name-node&gt;
+            &lt;configuration&gt;
+                &lt;property&gt;
+                    &lt;name&gt;mapred.job.queue.name&lt;/name&gt;
+                    &lt;value&gt;${queueName}&lt;/value&gt;
+                &lt;/property&gt;
+            &lt;/configuration&gt;
+            &lt;main-class&gt;exampleDecision&lt;/main-class&gt;
+            &lt;arg&gt;yes&lt;/arg&gt;
+            &lt;capture-output/&gt;
+        &lt;/java&gt;
+        &lt;ok to=&quot;end&quot; /&gt;
+        &lt;error to=&quot;fail&quot; /&gt;
+    &lt;/action&gt;
+
+        &lt;decision name=&quot;decision1&quot;&gt;
+       &lt;switch&gt;
+           &lt;case to=&quot;end&quot;&gt;${(wf:actionData('java3')['key1'] == &quot;value1&quot;) and (wf:actionData('java3')['key2'] == &quot;value2&quot;)}&lt;/case&gt;
+           &lt;default to=&quot;fail&quot; /&gt;
+       &lt;/switch&gt;
+    &lt;/decision&gt;
+
+    &lt;kill name=&quot;fail&quot;&gt;
+        &lt;message&gt;Java failed, error message[${wf:errorMessage(wf:lastErrorNode())}]
+         &lt;/message&gt;
+    &lt;/kill&gt;
+    &lt;end name='end' /&gt;
+&lt;workflow-app&gt;</pre>
+</div>
+<p>The corresponding Java class can be as below:</p>
+<div class="source"><pre>
+import java.io.File;
+import java.io.FileNotFoundException;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.io.OutputStream;
+import java.util.Properties;
+
+public class exampleDecision {
+    public static void main (String[] args) {
+       String text = args[0];
+        try{
+          File file = new File(System.getProperty(&quot;oozie.action.output.properties&quot;));
+          Properties props = new Properties();
+
+          if (text.equals(&quot;yes&quot;)) {
+                props.setProperty(&quot;key1&quot;, &quot;value1&quot;);
+                props.setProperty(&quot;key2&quot;, &quot;value2&quot;);
+          } else {
+                props.setProperty(&quot;key1&quot;, &quot;novalue&quot;);
+                props.setProperty(&quot;key2&quot;, &quot;novalue&quot;);
+          }
+
+          OutputStream os = new FileOutputStream(file);
+          props.store(os, &quot;&quot;);
+          os.close();
+          System.out.println(file.getAbsolutePath());
+        }
+        catch (Exception e) {
+          e.printStackTrace();
+        }
+    }
+}</pre>
+</div>
+</div>
+</div>
+
+      </div>
+    </div>
+    <div class="clear">
+      <hr/>
+    </div>
+    <div id="footer">
+      <div class="xright">
+        &#169;            2012
+              Apache Software Foundation
+            
+                       - <a href="http://maven.apache.org/privacy-policy.html">Privacy Policy</a>.
+        Apache Maven, Maven, Apache, the Apache feather logo, and the Apache Maven project logos are trademarks of The Apache Software Foundation.
+      </div>
+      <div class="clear">
+        <hr/>
+      </div>
+    </div>
+  </body>
+</html>