You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@oodt.apache.org by bf...@apache.org on 2010/12/23 03:47:22 UTC

svn commit: r1052147 [12/12] - in /oodt/branches/wengine-branch/wengine: ./ src/ src/main/ src/main/assembly/ src/main/bin/ src/main/java/ src/main/java/org/ src/main/java/org/apache/ src/main/java/org/apache/oodt/ src/main/java/org/apache/oodt/cas/ sr...

Added: oodt/branches/wengine-branch/wengine/src/site/xdoc/tech/index.xml
URL: http://svn.apache.org/viewvc/oodt/branches/wengine-branch/wengine/src/site/xdoc/tech/index.xml?rev=1052147&view=auto
==============================================================================
--- oodt/branches/wengine-branch/wengine/src/site/xdoc/tech/index.xml (added)
+++ oodt/branches/wengine-branch/wengine/src/site/xdoc/tech/index.xml Thu Dec 23 02:47:16 2010
@@ -0,0 +1,327 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+  Copyright (c) 2006 California Institute of Technology.
+  ALL RIGHTS RESERVED. U.S. Government sponsorship acknowledged.
+
+  $Id$
+-->
+
+<document>
+  <properties>
+    <title>CAS Workflow Manager Technical Guide</title>
+    <author email="Brian.M.Foster@jpl.nasa.gov">Brian Foster</author>
+  </properties>
+
+  <body>
+    <section name="Introduction">
+      <p>Historically data processing systems have been primarily controlled by file-based
+        triggering mechanisms. These types of systems function like a chain-reaction: one file would
+        trigger a process, which would generate another file, which would then trigger another
+        process, and so forth. These systems, while easy to add and remove processes from the
+        system, require the user to extensively understand how these processes are related to each
+        other, so to avoid creating unwanted 'chain-reactions'. Recently, efforts have been made to
+        move towards more controlled processing system models, which utilize the concept of
+        workflows. Workflows are more-or-less a tightly grouped set of processes. A workflow
+        explicitly tells the processing system which set of processes should be run and in what
+        order. Workflows run processes based off successful completion of previous processes in its
+        mapping, thereby making file generation a criteria for successful completion of a process
+        instead of being the triggering mechanism for the next process. This concept separates the
+        workflow from the files it may generate, thereby allowing the processing system to perform
+        more tasks than just file processing. In this paper you will learn how to use, configure,
+        and understand design decisions of a workflow processing system, specifically CAS-Workflow2.
+      </p>
+    </section>
+    <section name="Workflows Structure">
+      <p>Workflows consist of three parts: pre-conditions, a list of tasks (or processes) to
+        perform, and post-conditions.</p>
+      <subsection name="Pre-Conditions">
+        <p>A pre-condition is a task whose purpose is to return a true/false answer to some
+          question. Pre-conditions are requirements that must be meet before a workflow can run its
+          tasks. An example of a pre-condition might be: checking for the existences of a particular
+          file. After all pre-conditions have been meet, a workflow will execute its tasks.</p>
+      </subsection>
+      <subsection name="Tasks">
+        <p>A Task is an activity or piece of work that needs to be done. Tasks are the atomic level
+          of a workflow. The goal of any workflow is to run its tasks to successful completion. An
+          example of a task might be: creating a visual map for a data file. After all tasks have
+          completed, the workflow will then run its post-conditions.</p>
+      </subsection>
+      <subsection name="Post-Conditions">
+        <p>Post-conditions give the workflow the ability to evaluate whether or not its task
+          successfully perform all their required duties. An example of a post-condition might be:
+          checking for the existence of a file that a task was responsible for generating.</p>
+      </subsection>
+    </section>
+    <section name="Workflow Lifecycle">
+      <p>Each workflow must go through a well-defined set of states or a lifecycle. We can easily
+        deduce a few of the states from what we know already. A workflow starts by evaluating its
+        pre-conditions, so we can call this state: PreConditionEval. Then it must execute its tasks,
+        we'll call this state: Executing. Then of course we have: PostConditionEval. Now, what if
+        any of the three steps fail, we need a failure state, so hence the state: Failure. And, if
+        everything goes as planed, we have the state: Success. Figure 1 further describes this
+        workflow lifecycle. There are other states, however, for simplicity sake, these are the only
+        states we will introduce for now, the other states will be introduced later, as more
+        workflow knowledge is required to understand them.</p>
+      <center>
+        <img src="../images/simplified-lifecycle.png" alt="Workflow Manager Lifecycle"/>
+      </center>
+      <subsection name="PreConditionEval">
+        <p>Workflow is executing its pre-conditions.</p>
+      </subsection>
+      <subsection name="Executing">
+        <p>Workflow is executing its tasks.</p>
+      </subsection>
+      <subsection name="PostConditionEval">
+        <p>Workflow is executing its post-conditions.</p>
+      </subsection>
+      <subsection name="Success">
+        <p>Workflow has successfully passed all pre-conditions, executed all tasks, and passed all
+          post-conditions.</p>
+      </subsection>
+      <subsection name="Failure">
+        <p>At least one of the workflow's pre-conditions, tasks, or post-conditions have failed.</p>
+      </subsection>
+    </section>
+    <section name="Workflow Context">
+      <p>Workflows can have context, which is kind of like their knowledge base. This context is
+        also referred to as metadata. Metadata is a bucket of key/value(s) information that
+        workflows have access to. An example of a metadata field might be: RunDate='2009-01-20'. At
+        times, tasks needs to talk to other tasks, or conditions would like to communicate something
+        to the tasks that run after them. Workflows not only control the flow of conditions and
+        tasks, they also control communication between them. Workflows accomplish this by the use of
+        metadata. Conditions and tasks can also have their own metadata, which they don't share with
+        anyone else. A workflow has three categories of metadata: 1) static, 2) dynamic, and 3)
+        local.</p>
+      <subsection name="Static">
+        <p>This is metadata that is the same for every run of a workflow. A task can always assume
+          this metadata will exist.</p>
+      </subsection>
+      <subsection name="Dynamic">
+        <p>This is metadata that is passed into the workflow when it is run and/or set by other task
+          and conditions when communicating with each other.</p>
+      </subsection>
+      <subsection name="Local">
+        <p>This is dynamic metadata that is local to a task or condition.</p>
+      </subsection>
+    </section>
+    <section name="Everything is a Workflow">
+      <p>In order to simplify how process control is configured, tasks and conditions were also
+        designed to be workflows. This means that almost anywhere we used the word workflow up until
+        now, we could have replaced it with the word task and vise versa. However, there are a few
+        exceptions, a task differs from a workflow in that it wraps an executable class, which
+        performs some activity, and it cannot have any children workflows. Conditions are just
+        specialized tasks, so the same applies to them as well. Yet, conditions differ from tasks in
+        that they cannot have pre-conditions or post-conditions, since that would mean you could
+        have a pre-condition for a pre-condition. So, in other words, a workflow is really just a
+        workflow of workflows with pre and post-condition workflows.</p>
+    </section>
+    <section name="Workflow Listeners">
+      <p>We now know that workflows have three different parts (or buckets) into which other
+        workflows can be placed: pre-conditions, children workflows, and post-conditions. Workflows
+        placed into these buckets are treated like black boxes. A workflow has no idea what types of
+        workflows have been placed into these buckets. The workflow just knows that first the
+        workflows in the pre-conditions bucket must pass before running the workflows in the
+        children bucket, followed then by the workflows in the post-conditions bucket. The way a
+        workflow knows what is going on with the workflows in its buckets is by registering itself
+        as a listener for state changes in those workflows. When a workflow changes state, it will
+        notify its listeners about the change. The listening workflow will then adjust its state
+        depending on which bucket the state change notification came from. Earlier we learned about
+        the lifecycle which each workflow goes through. This lifecycle is not only followed by the
+        top workflow or root workflow, it is followed by every workflow in all of the different
+        buckets as well. Workflows will change states in their lifecycle when one of the workflows
+        in their buckets change state. For example, if that a workflow has a pre-condition workflow
+        which changes state to Executing, upon notification, it will change its state to
+        PreConditionEval. This notion of workflow lifecycle changes affecting other workflow
+        lifecycles will be explained in greater detail later.</p>
+    </section>
+    <section name="Workflow Types">
+      <p>There are two categories to workflows, there are workflows which control the run order of
+        other workflows, and then there are workflows which track the execution of some process or
+        activity. There are currently two workflows implemented which control run order of
+        workflows:</p>
+      <subsection name="Parallel">
+        <p>A workflow that runs all the workflows in its children bucket at the same time. Its
+          metadata (or context) becomes the merge of all metadata of workflows in its children
+          bucket.</p>
+      </subsection>
+      <subsection name="Sequential">
+        <p>A workflow that runs the workflows in its children bucket one at a time, only running the
+          next child workflow after its previous child workflow has finished. Its metadata (or
+          context) is updated after each workflow from its children bucket is run, then passed to
+          the next workflow to run from its children bucket.</p>
+      </subsection>
+      <p>The second category of workflows, which track the running of some process, we have already
+        been introduced to, these are tasks and conditions:</p>
+      <subsection name="Task">
+        <p>Tracks some executing activity. Its metadata is synched with this process
+          periodically.</p>
+      </subsection>
+      <subsection name="Condition">
+        <p>Tracks some executing condition activity. Its metadata is synched with this executing
+          condition periodically.</p>
+      </subsection>
+    </section>
+    <section name="Workflows in Workflows">
+      <p>Now that we understand the make up of a workflow, let look at an example. Let's say we want
+        a workflow that models going to the store to buy groceries. So the first step is to make
+        sure we have our keys and wallet. These would be considered pre-conditions, because we can't
+        drive without our keys, and we can't buy the groceries without our wallet. However, these
+        pre-conditions can be performed at the same time. I can check if I have my keys while I am
+        checking for my wallet, since checking for my keys does not depend on me checking for my
+        wallet. So these pre-conditions would happen in 'parallel'. After we've determined that we
+        have our keys and wallet, we can now perform the tasks we have set out to do: drive to the
+        store; buy our groceries; drive home. Since we can't do one of these tasks without doing the
+        one before it (that is, we can't buy our groceries without driving to the store), these
+        tasks are 'sequential'. So our workflow model graph would look something like:</p>
+      <pre>
+        [id='BuyGroceries' execution='sequential'] 
+          {PreCond: [id='FindWalletAndKeys' execution='parallel']
+            [id='FindWallet' exectuion='condition']
+            [id='FindKeys' execution='condition']}
+          [id='DriveToStore' execution='task']
+            [id='PurchaseGroceries' execution='task']
+            [id='DriveHome' execution='task'] 
+      </pre>
+      <p>Let's take this one step further now. Let's say we brought a friend along to help with the
+        shopping and we split up our list, so to cut the time in half. Now we have two people
+        shopping at the same time:</p>
+      <pre>
+        [id='BuyGroceries' execution='sequential']
+          {PreCond: [id='FindWalletAndKeys' execution='parallel']
+            [id='FindWallet' exectuion='condition'] 
+            [id='FindKeys' execution='condition']}
+          [id='DriveToStore' execution='task']
+          <strong>[id='PurchaseGroceries' execution='parallel']
+            [id='YouPurchaseGroceries' execution='task']
+            [id='FriendPurchaseGroceries' execution='task']</strong>
+          [id='DriveHome' execution='task'] 
+      </pre>
+      <p>Figure 2 shows the task mapping of this workflow. Usually, when you go to implement a
+        workflow in the system, you will have a task diagram, which you will have to convert to a
+        workflow model graph similar to the grocery store example above. So being able to look at
+        one and realize the other is essential.</p>
+      <center>
+        <img src="../images/grocery-store-workflow-1.png" alt="Grocery Store Workflow 1"/>
+      </center>
+      <p>The following figures enumerates the recommended thought process which one should follow to
+        identify workflows from a task graph:</p>
+      <center>
+        <img src="../images/grocery-store-workflow-2.png" alt="Grocery Store Workflow 2"/>
+      </center>
+      <center>
+        <img src="../images/grocery-store-workflow-3.png" alt="Grocery Store Workflow 3"/>
+      </center>
+      <center>
+        <img src="../images/grocery-store-workflow-4.png" alt="Grocery Store Workflow 4"/>
+      </center>
+    </section>
+    <section name="Workflow Patterns">
+      <p>There are many complex workflow patterns out there. However, most patterns should be
+        implementable with careful usage of different combinations of parallel and sequential
+        workflows. In the unusual case where parallel and sequential won't cut it, custom workflows
+        can be written and plugged in (this is an advanced topic that will be discussed later). Here
+        we will cover how to create the most common workflow patterns. More advanced patterns will
+        be discussed later.</p>
+      <subsection name="Parallel Split">
+        <subsection name="- Description:">
+          <p>The divergence of a branch into two or more parallel branches each of which execute
+            concurrently.</p>
+        </subsection>
+        <subsection name="- Diagram:">
+          <center>
+            <img src="../images/parallel-split-diagram.png" alt="Parallel Split Diagram"/>
+          </center>
+        </subsection>
+        <subsection name="- Model Graph:">
+          <pre>
+            [id='S1' execution='sequential']
+              [id='T1' execution='task']
+              [id='P1' execution='parallel']
+                [id='T2' execution='task']
+                [id='T3' execution='task']            
+          </pre>
+        </subsection>
+      </subsection>
+      <subsection name="Synchronization">
+        <subsection name="- Description:">
+          <p>The convergence of two or more branches into a single subsequent branch such that the
+            thread of control is passed to the subsequent branch when all input branches have been
+            enabled.</p>
+        </subsection>
+        <subsection name="- Diagram:">
+          <center>
+            <img src="../images/synchronization-diagram.png" alt="Synchronization Diagram"/>
+          </center>
+        </subsection>
+        <subsection name="- Model Graph:">
+          <pre>
+            [id='S1' execution='sequential']
+              [id='P1' execution='parallel']
+                [id='T1' execution='task']
+                [id='T2' execution='task']
+              [id='T3' execution='task']            
+           </pre>
+        </subsection>
+      </subsection>
+      <subsection name="Combination of a Parallel Split into a Synchronization">
+        <subsection name="- Description:">
+          <p>(See <strong>Parallel Split</strong> and <strong>Synchronization</strong>)</p>
+        </subsection>
+        <subsection name="- Diagram:">
+          <center>
+            <img src="../images/parallel-split-into-synchronization-diagram.png"
+              alt="Combination of a Parallel Split into a Synchronization Diagram"/>
+          </center>
+        </subsection>
+        <subsection name="- Model Graph:">
+          <pre>
+            [id='S1' execution='sequential']
+              [id='T1' execution='task']
+              [id='P1' execution='parallel']
+                [id='T2' execution='task']
+                [id='T3' execution='task']
+              [id='T4' execution='task']            
+          </pre>
+        </subsection>
+      </subsection>
+    </section>
+    <section name="Lifecycles in Lifecycles">
+      <p>We learned above how each workflow goes through its own lifecycle, which depends on is
+        pre-condition, children, and post-conditions workflows’ lifecycles. Here we will learn how
+        this actually works. First we are going to introduce a few more states: Queued,
+        PreConditionSuccess, WaitingOnResources, and ExecutionComplete. Figure 9 is an updated
+        lifecycle diagram.</p>
+      <center>
+        <img src="../images/almost-complete-lifecycle.png" alt="Almost Complete Lifecycle Diagram"/>
+      </center>
+      <subsection name="Queued">
+        <p>Workflow has been put on the main queue (assume this to be initial state for now).</p>
+      </subsection>
+      <subsection name="PreConditionSuccess">
+        <p>Workflow has been put on the main queue (assume this to be initial state for now).</p>
+      </subsection>
+      <subsection name="WaitingOnResources">
+        <p>Workflow (or its pre-condition, children, post-condition workflows) are ready to run but
+          can’t because of resources.</p>
+      </subsection>
+      <subsection name="ExecutionComplete">
+        <p>A workflow has completed executing or all workflows in its children bucket have completed
+          successfully.</p>
+      </subsection>
+      <p>Let’s bring back the buying groceries example but this time we will add in the states (with everything starting in Queued state):</p>
+      <pre>
+        [id=’BuyGroceries’ execution=’sequential’ state=‘Queued’]
+          {PreCond:
+            [id=’FindWalletAndKeys’ execution=’parallel state=‘Queued’’]
+              [id=’FindWallet’ exectuion=’condition’ state=‘Queued’]
+               [id=’FindKeys’ execution=’condition’ state=‘Queued’]}
+          [id=’DriveToStore’ execution=’task’ state=‘Queued’]
+          [id=’ PurchaseGroceries’ execution=’parallel’ state=‘Queued’]
+            [id=’YouPurchaseGroceries’ execution=’task’ state=‘Queued’]
+            [id=’FriendPurchaseGroceries’ execution=’task’ state=‘Queued’]
+          [id=’DriveHome’ execution=’task’ state=‘Queued’]
+      </pre>
+    </section>
+  </body>
+
+</document>
\ No newline at end of file