You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@oodt.apache.org by bf...@apache.org on 2010/12/23 03:47:22 UTC
svn commit: r1052147 [12/12] - in /oodt/branches/wengine-branch/wengine: ./
src/ src/main/ src/main/assembly/ src/main/bin/ src/main/java/
src/main/java/org/ src/main/java/org/apache/ src/main/java/org/apache/oodt/
src/main/java/org/apache/oodt/cas/ sr...
Added: oodt/branches/wengine-branch/wengine/src/site/xdoc/tech/index.xml
URL: http://svn.apache.org/viewvc/oodt/branches/wengine-branch/wengine/src/site/xdoc/tech/index.xml?rev=1052147&view=auto
==============================================================================
--- oodt/branches/wengine-branch/wengine/src/site/xdoc/tech/index.xml (added)
+++ oodt/branches/wengine-branch/wengine/src/site/xdoc/tech/index.xml Thu Dec 23 02:47:16 2010
@@ -0,0 +1,327 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+ Copyright (c) 2006 California Institute of Technology.
+ ALL RIGHTS RESERVED. U.S. Government sponsorship acknowledged.
+
+ $Id$
+-->
+
+<document>
+ <properties>
+ <title>CAS Workflow Manager Technical Guide</title>
+ <author email="Brian.M.Foster@jpl.nasa.gov">Brian Foster</author>
+ </properties>
+
+ <body>
+ <section name="Introduction">
+ <p>Historically data processing systems have been primarily controlled by file-based
+ triggering mechanisms. These types of systems function like a chain-reaction: one file would
+ trigger a process, which would generate another file, which would then trigger another
+ process, and so forth. These systems, while easy to add and remove processes from the
+ system, require the user to extensively understand how these processes are related to each
+ other, so to avoid creating unwanted 'chain-reactions'. Recently, efforts have been made to
+ move towards more controlled processing system models, which utilize the concept of
+ workflows. Workflows are more-or-less a tightly grouped set of processes. A workflow
+ explicitly tells the processing system which set of processes should be run and in what
+ order. Workflows run processes based off successful completion of previous processes in its
+ mapping, thereby making file generation a criteria for successful completion of a process
+ instead of being the triggering mechanism for the next process. This concept separates the
+ workflow from the files it may generate, thereby allowing the processing system to perform
+ more tasks than just file processing. In this paper you will learn how to use, configure,
+ and understand design decisions of a workflow processing system, specifically CAS-Workflow2.
+ </p>
+ </section>
+ <section name="Workflows Structure">
+ <p>Workflows consist of three parts: pre-conditions, a list of tasks (or processes) to
+ perform, and post-conditions.</p>
+ <subsection name="Pre-Conditions">
+ <p>A pre-condition is a task whose purpose is to return a true/false answer to some
+ question. Pre-conditions are requirements that must be meet before a workflow can run its
+ tasks. An example of a pre-condition might be: checking for the existences of a particular
+ file. After all pre-conditions have been meet, a workflow will execute its tasks.</p>
+ </subsection>
+ <subsection name="Tasks">
+ <p>A Task is an activity or piece of work that needs to be done. Tasks are the atomic level
+ of a workflow. The goal of any workflow is to run its tasks to successful completion. An
+ example of a task might be: creating a visual map for a data file. After all tasks have
+ completed, the workflow will then run its post-conditions.</p>
+ </subsection>
+ <subsection name="Post-Conditions">
+ <p>Post-conditions give the workflow the ability to evaluate whether or not its task
+ successfully perform all their required duties. An example of a post-condition might be:
+ checking for the existence of a file that a task was responsible for generating.</p>
+ </subsection>
+ </section>
+ <section name="Workflow Lifecycle">
+ <p>Each workflow must go through a well-defined set of states or a lifecycle. We can easily
+ deduce a few of the states from what we know already. A workflow starts by evaluating its
+ pre-conditions, so we can call this state: PreConditionEval. Then it must execute its tasks,
+ we'll call this state: Executing. Then of course we have: PostConditionEval. Now, what if
+ any of the three steps fail, we need a failure state, so hence the state: Failure. And, if
+ everything goes as planed, we have the state: Success. Figure 1 further describes this
+ workflow lifecycle. There are other states, however, for simplicity sake, these are the only
+ states we will introduce for now, the other states will be introduced later, as more
+ workflow knowledge is required to understand them.</p>
+ <center>
+ <img src="../images/simplified-lifecycle.png" alt="Workflow Manager Lifecycle"/>
+ </center>
+ <subsection name="PreConditionEval">
+ <p>Workflow is executing its pre-conditions.</p>
+ </subsection>
+ <subsection name="Executing">
+ <p>Workflow is executing its tasks.</p>
+ </subsection>
+ <subsection name="PostConditionEval">
+ <p>Workflow is executing its post-conditions.</p>
+ </subsection>
+ <subsection name="Success">
+ <p>Workflow has successfully passed all pre-conditions, executed all tasks, and passed all
+ post-conditions.</p>
+ </subsection>
+ <subsection name="Failure">
+ <p>At least one of the workflow's pre-conditions, tasks, or post-conditions have failed.</p>
+ </subsection>
+ </section>
+ <section name="Workflow Context">
+ <p>Workflows can have context, which is kind of like their knowledge base. This context is
+ also referred to as metadata. Metadata is a bucket of key/value(s) information that
+ workflows have access to. An example of a metadata field might be: RunDate='2009-01-20'. At
+ times, tasks needs to talk to other tasks, or conditions would like to communicate something
+ to the tasks that run after them. Workflows not only control the flow of conditions and
+ tasks, they also control communication between them. Workflows accomplish this by the use of
+ metadata. Conditions and tasks can also have their own metadata, which they don't share with
+ anyone else. A workflow has three categories of metadata: 1) static, 2) dynamic, and 3)
+ local.</p>
+ <subsection name="Static">
+ <p>This is metadata that is the same for every run of a workflow. A task can always assume
+ this metadata will exist.</p>
+ </subsection>
+ <subsection name="Dynamic">
+ <p>This is metadata that is passed into the workflow when it is run and/or set by other task
+ and conditions when communicating with each other.</p>
+ </subsection>
+ <subsection name="Local">
+ <p>This is dynamic metadata that is local to a task or condition.</p>
+ </subsection>
+ </section>
+ <section name="Everything is a Workflow">
+ <p>In order to simplify how process control is configured, tasks and conditions were also
+ designed to be workflows. This means that almost anywhere we used the word workflow up until
+ now, we could have replaced it with the word task and vise versa. However, there are a few
+ exceptions, a task differs from a workflow in that it wraps an executable class, which
+ performs some activity, and it cannot have any children workflows. Conditions are just
+ specialized tasks, so the same applies to them as well. Yet, conditions differ from tasks in
+ that they cannot have pre-conditions or post-conditions, since that would mean you could
+ have a pre-condition for a pre-condition. So, in other words, a workflow is really just a
+ workflow of workflows with pre and post-condition workflows.</p>
+ </section>
+ <section name="Workflow Listeners">
+ <p>We now know that workflows have three different parts (or buckets) into which other
+ workflows can be placed: pre-conditions, children workflows, and post-conditions. Workflows
+ placed into these buckets are treated like black boxes. A workflow has no idea what types of
+ workflows have been placed into these buckets. The workflow just knows that first the
+ workflows in the pre-conditions bucket must pass before running the workflows in the
+ children bucket, followed then by the workflows in the post-conditions bucket. The way a
+ workflow knows what is going on with the workflows in its buckets is by registering itself
+ as a listener for state changes in those workflows. When a workflow changes state, it will
+ notify its listeners about the change. The listening workflow will then adjust its state
+ depending on which bucket the state change notification came from. Earlier we learned about
+ the lifecycle which each workflow goes through. This lifecycle is not only followed by the
+ top workflow or root workflow, it is followed by every workflow in all of the different
+ buckets as well. Workflows will change states in their lifecycle when one of the workflows
+ in their buckets change state. For example, if that a workflow has a pre-condition workflow
+ which changes state to Executing, upon notification, it will change its state to
+ PreConditionEval. This notion of workflow lifecycle changes affecting other workflow
+ lifecycles will be explained in greater detail later.</p>
+ </section>
+ <section name="Workflow Types">
+ <p>There are two categories to workflows, there are workflows which control the run order of
+ other workflows, and then there are workflows which track the execution of some process or
+ activity. There are currently two workflows implemented which control run order of
+ workflows:</p>
+ <subsection name="Parallel">
+ <p>A workflow that runs all the workflows in its children bucket at the same time. Its
+ metadata (or context) becomes the merge of all metadata of workflows in its children
+ bucket.</p>
+ </subsection>
+ <subsection name="Sequential">
+ <p>A workflow that runs the workflows in its children bucket one at a time, only running the
+ next child workflow after its previous child workflow has finished. Its metadata (or
+ context) is updated after each workflow from its children bucket is run, then passed to
+ the next workflow to run from its children bucket.</p>
+ </subsection>
+ <p>The second category of workflows, which track the running of some process, we have already
+ been introduced to, these are tasks and conditions:</p>
+ <subsection name="Task">
+ <p>Tracks some executing activity. Its metadata is synched with this process
+ periodically.</p>
+ </subsection>
+ <subsection name="Condition">
+ <p>Tracks some executing condition activity. Its metadata is synched with this executing
+ condition periodically.</p>
+ </subsection>
+ </section>
+ <section name="Workflows in Workflows">
+ <p>Now that we understand the make up of a workflow, let look at an example. Let's say we want
+ a workflow that models going to the store to buy groceries. So the first step is to make
+ sure we have our keys and wallet. These would be considered pre-conditions, because we can't
+ drive without our keys, and we can't buy the groceries without our wallet. However, these
+ pre-conditions can be performed at the same time. I can check if I have my keys while I am
+ checking for my wallet, since checking for my keys does not depend on me checking for my
+ wallet. So these pre-conditions would happen in 'parallel'. After we've determined that we
+ have our keys and wallet, we can now perform the tasks we have set out to do: drive to the
+ store; buy our groceries; drive home. Since we can't do one of these tasks without doing the
+ one before it (that is, we can't buy our groceries without driving to the store), these
+ tasks are 'sequential'. So our workflow model graph would look something like:</p>
+ <pre>
+ [id='BuyGroceries' execution='sequential']
+ {PreCond: [id='FindWalletAndKeys' execution='parallel']
+ [id='FindWallet' exectuion='condition']
+ [id='FindKeys' execution='condition']}
+ [id='DriveToStore' execution='task']
+ [id='PurchaseGroceries' execution='task']
+ [id='DriveHome' execution='task']
+ </pre>
+ <p>Let's take this one step further now. Let's say we brought a friend along to help with the
+ shopping and we split up our list, so to cut the time in half. Now we have two people
+ shopping at the same time:</p>
+ <pre>
+ [id='BuyGroceries' execution='sequential']
+ {PreCond: [id='FindWalletAndKeys' execution='parallel']
+ [id='FindWallet' exectuion='condition']
+ [id='FindKeys' execution='condition']}
+ [id='DriveToStore' execution='task']
+ <strong>[id='PurchaseGroceries' execution='parallel']
+ [id='YouPurchaseGroceries' execution='task']
+ [id='FriendPurchaseGroceries' execution='task']</strong>
+ [id='DriveHome' execution='task']
+ </pre>
+ <p>Figure 2 shows the task mapping of this workflow. Usually, when you go to implement a
+ workflow in the system, you will have a task diagram, which you will have to convert to a
+ workflow model graph similar to the grocery store example above. So being able to look at
+ one and realize the other is essential.</p>
+ <center>
+ <img src="../images/grocery-store-workflow-1.png" alt="Grocery Store Workflow 1"/>
+ </center>
+ <p>The following figures enumerates the recommended thought process which one should follow to
+ identify workflows from a task graph:</p>
+ <center>
+ <img src="../images/grocery-store-workflow-2.png" alt="Grocery Store Workflow 2"/>
+ </center>
+ <center>
+ <img src="../images/grocery-store-workflow-3.png" alt="Grocery Store Workflow 3"/>
+ </center>
+ <center>
+ <img src="../images/grocery-store-workflow-4.png" alt="Grocery Store Workflow 4"/>
+ </center>
+ </section>
+ <section name="Workflow Patterns">
+ <p>There are many complex workflow patterns out there. However, most patterns should be
+ implementable with careful usage of different combinations of parallel and sequential
+ workflows. In the unusual case where parallel and sequential won't cut it, custom workflows
+ can be written and plugged in (this is an advanced topic that will be discussed later). Here
+ we will cover how to create the most common workflow patterns. More advanced patterns will
+ be discussed later.</p>
+ <subsection name="Parallel Split">
+ <subsection name="- Description:">
+ <p>The divergence of a branch into two or more parallel branches each of which execute
+ concurrently.</p>
+ </subsection>
+ <subsection name="- Diagram:">
+ <center>
+ <img src="../images/parallel-split-diagram.png" alt="Parallel Split Diagram"/>
+ </center>
+ </subsection>
+ <subsection name="- Model Graph:">
+ <pre>
+ [id='S1' execution='sequential']
+ [id='T1' execution='task']
+ [id='P1' execution='parallel']
+ [id='T2' execution='task']
+ [id='T3' execution='task']
+ </pre>
+ </subsection>
+ </subsection>
+ <subsection name="Synchronization">
+ <subsection name="- Description:">
+ <p>The convergence of two or more branches into a single subsequent branch such that the
+ thread of control is passed to the subsequent branch when all input branches have been
+ enabled.</p>
+ </subsection>
+ <subsection name="- Diagram:">
+ <center>
+ <img src="../images/synchronization-diagram.png" alt="Synchronization Diagram"/>
+ </center>
+ </subsection>
+ <subsection name="- Model Graph:">
+ <pre>
+ [id='S1' execution='sequential']
+ [id='P1' execution='parallel']
+ [id='T1' execution='task']
+ [id='T2' execution='task']
+ [id='T3' execution='task']
+ </pre>
+ </subsection>
+ </subsection>
+ <subsection name="Combination of a Parallel Split into a Synchronization">
+ <subsection name="- Description:">
+ <p>(See <strong>Parallel Split</strong> and <strong>Synchronization</strong>)</p>
+ </subsection>
+ <subsection name="- Diagram:">
+ <center>
+ <img src="../images/parallel-split-into-synchronization-diagram.png"
+ alt="Combination of a Parallel Split into a Synchronization Diagram"/>
+ </center>
+ </subsection>
+ <subsection name="- Model Graph:">
+ <pre>
+ [id='S1' execution='sequential']
+ [id='T1' execution='task']
+ [id='P1' execution='parallel']
+ [id='T2' execution='task']
+ [id='T3' execution='task']
+ [id='T4' execution='task']
+ </pre>
+ </subsection>
+ </subsection>
+ </section>
+ <section name="Lifecycles in Lifecycles">
+ <p>We learned above how each workflow goes through its own lifecycle, which depends on is
+ pre-condition, children, and post-conditions workflowsâ lifecycles. Here we will learn how
+ this actually works. First we are going to introduce a few more states: Queued,
+ PreConditionSuccess, WaitingOnResources, and ExecutionComplete. Figure 9 is an updated
+ lifecycle diagram.</p>
+ <center>
+ <img src="../images/almost-complete-lifecycle.png" alt="Almost Complete Lifecycle Diagram"/>
+ </center>
+ <subsection name="Queued">
+ <p>Workflow has been put on the main queue (assume this to be initial state for now).</p>
+ </subsection>
+ <subsection name="PreConditionSuccess">
+ <p>Workflow has been put on the main queue (assume this to be initial state for now).</p>
+ </subsection>
+ <subsection name="WaitingOnResources">
+ <p>Workflow (or its pre-condition, children, post-condition workflows) are ready to run but
+ canât because of resources.</p>
+ </subsection>
+ <subsection name="ExecutionComplete">
+ <p>A workflow has completed executing or all workflows in its children bucket have completed
+ successfully.</p>
+ </subsection>
+ <p>Letâs bring back the buying groceries example but this time we will add in the states (with everything starting in Queued state):</p>
+ <pre>
+ [id=âBuyGroceriesâ execution=âsequentialâ state=âQueuedâ]
+ {PreCond:
+ [id=âFindWalletAndKeysâ execution=âparallel state=âQueuedââ]
+ [id=âFindWalletâ exectuion=âconditionâ state=âQueuedâ]
+ [id=âFindKeysâ execution=âconditionâ state=âQueuedâ]}
+ [id=âDriveToStoreâ execution=âtaskâ state=âQueuedâ]
+ [id=â PurchaseGroceriesâ execution=âparallelâ state=âQueuedâ]
+ [id=âYouPurchaseGroceriesâ execution=âtaskâ state=âQueuedâ]
+ [id=âFriendPurchaseGroceriesâ execution=âtaskâ state=âQueuedâ]
+ [id=âDriveHomeâ execution=âtaskâ state=âQueuedâ]
+ </pre>
+ </section>
+ </body>
+
+</document>
\ No newline at end of file