You are viewing a plain text version of this content. The canonical link for it is here.
Posted to zeta-dev@incubator.apache.org by Marcel Blonk <ma...@blonk.org> on 2010/06/14 19:21:54 UTC

[zeta-dev] Thoughts on workflow

Hi All,

Recently I had a need for a workflow engine for our CakePHP based 
project. Even though we needed relatively straightforward document 
approval workflows, we decided to go with a workflow engine, because the 
separation of concerns allows for easy personalization of different 
workflows for different clients. After searching the web for the a bit, 
we found that ezComponents has by far the most well-developed open 
source PHP workflow engine. Integrating the workflow engine with CakePHP 
was not a big issue, as the workflow components is fairly separate from 
the rest of ezComponents, and the API of the workflow component is well 
thought out and easy to adapt (in short, thanks to the developers of 
ezComponents for desgining a great system!).

Anyways, along the way I had to make several modifications to the 
workflow source code to accommodate the type of environment that I 
envisioned. I think a lot of the changes I had to make came from a 
different philosophy on how and where a workflow is instantiated. 
Correct me if I'm wrong, but it seems that's the default mode of 
instantiating a workflow is meant to be from code. This certainly is a 
valid way, but I prefer to write the workflow directly in XML. The 
changes I made to accommodate that are:

    * allow for string node IDs
    * do not rewrite/renumber node IDs
    * do not require IDs for nodes with only one input that are only
      connected with the node before it

These 3 new rules make it much easier to write and debug workflows in 
XML. An example of readability improvement:

before:

<node id=1 type=Start>
<outNode id=2/>
</node>
<node id=2 type=MultiChoice>
<outNode id=3/>
<outNode id=4/>
</node>
<node id=3 type=Action>
<outNode id=5/>
</node>
<node id=4 type=Action>
<outNode id=5/>
</node>
<node id=5 type=SimpleMerge>
<outNode id=6/>
</node>
<node id=6 type=End>
	after:

<node type=Start/>
<node type=MultiChoice>
<outNode id=accept/>
<outNode id=reject/>
</node>
<node id=accept type=Action>
<outNode id=exit/>
</node>
<node id=reject type=Action>
<outNode id=exit/>
</node>
<node id=exit type=SimpleMerge/>
<node type=End/>


Even for such a small example this makes it far more legible and easy to 
maintain (you can insert a node without having to renumber everything 
for example). It does impose a new rule that the order of nodes in the 
XML file matters.

Other changes:

    * One need we had was that we wanted to present the user a choice
      based on the workflow state and available branches. Although this
      can be parsed in some way from the current node in the workflow
      (by traveling along the out notes until you get to a
      multiple-choice action), it makes things much more complicated
      than needed and imposes implicit restrictions to the workflow. For
      this I added a new action that has one input node and multiple
      (labeled) outputs and a method to get the list of those outputs.
      The action behaves as an input action and waits for a specified
      workflow variable and uses that value to determine what output to
      take. This allows you to load the current workflow state, and get
      the choices (if the current active node is our new action) to
      display to the user.
      Example:
      <node type="InputSplit">
      <variable name="next_step"/>
      <condition type="IsEqual" value="Accept" >
      <outNode id="accept" />
      </condition>
      <condition type="IsEqual" value="Reject" >
      <outNode id="reject" />
      </condition>
      </node>

      and then

               foreach($execution->getActivatedNodes() as $node) {
                   if ( is_a($node,'ezcWorkflowNodeInputSplit') ) {
                       $choices = array_merge($choices,
      $node->getChoices());
                   }
               }

      returns array( "Accept", "Reject")


    * We also found that using the threading paradigm made the workflow
      much more complicated if there are multiple intertwined loops in
      the workflow (e.g. send document back to previous reviewer or send
      the document back to originator in a review loop) but only one
      possible path. For that I made the above described new choice
      action not create new threads ($startNewThreadForBranch = false)
      and I added a converge action (same as simple merge, except
      assumes incoming nodes are all on the same thread).

    * I added a new variable node "VariableAppend" that appends an value
      to an (array) workflow variable. This allows you for example to
      accumulate comments in a review loop. eg.

      <node type="VariableAppend">
      <variable name="comment">
      <string>new comment</string>
      </variable>
      </node>

    * I added a variable type that is an indirect value, i.e. the value
      of the node is the name of a workflow variable, which allows you
      to assign one workflow variable with another. eg:

      <node type="VariableSet">
      <variable name="one">
      <wfvariable name="two"/>
      </variable>
      </node>

      assigns the value of workflow variable "two" to the workflow
      variable "one". This one is very useful when using XML-based
      workflows, as it is easier to vary the input of certain workflow
      actions (which otherwise would be hard coded in the XML).

    * I added a new optional label property to the node object (e.g.
      label="Waiting for review"). This allows you to label (input)
      nodes, so it becomes easy to display the current (wait) state of
      the workflow to the user by displaying the label of the currently
      active node.


If there is any interest in any of these changes, I can of course make 
code available.
One remaining issues I am still looking at and on which I could use 
advice, is the notion of a current user. Often a workflow involves 
handing the flow from one user to another. I added a current_user as a 
first class citizen to the execution state of a workflow, but it is of 
very limited use right now. A more generic notion of the current 
owner(s) of an execution (thread) would be useful, but I haven't come up 
with a good solution yet.

Marcel