You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@oodt.apache.org by Keith Cummings <kc...@nrao.edu> on 2012/02/28 17:37:28 UTC

Continuing a workflow after restarting the workflow manager

Hello,
I'm at NRAO and we are considering using OODT in the near future and I 
just started playing around with it, specifically the the Workflow Manager.

I was wondering if the Workflow Manager is able to restart workflows 
that are partially complete.  For example, I started a workflow that has 
four tasks.  During the processing of the second task, I stopped the 
Workflow Manager, then restarted it.  The Workflow Manager does NOT 
continue processing this task/workflow.  Is there a way to have it pick 
up where it left off?

Thanks,
Keith Cummings
NRAO
Socorro, NM

Re: Continuing a workflow after restarting the workflow manager

Posted by Sheryl John <sh...@gmail.com>.
Yes, you're right. Sorry for the confusion.
The repo will have all the workflow instance executed so far and you can
even access instances' metadata.  (There's a new command line option for
this in 0.4)
But, as you've seen, once you restart the wmgr it doesn't track the
previous instances.

I think it would be nice to have that functionality in workflow manager
too. I'm not aware of any other way to do this.

On Tue, Feb 28, 2012 at 2:23 PM, Keith Cummings <kc...@nrao.edu> wrote:

> Hi Sheryl.
> I tried using the wmgr-client command line options to pause/resume/stop
> workflow instances as you suggested.  It worked great; thanks for pointing
> me there.  FYI, I'm using v0.3, if that matters.
>
> As for the repo getting wiped out when the Workflow Manager is restarted,
> that's not what I see.  I'm using the CAS Workflow Manager Monitor Web App
> to view what's going on.  It continues to show the partially completed
> workflow after multiple restarts of the Workflow Manager.  So the repo is
> not getting wiped out, but that old workflows don't seem to be accessible.
>  If I try to pause/resume/stop this workflow from the command line, I get
> the following message:
>
> "WARNING: WorkflowEngine: Attempt to resume workflow instance id:
> 119a9ebd-625a-11e1-b564-**9bb1991d21af, however, this engine is not
> tracking its execution"
>
> So it's still in the repo, but the current manager isn't tracking it, so
> can't modify it.
>
> The use case I'm concerned about is losing partially completed workflows
> if the server crashes or is rebooted.  There are other ways to protect
> against this, but it would be nice if the Workflow Manager would simply
> start back up where it left off.
>
> Thanks,
> Keith
>
>
>
>
> Sheryl John wrote:
>
>> Hi Keith,
>>
>> First of all welcome to the OODT world!
>>
>> If you restart your Workflow Manager a new repo is created and all
>> previous
>> workflow instances are wiped out and so, the engine does not track
>> previous
>> tasks in your old workflow instance.
>> You can check this using the command line option : --getWorkflowInsts or
>> try ./wmgr-client --help for other cmd-lne options. If you're using
>> 0.4-SNAPSHOT you'll see the latest command line menu.
>>
>> If you haven't restarted you wmgr you can start, pause and resume your
>> workflow instance ( and this should complete the tasks in that workflow)
>> with the cmd-line options.
>> I usually restart the wmgr after I've modified policy files or changed
>> environment variables/locations.
>>
>> Hope this helps!
>>
>> Sheryl
>>
>> On Tue, Feb 28, 2012 at 8:37 AM, Keith Cummings <kc...@nrao.edu>
>> wrote:
>>
>>
>>
>>> Hello,
>>> I'm at NRAO and we are considering using OODT in the near future and I
>>> just started playing around with it, specifically the the Workflow
>>> Manager.
>>>
>>> I was wondering if the Workflow Manager is able to restart workflows that
>>> are partially complete.  For example, I started a workflow that has four
>>> tasks.  During the processing of the second task, I stopped the Workflow
>>> Manager, then restarted it.  The Workflow Manager does NOT continue
>>> processing this task/workflow.  Is there a way to have it pick up where
>>> it
>>> left off?
>>>
>>> Thanks,
>>> Keith Cummings
>>> NRAO
>>> Socorro, NM
>>>
>>>
>>>
>>
>>
>>
>>
>>
>


-- 
-Sheryl

Re: Continuing a workflow after restarting the workflow manager

Posted by Keith Cummings <kc...@nrao.edu>.
Hey Chris.
It's good to see that there are options on the table and an eventual 
path forward on this issue.  Thanks for the all the info.
-Keith

Mattmann, Chris A (388J) wrote:
> Hi Keith,
>
> For whatever reason I cannot find the original email for this thread in my 
> email reader, gah. So, I am going to reply to this particular version of the
> thread so sorry for those looking at the mail archives that will have a hard
> time finding this.
>
> In short, the functionality that you're asking for can be supported by the 0.3
> workflow manager, it's just more cumbersome to implement, and it's not
> something that the config supports out of the box, so you'll have to play
> with it a bit. The wengine branch [1] supports this more flexibly and more 
> natively, but we're in the process (in 0.4-SNAPSHOT) of porting over
> this functionality so that 0.4 (or 0.5) and beyond it'll be supported natively.
> I think Paul Ramirez linked you over to OODT-215 [2] which is a good place
> to check the status overall of that effort.
>
> Right now, with 0.3 if you want to support that functionality to dynamically
> restart a workflow at some point in the pipeline, here's how you can do it.
> Let's assume that you have a 3 task workflow, t1, t2 and t3.
>
> You would set up 4 workflow XML documents that contain the following
> variations in those task:
>
> w1.workflow.xml -> only runs t1
> w2.workflow.xml -> only runs t2
> w3.workflow.xml -> only runs t3
> w4.workflow.xml -> runs t1, then t2, then t3
>
> (note: you could also do the other permeations for t1->t2, or t2->t3, but
> I'm leaving those out for brevity and for simplicity). 
>
> Once you set those workflow XML files up, you would then add 4 events
> to the events.xml file:
>
> e1->runs w1
> e2->runs w2
> e3->runs w3
> e4->runs w4
>
> Then with the above set up, you could theoretically recover from a workflow
> whether it's restart or not, assuming that the context of the restart is at 
> the task level, and assuming that you could wire up the same metadata
> context as was currently running by gleaning it from the WorkflowInstanceRepository.
> The WorkflowInstanceRepository *does* store the current state of the WorkflowInstance
> in terms of its:
>
> * start/stop task ISO 8601 times
> * start/stop overall workflow ISO 8601 times
> * current workflow instance metadata (the dynamic "context" that tasks read from and write to)
> * current task running
>
> The combination of the above information is enough to recover from a failure in an existing
> workflow task, at a coarse grained level. At a more finer grained level, to perform that kind
> of checkpointing and recovery is kind of difficult, and really domain specific to the type of
> workflow task that is being written. I was just discussing this with Sheryl in OODT-212 [3]
> and one methodology that could be employed is to force WorkflowTaskInstances to implement
> a rollback method (perhaps guided by JTA or some other Java-based transaction model).
> That would be a change to existing users and code bases with existing compiled tasks and
> code. However, since CAS-PGE [4] is a common mechanism for folks to run workflows
> and integrate science algorithms, changing there could provide the necessary insulation
> and directed path to implement a capability like OODT-212.
>
> To suppor the current functionality of the wengine branch, in the trunk workflow, I was 
> thinking of a simple patch mechanism that did the following:
>
> * upon restart of the WM, interrogate the Workflow Instance Repository and find any 
> Workflows that are still in one of the executing states (STARTED, PAUSED, PGE_EXEC,
> etc...) and then for those, roll them back to their started state, and go through execution
> again.
> * provide the capability for resume to actually "resume" a workflow in any one of the
> above states.
>
> What you are seeing with resume right now and with pause is that they only work on 
> workflows that the engine is currently tracking. Using the ThreadPoolWorkflowEngine
> extension point only keeps tracked and queued workflows that are executing in memory
> so long as the WM is up and running. As soon as it goes down, though the context and
> state information *is* persisted in the WorkflowInstanceRepository, the ThreadPoolWorkflowEngine
> loses track of the executing information in *its* area, and thus cannot pause or resume after
> the fact. So, pause/resume right now work, but they do so only in an active state, not for 
> checkpointing *after* a WM restart.
>
> But, we are working to support this in the trunk, and in various places around the OODT
> ecosystem, it is supported already. Hope that clarifies and answers your question. Thanks 
> much!
>
> Cheers,
> Chris
>
>
> [1] http://svn.apache.org/repos/asf/oodt/branches/wengine-branch
> [2] https://issues.apache.org/jira/browse/OODT-215
> [3] https://issues.apache.org/jira/browse/OODT-212
> [4] http://svn.apache.org/repos/asf/oodt/trunk/pge
>
> On Feb 28, 2012, at 2:23 PM, Keith Cummings wrote:
>
>   
>> Hi Sheryl.
>> I tried using the wmgr-client command line options to pause/resume/stop 
>> workflow instances as you suggested.  It worked great; thanks for 
>> pointing me there.  FYI, I'm using v0.3, if that matters.
>>
>> As for the repo getting wiped out when the Workflow Manager is 
>> restarted, that's not what I see.  I'm using the CAS Workflow Manager 
>> Monitor Web App to view what's going on.  It continues to show the 
>> partially completed workflow after multiple restarts of the Workflow 
>> Manager.  So the repo is not getting wiped out, but that old workflows 
>> don't seem to be accessible.  If I try to pause/resume/stop this 
>> workflow from the command line, I get the following message:
>>
>> "WARNING: WorkflowEngine: Attempt to resume workflow instance id: 
>> 119a9ebd-625a-11e1-b564-9bb1991d21af, however, this engine is not 
>> tracking its execution"
>>
>> So it's still in the repo, but the current manager isn't tracking it, so 
>> can't modify it.
>>
>> The use case I'm concerned about is losing partially completed workflows 
>> if the server crashes or is rebooted.  There are other ways to protect 
>> against this, but it would be nice if the Workflow Manager would simply 
>> start back up where it left off.
>>
>> Thanks,
>> Keith
>>
>>
>>
>> Sheryl John wrote:
>>     
>>> Hi Keith,
>>>
>>> First of all welcome to the OODT world!
>>>
>>> If you restart your Workflow Manager a new repo is created and all previous
>>> workflow instances are wiped out and so, the engine does not track previous
>>> tasks in your old workflow instance.
>>> You can check this using the command line option : --getWorkflowInsts or
>>> try ./wmgr-client --help for other cmd-lne options. If you're using
>>> 0.4-SNAPSHOT you'll see the latest command line menu.
>>>
>>> If you haven't restarted you wmgr you can start, pause and resume your
>>> workflow instance ( and this should complete the tasks in that workflow)
>>> with the cmd-line options.
>>> I usually restart the wmgr after I've modified policy files or changed
>>> environment variables/locations.
>>>
>>> Hope this helps!
>>>
>>> Sheryl
>>>
>>> On Tue, Feb 28, 2012 at 8:37 AM, Keith Cummings <kc...@nrao.edu> wrote:
>>>
>>>
>>>       
>>>> Hello,
>>>> I'm at NRAO and we are considering using OODT in the near future and I
>>>> just started playing around with it, specifically the the Workflow Manager.
>>>>
>>>> I was wondering if the Workflow Manager is able to restart workflows that
>>>> are partially complete.  For example, I started a workflow that has four
>>>> tasks.  During the processing of the second task, I stopped the Workflow
>>>> Manager, then restarted it.  The Workflow Manager does NOT continue
>>>> processing this task/workflow.  Is there a way to have it pick up where it
>>>> left off?
>>>>
>>>> Thanks,
>>>> Keith Cummings
>>>> NRAO
>>>> Socorro, NM
>>>>
>>>>
>>>>         
>>>
>>>
>>>       
>
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: chris.a.mattmann@nasa.gov
> WWW:   http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>   

Re: Continuing a workflow after restarting the workflow manager

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
Hi Keith,

For whatever reason I cannot find the original email for this thread in my 
email reader, gah. So, I am going to reply to this particular version of the
thread so sorry for those looking at the mail archives that will have a hard
time finding this.

In short, the functionality that you're asking for can be supported by the 0.3
workflow manager, it's just more cumbersome to implement, and it's not
something that the config supports out of the box, so you'll have to play
with it a bit. The wengine branch [1] supports this more flexibly and more 
natively, but we're in the process (in 0.4-SNAPSHOT) of porting over
this functionality so that 0.4 (or 0.5) and beyond it'll be supported natively.
I think Paul Ramirez linked you over to OODT-215 [2] which is a good place
to check the status overall of that effort.

Right now, with 0.3 if you want to support that functionality to dynamically
restart a workflow at some point in the pipeline, here's how you can do it.
Let's assume that you have a 3 task workflow, t1, t2 and t3.

You would set up 4 workflow XML documents that contain the following
variations in those task:

w1.workflow.xml -> only runs t1
w2.workflow.xml -> only runs t2
w3.workflow.xml -> only runs t3
w4.workflow.xml -> runs t1, then t2, then t3

(note: you could also do the other permeations for t1->t2, or t2->t3, but
I'm leaving those out for brevity and for simplicity). 

Once you set those workflow XML files up, you would then add 4 events
to the events.xml file:

e1->runs w1
e2->runs w2
e3->runs w3
e4->runs w4

Then with the above set up, you could theoretically recover from a workflow
whether it's restart or not, assuming that the context of the restart is at 
the task level, and assuming that you could wire up the same metadata
context as was currently running by gleaning it from the WorkflowInstanceRepository.
The WorkflowInstanceRepository *does* store the current state of the WorkflowInstance
in terms of its:

* start/stop task ISO 8601 times
* start/stop overall workflow ISO 8601 times
* current workflow instance metadata (the dynamic "context" that tasks read from and write to)
* current task running

The combination of the above information is enough to recover from a failure in an existing
workflow task, at a coarse grained level. At a more finer grained level, to perform that kind
of checkpointing and recovery is kind of difficult, and really domain specific to the type of
workflow task that is being written. I was just discussing this with Sheryl in OODT-212 [3]
and one methodology that could be employed is to force WorkflowTaskInstances to implement
a rollback method (perhaps guided by JTA or some other Java-based transaction model).
That would be a change to existing users and code bases with existing compiled tasks and
code. However, since CAS-PGE [4] is a common mechanism for folks to run workflows
and integrate science algorithms, changing there could provide the necessary insulation
and directed path to implement a capability like OODT-212.

To suppor the current functionality of the wengine branch, in the trunk workflow, I was 
thinking of a simple patch mechanism that did the following:

* upon restart of the WM, interrogate the Workflow Instance Repository and find any 
Workflows that are still in one of the executing states (STARTED, PAUSED, PGE_EXEC,
etc...) and then for those, roll them back to their started state, and go through execution
again.
* provide the capability for resume to actually "resume" a workflow in any one of the
above states.

What you are seeing with resume right now and with pause is that they only work on 
workflows that the engine is currently tracking. Using the ThreadPoolWorkflowEngine
extension point only keeps tracked and queued workflows that are executing in memory
so long as the WM is up and running. As soon as it goes down, though the context and
state information *is* persisted in the WorkflowInstanceRepository, the ThreadPoolWorkflowEngine
loses track of the executing information in *its* area, and thus cannot pause or resume after
the fact. So, pause/resume right now work, but they do so only in an active state, not for 
checkpointing *after* a WM restart.

But, we are working to support this in the trunk, and in various places around the OODT
ecosystem, it is supported already. Hope that clarifies and answers your question. Thanks 
much!

Cheers,
Chris


[1] http://svn.apache.org/repos/asf/oodt/branches/wengine-branch
[2] https://issues.apache.org/jira/browse/OODT-215
[3] https://issues.apache.org/jira/browse/OODT-212
[4] http://svn.apache.org/repos/asf/oodt/trunk/pge

On Feb 28, 2012, at 2:23 PM, Keith Cummings wrote:

> Hi Sheryl.
> I tried using the wmgr-client command line options to pause/resume/stop 
> workflow instances as you suggested.  It worked great; thanks for 
> pointing me there.  FYI, I'm using v0.3, if that matters.
> 
> As for the repo getting wiped out when the Workflow Manager is 
> restarted, that's not what I see.  I'm using the CAS Workflow Manager 
> Monitor Web App to view what's going on.  It continues to show the 
> partially completed workflow after multiple restarts of the Workflow 
> Manager.  So the repo is not getting wiped out, but that old workflows 
> don't seem to be accessible.  If I try to pause/resume/stop this 
> workflow from the command line, I get the following message:
> 
> "WARNING: WorkflowEngine: Attempt to resume workflow instance id: 
> 119a9ebd-625a-11e1-b564-9bb1991d21af, however, this engine is not 
> tracking its execution"
> 
> So it's still in the repo, but the current manager isn't tracking it, so 
> can't modify it.
> 
> The use case I'm concerned about is losing partially completed workflows 
> if the server crashes or is rebooted.  There are other ways to protect 
> against this, but it would be nice if the Workflow Manager would simply 
> start back up where it left off.
> 
> Thanks,
> Keith
> 
> 
> 
> Sheryl John wrote:
>> Hi Keith,
>> 
>> First of all welcome to the OODT world!
>> 
>> If you restart your Workflow Manager a new repo is created and all previous
>> workflow instances are wiped out and so, the engine does not track previous
>> tasks in your old workflow instance.
>> You can check this using the command line option : --getWorkflowInsts or
>> try ./wmgr-client --help for other cmd-lne options. If you're using
>> 0.4-SNAPSHOT you'll see the latest command line menu.
>> 
>> If you haven't restarted you wmgr you can start, pause and resume your
>> workflow instance ( and this should complete the tasks in that workflow)
>> with the cmd-line options.
>> I usually restart the wmgr after I've modified policy files or changed
>> environment variables/locations.
>> 
>> Hope this helps!
>> 
>> Sheryl
>> 
>> On Tue, Feb 28, 2012 at 8:37 AM, Keith Cummings <kc...@nrao.edu> wrote:
>> 
>> 
>>> Hello,
>>> I'm at NRAO and we are considering using OODT in the near future and I
>>> just started playing around with it, specifically the the Workflow Manager.
>>> 
>>> I was wondering if the Workflow Manager is able to restart workflows that
>>> are partially complete.  For example, I started a workflow that has four
>>> tasks.  During the processing of the second task, I stopped the Workflow
>>> Manager, then restarted it.  The Workflow Manager does NOT continue
>>> processing this task/workflow.  Is there a way to have it pick up where it
>>> left off?
>>> 
>>> Thanks,
>>> Keith Cummings
>>> NRAO
>>> Socorro, NM
>>> 
>>> 
>> 
>> 
>> 
>> 


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattmann@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Re: Continuing a workflow after restarting the workflow manager

Posted by Keith Cummings <kc...@nrao.edu>.
Hi Sheryl.
I tried using the wmgr-client command line options to pause/resume/stop 
workflow instances as you suggested.  It worked great; thanks for 
pointing me there.  FYI, I'm using v0.3, if that matters.

As for the repo getting wiped out when the Workflow Manager is 
restarted, that's not what I see.  I'm using the CAS Workflow Manager 
Monitor Web App to view what's going on.  It continues to show the 
partially completed workflow after multiple restarts of the Workflow 
Manager.  So the repo is not getting wiped out, but that old workflows 
don't seem to be accessible.  If I try to pause/resume/stop this 
workflow from the command line, I get the following message:

"WARNING: WorkflowEngine: Attempt to resume workflow instance id: 
119a9ebd-625a-11e1-b564-9bb1991d21af, however, this engine is not 
tracking its execution"

So it's still in the repo, but the current manager isn't tracking it, so 
can't modify it.

The use case I'm concerned about is losing partially completed workflows 
if the server crashes or is rebooted.  There are other ways to protect 
against this, but it would be nice if the Workflow Manager would simply 
start back up where it left off.

Thanks,
Keith



Sheryl John wrote:
> Hi Keith,
>
> First of all welcome to the OODT world!
>
> If you restart your Workflow Manager a new repo is created and all previous
> workflow instances are wiped out and so, the engine does not track previous
> tasks in your old workflow instance.
> You can check this using the command line option : --getWorkflowInsts or
> try ./wmgr-client --help for other cmd-lne options. If you're using
> 0.4-SNAPSHOT you'll see the latest command line menu.
>
> If you haven't restarted you wmgr you can start, pause and resume your
> workflow instance ( and this should complete the tasks in that workflow)
> with the cmd-line options.
> I usually restart the wmgr after I've modified policy files or changed
> environment variables/locations.
>
> Hope this helps!
>
> Sheryl
>
> On Tue, Feb 28, 2012 at 8:37 AM, Keith Cummings <kc...@nrao.edu> wrote:
>
>   
>> Hello,
>> I'm at NRAO and we are considering using OODT in the near future and I
>> just started playing around with it, specifically the the Workflow Manager.
>>
>> I was wondering if the Workflow Manager is able to restart workflows that
>> are partially complete.  For example, I started a workflow that has four
>> tasks.  During the processing of the second task, I stopped the Workflow
>> Manager, then restarted it.  The Workflow Manager does NOT continue
>> processing this task/workflow.  Is there a way to have it pick up where it
>> left off?
>>
>> Thanks,
>> Keith Cummings
>> NRAO
>> Socorro, NM
>>
>>     
>
>
>
>   

Re: Continuing a workflow after restarting the workflow manager

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
Hi Brian,

I was hoping this one would get you commenting and out of the woodwork on this! :)

OK, so yes it's my plan to support the state restore from the WorkflowInstanceRepository and from current state information
like wengine provides. I'm not sure it'll be part of 0.4 (my main goal is to support the 80% of the wengine features to run, 
deliver it integrated into 0.4 and then start pulling over the stuff folks want and use in 0.4.1 or 0.5).

Keith: welcome! I'll reply to your original thread shortly...

Cheers,
Chris

On Feb 28, 2012, at 9:14 PM, Brian Foster wrote:

> wengine supports this because wengine only ever cached small amounts of the workflows in memory, the rest was save to disk and loaded on an "as needed" basis, so if the server went down it, when brought back up it just re-cached and kept going... the truck workflow only persists workflow task instances, the workflow itself in not persisted... this is something Chris will have to comment on if this is being ported or not (or has already been ported)... as far as starting a failed workflow from where it left off, wengine supports changing workflow states at the task level which allowed you to just set the tasks' states back to initial state and the wengine would start running it again... again Chris would be the one to answer if this is being ported or has already been ported
> 
> -Brian 
> 
> On Feb 28, 2012, at 8:01 PM, "Ramirez, Paul M (388J)" <pa...@jpl.nasa.gov> wrote:
> 
>> Hi Keith,
>> 
>> This type of functionality may end up getting ported back into the trunk as I believe it was available in the wengine branch. This porting is being actively worked on and can generally be tracked here https://issues.apache.org/jira/browse/OODT-215 and specifically for rollback here https://issues.apache.org/jira/browse/OODT-212. Currently, Chris is leading the charge in getting this work back into the trunk but Brian Foster may also be able to comment on what is needed to support rolling back directly.
>> 
>> 
>> --Paul Ramirez
>> 
>> 
>> On Feb 28, 2012, at 10:54 AM, Sheryl John wrote:
>> 
>> Hi Keith,
>> 
>> First of all welcome to the OODT world!
>> 
>> If you restart your Workflow Manager a new repo is created and all previous
>> workflow instances are wiped out and so, the engine does not track previous
>> tasks in your old workflow instance.
>> You can check this using the command line option : --getWorkflowInsts or
>> try ./wmgr-client --help for other cmd-lne options. If you're using
>> 0.4-SNAPSHOT you'll see the latest command line menu.
>> 
>> If you haven't restarted you wmgr you can start, pause and resume your
>> workflow instance ( and this should complete the tasks in that workflow)
>> with the cmd-line options.
>> I usually restart the wmgr after I've modified policy files or changed
>> environment variables/locations.
>> 
>> Hope this helps!
>> 
>> Sheryl
>> 
>> On Tue, Feb 28, 2012 at 8:37 AM, Keith Cummings <kc...@nrao.edu>> wrote:
>> 
>> Hello,
>> I'm at NRAO and we are considering using OODT in the near future and I
>> just started playing around with it, specifically the the Workflow Manager.
>> 
>> I was wondering if the Workflow Manager is able to restart workflows that
>> are partially complete.  For example, I started a workflow that has four
>> tasks.  During the processing of the second task, I stopped the Workflow
>> Manager, then restarted it.  The Workflow Manager does NOT continue
>> processing this task/workflow.  Is there a way to have it pick up where it
>> left off?
>> 
>> Thanks,
>> Keith Cummings
>> NRAO
>> Socorro, NM
>> 
>> 
>> 
>> 
>> --
>> -Sheryl
>> 


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattmann@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Re: Continuing a workflow after restarting the workflow manager

Posted by Brian Foster <ho...@me.com>.
wengine supports this because wengine only ever cached small amounts of the workflows in memory, the rest was save to disk and loaded on an "as needed" basis, so if the server went down it, when brought back up it just re-cached and kept going... the truck workflow only persists workflow task instances, the workflow itself in not persisted... this is something Chris will have to comment on if this is being ported or not (or has already been ported)... as far as starting a failed workflow from where it left off, wengine supports changing workflow states at the task level which allowed you to just set the tasks' states back to initial state and the wengine would start running it again... again Chris would be the one to answer if this is being ported or has already been ported

-Brian 

On Feb 28, 2012, at 8:01 PM, "Ramirez, Paul M (388J)" <pa...@jpl.nasa.gov> wrote:

> Hi Keith,
> 
> This type of functionality may end up getting ported back into the trunk as I believe it was available in the wengine branch. This porting is being actively worked on and can generally be tracked here https://issues.apache.org/jira/browse/OODT-215 and specifically for rollback here https://issues.apache.org/jira/browse/OODT-212. Currently, Chris is leading the charge in getting this work back into the trunk but Brian Foster may also be able to comment on what is needed to support rolling back directly.
> 
> 
> --Paul Ramirez
> 
> 
> On Feb 28, 2012, at 10:54 AM, Sheryl John wrote:
> 
> Hi Keith,
> 
> First of all welcome to the OODT world!
> 
> If you restart your Workflow Manager a new repo is created and all previous
> workflow instances are wiped out and so, the engine does not track previous
> tasks in your old workflow instance.
> You can check this using the command line option : --getWorkflowInsts or
> try ./wmgr-client --help for other cmd-lne options. If you're using
> 0.4-SNAPSHOT you'll see the latest command line menu.
> 
> If you haven't restarted you wmgr you can start, pause and resume your
> workflow instance ( and this should complete the tasks in that workflow)
> with the cmd-line options.
> I usually restart the wmgr after I've modified policy files or changed
> environment variables/locations.
> 
> Hope this helps!
> 
> Sheryl
> 
> On Tue, Feb 28, 2012 at 8:37 AM, Keith Cummings <kc...@nrao.edu>> wrote:
> 
> Hello,
> I'm at NRAO and we are considering using OODT in the near future and I
> just started playing around with it, specifically the the Workflow Manager.
> 
> I was wondering if the Workflow Manager is able to restart workflows that
> are partially complete.  For example, I started a workflow that has four
> tasks.  During the processing of the second task, I stopped the Workflow
> Manager, then restarted it.  The Workflow Manager does NOT continue
> processing this task/workflow.  Is there a way to have it pick up where it
> left off?
> 
> Thanks,
> Keith Cummings
> NRAO
> Socorro, NM
> 
> 
> 
> 
> --
> -Sheryl
> 

Re: Continuing a workflow after restarting the workflow manager

Posted by "Ramirez, Paul M (388J)" <pa...@jpl.nasa.gov>.
Hi Keith,

This type of functionality may end up getting ported back into the trunk as I believe it was available in the wengine branch. This porting is being actively worked on and can generally be tracked here https://issues.apache.org/jira/browse/OODT-215 and specifically for rollback here https://issues.apache.org/jira/browse/OODT-212. Currently, Chris is leading the charge in getting this work back into the trunk but Brian Foster may also be able to comment on what is needed to support rolling back directly.


--Paul Ramirez


On Feb 28, 2012, at 10:54 AM, Sheryl John wrote:

Hi Keith,

First of all welcome to the OODT world!

If you restart your Workflow Manager a new repo is created and all previous
workflow instances are wiped out and so, the engine does not track previous
tasks in your old workflow instance.
You can check this using the command line option : --getWorkflowInsts or
try ./wmgr-client --help for other cmd-lne options. If you're using
0.4-SNAPSHOT you'll see the latest command line menu.

If you haven't restarted you wmgr you can start, pause and resume your
workflow instance ( and this should complete the tasks in that workflow)
with the cmd-line options.
I usually restart the wmgr after I've modified policy files or changed
environment variables/locations.

Hope this helps!

Sheryl

On Tue, Feb 28, 2012 at 8:37 AM, Keith Cummings <kc...@nrao.edu>> wrote:

Hello,
I'm at NRAO and we are considering using OODT in the near future and I
just started playing around with it, specifically the the Workflow Manager.

I was wondering if the Workflow Manager is able to restart workflows that
are partially complete.  For example, I started a workflow that has four
tasks.  During the processing of the second task, I stopped the Workflow
Manager, then restarted it.  The Workflow Manager does NOT continue
processing this task/workflow.  Is there a way to have it pick up where it
left off?

Thanks,
Keith Cummings
NRAO
Socorro, NM




--
-Sheryl


Re: Continuing a workflow after restarting the workflow manager

Posted by Sheryl John <sh...@gmail.com>.
Hi Keith,

First of all welcome to the OODT world!

If you restart your Workflow Manager a new repo is created and all previous
workflow instances are wiped out and so, the engine does not track previous
tasks in your old workflow instance.
You can check this using the command line option : --getWorkflowInsts or
try ./wmgr-client --help for other cmd-lne options. If you're using
0.4-SNAPSHOT you'll see the latest command line menu.

If you haven't restarted you wmgr you can start, pause and resume your
workflow instance ( and this should complete the tasks in that workflow)
with the cmd-line options.
I usually restart the wmgr after I've modified policy files or changed
environment variables/locations.

Hope this helps!

Sheryl

On Tue, Feb 28, 2012 at 8:37 AM, Keith Cummings <kc...@nrao.edu> wrote:

> Hello,
> I'm at NRAO and we are considering using OODT in the near future and I
> just started playing around with it, specifically the the Workflow Manager.
>
> I was wondering if the Workflow Manager is able to restart workflows that
> are partially complete.  For example, I started a workflow that has four
> tasks.  During the processing of the second task, I stopped the Workflow
> Manager, then restarted it.  The Workflow Manager does NOT continue
> processing this task/workflow.  Is there a way to have it pick up where it
> left off?
>
> Thanks,
> Keith Cummings
> NRAO
> Socorro, NM
>



-- 
-Sheryl