You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@oodt.apache.org by "Mallder, Valerie" <Va...@jhuapl.edu> on 2015/09/21 23:27:25 UTC

workflow control question

Hi All,

What is the easiest way to prevent an improper start of workflow?

I have a cron job that sends an event (i.e. once an hour) to my workflow manager telling workflow manager to start a workflow. But, the workflow could take a long time to run depending on how many files are available to be processed at that time. If the workflow takes longer than an hour to complete, the cron job is going to send another event to workflow manager telling it to start the workflow again. But I don't want it to start the workflow again if the previous workflow hasn't completed yet. It's perfectly OK for workflow manager to ignore that second request to start the workflow again and just wait for the next event to be sent by the cron job.

I don't want to reinvent the wheel. Has anyone already done something this?  I've looked into the workflow preconditions, and I created a WorkflowStatusCondition class to use as a precondition. But, I can't tell if it is possible to check the status of the first workflow instance from within a WorkflowConditionInstance object in a second workflow instance.

Does anyone know how I would do that?

Val


Valerie A. Mallder

New Horizons Deputy Mission System Engineer
The Johns Hopkins University/Applied Physics Laboratory
11100 Johns Hopkins Rd (MS 23-282), Laurel, MD 20723
240-228-7846 (Office) 410-504-2233 (Blackberry)

Re: workflow control question

Posted by Bruce Barkstrom <br...@gmail.com>.

Thanks for responding back.

Technically, it sounds as though you want to block process 2 until process
1 completes.
You're asking for mutual exclusion (or 'mutex').  I don't have a specific
suggestion in java,
but you might google 'mutex in java'.  Semaphores may offer one way of
putting together
a mutex function.

Bruce B.

On Wed, Sep 23, 2015 at 11:15 AM, Mallder, Valerie <
Valerie.Mallder@jhuapl.edu> wrote:

> Hi Bruce,
>
> Thank you for your well thought out response. Now I feel bad because my
> question is not nearly as big as your answer! :( I should have been a
> little bit more clear with my question. I am looking for a way within OODT
> to prevent a second instance of a workflow from starting before the first
> instance of the same workflow has finished.  I have been looking at the
> examples for using WorkflowConditions to control the operation of a
> workflow, but there are no specific examples that do what I would like to
> do. So, if anyone has an example of doing this kind of thing, please let me
> know. Otherwise I will have to grow my own.  I am currently building a
> custom WorkflowCondition and from within that condition class I will try to
> see if I can query the workflow manager to get information about the last
> running workflow.
>
> Thanks,
>
> Val
>
>
>
>
> Valerie A. Mallder
> New Horizons Deputy Mission System Engineer
> Johns Hopkins University/Applied Physics Laboratory
>
> > -----Original Message-----
> > From: Bruce Barkstrom [mailto:brbarkstrom@gmail.com]
> > Sent: Tuesday, September 22, 2015 8:29 AM
> > To: dev@oodt.apache.org
> > Subject: Re: workflow control question
> >
> > The usual approach to this kind of problem is to use techniques from
> concurrent
> > programming that involve scheduling.  I'm most familiar with Ada, where
> there's a
> > long history of work in this area.
> > A classic text is
> >
> > Klein, M. H., et al., 1993: A Practitioner's Handbook for Real-Time
> > Analysis: Guide to Rate Monotonic Analysis for Real-Time Systems, Kluwer,
> > Boston, MA
> >
> > These scheduling problems are usually divided into soft problems, where
> the
> > consequences of missing the schedule are not catastrophic and hard
> problems,
> > where missing the schedule causes a system failure that is capable of
> hurting
> > people.  The analysis in this reference suggests that there are two
> kinds of
> > approaches to scheduling that can be guaranteed to work: Rate Monotonic
> with
> > Earliest Deadline First (EDF) which allows you to take up about 70% of
> the
> > production capacity and scheduling with homogeneous processes which
> allows
> > you to move to nearly 100% of capacity.
> > You can think of these as the difference between the traffic flow of an
> interstate
> > highway and a railroad.  In the former, each car has some average
> distance
> > between itself and the other vehicles, but the car can move around
> within that
> > average distance.
> > In the latter, the distance between cars is pretty close to fixed.
> >
> > Two more recent works are
> >
> > Burns, A. and Wellings, A., 2007: Concurrent and Real-Time Programming
> in Ada,
> > Cambridge Univ. Press, Cambridge, UK
> >
> > and
> >
> > McCormick, J. W., Singhoff, F., and Hugues, J., 2011: Building Parallel,
> > Embedded, and Real-Time Applications with Ada, Cambridge Univ. Press,
> > Cambridge, UK
> >
> > Both of these works cover various approaches to building a production
> scheduling
> > environment.  The concerns include deadlock, resource starvation, and
> system
> > component failures.  In cases where the system uses priorities to help
> derive the
> > schedule, you can also have priority inversion.
> >
> > The scheduling problem has a pretty large literature since it shows up
> not just in
> > the IT environment, but also in any organization that has to deal with
> scheduling
> > scarce resources.  You might also want to take a look at the work by
> Leslie
> > Lamport:
> >
> > <http://research.microsoft.com/en-us/um/people/lamport/pubs/pubs.html>
> >
> > Lamport has an analysis tool known as TLA+ that has been used for formal
> > analysis of scheduling requirements.  This tool is available online.
> > You can go to the TLA Home Page
> >
> > <http://research.microsoft.com/en-us/um/people/lamport/tla/tla.html>
> >
> > and download it from there.
> >
> > As you might expect, this kind of problem is not trivial - and even
> experienced
> > people make design mistakes.
> >
> > I don't have an easy solution to suggest to you.  To do this kind of
> work properly,
> > you'll need to conduct an analysis based on the environment you'll be
> working in.
> > Also, as Lamport explains, you have to worry about the basic scheduling
> issues -
> > and then you need to deal with scheduling in the presence of unreliable
> > components.  The difference between professional scheduling analysis and
> simple
> > analysis is in whether the consequences of failure can kill people or
> just simply
> > manually restarting the system and then figuring out what got corrupted.
> >
> > Bruce B
> >
> >
> > On Mon, Sep 21, 2015 at 5:27 PM, Mallder, Valerie <
> Valerie.Mallder@jhuapl.edu>
> > wrote:
> >
> > > Hi All,
> > >
> > > What is the easiest way to prevent an improper start of workflow?
> > >
> > > I have a cron job that sends an event (i.e. once an hour) to my
> > > workflow manager telling workflow manager to start a workflow. But,
> > > the workflow could take a long time to run depending on how many files
> > > are available to be processed at that time. If the workflow takes
> > > longer than an hour to complete, the cron job is going to send another
> > > event to workflow manager telling it to start the workflow again. But
> > > I don't want it to start the workflow again if the previous workflow
> > > hasn't completed yet. It's perfectly OK for workflow manager to ignore
> > > that second request to start the workflow again and just wait for the
> > > next event to be sent by the cron job.
> > >
> > > I don't want to reinvent the wheel. Has anyone already done something
> > > this?  I've looked into the workflow preconditions, and I created a
> > > WorkflowStatusCondition class to use as a precondition. But, I can't
> > > tell if it is possible to check the status of the first workflow
> > > instance from within a WorkflowConditionInstance object in a second
> workflow
> > instance.
> > >
> > > Does anyone know how I would do that?
> > >
> > > Val
> > >
> > >
> > > Valerie A. Mallder
> > >
> > > New Horizons Deputy Mission System Engineer The Johns Hopkins
> > > University/Applied Physics Laboratory
> > > 11100 Johns Hopkins Rd (MS 23-282), Laurel, MD 20723
> > > 240-228-7846 (Office) 410-504-2233 (Blackberry)
> > >
> > >
>

Re: workflow control question

Posted by Chris Mattmann <ch...@gmail.com>.

Hey Val,

I will try and answer this today.

Cheers
Chris

—
Chris Mattmann
chris.mattmann@gmail.com






-----Original Message-----
From: "Mallder, Valerie" <Va...@jhuapl.edu>
Reply-To: <de...@oodt.apache.org>
Date: Wednesday, September 23, 2015 at 8:15 AM
To: "dev@oodt.apache.org" <de...@oodt.apache.org>
Subject: RE: workflow control question

>Hi Bruce,
>
>Thank you for your well thought out response. Now I feel bad because my
>question is not nearly as big as your answer! :( I should have been a
>little bit more clear with my question. I am looking for a way within
>OODT to prevent a second instance of a workflow from starting before the
>first instance of the same workflow has finished.  I have been looking at
>the examples for using WorkflowConditions to control the operation of a
>workflow, but there are no specific examples that do what I would like to
>do. So, if anyone has an example of doing this kind of thing, please let
>me know. Otherwise I will have to grow my own.  I am currently building a
>custom WorkflowCondition and from within that condition class I will try
>to see if I can query the workflow manager to get information about the
>last running workflow.
>
>Thanks,
>
>Val
>
>
>
>
>Valerie A. Mallder
>New Horizons Deputy Mission System Engineer
>Johns Hopkins University/Applied Physics Laboratory
>
>> -----Original Message-----
>> From: Bruce Barkstrom [mailto:brbarkstrom@gmail.com]
>> Sent: Tuesday, September 22, 2015 8:29 AM
>> To: dev@oodt.apache.org
>> Subject: Re: workflow control question
>> 
>> The usual approach to this kind of problem is to use techniques from
>>concurrent
>> programming that involve scheduling.  I'm most familiar with Ada, where
>>there's a
>> long history of work in this area.
>> A classic text is
>> 
>> Klein, M. H., et al., 1993: A Practitioner's Handbook for Real-Time
>> Analysis: Guide to Rate Monotonic Analysis for Real-Time Systems,
>>Kluwer,
>> Boston, MA
>> 
>> These scheduling problems are usually divided into soft problems, where
>>the
>> consequences of missing the schedule are not catastrophic and hard
>>problems,
>> where missing the schedule causes a system failure that is capable of
>>hurting
>> people.  The analysis in this reference suggests that there are two
>>kinds of
>> approaches to scheduling that can be guaranteed to work: Rate Monotonic
>>with
>> Earliest Deadline First (EDF) which allows you to take up about 70% of
>>the
>> production capacity and scheduling with homogeneous processes which
>>allows
>> you to move to nearly 100% of capacity.
>> You can think of these as the difference between the traffic flow of an
>>interstate
>> highway and a railroad.  In the former, each car has some average
>>distance
>> between itself and the other vehicles, but the car can move around
>>within that
>> average distance.
>> In the latter, the distance between cars is pretty close to fixed.
>> 
>> Two more recent works are
>> 
>> Burns, A. and Wellings, A., 2007: Concurrent and Real-Time Programming
>>in Ada,
>> Cambridge Univ. Press, Cambridge, UK
>> 
>> and
>> 
>> McCormick, J. W., Singhoff, F., and Hugues, J., 2011: Building Parallel,
>> Embedded, and Real-Time Applications with Ada, Cambridge Univ. Press,
>> Cambridge, UK
>> 
>> Both of these works cover various approaches to building a production
>>scheduling
>> environment.  The concerns include deadlock, resource starvation, and
>>system
>> component failures.  In cases where the system uses priorities to help
>>derive the
>> schedule, you can also have priority inversion.
>> 
>> The scheduling problem has a pretty large literature since it shows up
>>not just in
>> the IT environment, but also in any organization that has to deal with
>>scheduling
>> scarce resources.  You might also want to take a look at the work by
>>Leslie
>> Lamport:
>> 
>> <http://research.microsoft.com/en-us/um/people/lamport/pubs/pubs.html>
>> 
>> Lamport has an analysis tool known as TLA+ that has been used for formal
>> analysis of scheduling requirements.  This tool is available online.
>> You can go to the TLA Home Page
>> 
>> <http://research.microsoft.com/en-us/um/people/lamport/tla/tla.html>
>> 
>> and download it from there.
>> 
>> As you might expect, this kind of problem is not trivial - and even
>>experienced
>> people make design mistakes.
>> 
>> I don't have an easy solution to suggest to you.  To do this kind of
>>work properly,
>> you'll need to conduct an analysis based on the environment you'll be
>>working in.
>> Also, as Lamport explains, you have to worry about the basic scheduling
>>issues -
>> and then you need to deal with scheduling in the presence of unreliable
>> components.  The difference between professional scheduling analysis
>>and simple
>> analysis is in whether the consequences of failure can kill people or
>>just simply
>> manually restarting the system and then figuring out what got corrupted.
>> 
>> Bruce B
>> 
>> 
>> On Mon, Sep 21, 2015 at 5:27 PM, Mallder, Valerie <
>>Valerie.Mallder@jhuapl.edu>
>> wrote:
>> 
>> > Hi All,
>> >
>> > What is the easiest way to prevent an improper start of workflow?
>> >
>> > I have a cron job that sends an event (i.e. once an hour) to my
>> > workflow manager telling workflow manager to start a workflow. But,
>> > the workflow could take a long time to run depending on how many files
>> > are available to be processed at that time. If the workflow takes
>> > longer than an hour to complete, the cron job is going to send another
>> > event to workflow manager telling it to start the workflow again. But
>> > I don't want it to start the workflow again if the previous workflow
>> > hasn't completed yet. It's perfectly OK for workflow manager to ignore
>> > that second request to start the workflow again and just wait for the
>> > next event to be sent by the cron job.
>> >
>> > I don't want to reinvent the wheel. Has anyone already done something
>> > this?  I've looked into the workflow preconditions, and I created a
>> > WorkflowStatusCondition class to use as a precondition. But, I can't
>> > tell if it is possible to check the status of the first workflow
>> > instance from within a WorkflowConditionInstance object in a second
>>workflow
>> instance.
>> >
>> > Does anyone know how I would do that?
>> >
>> > Val
>> >
>> >
>> > Valerie A. Mallder
>> >
>> > New Horizons Deputy Mission System Engineer The Johns Hopkins
>> > University/Applied Physics Laboratory
>> > 11100 Johns Hopkins Rd (MS 23-282), Laurel, MD 20723
>> > 240-228-7846 (Office) 410-504-2233 (Blackberry)
>> >
>> >

RE: workflow control question

Posted by "Mallder, Valerie" <Va...@jhuapl.edu>.

Hi Bruce,

Thank you for your well thought out response. Now I feel bad because my question is not nearly as big as your answer! :( I should have been a little bit more clear with my question. I am looking for a way within OODT to prevent a second instance of a workflow from starting before the first instance of the same workflow has finished.  I have been looking at the examples for using WorkflowConditions to control the operation of a workflow, but there are no specific examples that do what I would like to do. So, if anyone has an example of doing this kind of thing, please let me know. Otherwise I will have to grow my own.  I am currently building a custom WorkflowCondition and from within that condition class I will try to see if I can query the workflow manager to get information about the last running workflow.

Thanks,

Val




Valerie A. Mallder
New Horizons Deputy Mission System Engineer
Johns Hopkins University/Applied Physics Laboratory

> -----Original Message-----
> From: Bruce Barkstrom [mailto:brbarkstrom@gmail.com]
> Sent: Tuesday, September 22, 2015 8:29 AM
> To: dev@oodt.apache.org
> Subject: Re: workflow control question
> 
> The usual approach to this kind of problem is to use techniques from concurrent
> programming that involve scheduling.  I'm most familiar with Ada, where there's a
> long history of work in this area.
> A classic text is
> 
> Klein, M. H., et al., 1993: A Practitioner's Handbook for Real-Time
> Analysis: Guide to Rate Monotonic Analysis for Real-Time Systems, Kluwer,
> Boston, MA
> 
> These scheduling problems are usually divided into soft problems, where the
> consequences of missing the schedule are not catastrophic and hard problems,
> where missing the schedule causes a system failure that is capable of hurting
> people.  The analysis in this reference suggests that there are two kinds of
> approaches to scheduling that can be guaranteed to work: Rate Monotonic with
> Earliest Deadline First (EDF) which allows you to take up about 70% of the
> production capacity and scheduling with homogeneous processes which allows
> you to move to nearly 100% of capacity.
> You can think of these as the difference between the traffic flow of an interstate
> highway and a railroad.  In the former, each car has some average distance
> between itself and the other vehicles, but the car can move around within that
> average distance.
> In the latter, the distance between cars is pretty close to fixed.
> 
> Two more recent works are
> 
> Burns, A. and Wellings, A., 2007: Concurrent and Real-Time Programming in Ada,
> Cambridge Univ. Press, Cambridge, UK
> 
> and
> 
> McCormick, J. W., Singhoff, F., and Hugues, J., 2011: Building Parallel,
> Embedded, and Real-Time Applications with Ada, Cambridge Univ. Press,
> Cambridge, UK
> 
> Both of these works cover various approaches to building a production scheduling
> environment.  The concerns include deadlock, resource starvation, and system
> component failures.  In cases where the system uses priorities to help derive the
> schedule, you can also have priority inversion.
> 
> The scheduling problem has a pretty large literature since it shows up not just in
> the IT environment, but also in any organization that has to deal with scheduling
> scarce resources.  You might also want to take a look at the work by Leslie
> Lamport:
> 
> <http://research.microsoft.com/en-us/um/people/lamport/pubs/pubs.html>
> 
> Lamport has an analysis tool known as TLA+ that has been used for formal
> analysis of scheduling requirements.  This tool is available online.
> You can go to the TLA Home Page
> 
> <http://research.microsoft.com/en-us/um/people/lamport/tla/tla.html>
> 
> and download it from there.
> 
> As you might expect, this kind of problem is not trivial - and even experienced
> people make design mistakes.
> 
> I don't have an easy solution to suggest to you.  To do this kind of work properly,
> you'll need to conduct an analysis based on the environment you'll be working in.
> Also, as Lamport explains, you have to worry about the basic scheduling issues -
> and then you need to deal with scheduling in the presence of unreliable
> components.  The difference between professional scheduling analysis and simple
> analysis is in whether the consequences of failure can kill people or just simply
> manually restarting the system and then figuring out what got corrupted.
> 
> Bruce B
> 
> 
> On Mon, Sep 21, 2015 at 5:27 PM, Mallder, Valerie < Valerie.Mallder@jhuapl.edu>
> wrote:
> 
> > Hi All,
> >
> > What is the easiest way to prevent an improper start of workflow?
> >
> > I have a cron job that sends an event (i.e. once an hour) to my
> > workflow manager telling workflow manager to start a workflow. But,
> > the workflow could take a long time to run depending on how many files
> > are available to be processed at that time. If the workflow takes
> > longer than an hour to complete, the cron job is going to send another
> > event to workflow manager telling it to start the workflow again. But
> > I don't want it to start the workflow again if the previous workflow
> > hasn't completed yet. It's perfectly OK for workflow manager to ignore
> > that second request to start the workflow again and just wait for the
> > next event to be sent by the cron job.
> >
> > I don't want to reinvent the wheel. Has anyone already done something
> > this?  I've looked into the workflow preconditions, and I created a
> > WorkflowStatusCondition class to use as a precondition. But, I can't
> > tell if it is possible to check the status of the first workflow
> > instance from within a WorkflowConditionInstance object in a second workflow
> instance.
> >
> > Does anyone know how I would do that?
> >
> > Val
> >
> >
> > Valerie A. Mallder
> >
> > New Horizons Deputy Mission System Engineer The Johns Hopkins
> > University/Applied Physics Laboratory
> > 11100 Johns Hopkins Rd (MS 23-282), Laurel, MD 20723
> > 240-228-7846 (Office) 410-504-2233 (Blackberry)
> >
> >

Re: workflow control question

Posted by Bruce Barkstrom <br...@gmail.com>.

The usual approach to this kind of problem is to use techniques
from concurrent programming that involve scheduling.  I'm most
familiar with Ada, where there's a long history of work in this area.
A classic text is

Klein, M. H., et al., 1993: A Practitioner's Handbook for Real-Time
Analysis: Guide to Rate Monotonic Analysis for Real-Time Systems,
Kluwer, Boston, MA

These scheduling problems are usually divided into soft problems,
where the consequences of missing the schedule are not catastrophic
and hard problems, where missing the schedule causes a system
failure that is capable of hurting people.  The analysis in this reference
suggests that there are two kinds of approaches to scheduling that
can be guaranteed to work: Rate Monotonic with Earliest
Deadline First (EDF) which allows you to take up about 70%
of the production capacity and scheduling with homogeneous
processes which allows you to move to nearly 100% of capacity.
You can think of these as the difference between the traffic flow
of an interstate highway and a railroad.  In the former, each car
has some average distance between itself and the other vehicles,
but the car can move around within that average distance.
In the latter, the distance between cars is pretty close to fixed.

Two more recent works are

Burns, A. and Wellings, A., 2007: Concurrent and Real-Time Programming
in Ada, Cambridge Univ. Press, Cambridge, UK

and

McCormick, J. W., Singhoff, F., and Hugues, J., 2011: Building Parallel,
Embedded, and Real-Time Applications with Ada, Cambridge Univ. Press,
Cambridge, UK

Both of these works cover various approaches to building a production
scheduling environment.  The concerns include deadlock, resource
starvation, and system component failures.  In cases where the system
uses priorities to help derive the schedule, you can also have priority
inversion.

The scheduling problem has a pretty large literature since it shows up
not just in the IT environment, but also in any organization that has to
deal with scheduling scarce resources.  You might also want to take a
look at the work by Leslie Lamport:

<http://research.microsoft.com/en-us/um/people/lamport/pubs/pubs.html>

Lamport has an analysis tool known as TLA+ that has been used for
formal analysis of scheduling requirements.  This tool is available online.
You can go to the TLA Home Page

<http://research.microsoft.com/en-us/um/people/lamport/tla/tla.html>

and download it from there.

As you might expect, this kind of problem is not trivial - and even
experienced
people make design mistakes.

I don't have an easy solution to suggest to you.  To do this kind of work
properly, you'll need to conduct an analysis based on the environment
you'll be working in.  Also, as Lamport explains, you have to worry about
the basic scheduling issues - and then you need to deal with scheduling
in the presence of unreliable components.  The difference between
professional
scheduling analysis and simple analysis is in whether the consequences of
failure can kill people or just simply manually restarting the system and
then
figuring out what got corrupted.

Bruce B

On Mon, Sep 21, 2015 at 5:27 PM, Mallder, Valerie <
Valerie.Mallder@jhuapl.edu> wrote:

> Hi All,
>
> What is the easiest way to prevent an improper start of workflow?
>
> I have a cron job that sends an event (i.e. once an hour) to my workflow
> manager telling workflow manager to start a workflow. But, the workflow
> could take a long time to run depending on how many files are available to
> be processed at that time. If the workflow takes longer than an hour to
> complete, the cron job is going to send another event to workflow manager
> telling it to start the workflow again. But I don't want it to start the
> workflow again if the previous workflow hasn't completed yet. It's
> perfectly OK for workflow manager to ignore that second request to start
> the workflow again and just wait for the next event to be sent by the cron
> job.
>
> I don't want to reinvent the wheel. Has anyone already done something
> this?  I've looked into the workflow preconditions, and I created a
> WorkflowStatusCondition class to use as a precondition. But, I can't tell
> if it is possible to check the status of the first workflow instance from
> within a WorkflowConditionInstance object in a second workflow instance.
>
> Does anyone know how I would do that?
>
> Val
>
>
> Valerie A. Mallder
>
> New Horizons Deputy Mission System Engineer
> The Johns Hopkins University/Applied Physics Laboratory
> 11100 Johns Hopkins Rd (MS 23-282), Laurel, MD 20723
> 240-228-7846 (Office) 410-504-2233 (Blackberry)
>
>