You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@oodt.apache.org by "Wong, Cynthia L (388J)" <cy...@jpl.nasa.gov> on 2012/04/13 21:55:36 UTC

Start/Restart/Stop commands for FM, WM, RM

What are the behavior for these servers (FM, WM, RM) when the start/restart/stop commands are issued?

For example, when we issue command "fmgr start", it does the following:

Read properties files
Read policy files
Connect to database???

When we issue command "fmgr stop", does it wait for the current file transfer to complete?

When we issue command "wmgr stop", does it wait for the workflow tasks to complete and shut down gracefully?

Is there documentation or javadocs to describe the details about these commands?

Thanks,
Cynthia

--
Cynthia L. Wong
Data Management Systems and Technologies
Jet Propulsion Laboratory
4800 Oak Grove Drive, M/S  171-264, Pasadena, CA  91109-8099
Phone:  818/393-2572, Email: Cynthia.L.Wong@jpl.nasa.gov


Re: Start/Restart/Stop commands for FM, WM, RM

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
Hi Cynthia,

Anytime, hope the info was useful.

Thanks!

Cheers,
Chris

On Apr 17, 2012, at 7:28 AM, Wong, Cynthia L (388J) wrote:

> Chris,
> 
> Thanks,
> Cynthia
> 
> --
> Cynthia L. Wong
> Data Management Systems and Technologies
> Jet Propulsion Laboratory
> 4800 Oak Grove Drive, M/S  171-264, Pasadena, CA  91109-8099
> Phone:  818/393-2572, Email: Cynthia.L.Wong@jpl.nasa.gov
> 
> 
> 
> 
> On 4/15/12 7:23 PM, "Mattmann, Chris A (388J)"
> <ch...@jpl.nasa.gov> wrote:
> 
>> Hi Cynthia,
>> 
>> On Apr 13, 2012, at 12:55 PM, Wong, Cynthia L (388J) wrote:
>> 
>>> What are the behavior for these servers (FM, WM, RM) when the
>>> start/restart/stop commands are issued?
>>> 
>>> For example, when we issue command "fmgr start", it does the following:
>>> 
>>> Read properties files
>>> Read policy files
>>> Connect to database???
>> 
>> The answer to the above is largely dependent on what extension points
>> have been
>> configured, and what system properties have been configured for the
>> specific 
>> service component. For example, a good set of slides that describe this
>> process
>> are:
>> 
>> http://s.apache.org/GoS
>> 
>> e.g., slide 22 ,where it talks about what happens when you configure
>> e.g., the file manager with
>> a particular set of extension points.
>> 
>> You can find precisely what happened by looking at:
>> 
>> http://s.apache.org/0Ww
>> 
>> Check out the #loadConfiguration method. #refreshConfiguration allows an
>> external
>> client for the FM to reload its configuration and refresh it.
>> 
>>> 
>>> When we issue command "fmgr stop", does it wait for the current file
>>> transfer to complete?
>> 
>> Nope. Did you check out the filemgr script, here?
>> 
>> http://s.apache.org/m4y
>> 
>> stop just issues a kill to the current FM PID.
>> 
>> Also, I'm not sure it would make sense to wait for the current file
>> transfer to complete. That
>> could take an indefinite amount of time, and defeat the purpose of
>> calling "stop".
>> 
>>> 
>>> When we issue command "wmgr stop", does it wait for the workflow tasks
>>> to complete and shut down gracefully?
>> 
>> Nope, for the same reason as stated above -- "stop" means, "stop", not
>> "wait for workflow tasks to complete",
>> which again, could take an indefinite amount of time.
>> 
>>> 
>>> Is there documentation or javadocs to describe the details about these
>>> commands?
>> 
>> The wiki is probably the most active, up-to-date set of information. And
>> besides those, 
>> the component user guides, described here:
>> 
>> https://cwiki.apache.org/confluence/display/OODT/Component+Level+User+Guid
>> es
>> 
>> Cheers,
>> Chris
>> 
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Chris Mattmann, Ph.D.
>> Senior Computer Scientist
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 171-266B, Mailstop: 171-246
>> Email: chris.a.mattmann@nasa.gov
>> WWW:   http://sunset.usc.edu/~mattmann/
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Adjunct Assistant Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> 
> 


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattmann@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Re: Start/Restart/Stop commands for FM, WM, RM

Posted by "Wong, Cynthia L (388J)" <cy...@jpl.nasa.gov>.
Chris,

Thanks,
Cynthia

--
Cynthia L. Wong
Data Management Systems and Technologies
Jet Propulsion Laboratory
4800 Oak Grove Drive, M/S  171-264, Pasadena, CA  91109-8099
Phone:  818/393-2572, Email: Cynthia.L.Wong@jpl.nasa.gov




On 4/15/12 7:23 PM, "Mattmann, Chris A (388J)"
<ch...@jpl.nasa.gov> wrote:

>Hi Cynthia,
>
>On Apr 13, 2012, at 12:55 PM, Wong, Cynthia L (388J) wrote:
>
>> What are the behavior for these servers (FM, WM, RM) when the
>>start/restart/stop commands are issued?
>> 
>> For example, when we issue command "fmgr start", it does the following:
>> 
>> Read properties files
>> Read policy files
>> Connect to database???
>
>The answer to the above is largely dependent on what extension points
>have been
>configured, and what system properties have been configured for the
>specific 
>service component. For example, a good set of slides that describe this
>process
>are:
>
>http://s.apache.org/GoS
>
>e.g., slide 22 ,where it talks about what happens when you configure
>e.g., the file manager with
>a particular set of extension points.
>
>You can find precisely what happened by looking at:
>
>http://s.apache.org/0Ww
>
>Check out the #loadConfiguration method. #refreshConfiguration allows an
>external
>client for the FM to reload its configuration and refresh it.
>
>> 
>> When we issue command "fmgr stop", does it wait for the current file
>>transfer to complete?
>
>Nope. Did you check out the filemgr script, here?
>
>http://s.apache.org/m4y
>
>stop just issues a kill to the current FM PID.
>
>Also, I'm not sure it would make sense to wait for the current file
>transfer to complete. That
>could take an indefinite amount of time, and defeat the purpose of
>calling "stop".
>
>> 
>> When we issue command "wmgr stop", does it wait for the workflow tasks
>>to complete and shut down gracefully?
>
>Nope, for the same reason as stated above -- "stop" means, "stop", not
>"wait for workflow tasks to complete",
>which again, could take an indefinite amount of time.
>
>> 
>> Is there documentation or javadocs to describe the details about these
>>commands?
>
>The wiki is probably the most active, up-to-date set of information. And
>besides those, 
>the component user guides, described here:
>
>https://cwiki.apache.org/confluence/display/OODT/Component+Level+User+Guid
>es
>
>Cheers,
>Chris
>
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>Chris Mattmann, Ph.D.
>Senior Computer Scientist
>NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>Office: 171-266B, Mailstop: 171-246
>Email: chris.a.mattmann@nasa.gov
>WWW:   http://sunset.usc.edu/~mattmann/
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>Adjunct Assistant Professor, Computer Science Department
>University of Southern California, Los Angeles, CA 90089 USA
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>


Re: Start/Restart/Stop commands for FM, WM, RM

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
Hi Cynthia,

On Apr 13, 2012, at 12:55 PM, Wong, Cynthia L (388J) wrote:

> What are the behavior for these servers (FM, WM, RM) when the start/restart/stop commands are issued?
> 
> For example, when we issue command "fmgr start", it does the following:
> 
> Read properties files
> Read policy files
> Connect to database???

The answer to the above is largely dependent on what extension points have been
configured, and what system properties have been configured for the specific 
service component. For example, a good set of slides that describe this process
are:

http://s.apache.org/GoS

e.g., slide 22 ,where it talks about what happens when you configure e.g., the file manager with
a particular set of extension points.

You can find precisely what happened by looking at:

http://s.apache.org/0Ww

Check out the #loadConfiguration method. #refreshConfiguration allows an external
client for the FM to reload its configuration and refresh it.

> 
> When we issue command "fmgr stop", does it wait for the current file transfer to complete?

Nope. Did you check out the filemgr script, here?

http://s.apache.org/m4y

stop just issues a kill to the current FM PID. 

Also, I'm not sure it would make sense to wait for the current file transfer to complete. That
could take an indefinite amount of time, and defeat the purpose of calling "stop".

> 
> When we issue command "wmgr stop", does it wait for the workflow tasks to complete and shut down gracefully?

Nope, for the same reason as stated above -- "stop" means, "stop", not "wait for workflow tasks to complete", 
which again, could take an indefinite amount of time.

> 
> Is there documentation or javadocs to describe the details about these commands?

The wiki is probably the most active, up-to-date set of information. And besides those, 
the component user guides, described here:

https://cwiki.apache.org/confluence/display/OODT/Component+Level+User+Guides

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattmann@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


RE: Start/Restart/Stop commands for FM, WM, RM

Posted by "Resneck, Gabriel M (388J)" <Ga...@jpl.nasa.gov>.
I don't think that I have moved the functionality for pause/unpause or for reconstructing the RM state on startup over to the Apache repo yet.
I'll try to do that as soon as my schedule allows.

Gabe =)
________________________________________
From: Cheng, Cecilia S (388K) [cecilia.s.cheng@jpl.nasa.gov]
Sent: Wednesday, April 18, 2012 9:23 AM
To: dev@oodt.apache.org
Subject: Re: Start/Restart/Stop commands for FM, WM, RM

Hi Cynthia,

I think the most important point about shutting down the components
gracefully is so that tasks / jobs aren't lost. There are ways to achieve
that even though the 'stop' commands execute a brute 'kill'.

For example, you can pause the RM, so that no more jobs will be sent to
the batch stubs, then wait until all those running jobs are done before
you shut down the RM. Upon restart of the RM, the RM will rebuild its Q
from the state before the shutdown. Please note that these capabilities
are implemented in the branched RM. ACOS has tested the pause capability,
but not the rebuild capability.

You can do something similar to that in the WEngine as well.

-- cecilia

On 4/13/12 12:55 PM, "Wong, Cynthia L (388J)"
<cy...@jpl.nasa.gov> wrote:

>What are the behavior for these servers (FM, WM, RM) when the
>start/restart/stop commands are issued?
>
>For example, when we issue command "fmgr start", it does the following:
>
>Read properties files
>Read policy files
>Connect to database???
>
>When we issue command "fmgr stop", does it wait for the current file
>transfer to complete?
>
>When we issue command "wmgr stop", does it wait for the workflow tasks to
>complete and shut down gracefully?
>
>Is there documentation or javadocs to describe the details about these
>commands?
>
>Thanks,
>Cynthia
>
>--
>Cynthia L. Wong
>Data Management Systems and Technologies
>Jet Propulsion Laboratory
>4800 Oak Grove Drive, M/S  171-264, Pasadena, CA  91109-8099
>Phone:  818/393-2572, Email: Cynthia.L.Wong@jpl.nasa.gov
>


Re: Start/Restart/Stop commands for FM, WM, RM

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
Hi Cecilia,

On Apr 18, 2012, at 9:23 AM, Cheng, Cecilia S (388K) wrote:

> Hi Cynthia,
> 
> I think the most important point about shutting down the components
> gracefully is so that tasks / jobs aren't lost. There are ways to achieve
> that even though the 'stop' commands execute a brute 'kill'.

Agreed -- one way is to simply maintain state, which the current Apache
OODT trunk RM and WM do, for the RM in its Job Repository (interface), 
and in the WM via the WorkflowInstanceRepository (interface). These 
are updated periodically at different stages of execution.

Then, the goal is to take that state, and have some simple commands
to read that state on startup, and decide what to do. This was what 
Brian really did great in his wengine-branch and what we are working
to do in the trunk right now.

> 
> For example, you can pause the RM, so that no more jobs will be sent to
> the batch stubs, then wait until all those running jobs are done before
> you shut down the RM. Upon restart of the RM, the RM will rebuild its Q
> from the state before the shutdown. Please note that these capabilities
> are implemented in the branched RM.

For the community when Cecilia says "branched RM", she is talking about
the work that the ACOS project is doing at JPL. I'm encouraging them to
work with the Apache community here on list to get those patches vetted
by the PMC, and into the next version of the trunk (hopefully 0.5 once
0.4 is released -- yes I know we are behind -- /flails self ;) ).

> ACOS has tested the pause capability,
> but not the rebuild capability.

All that's needed in trunk workflow is a command to read the current JobRepository
history and make some decisions as to what to do with Jobs that aren't finished, 
based on the information captured about them. Folks are welcome to file a JIRA
issue and work towards a solution for that. I'd be happy to help shepherd it in.

> 
> You can do something similar to that in the WEngine as well.

Agreed. This is what Brian Foster and I already suggested doing.

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattmann@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Re: Start/Restart/Stop commands for FM, WM, RM

Posted by "Cheng, Cecilia S (388K)" <ce...@jpl.nasa.gov>.
Hi Cynthia,

I think the most important point about shutting down the components
gracefully is so that tasks / jobs aren't lost. There are ways to achieve
that even though the 'stop' commands execute a brute 'kill'.

For example, you can pause the RM, so that no more jobs will be sent to
the batch stubs, then wait until all those running jobs are done before
you shut down the RM. Upon restart of the RM, the RM will rebuild its Q
from the state before the shutdown. Please note that these capabilities
are implemented in the branched RM. ACOS has tested the pause capability,
but not the rebuild capability.

You can do something similar to that in the WEngine as well.

-- cecilia

On 4/13/12 12:55 PM, "Wong, Cynthia L (388J)"
<cy...@jpl.nasa.gov> wrote:

>What are the behavior for these servers (FM, WM, RM) when the
>start/restart/stop commands are issued?
>
>For example, when we issue command "fmgr start", it does the following:
>
>Read properties files
>Read policy files
>Connect to database???
>
>When we issue command "fmgr stop", does it wait for the current file
>transfer to complete?
>
>When we issue command "wmgr stop", does it wait for the workflow tasks to
>complete and shut down gracefully?
>
>Is there documentation or javadocs to describe the details about these
>commands?
>
>Thanks,
>Cynthia
>
>--
>Cynthia L. Wong
>Data Management Systems and Technologies
>Jet Propulsion Laboratory
>4800 Oak Grove Drive, M/S  171-264, Pasadena, CA  91109-8099
>Phone:  818/393-2572, Email: Cynthia.L.Wong@jpl.nasa.gov
>