You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by Harvey Feng <h....@berkeley.edu> on 2012/05/02 15:00:12 UTC

Re: Review Request: Updates and additions to the MPI framework

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4768/
-----------------------------------------------------------

(Updated 2012-05-02 13:00:12.162674)


Review request for mesos, Benjamin Hindman, Charles Reiss, and Jessica.


Changes
-------

-Fixed lots of style issues...
-Converted to a no-executor framework


Summary
-------

Some updates to point out:

-nmpiexec.py
  -> 'mpdallexit' should terminate all slaves' mpds in the ring. I moved 'driver.stop()' to statusUpdate() so that it stops when all tasks have been finished, which occurs when the executor's launched mpd processes have all exited. 
-startmpd.py
  -> Didn't remove cleanup(), and added code in shutdown() that manually kills mpd processes. They might be useful during abnormal (cleanup) and normal (shutdown) framework/executor termination...I think. cleanup() still terminates all mpd's in the slave, but shutdown doesn't. 
  -> killtask() stops the mpd associated with the given tid. 
  -> Task states update nicely now. They correspond to the state of a task's associated mpd process.
-Readme
  -> Included additional info on how to setup and run MPICH2 1.2 and nmpiexec on OS X and Ubuntu/Linux


This addresses bug MESOS-183.
    https://issues.apache.org/jira/browse/MESOS-183


Diffs (updated)
-----

  frameworks/mpi/nmpiexec.py a5db9c0 
  frameworks/mpi/README.txt cdb4553 
  frameworks/mpi/startmpd.py 8eeba5e 
  frameworks/mpi/startmpd.sh 44faa05 

Diff: https://reviews.apache.org/r/4768/diff


Testing
-------


Thanks,

Harvey


Re: Review Request: Updates and additions to the MPI framework

Posted by Harvey Feng <h....@berkeley.edu>.

> On 2012-05-04 01:41:20, Benjamin Hindman wrote:
> > frameworks/mpi/README.txt, line 19
> > <https://reviews.apache.org/r/4768/diff/5/?file=105961#file105961line19>
> >
> >     I know it's obvious, but you might want to remind users that you'll need to install mpich2 on every machine in your cluster?.

Done.


> On 2012-05-04 01:41:20, Benjamin Hindman wrote:
> > frameworks/mpi/README.txt, line 23
> > <https://reviews.apache.org/r/4768/diff/5/?file=105961#file105961line23>
> >
> >     Kill whitespace.

Done.


> On 2012-05-04 01:41:20, Benjamin Hindman wrote:
> > frameworks/mpi/README.txt, line 25
> > <https://reviews.apache.org/r/4768/diff/5/?file=105961#file105961line25>
> >
> >     Kill whitespace.

Done.


> On 2012-05-04 01:41:20, Benjamin Hindman wrote:
> > frameworks/mpi/nmpiexec.py, line 26
> > <https://reviews.apache.org/r/4768/diff/5/?file=105963#file105963line26>
> >
> >     s/mpd slots/mpd(s)

Done


> On 2012-05-04 01:41:20, Benjamin Hindman wrote:
> > frameworks/mpi/nmpiexec.py, line 71
> > <https://reviews.apache.org/r/4768/diff/5/?file=105963#file105963line71>
> >
> >     If you move this check into the 'for offer in offers:' on line 60, then you'll only be doing the check and decline in one place (not also on lines 107 and 108).

Done


> On 2012-05-04 01:41:20, Benjamin Hindman wrote:
> > frameworks/mpi/nmpiexec.py, line 118
> > <https://reviews.apache.org/r/4768/diff/5/?file=105963#file105963line118>
> >
> >     Again, I'm not sure how ifhn_slave is going to be used. Can you elaborate?

I left this in pending Jessica's response...it's removed now.


> On 2012-05-04 01:41:20, Benjamin Hindman wrote:
> > frameworks/mpi/nmpiexec.py, line 121
> > <https://reviews.apache.org/r/4768/diff/5/?file=105963#file105963line121>
> >
> >     I love the long options! Thank you!


> On 2012-05-04 01:41:20, Benjamin Hindman wrote:
> > frameworks/mpi/nmpiexec.py, line 209
> > <https://reviews.apache.org/r/4768/diff/5/?file=105963#file105963line209>
> >
> >     +1 to Jessica's comment.

This simplifies the trailing '/' check/fix to just os.path.join(options.path, ""). 


> On 2012-05-04 01:41:20, Benjamin Hindman wrote:
> > frameworks/mpi/nmpiexec.py, line 221
> > <https://reviews.apache.org/r/4768/diff/5/?file=105963#file105963line221>
> >
> >     +1 to Jessica's comment.

Unchanged after using the above.


> On 2012-05-04 01:41:20, Benjamin Hindman wrote:
> > frameworks/mpi/nmpiexec.py, line 230
> > <https://reviews.apache.org/r/4768/diff/5/?file=105963#file105963line230>
> >
> >     mpdtraceerr is not used, kill it please.

Done.


- Harvey


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4768/#review7541
-----------------------------------------------------------


On 2012-05-02 13:29:50, Harvey Feng wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/4768/
> -----------------------------------------------------------
> 
> (Updated 2012-05-02 13:29:50)
> 
> 
> Review request for mesos, Benjamin Hindman, Charles Reiss, and Jessica.
> 
> 
> Summary
> -------
> 
> Some updates to point out:
> 
> -nmpiexec.py
>   -> 'mpdallexit' should terminate all slaves' mpds in the ring. I moved 'driver.stop()' to statusUpdate() so that it stops when all tasks have been finished, which occurs when the executor's launched mpd processes have all exited. 
> -startmpd.py
>   -> Didn't remove cleanup(), and added code in shutdown() that manually kills mpd processes. They might be useful during abnormal (cleanup) and normal (shutdown) framework/executor termination...I think. cleanup() still terminates all mpd's in the slave, but shutdown doesn't. 
>   -> killtask() stops the mpd associated with the given tid. 
>   -> Task states update nicely now. They correspond to the state of a task's associated mpd process.
> -Readme
>   -> Included additional info on how to setup and run MPICH2 1.2 and nmpiexec on OS X and Ubuntu/Linux
> 
> 
> This addresses bug MESOS-183.
>     https://issues.apache.org/jira/browse/MESOS-183
> 
> 
> Diffs
> -----
> 
>   frameworks/mpi/README.txt cdb4553 
>   frameworks/mpi/nmpiexec 517bdbc 
>   frameworks/mpi/nmpiexec.py a5db9c0 
>   frameworks/mpi/startmpd.py 8eeba5e 
>   frameworks/mpi/startmpd.sh 44faa05 
> 
> Diff: https://reviews.apache.org/r/4768/diff
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Harvey
> 
>


Re: Review Request: Updates and additions to the MPI framework

Posted by Benjamin Hindman <be...@berkeley.edu>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4768/#review7541
-----------------------------------------------------------


Awesome refactor! Super excited to get this committed (and I think Jessica will be too!). Just a few more minor points (please address Jessica's comments too). Thank you!


frameworks/mpi/README.txt
<https://reviews.apache.org/r/4768/#comment16703>

    I know it's obvious, but you might want to remind users that you'll need to install mpich2 on every machine in your cluster?.



frameworks/mpi/README.txt
<https://reviews.apache.org/r/4768/#comment16701>

    Kill whitespace.



frameworks/mpi/README.txt
<https://reviews.apache.org/r/4768/#comment16702>

    Kill whitespace.



frameworks/mpi/nmpiexec.py
<https://reviews.apache.org/r/4768/#comment16704>

    s/mpd slots/mpd(s)



frameworks/mpi/nmpiexec.py
<https://reviews.apache.org/r/4768/#comment16707>

    If you move this check into the 'for offer in offers:' on line 60, then you'll only be doing the check and decline in one place (not also on lines 107 and 108).



frameworks/mpi/nmpiexec.py
<https://reviews.apache.org/r/4768/#comment16705>

    Again, I'm not sure how ifhn_slave is going to be used. Can you elaborate?



frameworks/mpi/nmpiexec.py
<https://reviews.apache.org/r/4768/#comment16706>

    I love the long options! Thank you!



frameworks/mpi/nmpiexec.py
<https://reviews.apache.org/r/4768/#comment16710>

    +1 to Jessica's comment.



frameworks/mpi/nmpiexec.py
<https://reviews.apache.org/r/4768/#comment16708>

    +1 to Jessica's comment.



frameworks/mpi/nmpiexec.py
<https://reviews.apache.org/r/4768/#comment16709>

    mpdtraceerr is not used, kill it please.


- Benjamin


On 2012-05-02 13:29:50, Harvey Feng wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/4768/
> -----------------------------------------------------------
> 
> (Updated 2012-05-02 13:29:50)
> 
> 
> Review request for mesos, Benjamin Hindman, Charles Reiss, and Jessica.
> 
> 
> Summary
> -------
> 
> Some updates to point out:
> 
> -nmpiexec.py
>   -> 'mpdallexit' should terminate all slaves' mpds in the ring. I moved 'driver.stop()' to statusUpdate() so that it stops when all tasks have been finished, which occurs when the executor's launched mpd processes have all exited. 
> -startmpd.py
>   -> Didn't remove cleanup(), and added code in shutdown() that manually kills mpd processes. They might be useful during abnormal (cleanup) and normal (shutdown) framework/executor termination...I think. cleanup() still terminates all mpd's in the slave, but shutdown doesn't. 
>   -> killtask() stops the mpd associated with the given tid. 
>   -> Task states update nicely now. They correspond to the state of a task's associated mpd process.
> -Readme
>   -> Included additional info on how to setup and run MPICH2 1.2 and nmpiexec on OS X and Ubuntu/Linux
> 
> 
> This addresses bug MESOS-183.
>     https://issues.apache.org/jira/browse/MESOS-183
> 
> 
> Diffs
> -----
> 
>   frameworks/mpi/README.txt cdb4553 
>   frameworks/mpi/nmpiexec 517bdbc 
>   frameworks/mpi/nmpiexec.py a5db9c0 
>   frameworks/mpi/startmpd.py 8eeba5e 
>   frameworks/mpi/startmpd.sh 44faa05 
> 
> Diff: https://reviews.apache.org/r/4768/diff
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Harvey
> 
>


Re: Review Request: Updates and additions to the MPI framework

Posted by Jessica <je...@gmail.com>.

> On 2012-05-08 03:39:29, Benjamin Hindman wrote:
> > I'll get this checked in provided Jessica gives it a "Ship It". Thanks the the good work here, I intend to make it a demonstration of how to write frameworks on Mesos!

Scratch that. I voted to ship it and then remembered an issue that I don't think has been addressed yet. I posted this on the jira, but I haven't seen any changes for it: 

I'm running into the setuptools issue addressed in the test python framework: https://issues.apache.org/jira/browse/MESOS-130. The locations of the eggs added to PYTHONPATH in nmpiexec [now mpiexec-mesos?] need to be updated so that the Mesos/protobuf libraries (and setuptools) don't have to be installed on every node. 

There also seems to be an issue with Python detecting the Mesos module from the egg in src/python/dist--I couldn't import mesos until I unzipped the egg, no matter what directory I was in or how I modified the PYTHONPATH. [Update: I believe it's related to the fact that the mesos egg uses C/C++ extensions. I think it needs to use a setuptools module to list the package contents.]


- Jessica


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4768/#review7666
-----------------------------------------------------------


On 2012-05-08 01:29:06, Harvey Feng wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/4768/
> -----------------------------------------------------------
> 
> (Updated 2012-05-08 01:29:06)
> 
> 
> Review request for mesos, Benjamin Hindman, Charles Reiss, and Jessica.
> 
> 
> Summary
> -------
> 
> Some updates to point out:
> 
> -nmpiexec.py
>   -> 'mpdallexit' should terminate all slaves' mpds in the ring. I moved 'driver.stop()' to statusUpdate() so that it stops when all tasks have been finished, which occurs when the executor's launched mpd processes have all exited. 
> -startmpd.py
>   -> Didn't remove cleanup(), and added code in shutdown() that manually kills mpd processes. They might be useful during abnormal (cleanup) and normal (shutdown) framework/executor termination...I think. cleanup() still terminates all mpd's in the slave, but shutdown doesn't. 
>   -> killtask() stops the mpd associated with the given tid. 
>   -> Task states update nicely now. They correspond to the state of a task's associated mpd process.
> -Readme
>   -> Included additional info on how to setup and run MPICH2 1.2 and nmpiexec on OS X and Ubuntu/Linux
> 
> 
> This addresses bug MESOS-183.
>     https://issues.apache.org/jira/browse/MESOS-183
> 
> 
> Diffs
> -----
> 
>   frameworks/mpi/README.txt cdb4553 
>   frameworks/mpi/nmpiexec 517bdbc 
>   frameworks/mpi/nmpiexec.py a5db9c0 
>   frameworks/mpi/startmpd.py 8eeba5e 
>   frameworks/mpi/startmpd.sh 44faa05 
> 
> Diff: https://reviews.apache.org/r/4768/diff
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Harvey
> 
>


Re: Review Request: Updates and additions to the MPI framework

Posted by Benjamin Hindman <be...@berkeley.edu>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4768/#review7666
-----------------------------------------------------------

Ship it!


I'll get this checked in provided Jessica gives it a "Ship It". Thanks the the good work here, I intend to make it a demonstration of how to write frameworks on Mesos!

- Benjamin


On 2012-05-08 01:29:06, Harvey Feng wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/4768/
> -----------------------------------------------------------
> 
> (Updated 2012-05-08 01:29:06)
> 
> 
> Review request for mesos, Benjamin Hindman, Charles Reiss, and Jessica.
> 
> 
> Summary
> -------
> 
> Some updates to point out:
> 
> -nmpiexec.py
>   -> 'mpdallexit' should terminate all slaves' mpds in the ring. I moved 'driver.stop()' to statusUpdate() so that it stops when all tasks have been finished, which occurs when the executor's launched mpd processes have all exited. 
> -startmpd.py
>   -> Didn't remove cleanup(), and added code in shutdown() that manually kills mpd processes. They might be useful during abnormal (cleanup) and normal (shutdown) framework/executor termination...I think. cleanup() still terminates all mpd's in the slave, but shutdown doesn't. 
>   -> killtask() stops the mpd associated with the given tid. 
>   -> Task states update nicely now. They correspond to the state of a task's associated mpd process.
> -Readme
>   -> Included additional info on how to setup and run MPICH2 1.2 and nmpiexec on OS X and Ubuntu/Linux
> 
> 
> This addresses bug MESOS-183.
>     https://issues.apache.org/jira/browse/MESOS-183
> 
> 
> Diffs
> -----
> 
>   frameworks/mpi/README.txt cdb4553 
>   frameworks/mpi/nmpiexec 517bdbc 
>   frameworks/mpi/nmpiexec.py a5db9c0 
>   frameworks/mpi/startmpd.py 8eeba5e 
>   frameworks/mpi/startmpd.sh 44faa05 
> 
> Diff: https://reviews.apache.org/r/4768/diff
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Harvey
> 
>


Re: Review Request: Updates and additions to the MPI framework

Posted by Jessica <je...@gmail.com>.

> On 2012-05-25 18:12:45, Jessica wrote:
> > frameworks/mpi/mpiexec-mesos.py, line 61
> > <https://reviews.apache.org/r/4768/diff/8/?file=109962#file109962line61>
> >
> >     I've been puzzling over why the return is an issue with this revision since it wasn't with earlier revisions, and I believe it's due to the fact that the return is within the for loop. Before, this return was outside of the loop, so we'd always complete the loop. Once the loop completed, we'd check if we had enough mpds, and if so, we'd launch. With this revision, we may never get a chance to complete the loop and thus never check if we have enough resources. I think a break would solve the problem, provided it's acceptable not to respond to all of the offers. Otherwise, we need to make sure to decline all offers.
> 
> Harvey Feng wrote:
>     You're right, I missed this :(. A continue would make sure we decline all the offers if enough tasks are launched.

Yes; however, after further investigation, I've discovered that completing the function results in threading.Thread(target=mpiexec).start() getting called multiple times. So I guess it either needs to go back to how it was before (with the return before the loop) or there needs to be some kind of flag that indicates whether the thread has already been launched. (I used the flag approach, and it worked fine, but maybe you have a better idea.)


- Jessica


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4768/#review8116
-----------------------------------------------------------


On 2012-05-23 23:44:52, Harvey Feng wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/4768/
> -----------------------------------------------------------
> 
> (Updated 2012-05-23 23:44:52)
> 
> 
> Review request for mesos, Benjamin Hindman, Charles Reiss, and Jessica.
> 
> 
> Summary
> -------
> 
> Some updates to point out:
> 
> -nmpiexec.py
>   -> 'mpdallexit' should terminate all slaves' mpds in the ring. I moved 'driver.stop()' to statusUpdate() so that it stops when all tasks have been finished, which occurs when the executor's launched mpd processes have all exited. 
> -startmpd.py
>   -> Didn't remove cleanup(), and added code in shutdown() that manually kills mpd processes. They might be useful during abnormal (cleanup) and normal (shutdown) framework/executor termination...I think. cleanup() still terminates all mpd's in the slave, but shutdown doesn't. 
>   -> killtask() stops the mpd associated with the given tid. 
>   -> Task states update nicely now. They correspond to the state of a task's associated mpd process.
> -Readme
>   -> Included additional info on how to setup and run MPICH2 1.2 and nmpiexec on OS X and Ubuntu/Linux
> 
> 
> This addresses bug MESOS-183.
>     https://issues.apache.org/jira/browse/MESOS-183
> 
> 
> Diffs
> -----
> 
>   frameworks/mpi/startmpd.py 8eeba5e 
>   frameworks/mpi/startmpd.sh 44faa05 
>   frameworks/mpi/nmpiexec 517bdbc 
>   frameworks/mpi/nmpiexec.py a5db9c0 
>   frameworks/mpi/mpiexec-mesos PRE-CREATION 
>   frameworks/mpi/mpiexec-mesos.py PRE-CREATION 
>   frameworks/mpi/README.txt cdb4553 
> 
> Diff: https://reviews.apache.org/r/4768/diff
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Harvey
> 
>


Re: Review Request: Updates and additions to the MPI framework

Posted by Harvey Feng <h....@berkeley.edu>.

> On 2012-05-25 18:12:45, Jessica wrote:
> > frameworks/mpi/mpiexec-mesos.py, line 61
> > <https://reviews.apache.org/r/4768/diff/8/?file=109962#file109962line61>
> >
> >     I've been puzzling over why the return is an issue with this revision since it wasn't with earlier revisions, and I believe it's due to the fact that the return is within the for loop. Before, this return was outside of the loop, so we'd always complete the loop. Once the loop completed, we'd check if we had enough mpds, and if so, we'd launch. With this revision, we may never get a chance to complete the loop and thus never check if we have enough resources. I think a break would solve the problem, provided it's acceptable not to respond to all of the offers. Otherwise, we need to make sure to decline all offers.
> 
> Harvey Feng wrote:
>     You're right, I missed this :(. A continue would make sure we decline all the offers if enough tasks are launched.
> 
> Jessica wrote:
>     Yes; however, after further investigation, I've discovered that completing the function results in threading.Thread(target=mpiexec).start() getting called multiple times. So I guess it either needs to go back to how it was before (with the return before the loop) or there needs to be some kind of flag that indicates whether the thread has already been launched. (I used the flag approach, and it worked fine, but maybe you have a better idea.)

Fixed by adding a flag.


- Harvey


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4768/#review8116
-----------------------------------------------------------


On 2012-05-23 23:44:52, Harvey Feng wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/4768/
> -----------------------------------------------------------
> 
> (Updated 2012-05-23 23:44:52)
> 
> 
> Review request for mesos, Benjamin Hindman, Charles Reiss, and Jessica.
> 
> 
> Summary
> -------
> 
> Some updates to point out:
> 
> -nmpiexec.py
>   -> 'mpdallexit' should terminate all slaves' mpds in the ring. I moved 'driver.stop()' to statusUpdate() so that it stops when all tasks have been finished, which occurs when the executor's launched mpd processes have all exited. 
> -startmpd.py
>   -> Didn't remove cleanup(), and added code in shutdown() that manually kills mpd processes. They might be useful during abnormal (cleanup) and normal (shutdown) framework/executor termination...I think. cleanup() still terminates all mpd's in the slave, but shutdown doesn't. 
>   -> killtask() stops the mpd associated with the given tid. 
>   -> Task states update nicely now. They correspond to the state of a task's associated mpd process.
> -Readme
>   -> Included additional info on how to setup and run MPICH2 1.2 and nmpiexec on OS X and Ubuntu/Linux
> 
> 
> This addresses bug MESOS-183.
>     https://issues.apache.org/jira/browse/MESOS-183
> 
> 
> Diffs
> -----
> 
>   frameworks/mpi/startmpd.py 8eeba5e 
>   frameworks/mpi/startmpd.sh 44faa05 
>   frameworks/mpi/nmpiexec 517bdbc 
>   frameworks/mpi/nmpiexec.py a5db9c0 
>   frameworks/mpi/mpiexec-mesos PRE-CREATION 
>   frameworks/mpi/mpiexec-mesos.py PRE-CREATION 
>   frameworks/mpi/README.txt cdb4553 
> 
> Diff: https://reviews.apache.org/r/4768/diff
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Harvey
> 
>


Re: Review Request: Updates and additions to the MPI framework

Posted by Harvey Feng <h....@berkeley.edu>.

> On 2012-05-25 18:12:45, Jessica wrote:
> > frameworks/mpi/mpiexec-mesos.py, line 61
> > <https://reviews.apache.org/r/4768/diff/8/?file=109962#file109962line61>
> >
> >     I've been puzzling over why the return is an issue with this revision since it wasn't with earlier revisions, and I believe it's due to the fact that the return is within the for loop. Before, this return was outside of the loop, so we'd always complete the loop. Once the loop completed, we'd check if we had enough mpds, and if so, we'd launch. With this revision, we may never get a chance to complete the loop and thus never check if we have enough resources. I think a break would solve the problem, provided it's acceptable not to respond to all of the offers. Otherwise, we need to make sure to decline all offers.

You're right, I missed this :(. A continue would make sure we decline all the offers if enough tasks are launched.


- Harvey


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4768/#review8116
-----------------------------------------------------------


On 2012-05-23 23:44:52, Harvey Feng wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/4768/
> -----------------------------------------------------------
> 
> (Updated 2012-05-23 23:44:52)
> 
> 
> Review request for mesos, Benjamin Hindman, Charles Reiss, and Jessica.
> 
> 
> Summary
> -------
> 
> Some updates to point out:
> 
> -nmpiexec.py
>   -> 'mpdallexit' should terminate all slaves' mpds in the ring. I moved 'driver.stop()' to statusUpdate() so that it stops when all tasks have been finished, which occurs when the executor's launched mpd processes have all exited. 
> -startmpd.py
>   -> Didn't remove cleanup(), and added code in shutdown() that manually kills mpd processes. They might be useful during abnormal (cleanup) and normal (shutdown) framework/executor termination...I think. cleanup() still terminates all mpd's in the slave, but shutdown doesn't. 
>   -> killtask() stops the mpd associated with the given tid. 
>   -> Task states update nicely now. They correspond to the state of a task's associated mpd process.
> -Readme
>   -> Included additional info on how to setup and run MPICH2 1.2 and nmpiexec on OS X and Ubuntu/Linux
> 
> 
> This addresses bug MESOS-183.
>     https://issues.apache.org/jira/browse/MESOS-183
> 
> 
> Diffs
> -----
> 
>   frameworks/mpi/startmpd.py 8eeba5e 
>   frameworks/mpi/startmpd.sh 44faa05 
>   frameworks/mpi/nmpiexec 517bdbc 
>   frameworks/mpi/nmpiexec.py a5db9c0 
>   frameworks/mpi/mpiexec-mesos PRE-CREATION 
>   frameworks/mpi/mpiexec-mesos.py PRE-CREATION 
>   frameworks/mpi/README.txt cdb4553 
> 
> Diff: https://reviews.apache.org/r/4768/diff
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Harvey
> 
>


Re: Review Request: Updates and additions to the MPI framework

Posted by Jessica <je...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4768/#review8116
-----------------------------------------------------------



frameworks/mpi/mpiexec-mesos.py
<https://reviews.apache.org/r/4768/#comment17634>

    I've been puzzling over why the return is an issue with this revision since it wasn't with earlier revisions, and I believe it's due to the fact that the return is within the for loop. Before, this return was outside of the loop, so we'd always complete the loop. Once the loop completed, we'd check if we had enough mpds, and if so, we'd launch. With this revision, we may never get a chance to complete the loop and thus never check if we have enough resources. I think a break would solve the problem, provided it's acceptable not to respond to all of the offers. Otherwise, we need to make sure to decline all offers.


- Jessica


On 2012-05-23 23:44:52, Harvey Feng wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/4768/
> -----------------------------------------------------------
> 
> (Updated 2012-05-23 23:44:52)
> 
> 
> Review request for mesos, Benjamin Hindman, Charles Reiss, and Jessica.
> 
> 
> Summary
> -------
> 
> Some updates to point out:
> 
> -nmpiexec.py
>   -> 'mpdallexit' should terminate all slaves' mpds in the ring. I moved 'driver.stop()' to statusUpdate() so that it stops when all tasks have been finished, which occurs when the executor's launched mpd processes have all exited. 
> -startmpd.py
>   -> Didn't remove cleanup(), and added code in shutdown() that manually kills mpd processes. They might be useful during abnormal (cleanup) and normal (shutdown) framework/executor termination...I think. cleanup() still terminates all mpd's in the slave, but shutdown doesn't. 
>   -> killtask() stops the mpd associated with the given tid. 
>   -> Task states update nicely now. They correspond to the state of a task's associated mpd process.
> -Readme
>   -> Included additional info on how to setup and run MPICH2 1.2 and nmpiexec on OS X and Ubuntu/Linux
> 
> 
> This addresses bug MESOS-183.
>     https://issues.apache.org/jira/browse/MESOS-183
> 
> 
> Diffs
> -----
> 
>   frameworks/mpi/startmpd.py 8eeba5e 
>   frameworks/mpi/startmpd.sh 44faa05 
>   frameworks/mpi/nmpiexec 517bdbc 
>   frameworks/mpi/nmpiexec.py a5db9c0 
>   frameworks/mpi/mpiexec-mesos PRE-CREATION 
>   frameworks/mpi/mpiexec-mesos.py PRE-CREATION 
>   frameworks/mpi/README.txt cdb4553 
> 
> Diff: https://reviews.apache.org/r/4768/diff
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Harvey
> 
>


Re: Review Request: Updates and additions to the MPI framework

Posted by Benjamin Hindman <be...@berkeley.edu>.

> On June 1, 2012, 11:25 a.m., Jessica wrote:
> > Looks great--everything appears to work now. Thanks so much for your work!

I've just committed this, thanks Harvey!


- Benjamin


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4768/#review8282
-----------------------------------------------------------


On June 1, 2012, 10:21 a.m., Harvey Feng wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/4768/
> -----------------------------------------------------------
> 
> (Updated June 1, 2012, 10:21 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Charles Reiss, and Jessica.
> 
> 
> Description
> -------
> 
> Some updates to point out:
> 
> -nmpiexec.py
>   -> 'mpdallexit' should terminate all slaves' mpds in the ring. I moved 'driver.stop()' to statusUpdate() so that it stops when all tasks have been finished, which occurs when the executor's launched mpd processes have all exited. 
> -startmpd.py
>   -> Didn't remove cleanup(), and added code in shutdown() that manually kills mpd processes. They might be useful during abnormal (cleanup) and normal (shutdown) framework/executor termination...I think. cleanup() still terminates all mpd's in the slave, but shutdown doesn't. 
>   -> killtask() stops the mpd associated with the given tid. 
>   -> Task states update nicely now. They correspond to the state of a task's associated mpd process.
> -Readme
>   -> Included additional info on how to setup and run MPICH2 1.2 and nmpiexec on OS X and Ubuntu/Linux
> 
> 
> This addresses bug MESOS-183.
>     https://issues.apache.org/jira/browse/MESOS-183
> 
> 
> Diffs
> -----
> 
>   frameworks/mpi/README.txt cdb4553 
>   frameworks/mpi/mpiexec-mesos PRE-CREATION 
>   frameworks/mpi/mpiexec-mesos.py PRE-CREATION 
>   frameworks/mpi/nmpiexec 517bdbc 
>   frameworks/mpi/nmpiexec.py a5db9c0 
>   frameworks/mpi/startmpd.py 8eeba5e 
>   frameworks/mpi/startmpd.sh 44faa05 
> 
> Diff: https://reviews.apache.org/r/4768/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Harvey Feng
> 
>


Re: Review Request: Updates and additions to the MPI framework

Posted by Jessica <je...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4768/#review8282
-----------------------------------------------------------

Ship it!


Looks great--everything appears to work now. Thanks so much for your work!

- Jessica


On 2012-06-01 10:21:09, Harvey Feng wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/4768/
> -----------------------------------------------------------
> 
> (Updated 2012-06-01 10:21:09)
> 
> 
> Review request for mesos, Benjamin Hindman, Charles Reiss, and Jessica.
> 
> 
> Summary
> -------
> 
> Some updates to point out:
> 
> -nmpiexec.py
>   -> 'mpdallexit' should terminate all slaves' mpds in the ring. I moved 'driver.stop()' to statusUpdate() so that it stops when all tasks have been finished, which occurs when the executor's launched mpd processes have all exited. 
> -startmpd.py
>   -> Didn't remove cleanup(), and added code in shutdown() that manually kills mpd processes. They might be useful during abnormal (cleanup) and normal (shutdown) framework/executor termination...I think. cleanup() still terminates all mpd's in the slave, but shutdown doesn't. 
>   -> killtask() stops the mpd associated with the given tid. 
>   -> Task states update nicely now. They correspond to the state of a task's associated mpd process.
> -Readme
>   -> Included additional info on how to setup and run MPICH2 1.2 and nmpiexec on OS X and Ubuntu/Linux
> 
> 
> This addresses bug MESOS-183.
>     https://issues.apache.org/jira/browse/MESOS-183
> 
> 
> Diffs
> -----
> 
>   frameworks/mpi/README.txt cdb4553 
>   frameworks/mpi/mpiexec-mesos PRE-CREATION 
>   frameworks/mpi/mpiexec-mesos.py PRE-CREATION 
>   frameworks/mpi/nmpiexec 517bdbc 
>   frameworks/mpi/nmpiexec.py a5db9c0 
>   frameworks/mpi/startmpd.py 8eeba5e 
>   frameworks/mpi/startmpd.sh 44faa05 
> 
> Diff: https://reviews.apache.org/r/4768/diff
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Harvey
> 
>


Re: Review Request: Updates and additions to the MPI framework

Posted by Harvey Feng <h....@berkeley.edu>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4768/
-----------------------------------------------------------

(Updated 2012-06-01 10:21:09.375357)


Review request for mesos, Benjamin Hindman, Charles Reiss, and Jessica.


Changes
-------

L59 should fix the start mpiexec problem


Summary
-------

Some updates to point out:

-nmpiexec.py
  -> 'mpdallexit' should terminate all slaves' mpds in the ring. I moved 'driver.stop()' to statusUpdate() so that it stops when all tasks have been finished, which occurs when the executor's launched mpd processes have all exited. 
-startmpd.py
  -> Didn't remove cleanup(), and added code in shutdown() that manually kills mpd processes. They might be useful during abnormal (cleanup) and normal (shutdown) framework/executor termination...I think. cleanup() still terminates all mpd's in the slave, but shutdown doesn't. 
  -> killtask() stops the mpd associated with the given tid. 
  -> Task states update nicely now. They correspond to the state of a task's associated mpd process.
-Readme
  -> Included additional info on how to setup and run MPICH2 1.2 and nmpiexec on OS X and Ubuntu/Linux


This addresses bug MESOS-183.
    https://issues.apache.org/jira/browse/MESOS-183


Diffs (updated)
-----

  frameworks/mpi/README.txt cdb4553 
  frameworks/mpi/mpiexec-mesos PRE-CREATION 
  frameworks/mpi/mpiexec-mesos.py PRE-CREATION 
  frameworks/mpi/nmpiexec 517bdbc 
  frameworks/mpi/nmpiexec.py a5db9c0 
  frameworks/mpi/startmpd.py 8eeba5e 
  frameworks/mpi/startmpd.sh 44faa05 

Diff: https://reviews.apache.org/r/4768/diff


Testing
-------


Thanks,

Harvey


Re: Review Request: Updates and additions to the MPI framework

Posted by Harvey Feng <h....@berkeley.edu>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4768/
-----------------------------------------------------------

(Updated 2012-06-01 10:04:04.996783)


Review request for mesos, Benjamin Hindman, Charles Reiss, and Jessica.


Changes
-------

-Fixed starting of mpiexec and declining of offers once started
-Added command line arg support for MPI_PROGRAM


Summary
-------

Some updates to point out:

-nmpiexec.py
  -> 'mpdallexit' should terminate all slaves' mpds in the ring. I moved 'driver.stop()' to statusUpdate() so that it stops when all tasks have been finished, which occurs when the executor's launched mpd processes have all exited. 
-startmpd.py
  -> Didn't remove cleanup(), and added code in shutdown() that manually kills mpd processes. They might be useful during abnormal (cleanup) and normal (shutdown) framework/executor termination...I think. cleanup() still terminates all mpd's in the slave, but shutdown doesn't. 
  -> killtask() stops the mpd associated with the given tid. 
  -> Task states update nicely now. They correspond to the state of a task's associated mpd process.
-Readme
  -> Included additional info on how to setup and run MPICH2 1.2 and nmpiexec on OS X and Ubuntu/Linux


This addresses bug MESOS-183.
    https://issues.apache.org/jira/browse/MESOS-183


Diffs (updated)
-----

  frameworks/mpi/README.txt cdb4553 
  frameworks/mpi/mpiexec-mesos PRE-CREATION 
  frameworks/mpi/mpiexec-mesos.py PRE-CREATION 
  frameworks/mpi/nmpiexec 517bdbc 
  frameworks/mpi/nmpiexec.py a5db9c0 
  frameworks/mpi/startmpd.py 8eeba5e 
  frameworks/mpi/startmpd.sh 44faa05 

Diff: https://reviews.apache.org/r/4768/diff


Testing
-------


Thanks,

Harvey


Re: Review Request: Updates and additions to the MPI framework

Posted by Jessica <je...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4768/#review8115
-----------------------------------------------------------



frameworks/mpi/mpiexec-mesos.py
<https://reviews.apache.org/r/4768/#comment17633>

    I think maybe this needs to be a break rather than a return. Putting a return here means that we might never get to line 104, and the process will just sit there forever declining offers.


- Jessica


On 2012-05-23 23:44:52, Harvey Feng wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/4768/
> -----------------------------------------------------------
> 
> (Updated 2012-05-23 23:44:52)
> 
> 
> Review request for mesos, Benjamin Hindman, Charles Reiss, and Jessica.
> 
> 
> Summary
> -------
> 
> Some updates to point out:
> 
> -nmpiexec.py
>   -> 'mpdallexit' should terminate all slaves' mpds in the ring. I moved 'driver.stop()' to statusUpdate() so that it stops when all tasks have been finished, which occurs when the executor's launched mpd processes have all exited. 
> -startmpd.py
>   -> Didn't remove cleanup(), and added code in shutdown() that manually kills mpd processes. They might be useful during abnormal (cleanup) and normal (shutdown) framework/executor termination...I think. cleanup() still terminates all mpd's in the slave, but shutdown doesn't. 
>   -> killtask() stops the mpd associated with the given tid. 
>   -> Task states update nicely now. They correspond to the state of a task's associated mpd process.
> -Readme
>   -> Included additional info on how to setup and run MPICH2 1.2 and nmpiexec on OS X and Ubuntu/Linux
> 
> 
> This addresses bug MESOS-183.
>     https://issues.apache.org/jira/browse/MESOS-183
> 
> 
> Diffs
> -----
> 
>   frameworks/mpi/startmpd.py 8eeba5e 
>   frameworks/mpi/startmpd.sh 44faa05 
>   frameworks/mpi/nmpiexec 517bdbc 
>   frameworks/mpi/nmpiexec.py a5db9c0 
>   frameworks/mpi/mpiexec-mesos PRE-CREATION 
>   frameworks/mpi/mpiexec-mesos.py PRE-CREATION 
>   frameworks/mpi/README.txt cdb4553 
> 
> Diff: https://reviews.apache.org/r/4768/diff
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Harvey
> 
>


Re: Review Request: Updates and additions to the MPI framework

Posted by Harvey Feng <h....@berkeley.edu>.

> On 2012-05-31 15:42:48, Jessica wrote:
> > frameworks/mpi/mpiexec-mesos.py, line 25
> > <https://reviews.apache.org/r/4768/diff/8/?file=109962#file109962line25>
> >
> >     This is incorrect. Replace with
> >     
> >     call([MPICH2PATH + 'mpiexec', '-l', '-n', str(TOTAL_MPDS)] + MPI_PROGRAM)
> >     
> >     (note that MPI_PROGRAM is the entire program list appended to the first list)
> >     
> >     This allows MPI_PROGRAM to contain command line arguments.

Done


> On 2012-05-31 15:42:48, Jessica wrote:
> > frameworks/mpi/mpiexec-mesos.py, line 141
> > <https://reviews.apache.org/r/4768/diff/8/?file=109962#file109962line141>
> >
> >     To allow mpi_program to contain command line arguments, add the line 
> >     
> >     parser.disable_interspersed_args()
> >     
> >     (http://docs.python.org/library/optparse.html#querying-and-manipulating-your-option-parser)

Done


- Harvey


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4768/#review8256
-----------------------------------------------------------


On 2012-05-23 23:44:52, Harvey Feng wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/4768/
> -----------------------------------------------------------
> 
> (Updated 2012-05-23 23:44:52)
> 
> 
> Review request for mesos, Benjamin Hindman, Charles Reiss, and Jessica.
> 
> 
> Summary
> -------
> 
> Some updates to point out:
> 
> -nmpiexec.py
>   -> 'mpdallexit' should terminate all slaves' mpds in the ring. I moved 'driver.stop()' to statusUpdate() so that it stops when all tasks have been finished, which occurs when the executor's launched mpd processes have all exited. 
> -startmpd.py
>   -> Didn't remove cleanup(), and added code in shutdown() that manually kills mpd processes. They might be useful during abnormal (cleanup) and normal (shutdown) framework/executor termination...I think. cleanup() still terminates all mpd's in the slave, but shutdown doesn't. 
>   -> killtask() stops the mpd associated with the given tid. 
>   -> Task states update nicely now. They correspond to the state of a task's associated mpd process.
> -Readme
>   -> Included additional info on how to setup and run MPICH2 1.2 and nmpiexec on OS X and Ubuntu/Linux
> 
> 
> This addresses bug MESOS-183.
>     https://issues.apache.org/jira/browse/MESOS-183
> 
> 
> Diffs
> -----
> 
>   frameworks/mpi/startmpd.py 8eeba5e 
>   frameworks/mpi/startmpd.sh 44faa05 
>   frameworks/mpi/nmpiexec 517bdbc 
>   frameworks/mpi/nmpiexec.py a5db9c0 
>   frameworks/mpi/mpiexec-mesos PRE-CREATION 
>   frameworks/mpi/mpiexec-mesos.py PRE-CREATION 
>   frameworks/mpi/README.txt cdb4553 
> 
> Diff: https://reviews.apache.org/r/4768/diff
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Harvey
> 
>


Re: Review Request: Updates and additions to the MPI framework

Posted by Jessica <je...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4768/#review8256
-----------------------------------------------------------



frameworks/mpi/mpiexec-mesos.py
<https://reviews.apache.org/r/4768/#comment17880>

    This is incorrect. Replace with
    
    call([MPICH2PATH + 'mpiexec', '-l', '-n', str(TOTAL_MPDS)] + MPI_PROGRAM)
    
    (note that MPI_PROGRAM is the entire program list appended to the first list)
    
    This allows MPI_PROGRAM to contain command line arguments.



frameworks/mpi/mpiexec-mesos.py
<https://reviews.apache.org/r/4768/#comment17881>

    To allow mpi_program to contain command line arguments, add the line 
    
    parser.disable_interspersed_args()
    
    (http://docs.python.org/library/optparse.html#querying-and-manipulating-your-option-parser)


- Jessica


On 2012-05-23 23:44:52, Harvey Feng wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/4768/
> -----------------------------------------------------------
> 
> (Updated 2012-05-23 23:44:52)
> 
> 
> Review request for mesos, Benjamin Hindman, Charles Reiss, and Jessica.
> 
> 
> Summary
> -------
> 
> Some updates to point out:
> 
> -nmpiexec.py
>   -> 'mpdallexit' should terminate all slaves' mpds in the ring. I moved 'driver.stop()' to statusUpdate() so that it stops when all tasks have been finished, which occurs when the executor's launched mpd processes have all exited. 
> -startmpd.py
>   -> Didn't remove cleanup(), and added code in shutdown() that manually kills mpd processes. They might be useful during abnormal (cleanup) and normal (shutdown) framework/executor termination...I think. cleanup() still terminates all mpd's in the slave, but shutdown doesn't. 
>   -> killtask() stops the mpd associated with the given tid. 
>   -> Task states update nicely now. They correspond to the state of a task's associated mpd process.
> -Readme
>   -> Included additional info on how to setup and run MPICH2 1.2 and nmpiexec on OS X and Ubuntu/Linux
> 
> 
> This addresses bug MESOS-183.
>     https://issues.apache.org/jira/browse/MESOS-183
> 
> 
> Diffs
> -----
> 
>   frameworks/mpi/startmpd.py 8eeba5e 
>   frameworks/mpi/startmpd.sh 44faa05 
>   frameworks/mpi/nmpiexec 517bdbc 
>   frameworks/mpi/nmpiexec.py a5db9c0 
>   frameworks/mpi/mpiexec-mesos PRE-CREATION 
>   frameworks/mpi/mpiexec-mesos.py PRE-CREATION 
>   frameworks/mpi/README.txt cdb4553 
> 
> Diff: https://reviews.apache.org/r/4768/diff
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Harvey
> 
>


Re: Review Request: Updates and additions to the MPI framework

Posted by Harvey Feng <h....@berkeley.edu>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4768/
-----------------------------------------------------------

(Updated 2012-05-23 23:44:52.955726)


Review request for mesos, Benjamin Hindman, Charles Reiss, and Jessica.


Changes
-------

Added path to distribute egg in mpiexec-mesos and wrapped a line in the Readme. Also changed nmpiexec filenames to mpiexec-mesos.


Summary
-------

Some updates to point out:

-nmpiexec.py
  -> 'mpdallexit' should terminate all slaves' mpds in the ring. I moved 'driver.stop()' to statusUpdate() so that it stops when all tasks have been finished, which occurs when the executor's launched mpd processes have all exited. 
-startmpd.py
  -> Didn't remove cleanup(), and added code in shutdown() that manually kills mpd processes. They might be useful during abnormal (cleanup) and normal (shutdown) framework/executor termination...I think. cleanup() still terminates all mpd's in the slave, but shutdown doesn't. 
  -> killtask() stops the mpd associated with the given tid. 
  -> Task states update nicely now. They correspond to the state of a task's associated mpd process.
-Readme
  -> Included additional info on how to setup and run MPICH2 1.2 and nmpiexec on OS X and Ubuntu/Linux


This addresses bug MESOS-183.
    https://issues.apache.org/jira/browse/MESOS-183


Diffs (updated)
-----

  frameworks/mpi/startmpd.py 8eeba5e 
  frameworks/mpi/startmpd.sh 44faa05 
  frameworks/mpi/nmpiexec 517bdbc 
  frameworks/mpi/nmpiexec.py a5db9c0 
  frameworks/mpi/mpiexec-mesos PRE-CREATION 
  frameworks/mpi/mpiexec-mesos.py PRE-CREATION 
  frameworks/mpi/README.txt cdb4553 

Diff: https://reviews.apache.org/r/4768/diff


Testing
-------


Thanks,

Harvey


Re: Review Request: Updates and additions to the MPI framework

Posted by Harvey Feng <h....@berkeley.edu>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4768/
-----------------------------------------------------------

(Updated 2012-05-21 18:16:49.284988)


Review request for mesos, Benjamin Hindman, Charles Reiss, and Jessica.


Changes
-------

Added paths (with defaults) to mesos and protobuf eggs ($MESOS_EGG and $PROTOBUF_EGG) in mpiexec-mesos script (still named nmpiexec at the moment).


Summary
-------

Some updates to point out:

-nmpiexec.py
  -> 'mpdallexit' should terminate all slaves' mpds in the ring. I moved 'driver.stop()' to statusUpdate() so that it stops when all tasks have been finished, which occurs when the executor's launched mpd processes have all exited. 
-startmpd.py
  -> Didn't remove cleanup(), and added code in shutdown() that manually kills mpd processes. They might be useful during abnormal (cleanup) and normal (shutdown) framework/executor termination...I think. cleanup() still terminates all mpd's in the slave, but shutdown doesn't. 
  -> killtask() stops the mpd associated with the given tid. 
  -> Task states update nicely now. They correspond to the state of a task's associated mpd process.
-Readme
  -> Included additional info on how to setup and run MPICH2 1.2 and nmpiexec on OS X and Ubuntu/Linux


This addresses bug MESOS-183.
    https://issues.apache.org/jira/browse/MESOS-183


Diffs (updated)
-----

  frameworks/mpi/nmpiexec 517bdbc 
  frameworks/mpi/nmpiexec.py a5db9c0 
  frameworks/mpi/README.txt cdb4553 
  frameworks/mpi/startmpd.py 8eeba5e 
  frameworks/mpi/startmpd.sh 44faa05 

Diff: https://reviews.apache.org/r/4768/diff


Testing
-------


Thanks,

Harvey


Re: Review Request: Updates and additions to the MPI framework

Posted by Harvey Feng <h....@berkeley.edu>.

> On 2012-05-08 12:04:40, Jessica wrote:
> > frameworks/mpi/nmpiexec.py, line 187
> > <https://reviews.apache.org/r/4768/diff/6/?file=107646#file107646line187>
> >
> >     Super minor issue, so I'm still voting to ship it, but convention and consistency say that the = in default = "" should not be surrounded by whitespace.

Fixed.


- Harvey


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4768/#review7677
-----------------------------------------------------------


On 2012-05-21 18:16:49, Harvey Feng wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/4768/
> -----------------------------------------------------------
> 
> (Updated 2012-05-21 18:16:49)
> 
> 
> Review request for mesos, Benjamin Hindman, Charles Reiss, and Jessica.
> 
> 
> Summary
> -------
> 
> Some updates to point out:
> 
> -nmpiexec.py
>   -> 'mpdallexit' should terminate all slaves' mpds in the ring. I moved 'driver.stop()' to statusUpdate() so that it stops when all tasks have been finished, which occurs when the executor's launched mpd processes have all exited. 
> -startmpd.py
>   -> Didn't remove cleanup(), and added code in shutdown() that manually kills mpd processes. They might be useful during abnormal (cleanup) and normal (shutdown) framework/executor termination...I think. cleanup() still terminates all mpd's in the slave, but shutdown doesn't. 
>   -> killtask() stops the mpd associated with the given tid. 
>   -> Task states update nicely now. They correspond to the state of a task's associated mpd process.
> -Readme
>   -> Included additional info on how to setup and run MPICH2 1.2 and nmpiexec on OS X and Ubuntu/Linux
> 
> 
> This addresses bug MESOS-183.
>     https://issues.apache.org/jira/browse/MESOS-183
> 
> 
> Diffs
> -----
> 
>   frameworks/mpi/nmpiexec 517bdbc 
>   frameworks/mpi/nmpiexec.py a5db9c0 
>   frameworks/mpi/README.txt cdb4553 
>   frameworks/mpi/startmpd.py 8eeba5e 
>   frameworks/mpi/startmpd.sh 44faa05 
> 
> Diff: https://reviews.apache.org/r/4768/diff
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Harvey
> 
>


Re: Review Request: Updates and additions to the MPI framework

Posted by Jessica <je...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4768/#review7677
-----------------------------------------------------------

Ship it!


As far as I can tell, everything looks good. I won't have access to my test environment until Thursday, but you know you'll hear from me if there are any bugs. :) Thanks for your work on this!


frameworks/mpi/nmpiexec.py
<https://reviews.apache.org/r/4768/#comment16907>

    Super minor issue, so I'm still voting to ship it, but convention and consistency say that the = in default = "" should not be surrounded by whitespace.



frameworks/mpi/nmpiexec.py
<https://reviews.apache.org/r/4768/#comment16905>

    Clever technique. I definitely wouldn't have thought to join here. (I also verified that you won't have issues if the user doesn't define a path. os.path.join('','') == '', so it's perfect.)


- Jessica


On 2012-05-08 01:29:06, Harvey Feng wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/4768/
> -----------------------------------------------------------
> 
> (Updated 2012-05-08 01:29:06)
> 
> 
> Review request for mesos, Benjamin Hindman, Charles Reiss, and Jessica.
> 
> 
> Summary
> -------
> 
> Some updates to point out:
> 
> -nmpiexec.py
>   -> 'mpdallexit' should terminate all slaves' mpds in the ring. I moved 'driver.stop()' to statusUpdate() so that it stops when all tasks have been finished, which occurs when the executor's launched mpd processes have all exited. 
> -startmpd.py
>   -> Didn't remove cleanup(), and added code in shutdown() that manually kills mpd processes. They might be useful during abnormal (cleanup) and normal (shutdown) framework/executor termination...I think. cleanup() still terminates all mpd's in the slave, but shutdown doesn't. 
>   -> killtask() stops the mpd associated with the given tid. 
>   -> Task states update nicely now. They correspond to the state of a task's associated mpd process.
> -Readme
>   -> Included additional info on how to setup and run MPICH2 1.2 and nmpiexec on OS X and Ubuntu/Linux
> 
> 
> This addresses bug MESOS-183.
>     https://issues.apache.org/jira/browse/MESOS-183
> 
> 
> Diffs
> -----
> 
>   frameworks/mpi/README.txt cdb4553 
>   frameworks/mpi/nmpiexec 517bdbc 
>   frameworks/mpi/nmpiexec.py a5db9c0 
>   frameworks/mpi/startmpd.py 8eeba5e 
>   frameworks/mpi/startmpd.sh 44faa05 
> 
> Diff: https://reviews.apache.org/r/4768/diff
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Harvey
> 
>


Re: Review Request: Updates and additions to the MPI framework

Posted by Harvey Feng <h....@berkeley.edu>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4768/
-----------------------------------------------------------

(Updated 2012-05-08 01:29:06.075735)


Review request for mesos, Benjamin Hindman, Charles Reiss, and Jessica.


Changes
-------

-Updated some of the logic from the previous diff.
-Better usage of os.path.join()
-References to "nmpiexec*" have been changed to "mpiexec-mesos*", but the filenames still need to be changed...


Summary
-------

Some updates to point out:

-nmpiexec.py
  -> 'mpdallexit' should terminate all slaves' mpds in the ring. I moved 'driver.stop()' to statusUpdate() so that it stops when all tasks have been finished, which occurs when the executor's launched mpd processes have all exited. 
-startmpd.py
  -> Didn't remove cleanup(), and added code in shutdown() that manually kills mpd processes. They might be useful during abnormal (cleanup) and normal (shutdown) framework/executor termination...I think. cleanup() still terminates all mpd's in the slave, but shutdown doesn't. 
  -> killtask() stops the mpd associated with the given tid. 
  -> Task states update nicely now. They correspond to the state of a task's associated mpd process.
-Readme
  -> Included additional info on how to setup and run MPICH2 1.2 and nmpiexec on OS X and Ubuntu/Linux


This addresses bug MESOS-183.
    https://issues.apache.org/jira/browse/MESOS-183


Diffs (updated)
-----

  frameworks/mpi/README.txt cdb4553 
  frameworks/mpi/nmpiexec 517bdbc 
  frameworks/mpi/nmpiexec.py a5db9c0 
  frameworks/mpi/startmpd.py 8eeba5e 
  frameworks/mpi/startmpd.sh 44faa05 

Diff: https://reviews.apache.org/r/4768/diff


Testing
-------


Thanks,

Harvey


Re: Review Request: Updates and additions to the MPI framework

Posted by Harvey Feng <h....@berkeley.edu>.

> On 2012-05-07 19:19:17, Jessica wrote:
> > frameworks/mpi/nmpiexec.py, line 118
> > <https://reviews.apache.org/r/4768/diff/5/?file=105963#file105963line118>
> >
> >     I agree with Benjamin--ifhn_slave probably needs to go. I've created a new issue (MESOS-189) for configuring IP addresses that I think is relevant.
> >     
> >     Also... technically, you should use "is" and "is not" instead of "==" and "!=" to compare against None. This is the pythonic (and minutely faster) way.

Ok, switched from "==" to "is"


> On 2012-05-07 19:19:17, Jessica wrote:
> > frameworks/mpi/nmpiexec.py, line 186
> > <https://reviews.apache.org/r/4768/diff/5/?file=105963#file105963line186>
> >
> >     I'd recommend passing default='' to parser.add_option so that you don't have to check if path is None later in the code. (This will also simplify calls to os.path.join since you won't have to check the value of path first.)

Yup, this leaves just one "os.path.join()" call.


- Harvey


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4768/#review7651
-----------------------------------------------------------


On 2012-05-08 01:29:06, Harvey Feng wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/4768/
> -----------------------------------------------------------
> 
> (Updated 2012-05-08 01:29:06)
> 
> 
> Review request for mesos, Benjamin Hindman, Charles Reiss, and Jessica.
> 
> 
> Summary
> -------
> 
> Some updates to point out:
> 
> -nmpiexec.py
>   -> 'mpdallexit' should terminate all slaves' mpds in the ring. I moved 'driver.stop()' to statusUpdate() so that it stops when all tasks have been finished, which occurs when the executor's launched mpd processes have all exited. 
> -startmpd.py
>   -> Didn't remove cleanup(), and added code in shutdown() that manually kills mpd processes. They might be useful during abnormal (cleanup) and normal (shutdown) framework/executor termination...I think. cleanup() still terminates all mpd's in the slave, but shutdown doesn't. 
>   -> killtask() stops the mpd associated with the given tid. 
>   -> Task states update nicely now. They correspond to the state of a task's associated mpd process.
> -Readme
>   -> Included additional info on how to setup and run MPICH2 1.2 and nmpiexec on OS X and Ubuntu/Linux
> 
> 
> This addresses bug MESOS-183.
>     https://issues.apache.org/jira/browse/MESOS-183
> 
> 
> Diffs
> -----
> 
>   frameworks/mpi/README.txt cdb4553 
>   frameworks/mpi/nmpiexec 517bdbc 
>   frameworks/mpi/nmpiexec.py a5db9c0 
>   frameworks/mpi/startmpd.py 8eeba5e 
>   frameworks/mpi/startmpd.sh 44faa05 
> 
> Diff: https://reviews.apache.org/r/4768/diff
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Harvey
> 
>


Re: Review Request: Updates and additions to the MPI framework

Posted by Jessica <je...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4768/#review7651
-----------------------------------------------------------



frameworks/mpi/nmpiexec.py
<https://reviews.apache.org/r/4768/#comment16862>

    I agree with Benjamin--ifhn_slave probably needs to go. I've created a new issue (MESOS-189) for configuring IP addresses that I think is relevant.
    
    Also... technically, you should use "is" and "is not" instead of "==" and "!=" to compare against None. This is the pythonic (and minutely faster) way.



frameworks/mpi/nmpiexec.py
<https://reviews.apache.org/r/4768/#comment16860>

    I'd recommend passing default='' to parser.add_option so that you don't have to check if path is None later in the code. (This will also simplify calls to os.path.join since you won't have to check the value of path first.)



frameworks/mpi/nmpiexec.py
<https://reviews.apache.org/r/4768/#comment16861>

    setting path default to '' eliminates this check.


- Jessica


On 2012-05-02 13:29:50, Harvey Feng wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/4768/
> -----------------------------------------------------------
> 
> (Updated 2012-05-02 13:29:50)
> 
> 
> Review request for mesos, Benjamin Hindman, Charles Reiss, and Jessica.
> 
> 
> Summary
> -------
> 
> Some updates to point out:
> 
> -nmpiexec.py
>   -> 'mpdallexit' should terminate all slaves' mpds in the ring. I moved 'driver.stop()' to statusUpdate() so that it stops when all tasks have been finished, which occurs when the executor's launched mpd processes have all exited. 
> -startmpd.py
>   -> Didn't remove cleanup(), and added code in shutdown() that manually kills mpd processes. They might be useful during abnormal (cleanup) and normal (shutdown) framework/executor termination...I think. cleanup() still terminates all mpd's in the slave, but shutdown doesn't. 
>   -> killtask() stops the mpd associated with the given tid. 
>   -> Task states update nicely now. They correspond to the state of a task's associated mpd process.
> -Readme
>   -> Included additional info on how to setup and run MPICH2 1.2 and nmpiexec on OS X and Ubuntu/Linux
> 
> 
> This addresses bug MESOS-183.
>     https://issues.apache.org/jira/browse/MESOS-183
> 
> 
> Diffs
> -----
> 
>   frameworks/mpi/README.txt cdb4553 
>   frameworks/mpi/nmpiexec 517bdbc 
>   frameworks/mpi/nmpiexec.py a5db9c0 
>   frameworks/mpi/startmpd.py 8eeba5e 
>   frameworks/mpi/startmpd.sh 44faa05 
> 
> Diff: https://reviews.apache.org/r/4768/diff
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Harvey
> 
>


Re: Review Request: Updates and additions to the MPI framework

Posted by Harvey Feng <h....@berkeley.edu>.

> On 2012-05-02 14:58:27, Jessica wrote:
> > frameworks/mpi/nmpiexec.py, line 209
> > <https://reviews.apache.org/r/4768/diff/5/?file=105963#file105963line209>
> >
> >     os.path.join will handle this check and "do the right thing" wherever you use it, whether the user specifies the ending slash or not.

I used it to add a trailing '/' to the specified path, if necessary. Thanks!


> On 2012-05-02 14:58:27, Jessica wrote:
> > frameworks/mpi/nmpiexec.py, line 221
> > <https://reviews.apache.org/r/4768/diff/5/?file=105963#file105963line221>
> >
> >     mpd_cmd = os.path.join(MPICH2PATH, "mpd")

Unchanged after adding the above


- Harvey


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4768/#review7469
-----------------------------------------------------------


On 2012-05-02 13:29:50, Harvey Feng wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/4768/
> -----------------------------------------------------------
> 
> (Updated 2012-05-02 13:29:50)
> 
> 
> Review request for mesos, Benjamin Hindman, Charles Reiss, and Jessica.
> 
> 
> Summary
> -------
> 
> Some updates to point out:
> 
> -nmpiexec.py
>   -> 'mpdallexit' should terminate all slaves' mpds in the ring. I moved 'driver.stop()' to statusUpdate() so that it stops when all tasks have been finished, which occurs when the executor's launched mpd processes have all exited. 
> -startmpd.py
>   -> Didn't remove cleanup(), and added code in shutdown() that manually kills mpd processes. They might be useful during abnormal (cleanup) and normal (shutdown) framework/executor termination...I think. cleanup() still terminates all mpd's in the slave, but shutdown doesn't. 
>   -> killtask() stops the mpd associated with the given tid. 
>   -> Task states update nicely now. They correspond to the state of a task's associated mpd process.
> -Readme
>   -> Included additional info on how to setup and run MPICH2 1.2 and nmpiexec on OS X and Ubuntu/Linux
> 
> 
> This addresses bug MESOS-183.
>     https://issues.apache.org/jira/browse/MESOS-183
> 
> 
> Diffs
> -----
> 
>   frameworks/mpi/README.txt cdb4553 
>   frameworks/mpi/nmpiexec 517bdbc 
>   frameworks/mpi/nmpiexec.py a5db9c0 
>   frameworks/mpi/startmpd.py 8eeba5e 
>   frameworks/mpi/startmpd.sh 44faa05 
> 
> Diff: https://reviews.apache.org/r/4768/diff
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Harvey
> 
>


Re: Review Request: Updates and additions to the MPI framework

Posted by Jessica <je...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4768/#review7469
-----------------------------------------------------------



frameworks/mpi/nmpiexec.py
<https://reviews.apache.org/r/4768/#comment16516>

    os.path.join will handle this check and "do the right thing" wherever you use it, whether the user specifies the ending slash or not.



frameworks/mpi/nmpiexec.py
<https://reviews.apache.org/r/4768/#comment16517>

    mpd_cmd = os.path.join(MPICH2PATH, "mpd")


- Jessica


On 2012-05-02 13:29:50, Harvey Feng wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/4768/
> -----------------------------------------------------------
> 
> (Updated 2012-05-02 13:29:50)
> 
> 
> Review request for mesos, Benjamin Hindman, Charles Reiss, and Jessica.
> 
> 
> Summary
> -------
> 
> Some updates to point out:
> 
> -nmpiexec.py
>   -> 'mpdallexit' should terminate all slaves' mpds in the ring. I moved 'driver.stop()' to statusUpdate() so that it stops when all tasks have been finished, which occurs when the executor's launched mpd processes have all exited. 
> -startmpd.py
>   -> Didn't remove cleanup(), and added code in shutdown() that manually kills mpd processes. They might be useful during abnormal (cleanup) and normal (shutdown) framework/executor termination...I think. cleanup() still terminates all mpd's in the slave, but shutdown doesn't. 
>   -> killtask() stops the mpd associated with the given tid. 
>   -> Task states update nicely now. They correspond to the state of a task's associated mpd process.
> -Readme
>   -> Included additional info on how to setup and run MPICH2 1.2 and nmpiexec on OS X and Ubuntu/Linux
> 
> 
> This addresses bug MESOS-183.
>     https://issues.apache.org/jira/browse/MESOS-183
> 
> 
> Diffs
> -----
> 
>   frameworks/mpi/README.txt cdb4553 
>   frameworks/mpi/nmpiexec 517bdbc 
>   frameworks/mpi/nmpiexec.py a5db9c0 
>   frameworks/mpi/startmpd.py 8eeba5e 
>   frameworks/mpi/startmpd.sh 44faa05 
> 
> Diff: https://reviews.apache.org/r/4768/diff
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Harvey
> 
>


Re: Review Request: Updates and additions to the MPI framework

Posted by Harvey Feng <h....@berkeley.edu>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4768/
-----------------------------------------------------------

(Updated 2012-05-02 13:29:50.636167)


Review request for mesos, Benjamin Hindman, Charles Reiss, and Jessica.


Changes
-------

I'm not sure how to get 'git diff' results using 'mpiexec-mesos*' filenames, so I left the two files with 'nmpiexec' names unchanged. All text/code uses mpiexec-mesos though...

This patch also deals with many style issues from the last one, and eliminates the executor code (startmpd).


Summary
-------

Some updates to point out:

-nmpiexec.py
  -> 'mpdallexit' should terminate all slaves' mpds in the ring. I moved 'driver.stop()' to statusUpdate() so that it stops when all tasks have been finished, which occurs when the executor's launched mpd processes have all exited. 
-startmpd.py
  -> Didn't remove cleanup(), and added code in shutdown() that manually kills mpd processes. They might be useful during abnormal (cleanup) and normal (shutdown) framework/executor termination...I think. cleanup() still terminates all mpd's in the slave, but shutdown doesn't. 
  -> killtask() stops the mpd associated with the given tid. 
  -> Task states update nicely now. They correspond to the state of a task's associated mpd process.
-Readme
  -> Included additional info on how to setup and run MPICH2 1.2 and nmpiexec on OS X and Ubuntu/Linux


This addresses bug MESOS-183.
    https://issues.apache.org/jira/browse/MESOS-183


Diffs (updated)
-----

  frameworks/mpi/README.txt cdb4553 
  frameworks/mpi/nmpiexec 517bdbc 
  frameworks/mpi/nmpiexec.py a5db9c0 
  frameworks/mpi/startmpd.py 8eeba5e 
  frameworks/mpi/startmpd.sh 44faa05 

Diff: https://reviews.apache.org/r/4768/diff


Testing
-------


Thanks,

Harvey