You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by Vinod Kone <vi...@gmail.com> on 2012/08/30 23:25:35 UTC

Review Request: Short term fix for ignoring executor resources during launch

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6863/
-----------------------------------------------------------

Review request for mesos, Benjamin Hindman, Brian Wickman, Jie Yu, and Ben Mahler.


Description
-------

This is a short term fix.

The basic idea is that we split the launcher->run() into setup() and launch(), and assign the executor process to cgroup after the launch()


Diffs
-----

  src/launcher/launcher.hpp 89cfa8c 
  src/launcher/launcher.cpp 5267ac2 
  src/slave/cgroups_isolation_module.cpp 8a121e0 
  third_party/libprocess/include/stout/os.hpp ffe56b6 

Diff: https://reviews.apache.org/r/6863/diff/


Testing
-------

make check succeeds on linux

but, the cgroups tests seemed to be disabled on my nest machine. are there any flags that i should use? i suspect the reason for the disabled tests on linux is because i run bootstrap on my mac and build the bootstrapped branch on nest.


Thanks,

Vinod Kone


Re: Review Request: Short term fix for ignoring executor resources during launch

Posted by Vinod Kone <vi...@gmail.com>.

> On Aug. 31, 2012, 1:45 a.m., Jie Yu wrote:
> > I don't think you will be able to run the test on a nest machine as cgroups code requires root permission.
> > 
> > You may want to install a virtual machine using vmware fusion.
> > 
> > The ubuntu distribution I used was here:
> > http://releases.ubuntu.com/lucid/ubuntu-10.04.4-desktop-amd64.iso
> > 
> > For a temp fix, I think this one is OK. The only concern I have is that the download still happens in the child process. If the OS does not reclaim the memory promptly, the assignTask might fail because you try to assign a process that uses more memory than the memory limit of the target cgroup.
> > 
> > For the long term, I think we can refactor the ExecutorLauncher so that we always download executors in parent, and keep the logic in child process as simple as possible.

finally, was able to run Cgroups tests on my vm.

I'm getting the following error (below), which I think is unrelated to my change.

root@ubuntu:/home/vinod/workspace/mesos/build# GLOG_v=1 ./bin/mesos-tests.sh --gtest_filter="*Cgroups*" --verbose
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0910 15:51:01.905972  2118 process.cpp:1395] libprocess is initialized on 127.0.1.1:45161 for 1 cpus
I0910 15:51:01.906347  2118 logging.cpp:185] Logging to STDERR
Source directory: /home/vinod/workspace/mesos
Build directory: /home/vinod/workspace/mesos/build
Note: Google Test filter = *Cgroups*-
[==========] Running 17 tests from 3 test cases.
[----------] Global test environment set-up.
[----------] 4 tests from CgroupsSimpleTest
[ RUN      ] CgroupsSimpleTest.ROOT_CGROUPS_Enabled
[       OK ] CgroupsSimpleTest.ROOT_CGROUPS_Enabled (0 ms)
[ RUN      ] CgroupsSimpleTest.ROOT_CGROUPS_Subsystems
[       OK ] CgroupsSimpleTest.ROOT_CGROUPS_Subsystems (1 ms)
[ RUN      ] CgroupsSimpleTest.ROOT_CGROUPS_CreateRemoveHierarchy
[       OK ] CgroupsSimpleTest.ROOT_CGROUPS_CreateRemoveHierarchy (1 ms)
[ RUN      ] CgroupsSimpleTest.ROOT_CGROUPS_CreateRemoveCgroup
[       OK ] CgroupsSimpleTest.ROOT_CGROUPS_CreateRemoveCgroup (3 ms)
[----------] 4 tests from CgroupsSimpleTest (5 ms total)

[----------] 12 tests from CgroupsTest
[ RUN      ] CgroupsTest.ROOT_CGROUPS_Busy
[       OK ] CgroupsTest.ROOT_CGROUPS_Busy (15 ms)
[ RUN      ] CgroupsTest.ROOT_CGROUPS_SubsystemsHierarchy
[       OK ] CgroupsTest.ROOT_CGROUPS_SubsystemsHierarchy (6 ms)
[ RUN      ] CgroupsTest.ROOT_CGROUPS_CheckHierarchy
[       OK ] CgroupsTest.ROOT_CGROUPS_CheckHierarchy (4 ms)
[ RUN      ] CgroupsTest.ROOT_CGROUPS_CheckHierarchySubsystems
[       OK ] CgroupsTest.ROOT_CGROUPS_CheckHierarchySubsystems (22 ms)
[ RUN      ] CgroupsTest.ROOT_CGROUPS_ReadControl
[       OK ] CgroupsTest.ROOT_CGROUPS_ReadControl (5 ms)
[ RUN      ] CgroupsTest.ROOT_CGROUPS_WriteControl
[       OK ] CgroupsTest.ROOT_CGROUPS_WriteControl (6 ms)
[ RUN      ] CgroupsTest.ROOT_CGROUPS_GetCgroups
[       OK ] CgroupsTest.ROOT_CGROUPS_GetCgroups (6 ms)
[ RUN      ] CgroupsTest.ROOT_CGROUPS_GetTasks
[       OK ] CgroupsTest.ROOT_CGROUPS_GetTasks (12 ms)
[ RUN      ] CgroupsTest.ROOT_CGROUPS_ListenEvent
../../src/tests/cgroups_tests.cpp:399: Failure
Value of: disableResult.isSome()
  Actual: false
Expected: true
[  FAILED  ] CgroupsTest.ROOT_CGROUPS_ListenEvent (6 ms)
[ RUN      ] CgroupsTest.ROOT_CGROUPS_Freezer
F0910 15:51:02.002341  2133 cgroups.cpp:1114] Unexpected state: THAWED
*** Check failure stack trace: ***
    @     0x7fc23a06e7ed  google::LogMessage::Fail()
    @     0x7fc23a07205f  google::LogMessage::SendToLog()
    @     0x7fc23a071597  google::LogMessage::Flush()
    @     0x7fc23a07253d  google::LogMessageFatal::~LogMessageFatal()
    @     0x7fc239eb7f44  cgroups::internal::Freezer::watchFrozen()
    @     0x7fc239eb70a1  cgroups::internal::Freezer::freeze()
    @     0x7fc239eb6d74  cgroups::internal::Freezer::initialize()
    @     0x7fc239f80a4c  process::ProcessManager::resume()
    @     0x7fc239f781df  process::schedule()
    @     0x7fc23898f9ca  start_thread
    @     0x7fc2386eccdd  (unknown)
Aborted


Also, this results in a hung test process, which I'm unable to kill?

root@ubuntu:/home/vinod/workspace/mesos/build# ps aux | grep mesos
root      2137  0.0  0.2  79244  2856 pts/0    D    15:51   0:00 /home/vinod/workspace/mesos/build/src/.libs/lt-mesos-tests --gtest_filter=*Cgroups* --verbose
root      2147  0.0  0.0   7624   936 pts/0    S+   15:51   0:00 grep --color=auto mesos

root@ubuntu:/home/vinod/workspace/mesos/build# kill -9 2137

root@ubuntu:/home/vinod/workspace/mesos/build# ps aux | grep mesos
root      2137  0.0  0.2  79244  2856 pts/0    D    15:51   0:00 /home/vinod/workspace/mesos/build/src/.libs/lt-mesos-tests --gtest_filter=*Cgroups* --verbose
root      2161  0.0  0.0   7624   936 pts/0    S+   15:54   0:00 grep --color=auto mesos


- Vinod


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6863/#review10925
-----------------------------------------------------------


On Sept. 10, 2012, 9:39 p.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/6863/
> -----------------------------------------------------------
> 
> (Updated Sept. 10, 2012, 9:39 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Brian Wickman, Jie Yu, and Ben Mahler.
> 
> 
> Description
> -------
> 
> This is a short term fix.
> 
> The basic idea is that we split the launcher->run() into setup() and launch(), and assign the executor process to cgroup after the launch()
> 
> 
> Diffs
> -----
> 
>   src/launcher/launcher.hpp 89cfa8c 
>   src/launcher/launcher.cpp 5267ac2 
>   src/slave/cgroups_isolation_module.cpp 8a121e0 
>   third_party/libprocess/include/stout/os.hpp 602db1f 
> 
> Diff: https://reviews.apache.org/r/6863/diff/
> 
> 
> Testing
> -------
> 
> make check succeeds on linux
> 
> but, the cgroups tests seemed to be disabled on my nest machine. are there any flags that i should use? i suspect the reason for the disabled tests on linux is because i run bootstrap on my mac and build the bootstrapped branch on nest.
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Short term fix for ignoring executor resources during launch

Posted by Jie Yu <yu...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6863/#review10925
-----------------------------------------------------------

Ship it!


I don't think you will be able to run the test on a nest machine as cgroups code requires root permission.

You may want to install a virtual machine using vmware fusion.

The ubuntu distribution I used was here:
http://releases.ubuntu.com/lucid/ubuntu-10.04.4-desktop-amd64.iso

For a temp fix, I think this one is OK. The only concern I have is that the download still happens in the child process. If the OS does not reclaim the memory promptly, the assignTask might fail because you try to assign a process that uses more memory than the memory limit of the target cgroup.

For the long term, I think we can refactor the ExecutorLauncher so that we always download executors in parent, and keep the logic in child process as simple as possible.

- Jie Yu


On Aug. 30, 2012, 9:25 p.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/6863/
> -----------------------------------------------------------
> 
> (Updated Aug. 30, 2012, 9:25 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Brian Wickman, Jie Yu, and Ben Mahler.
> 
> 
> Description
> -------
> 
> This is a short term fix.
> 
> The basic idea is that we split the launcher->run() into setup() and launch(), and assign the executor process to cgroup after the launch()
> 
> 
> Diffs
> -----
> 
>   src/launcher/launcher.hpp 89cfa8c 
>   src/launcher/launcher.cpp 5267ac2 
>   src/slave/cgroups_isolation_module.cpp 8a121e0 
>   third_party/libprocess/include/stout/os.hpp ffe56b6 
> 
> Diff: https://reviews.apache.org/r/6863/diff/
> 
> 
> Testing
> -------
> 
> make check succeeds on linux
> 
> but, the cgroups tests seemed to be disabled on my nest machine. are there any flags that i should use? i suspect the reason for the disabled tests on linux is because i run bootstrap on my mac and build the bootstrapped branch on nest.
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Short term fix for ignoring executor resources during launch

Posted by Vinod Kone <vi...@gmail.com>.

> On Sept. 9, 2012, 11:01 p.m., Benjamin Hindman wrote:
> > src/launcher/launcher.cpp, line 84
> > <https://reviews.apache.org/r/6863/diff/2/?file=151318#file151318line84>
> >
> >     This now kills the slave (for the cgroups module) in the event something weird is going on. I agree that not being able to chdir is a rather serious issue but I'd rather not see us killing the slave in this circumstance (i.e., return bool/int here).

good call, fixed.


> On Sept. 9, 2012, 11:01 p.m., Benjamin Hindman wrote:
> > src/launcher/launcher.cpp, line 87
> > <https://reviews.apache.org/r/6863/diff/2/?file=151318#file151318line87>
> >
> >     My guess is this has some fatalerror issues as well? This one is loads of serious because a framework might have just given us a bad executor URI and we don't want to kill a slave for that.

done


- Vinod


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6863/#review11222
-----------------------------------------------------------


On Sept. 7, 2012, 9:51 p.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/6863/
> -----------------------------------------------------------
> 
> (Updated Sept. 7, 2012, 9:51 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Brian Wickman, Jie Yu, and Ben Mahler.
> 
> 
> Description
> -------
> 
> This is a short term fix.
> 
> The basic idea is that we split the launcher->run() into setup() and launch(), and assign the executor process to cgroup after the launch()
> 
> 
> Diffs
> -----
> 
>   src/launcher/launcher.hpp 89cfa8c 
>   src/launcher/launcher.cpp 5267ac2 
>   src/slave/cgroups_isolation_module.cpp 8a121e0 
>   third_party/libprocess/include/stout/os.hpp df0f7ff 
> 
> Diff: https://reviews.apache.org/r/6863/diff/
> 
> 
> Testing
> -------
> 
> make check succeeds on linux
> 
> but, the cgroups tests seemed to be disabled on my nest machine. are there any flags that i should use? i suspect the reason for the disabled tests on linux is because i run bootstrap on my mac and build the bootstrapped branch on nest.
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Short term fix for ignoring executor resources during launch

Posted by Benjamin Hindman <be...@berkeley.edu>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6863/#review11222
-----------------------------------------------------------



src/launcher/launcher.cpp
<https://reviews.apache.org/r/6863/#comment24153>

    Not your bug, but please add a period at the end of comments you come across that don't have them. Thanks!



src/launcher/launcher.cpp
<https://reviews.apache.org/r/6863/#comment24154>

    This now kills the slave (for the cgroups module) in the event something weird is going on. I agree that not being able to chdir is a rather serious issue but I'd rather not see us killing the slave in this circumstance (i.e., return bool/int here).



src/launcher/launcher.cpp
<https://reviews.apache.org/r/6863/#comment24157>

    My guess is this has some fatalerror issues as well? This one is loads of serious because a framework might have just given us a bad executor URI and we don't want to kill a slave for that.



src/launcher/launcher.cpp
<https://reviews.apache.org/r/6863/#comment24155>

    Ditto above.



src/launcher/launcher.cpp
<https://reviews.apache.org/r/6863/#comment24156>

    Likewise, now this could kill the slave. Do me a favor and just kill initializeWorkingDirectory all together and inline this code into setup. Again, not your bug, but thanks!


- Benjamin Hindman


On Sept. 7, 2012, 9:51 p.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/6863/
> -----------------------------------------------------------
> 
> (Updated Sept. 7, 2012, 9:51 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Brian Wickman, Jie Yu, and Ben Mahler.
> 
> 
> Description
> -------
> 
> This is a short term fix.
> 
> The basic idea is that we split the launcher->run() into setup() and launch(), and assign the executor process to cgroup after the launch()
> 
> 
> Diffs
> -----
> 
>   src/launcher/launcher.hpp 89cfa8c 
>   src/launcher/launcher.cpp 5267ac2 
>   src/slave/cgroups_isolation_module.cpp 8a121e0 
>   third_party/libprocess/include/stout/os.hpp df0f7ff 
> 
> Diff: https://reviews.apache.org/r/6863/diff/
> 
> 
> Testing
> -------
> 
> make check succeeds on linux
> 
> but, the cgroups tests seemed to be disabled on my nest machine. are there any flags that i should use? i suspect the reason for the disabled tests on linux is because i run bootstrap on my mac and build the bootstrapped branch on nest.
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Short term fix for ignoring executor resources during launch

Posted by Benjamin Hindman <be...@berkeley.edu>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6863/#review11307
-----------------------------------------------------------

Ship it!


Ship It!

- Benjamin Hindman


On Sept. 10, 2012, 9:39 p.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/6863/
> -----------------------------------------------------------
> 
> (Updated Sept. 10, 2012, 9:39 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Brian Wickman, Jie Yu, and Ben Mahler.
> 
> 
> Description
> -------
> 
> This is a short term fix.
> 
> The basic idea is that we split the launcher->run() into setup() and launch(), and assign the executor process to cgroup after the launch()
> 
> 
> Diffs
> -----
> 
>   src/launcher/launcher.hpp 89cfa8c 
>   src/launcher/launcher.cpp 5267ac2 
>   src/slave/cgroups_isolation_module.cpp 8a121e0 
>   third_party/libprocess/include/stout/os.hpp 602db1f 
> 
> Diff: https://reviews.apache.org/r/6863/diff/
> 
> 
> Testing
> -------
> 
> make check succeeds on linux
> 
> but, the cgroups tests seemed to be disabled on my nest machine. are there any flags that i should use? i suspect the reason for the disabled tests on linux is because i run bootstrap on my mac and build the bootstrapped branch on nest.
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Short term fix for ignoring executor resources during launch

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6863/
-----------------------------------------------------------

(Updated Sept. 10, 2012, 9:39 p.m.)


Review request for mesos, Benjamin Hindman, Brian Wickman, Jie Yu, and Ben Mahler.


Changes
-------

Ben's comments.

diff'ed against apache/trunk


Description
-------

This is a short term fix.

The basic idea is that we split the launcher->run() into setup() and launch(), and assign the executor process to cgroup after the launch()


Diffs (updated)
-----

  src/launcher/launcher.hpp 89cfa8c 
  src/launcher/launcher.cpp 5267ac2 
  src/slave/cgroups_isolation_module.cpp 8a121e0 
  third_party/libprocess/include/stout/os.hpp 602db1f 

Diff: https://reviews.apache.org/r/6863/diff/


Testing
-------

make check succeeds on linux

but, the cgroups tests seemed to be disabled on my nest machine. are there any flags that i should use? i suspect the reason for the disabled tests on linux is because i run bootstrap on my mac and build the bootstrapped branch on nest.


Thanks,

Vinod Kone


Re: Review Request: Short term fix for ignoring executor resources during launch

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6863/
-----------------------------------------------------------

(Updated Sept. 7, 2012, 9:51 p.m.)


Review request for mesos, Benjamin Hindman, Brian Wickman, Jie Yu, and Ben Mahler.


Changes
-------

a sneak peak before i actually test this on (currently being installed vm) linux


Description
-------

This is a short term fix.

The basic idea is that we split the launcher->run() into setup() and launch(), and assign the executor process to cgroup after the launch()


Diffs (updated)
-----

  src/launcher/launcher.hpp 89cfa8c 
  src/launcher/launcher.cpp 5267ac2 
  src/slave/cgroups_isolation_module.cpp 8a121e0 
  third_party/libprocess/include/stout/os.hpp df0f7ff 

Diff: https://reviews.apache.org/r/6863/diff/


Testing
-------

make check succeeds on linux

but, the cgroups tests seemed to be disabled on my nest machine. are there any flags that i should use? i suspect the reason for the disabled tests on linux is because i run bootstrap on my mac and build the bootstrapped branch on nest.


Thanks,

Vinod Kone