You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by Vinod Kone <vi...@gmail.com> on 2012/08/30 23:25:35 UTC
Review Request: Short term fix for ignoring executor resources during launch
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6863/
-----------------------------------------------------------
Review request for mesos, Benjamin Hindman, Brian Wickman, Jie Yu, and Ben Mahler.
Description
-------
This is a short term fix.
The basic idea is that we split the launcher->run() into setup() and launch(), and assign the executor process to cgroup after the launch()
Diffs
-----
src/launcher/launcher.hpp 89cfa8c
src/launcher/launcher.cpp 5267ac2
src/slave/cgroups_isolation_module.cpp 8a121e0
third_party/libprocess/include/stout/os.hpp ffe56b6
Diff: https://reviews.apache.org/r/6863/diff/
Testing
-------
make check succeeds on linux
but, the cgroups tests seemed to be disabled on my nest machine. are there any flags that i should use? i suspect the reason for the disabled tests on linux is because i run bootstrap on my mac and build the bootstrapped branch on nest.
Thanks,
Vinod Kone
Re: Review Request: Short term fix for ignoring executor resources during
launch
Posted by Vinod Kone <vi...@gmail.com>.
> On Aug. 31, 2012, 1:45 a.m., Jie Yu wrote:
> > I don't think you will be able to run the test on a nest machine as cgroups code requires root permission.
> >
> > You may want to install a virtual machine using vmware fusion.
> >
> > The ubuntu distribution I used was here:
> > http://releases.ubuntu.com/lucid/ubuntu-10.04.4-desktop-amd64.iso
> >
> > For a temp fix, I think this one is OK. The only concern I have is that the download still happens in the child process. If the OS does not reclaim the memory promptly, the assignTask might fail because you try to assign a process that uses more memory than the memory limit of the target cgroup.
> >
> > For the long term, I think we can refactor the ExecutorLauncher so that we always download executors in parent, and keep the logic in child process as simple as possible.
finally, was able to run Cgroups tests on my vm.
I'm getting the following error (below), which I think is unrelated to my change.
root@ubuntu:/home/vinod/workspace/mesos/build# GLOG_v=1 ./bin/mesos-tests.sh --gtest_filter="*Cgroups*" --verbose
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0910 15:51:01.905972 2118 process.cpp:1395] libprocess is initialized on 127.0.1.1:45161 for 1 cpus
I0910 15:51:01.906347 2118 logging.cpp:185] Logging to STDERR
Source directory: /home/vinod/workspace/mesos
Build directory: /home/vinod/workspace/mesos/build
Note: Google Test filter = *Cgroups*-
[==========] Running 17 tests from 3 test cases.
[----------] Global test environment set-up.
[----------] 4 tests from CgroupsSimpleTest
[ RUN ] CgroupsSimpleTest.ROOT_CGROUPS_Enabled
[ OK ] CgroupsSimpleTest.ROOT_CGROUPS_Enabled (0 ms)
[ RUN ] CgroupsSimpleTest.ROOT_CGROUPS_Subsystems
[ OK ] CgroupsSimpleTest.ROOT_CGROUPS_Subsystems (1 ms)
[ RUN ] CgroupsSimpleTest.ROOT_CGROUPS_CreateRemoveHierarchy
[ OK ] CgroupsSimpleTest.ROOT_CGROUPS_CreateRemoveHierarchy (1 ms)
[ RUN ] CgroupsSimpleTest.ROOT_CGROUPS_CreateRemoveCgroup
[ OK ] CgroupsSimpleTest.ROOT_CGROUPS_CreateRemoveCgroup (3 ms)
[----------] 4 tests from CgroupsSimpleTest (5 ms total)
[----------] 12 tests from CgroupsTest
[ RUN ] CgroupsTest.ROOT_CGROUPS_Busy
[ OK ] CgroupsTest.ROOT_CGROUPS_Busy (15 ms)
[ RUN ] CgroupsTest.ROOT_CGROUPS_SubsystemsHierarchy
[ OK ] CgroupsTest.ROOT_CGROUPS_SubsystemsHierarchy (6 ms)
[ RUN ] CgroupsTest.ROOT_CGROUPS_CheckHierarchy
[ OK ] CgroupsTest.ROOT_CGROUPS_CheckHierarchy (4 ms)
[ RUN ] CgroupsTest.ROOT_CGROUPS_CheckHierarchySubsystems
[ OK ] CgroupsTest.ROOT_CGROUPS_CheckHierarchySubsystems (22 ms)
[ RUN ] CgroupsTest.ROOT_CGROUPS_ReadControl
[ OK ] CgroupsTest.ROOT_CGROUPS_ReadControl (5 ms)
[ RUN ] CgroupsTest.ROOT_CGROUPS_WriteControl
[ OK ] CgroupsTest.ROOT_CGROUPS_WriteControl (6 ms)
[ RUN ] CgroupsTest.ROOT_CGROUPS_GetCgroups
[ OK ] CgroupsTest.ROOT_CGROUPS_GetCgroups (6 ms)
[ RUN ] CgroupsTest.ROOT_CGROUPS_GetTasks
[ OK ] CgroupsTest.ROOT_CGROUPS_GetTasks (12 ms)
[ RUN ] CgroupsTest.ROOT_CGROUPS_ListenEvent
../../src/tests/cgroups_tests.cpp:399: Failure
Value of: disableResult.isSome()
Actual: false
Expected: true
[ FAILED ] CgroupsTest.ROOT_CGROUPS_ListenEvent (6 ms)
[ RUN ] CgroupsTest.ROOT_CGROUPS_Freezer
F0910 15:51:02.002341 2133 cgroups.cpp:1114] Unexpected state: THAWED
*** Check failure stack trace: ***
@ 0x7fc23a06e7ed google::LogMessage::Fail()
@ 0x7fc23a07205f google::LogMessage::SendToLog()
@ 0x7fc23a071597 google::LogMessage::Flush()
@ 0x7fc23a07253d google::LogMessageFatal::~LogMessageFatal()
@ 0x7fc239eb7f44 cgroups::internal::Freezer::watchFrozen()
@ 0x7fc239eb70a1 cgroups::internal::Freezer::freeze()
@ 0x7fc239eb6d74 cgroups::internal::Freezer::initialize()
@ 0x7fc239f80a4c process::ProcessManager::resume()
@ 0x7fc239f781df process::schedule()
@ 0x7fc23898f9ca start_thread
@ 0x7fc2386eccdd (unknown)
Aborted
Also, this results in a hung test process, which I'm unable to kill?
root@ubuntu:/home/vinod/workspace/mesos/build# ps aux | grep mesos
root 2137 0.0 0.2 79244 2856 pts/0 D 15:51 0:00 /home/vinod/workspace/mesos/build/src/.libs/lt-mesos-tests --gtest_filter=*Cgroups* --verbose
root 2147 0.0 0.0 7624 936 pts/0 S+ 15:51 0:00 grep --color=auto mesos
root@ubuntu:/home/vinod/workspace/mesos/build# kill -9 2137
root@ubuntu:/home/vinod/workspace/mesos/build# ps aux | grep mesos
root 2137 0.0 0.2 79244 2856 pts/0 D 15:51 0:00 /home/vinod/workspace/mesos/build/src/.libs/lt-mesos-tests --gtest_filter=*Cgroups* --verbose
root 2161 0.0 0.0 7624 936 pts/0 S+ 15:54 0:00 grep --color=auto mesos
- Vinod
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6863/#review10925
-----------------------------------------------------------
On Sept. 10, 2012, 9:39 p.m., Vinod Kone wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/6863/
> -----------------------------------------------------------
>
> (Updated Sept. 10, 2012, 9:39 p.m.)
>
>
> Review request for mesos, Benjamin Hindman, Brian Wickman, Jie Yu, and Ben Mahler.
>
>
> Description
> -------
>
> This is a short term fix.
>
> The basic idea is that we split the launcher->run() into setup() and launch(), and assign the executor process to cgroup after the launch()
>
>
> Diffs
> -----
>
> src/launcher/launcher.hpp 89cfa8c
> src/launcher/launcher.cpp 5267ac2
> src/slave/cgroups_isolation_module.cpp 8a121e0
> third_party/libprocess/include/stout/os.hpp 602db1f
>
> Diff: https://reviews.apache.org/r/6863/diff/
>
>
> Testing
> -------
>
> make check succeeds on linux
>
> but, the cgroups tests seemed to be disabled on my nest machine. are there any flags that i should use? i suspect the reason for the disabled tests on linux is because i run bootstrap on my mac and build the bootstrapped branch on nest.
>
>
> Thanks,
>
> Vinod Kone
>
>
Re: Review Request: Short term fix for ignoring executor resources during
launch
Posted by Jie Yu <yu...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6863/#review10925
-----------------------------------------------------------
Ship it!
I don't think you will be able to run the test on a nest machine as cgroups code requires root permission.
You may want to install a virtual machine using vmware fusion.
The ubuntu distribution I used was here:
http://releases.ubuntu.com/lucid/ubuntu-10.04.4-desktop-amd64.iso
For a temp fix, I think this one is OK. The only concern I have is that the download still happens in the child process. If the OS does not reclaim the memory promptly, the assignTask might fail because you try to assign a process that uses more memory than the memory limit of the target cgroup.
For the long term, I think we can refactor the ExecutorLauncher so that we always download executors in parent, and keep the logic in child process as simple as possible.
- Jie Yu
On Aug. 30, 2012, 9:25 p.m., Vinod Kone wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/6863/
> -----------------------------------------------------------
>
> (Updated Aug. 30, 2012, 9:25 p.m.)
>
>
> Review request for mesos, Benjamin Hindman, Brian Wickman, Jie Yu, and Ben Mahler.
>
>
> Description
> -------
>
> This is a short term fix.
>
> The basic idea is that we split the launcher->run() into setup() and launch(), and assign the executor process to cgroup after the launch()
>
>
> Diffs
> -----
>
> src/launcher/launcher.hpp 89cfa8c
> src/launcher/launcher.cpp 5267ac2
> src/slave/cgroups_isolation_module.cpp 8a121e0
> third_party/libprocess/include/stout/os.hpp ffe56b6
>
> Diff: https://reviews.apache.org/r/6863/diff/
>
>
> Testing
> -------
>
> make check succeeds on linux
>
> but, the cgroups tests seemed to be disabled on my nest machine. are there any flags that i should use? i suspect the reason for the disabled tests on linux is because i run bootstrap on my mac and build the bootstrapped branch on nest.
>
>
> Thanks,
>
> Vinod Kone
>
>
Re: Review Request: Short term fix for ignoring executor resources during
launch
Posted by Vinod Kone <vi...@gmail.com>.
> On Sept. 9, 2012, 11:01 p.m., Benjamin Hindman wrote:
> > src/launcher/launcher.cpp, line 84
> > <https://reviews.apache.org/r/6863/diff/2/?file=151318#file151318line84>
> >
> > This now kills the slave (for the cgroups module) in the event something weird is going on. I agree that not being able to chdir is a rather serious issue but I'd rather not see us killing the slave in this circumstance (i.e., return bool/int here).
good call, fixed.
> On Sept. 9, 2012, 11:01 p.m., Benjamin Hindman wrote:
> > src/launcher/launcher.cpp, line 87
> > <https://reviews.apache.org/r/6863/diff/2/?file=151318#file151318line87>
> >
> > My guess is this has some fatalerror issues as well? This one is loads of serious because a framework might have just given us a bad executor URI and we don't want to kill a slave for that.
done
- Vinod
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6863/#review11222
-----------------------------------------------------------
On Sept. 7, 2012, 9:51 p.m., Vinod Kone wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/6863/
> -----------------------------------------------------------
>
> (Updated Sept. 7, 2012, 9:51 p.m.)
>
>
> Review request for mesos, Benjamin Hindman, Brian Wickman, Jie Yu, and Ben Mahler.
>
>
> Description
> -------
>
> This is a short term fix.
>
> The basic idea is that we split the launcher->run() into setup() and launch(), and assign the executor process to cgroup after the launch()
>
>
> Diffs
> -----
>
> src/launcher/launcher.hpp 89cfa8c
> src/launcher/launcher.cpp 5267ac2
> src/slave/cgroups_isolation_module.cpp 8a121e0
> third_party/libprocess/include/stout/os.hpp df0f7ff
>
> Diff: https://reviews.apache.org/r/6863/diff/
>
>
> Testing
> -------
>
> make check succeeds on linux
>
> but, the cgroups tests seemed to be disabled on my nest machine. are there any flags that i should use? i suspect the reason for the disabled tests on linux is because i run bootstrap on my mac and build the bootstrapped branch on nest.
>
>
> Thanks,
>
> Vinod Kone
>
>
Re: Review Request: Short term fix for ignoring executor resources during
launch
Posted by Benjamin Hindman <be...@berkeley.edu>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6863/#review11222
-----------------------------------------------------------
src/launcher/launcher.cpp
<https://reviews.apache.org/r/6863/#comment24153>
Not your bug, but please add a period at the end of comments you come across that don't have them. Thanks!
src/launcher/launcher.cpp
<https://reviews.apache.org/r/6863/#comment24154>
This now kills the slave (for the cgroups module) in the event something weird is going on. I agree that not being able to chdir is a rather serious issue but I'd rather not see us killing the slave in this circumstance (i.e., return bool/int here).
src/launcher/launcher.cpp
<https://reviews.apache.org/r/6863/#comment24157>
My guess is this has some fatalerror issues as well? This one is loads of serious because a framework might have just given us a bad executor URI and we don't want to kill a slave for that.
src/launcher/launcher.cpp
<https://reviews.apache.org/r/6863/#comment24155>
Ditto above.
src/launcher/launcher.cpp
<https://reviews.apache.org/r/6863/#comment24156>
Likewise, now this could kill the slave. Do me a favor and just kill initializeWorkingDirectory all together and inline this code into setup. Again, not your bug, but thanks!
- Benjamin Hindman
On Sept. 7, 2012, 9:51 p.m., Vinod Kone wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/6863/
> -----------------------------------------------------------
>
> (Updated Sept. 7, 2012, 9:51 p.m.)
>
>
> Review request for mesos, Benjamin Hindman, Brian Wickman, Jie Yu, and Ben Mahler.
>
>
> Description
> -------
>
> This is a short term fix.
>
> The basic idea is that we split the launcher->run() into setup() and launch(), and assign the executor process to cgroup after the launch()
>
>
> Diffs
> -----
>
> src/launcher/launcher.hpp 89cfa8c
> src/launcher/launcher.cpp 5267ac2
> src/slave/cgroups_isolation_module.cpp 8a121e0
> third_party/libprocess/include/stout/os.hpp df0f7ff
>
> Diff: https://reviews.apache.org/r/6863/diff/
>
>
> Testing
> -------
>
> make check succeeds on linux
>
> but, the cgroups tests seemed to be disabled on my nest machine. are there any flags that i should use? i suspect the reason for the disabled tests on linux is because i run bootstrap on my mac and build the bootstrapped branch on nest.
>
>
> Thanks,
>
> Vinod Kone
>
>
Re: Review Request: Short term fix for ignoring executor resources during
launch
Posted by Benjamin Hindman <be...@berkeley.edu>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6863/#review11307
-----------------------------------------------------------
Ship it!
Ship It!
- Benjamin Hindman
On Sept. 10, 2012, 9:39 p.m., Vinod Kone wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/6863/
> -----------------------------------------------------------
>
> (Updated Sept. 10, 2012, 9:39 p.m.)
>
>
> Review request for mesos, Benjamin Hindman, Brian Wickman, Jie Yu, and Ben Mahler.
>
>
> Description
> -------
>
> This is a short term fix.
>
> The basic idea is that we split the launcher->run() into setup() and launch(), and assign the executor process to cgroup after the launch()
>
>
> Diffs
> -----
>
> src/launcher/launcher.hpp 89cfa8c
> src/launcher/launcher.cpp 5267ac2
> src/slave/cgroups_isolation_module.cpp 8a121e0
> third_party/libprocess/include/stout/os.hpp 602db1f
>
> Diff: https://reviews.apache.org/r/6863/diff/
>
>
> Testing
> -------
>
> make check succeeds on linux
>
> but, the cgroups tests seemed to be disabled on my nest machine. are there any flags that i should use? i suspect the reason for the disabled tests on linux is because i run bootstrap on my mac and build the bootstrapped branch on nest.
>
>
> Thanks,
>
> Vinod Kone
>
>
Re: Review Request: Short term fix for ignoring executor resources during
launch
Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6863/
-----------------------------------------------------------
(Updated Sept. 10, 2012, 9:39 p.m.)
Review request for mesos, Benjamin Hindman, Brian Wickman, Jie Yu, and Ben Mahler.
Changes
-------
Ben's comments.
diff'ed against apache/trunk
Description
-------
This is a short term fix.
The basic idea is that we split the launcher->run() into setup() and launch(), and assign the executor process to cgroup after the launch()
Diffs (updated)
-----
src/launcher/launcher.hpp 89cfa8c
src/launcher/launcher.cpp 5267ac2
src/slave/cgroups_isolation_module.cpp 8a121e0
third_party/libprocess/include/stout/os.hpp 602db1f
Diff: https://reviews.apache.org/r/6863/diff/
Testing
-------
make check succeeds on linux
but, the cgroups tests seemed to be disabled on my nest machine. are there any flags that i should use? i suspect the reason for the disabled tests on linux is because i run bootstrap on my mac and build the bootstrapped branch on nest.
Thanks,
Vinod Kone
Re: Review Request: Short term fix for ignoring executor resources during
launch
Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6863/
-----------------------------------------------------------
(Updated Sept. 7, 2012, 9:51 p.m.)
Review request for mesos, Benjamin Hindman, Brian Wickman, Jie Yu, and Ben Mahler.
Changes
-------
a sneak peak before i actually test this on (currently being installed vm) linux
Description
-------
This is a short term fix.
The basic idea is that we split the launcher->run() into setup() and launch(), and assign the executor process to cgroup after the launch()
Diffs (updated)
-----
src/launcher/launcher.hpp 89cfa8c
src/launcher/launcher.cpp 5267ac2
src/slave/cgroups_isolation_module.cpp 8a121e0
third_party/libprocess/include/stout/os.hpp df0f7ff
Diff: https://reviews.apache.org/r/6863/diff/
Testing
-------
make check succeeds on linux
but, the cgroups tests seemed to be disabled on my nest machine. are there any flags that i should use? i suspect the reason for the disabled tests on linux is because i run bootstrap on my mac and build the bootstrapped branch on nest.
Thanks,
Vinod Kone