You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by Qian Zhang <zh...@gmail.com> on 2017/09/06 13:00:41 UTC

Re: Review Request 62003: Added `network/ports` isolator nested container tests.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/62003/#review184660
-----------------------------------------------------------




src/tests/containerizer/ports_isolator_tests.cpp
Lines 910 (patched)
<https://reviews.apache.org/r/62003/#comment260853>

    Why do we need this flag?



src/tests/containerizer/ports_isolator_tests.cpp
Lines 967-968 (patched)
<https://reviews.apache.org/r/62003/#comment260851>

    I do not think we need this, by default the executor without `ContainerInfo` will be handled by Mesos Containerizer.



src/tests/containerizer/ports_isolator_tests.cpp
Lines 1051-1052 (patched)
<https://reviews.apache.org/r/62003/#comment260852>

    Can you please elaborate a bit about this? What do you mean for `the original nested container status gets swallowed`?



src/tests/containerizer/ports_isolator_tests.cpp
Lines 1240 (patched)
<https://reviews.apache.org/r/62003/#comment260855>

    Please add a comment to describe the purpose of this test.



src/tests/containerizer/ports_isolator_tests.cpp
Lines 1251-1254 (patched)
<https://reviews.apache.org/r/62003/#comment260854>

    I see you set this flag below when the agent is restarted, so I think we do not need to set it here since when the agent is started for the first time `network/ports` isolator is not enabled at all.


- Qian Zhang


On Aug. 31, 2017, 7:16 a.m., James Peach wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/62003/
> -----------------------------------------------------------
> 
> (Updated Aug. 31, 2017, 7:16 a.m.)
> 
> 
> Review request for mesos, Qian Zhang and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-7675
>     https://issues.apache.org/jira/browse/MESOS-7675
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Added `network/ports` isolator nested container tests using the v1
> TaskGroups API. This tests that rogue port usage by a nested task is
> detected both with and without agent recovery, and that a well-behaved
> task is preserved across agent recovery.
> 
> 
> Diffs
> -----
> 
>   src/tests/containerizer/ports_isolator_tests.cpp PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/62003/diff/1/
> 
> 
> Testing
> -------
> 
> make check (Fedora 26)
> 
> 
> Thanks,
> 
> James Peach
> 
>


Re: Review Request 62003: Added `network/ports` isolator nested container tests.

Posted by Qian Zhang <zh...@gmail.com>.

> On Sept. 6, 2017, 9 p.m., Qian Zhang wrote:
> > src/tests/containerizer/ports_isolator_tests.cpp
> > Lines 1051-1052 (patched)
> > <https://reviews.apache.org/r/62003/diff/1/?file=1808287#file1808287line1051>
> >
> >     Can you please elaborate a bit about this? What do you mean for `the original nested container status gets swallowed`?
> 
> James Peach wrote:
>     When we are not using nested containers, we get a `REASON_CONTAINER_LIMITATION` status update which comes directly from the isolator. However, when we use nested containers, we actually get a `REASON_COMMAND_EXECUTOR_FAILED` reason; the task status that comes from the isolator never goes anywhere.
> 
> Qian Zhang wrote:
>     > However, when we use nested containers, we actually get a REASON_COMMAND_EXECUTOR_FAILED reason; the task status that comes from the isolator never goes anywhere.
>     
>     Is it because there is only one nested container and when that nested container is destroyed by Mesos containerizer, the default executor will know it and terminate itself, so it has no chance to send REASON_CONTAINER_LIMITATION status for that nested container before it's self termination? If so, what if we launch a task group which has two tasks? when one of the task is destroyed due to the limitation, will we get a `REASON_CONTAINER_LIMITATION` status update for it?
> 
> James Peach wrote:
>     I can reproduce this with `mesos-execute` so I filed [MESOS-7963](https://issues.apache.org/jira/browse/MESOS-7963) to handle this.
> 
> James Peach wrote:
>     The nested container status gets lost [here](https://github.com/apache/mesos/blob/master/src/slave/http.cpp#L2496), so I think the right approach is to always trigger the resource limitation on the root container.

I agree the nested container status gets lost [here](https://github.com/apache/mesos/blob/1.3.1/src/slave/http.cpp#L2420), the protobuf message `WaitNestedContainer` only has one field `exit_status`, but the protobuf message `ContainerTermination` has some other fields (`reasons` and `message`) which are lost when convert `ContainerTermination` to `WaitNestedContainer` in those code.

So I think the right approach is to add more fields (`reasons` and `message`) in `WaitNestedContainer` and set them based on the corresponding fields in `ContainerTermination`, and then in `DefaultExecutor::waited()`, we should get `reasons` and `message` from `WaitNestedContainer` and set them in the task status update so that the scheduler can receive them.


- Qian


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/62003/#review184660
-----------------------------------------------------------


On Sept. 9, 2017, 6:44 a.m., James Peach wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/62003/
> -----------------------------------------------------------
> 
> (Updated Sept. 9, 2017, 6:44 a.m.)
> 
> 
> Review request for mesos, Qian Zhang and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-7675
>     https://issues.apache.org/jira/browse/MESOS-7675
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Added `network/ports` isolator nested container tests using the v1
> TaskGroups API. This tests that rogue port usage by a nested task is
> detected both with and without agent recovery, and that a well-behaved
> task is preserved across agent recovery.
> 
> 
> Diffs
> -----
> 
>   src/tests/containerizer/ports_isolator_tests.cpp PRE-CREATION 
>   src/tests/mesos.hpp 444c75763aea995708db0b17eafee2d22c912554 
> 
> 
> Diff: https://reviews.apache.org/r/62003/diff/4/
> 
> 
> Testing
> -------
> 
> make check (Fedora 26)
> 
> 
> Thanks,
> 
> James Peach
> 
>


Re: Review Request 62003: Added `network/ports` isolator nested container tests.

Posted by James Peach <jp...@apache.org>.

> On Sept. 6, 2017, 1 p.m., Qian Zhang wrote:
> > src/tests/containerizer/ports_isolator_tests.cpp
> > Lines 1051-1052 (patched)
> > <https://reviews.apache.org/r/62003/diff/1/?file=1808287#file1808287line1051>
> >
> >     Can you please elaborate a bit about this? What do you mean for `the original nested container status gets swallowed`?
> 
> James Peach wrote:
>     When we are not using nested containers, we get a `REASON_CONTAINER_LIMITATION` status update which comes directly from the isolator. However, when we use nested containers, we actually get a `REASON_COMMAND_EXECUTOR_FAILED` reason; the task status that comes from the isolator never goes anywhere.
> 
> Qian Zhang wrote:
>     > However, when we use nested containers, we actually get a REASON_COMMAND_EXECUTOR_FAILED reason; the task status that comes from the isolator never goes anywhere.
>     
>     Is it because there is only one nested container and when that nested container is destroyed by Mesos containerizer, the default executor will know it and terminate itself, so it has no chance to send REASON_CONTAINER_LIMITATION status for that nested container before it's self termination? If so, what if we launch a task group which has two tasks? when one of the task is destroyed due to the limitation, will we get a `REASON_CONTAINER_LIMITATION` status update for it?

I can reproduce this with `mesos-execute` so I filed [MESOS-7963](https://issues.apache.org/jira/browse/MESOS-7963) to handle this.


- James


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/62003/#review184660
-----------------------------------------------------------


On Sept. 8, 2017, 10:44 p.m., James Peach wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/62003/
> -----------------------------------------------------------
> 
> (Updated Sept. 8, 2017, 10:44 p.m.)
> 
> 
> Review request for mesos, Qian Zhang and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-7675
>     https://issues.apache.org/jira/browse/MESOS-7675
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Added `network/ports` isolator nested container tests using the v1
> TaskGroups API. This tests that rogue port usage by a nested task is
> detected both with and without agent recovery, and that a well-behaved
> task is preserved across agent recovery.
> 
> 
> Diffs
> -----
> 
>   src/tests/containerizer/ports_isolator_tests.cpp PRE-CREATION 
>   src/tests/mesos.hpp 444c75763aea995708db0b17eafee2d22c912554 
> 
> 
> Diff: https://reviews.apache.org/r/62003/diff/4/
> 
> 
> Testing
> -------
> 
> make check (Fedora 26)
> 
> 
> Thanks,
> 
> James Peach
> 
>


Re: Review Request 62003: Added `network/ports` isolator nested container tests.

Posted by James Peach <jp...@apache.org>.

> On Sept. 6, 2017, 1 p.m., Qian Zhang wrote:
> > src/tests/containerizer/ports_isolator_tests.cpp
> > Lines 1051-1052 (patched)
> > <https://reviews.apache.org/r/62003/diff/1/?file=1808287#file1808287line1051>
> >
> >     Can you please elaborate a bit about this? What do you mean for `the original nested container status gets swallowed`?
> 
> James Peach wrote:
>     When we are not using nested containers, we get a `REASON_CONTAINER_LIMITATION` status update which comes directly from the isolator. However, when we use nested containers, we actually get a `REASON_COMMAND_EXECUTOR_FAILED` reason; the task status that comes from the isolator never goes anywhere.
> 
> Qian Zhang wrote:
>     > However, when we use nested containers, we actually get a REASON_COMMAND_EXECUTOR_FAILED reason; the task status that comes from the isolator never goes anywhere.
>     
>     Is it because there is only one nested container and when that nested container is destroyed by Mesos containerizer, the default executor will know it and terminate itself, so it has no chance to send REASON_CONTAINER_LIMITATION status for that nested container before it's self termination? If so, what if we launch a task group which has two tasks? when one of the task is destroyed due to the limitation, will we get a `REASON_CONTAINER_LIMITATION` status update for it?
> 
> James Peach wrote:
>     I can reproduce this with `mesos-execute` so I filed [MESOS-7963](https://issues.apache.org/jira/browse/MESOS-7963) to handle this.

The nested container status gets lost [here](https://github.com/apache/mesos/blob/master/src/slave/http.cpp#L2496), so I think the right approach is to always trigger the resource limitation on the root container.


- James


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/62003/#review184660
-----------------------------------------------------------


On Sept. 8, 2017, 10:44 p.m., James Peach wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/62003/
> -----------------------------------------------------------
> 
> (Updated Sept. 8, 2017, 10:44 p.m.)
> 
> 
> Review request for mesos, Qian Zhang and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-7675
>     https://issues.apache.org/jira/browse/MESOS-7675
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Added `network/ports` isolator nested container tests using the v1
> TaskGroups API. This tests that rogue port usage by a nested task is
> detected both with and without agent recovery, and that a well-behaved
> task is preserved across agent recovery.
> 
> 
> Diffs
> -----
> 
>   src/tests/containerizer/ports_isolator_tests.cpp PRE-CREATION 
>   src/tests/mesos.hpp 444c75763aea995708db0b17eafee2d22c912554 
> 
> 
> Diff: https://reviews.apache.org/r/62003/diff/4/
> 
> 
> Testing
> -------
> 
> make check (Fedora 26)
> 
> 
> Thanks,
> 
> James Peach
> 
>


Re: Review Request 62003: Added `network/ports` isolator nested container tests.

Posted by Qian Zhang <zh...@gmail.com>.

> On Sept. 6, 2017, 9 p.m., Qian Zhang wrote:
> > src/tests/containerizer/ports_isolator_tests.cpp
> > Lines 1051-1052 (patched)
> > <https://reviews.apache.org/r/62003/diff/1/?file=1808287#file1808287line1051>
> >
> >     Can you please elaborate a bit about this? What do you mean for `the original nested container status gets swallowed`?
> 
> James Peach wrote:
>     When we are not using nested containers, we get a `REASON_CONTAINER_LIMITATION` status update which comes directly from the isolator. However, when we use nested containers, we actually get a `REASON_COMMAND_EXECUTOR_FAILED` reason; the task status that comes from the isolator never goes anywhere.

> However, when we use nested containers, we actually get a REASON_COMMAND_EXECUTOR_FAILED reason; the task status that comes from the isolator never goes anywhere.

Is it because there is only one nested container and when that nested container is destroyed by Mesos containerizer, the default executor will know it and terminate itself, so it has no chance to send REASON_CONTAINER_LIMITATION status for that nested container before it's self termination? If so, what if we launch a task group which has two tasks? when one of the task is destroyed due to the limitation, will we get a `REASON_CONTAINER_LIMITATION` status update for it?


- Qian


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/62003/#review184660
-----------------------------------------------------------


On Aug. 31, 2017, 7:16 a.m., James Peach wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/62003/
> -----------------------------------------------------------
> 
> (Updated Aug. 31, 2017, 7:16 a.m.)
> 
> 
> Review request for mesos, Qian Zhang and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-7675
>     https://issues.apache.org/jira/browse/MESOS-7675
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Added `network/ports` isolator nested container tests using the v1
> TaskGroups API. This tests that rogue port usage by a nested task is
> detected both with and without agent recovery, and that a well-behaved
> task is preserved across agent recovery.
> 
> 
> Diffs
> -----
> 
>   src/tests/containerizer/ports_isolator_tests.cpp PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/62003/diff/1/
> 
> 
> Testing
> -------
> 
> make check (Fedora 26)
> 
> 
> Thanks,
> 
> James Peach
> 
>


Re: Review Request 62003: Added `network/ports` isolator nested container tests.

Posted by James Peach <jp...@apache.org>.

> On Sept. 6, 2017, 1 p.m., Qian Zhang wrote:
> > src/tests/containerizer/ports_isolator_tests.cpp
> > Lines 1051-1052 (patched)
> > <https://reviews.apache.org/r/62003/diff/1/?file=1808287#file1808287line1051>
> >
> >     Can you please elaborate a bit about this? What do you mean for `the original nested container status gets swallowed`?

When we are not using nested containers, we get a `REASON_CONTAINER_LIMITATION` status update which comes directly from the isolator. However, when we use nested containers, we actually get a `REASON_COMMAND_EXECUTOR_FAILED` reason; the task status that comes from the isolator never goes anywhere.


- James


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/62003/#review184660
-----------------------------------------------------------


On Aug. 30, 2017, 11:16 p.m., James Peach wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/62003/
> -----------------------------------------------------------
> 
> (Updated Aug. 30, 2017, 11:16 p.m.)
> 
> 
> Review request for mesos, Qian Zhang and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-7675
>     https://issues.apache.org/jira/browse/MESOS-7675
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Added `network/ports` isolator nested container tests using the v1
> TaskGroups API. This tests that rogue port usage by a nested task is
> detected both with and without agent recovery, and that a well-behaved
> task is preserved across agent recovery.
> 
> 
> Diffs
> -----
> 
>   src/tests/containerizer/ports_isolator_tests.cpp PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/62003/diff/1/
> 
> 
> Testing
> -------
> 
> make check (Fedora 26)
> 
> 
> Thanks,
> 
> James Peach
> 
>