You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by Jie Yu <yu...@gmail.com> on 2018/06/26 05:36:57 UTC
Review Request 67737: Updated CNI slave recovery test.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67737/
-----------------------------------------------------------
Review request for mesos and Qian Zhang.
Bugs: MESOS-9025
https://issues.apache.org/jira/browse/MESOS-9025
Repository: mesos
Description
-------
Updated the test so that it can catch regression described in
MESOS-9025. This test fails without the fix for MESOS-9025 and passes
after the fix.
Diffs
-----
src/tests/containerizer/cni_isolator_tests.cpp b58a9caca136cfa42689159389bfdcb3f92f05ee
Diff: https://reviews.apache.org/r/67737/diff/1/
Testing
-------
sudo make check
Thanks,
Jie Yu
Re: Review Request 67737: Updated CNI slave recovery test.
Posted by Qian Zhang <zh...@gmail.com>.
> On June 26, 2018, 3:44 p.m., Qian Zhang wrote:
> > Do we still need to kill the task and wait for `TASK_KILLED`?
And is it possible for CNI DEL command gets called after `reregisterExecutorMessage` is received?
- Qian
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67737/#review205352
-----------------------------------------------------------
On June 26, 2018, 1:36 p.m., Jie Yu wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67737/
> -----------------------------------------------------------
>
> (Updated June 26, 2018, 1:36 p.m.)
>
>
> Review request for mesos and Qian Zhang.
>
>
> Bugs: MESOS-9025
> https://issues.apache.org/jira/browse/MESOS-9025
>
>
> Repository: mesos
>
>
> Description
> -------
>
> Updated the test so that it can catch regression described in
> MESOS-9025. This test fails without the fix for MESOS-9025 and passes
> after the fix.
>
>
> Diffs
> -----
>
> src/tests/containerizer/cni_isolator_tests.cpp b58a9caca136cfa42689159389bfdcb3f92f05ee
>
>
> Diff: https://reviews.apache.org/r/67737/diff/1/
>
>
> Testing
> -------
>
> sudo make check
>
>
> Thanks,
>
> Jie Yu
>
>
Re: Review Request 67737: Updated CNI slave recovery test.
Posted by Qian Zhang <zh...@gmail.com>.
> On June 26, 2018, 3:44 p.m., Qian Zhang wrote:
> > Do we still need to kill the task and wait for `TASK_KILLED`?
>
> Qian Zhang wrote:
> And is it possible for CNI DEL command gets called after `reregisterExecutorMessage` is received?
>
> Jie Yu wrote:
> It's not possible. recover containerizer should happen first before executors are allowed to re-register.
Yes, agent will recover containerizer first before executors are allowed to re-register, but CNI DEL command is called in an asynchronous way which means it is possible that CNI DEL command gets called after executor re-registers agent (if the CNI plugin need a bit long time to handle the DEL command). I tried the following to prove it.
1. Revert the patch https://reviews.apache.org/r/67728/ .
2. Modify the mock CNI plugin used in this test to slow it down a bit.
```
diff --git a/src/tests/containerizer/cni_isolator_tests.cpp b/src/tests/containerizer/cni_isolator_tests.cpp
index b282e1070..d266fcb40 100644
--- a/src/tests/containerizer/cni_isolator_tests.cpp
+++ b/src/tests/containerizer/cni_isolator_tests.cpp
@@ -521,6 +521,7 @@ TEST_F(CniIsolatorTest, ROOT_SlaveRecovery)
echo ' }'
echo '}'
else
+ sleep 0.1
touch %s
fi
)~",
```
And then this test will succeed. That means this test may not be enough to catch the regression described in MESOS-9025.
- Qian
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67737/#review205352
-----------------------------------------------------------
On June 26, 2018, 1:36 p.m., Jie Yu wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67737/
> -----------------------------------------------------------
>
> (Updated June 26, 2018, 1:36 p.m.)
>
>
> Review request for mesos and Qian Zhang.
>
>
> Bugs: MESOS-9025
> https://issues.apache.org/jira/browse/MESOS-9025
>
>
> Repository: mesos
>
>
> Description
> -------
>
> Updated the test so that it can catch regression described in
> MESOS-9025. This test fails without the fix for MESOS-9025 and passes
> after the fix.
>
>
> Diffs
> -----
>
> src/tests/containerizer/cni_isolator_tests.cpp b58a9caca136cfa42689159389bfdcb3f92f05ee
>
>
> Diff: https://reviews.apache.org/r/67737/diff/1/
>
>
> Testing
> -------
>
> sudo make check
>
>
> Thanks,
>
> Jie Yu
>
>
Re: Review Request 67737: Updated CNI slave recovery test.
Posted by Qian Zhang <zh...@gmail.com>.
> On June 26, 2018, 3:44 p.m., Qian Zhang wrote:
> > Do we still need to kill the task and wait for `TASK_KILLED`?
>
> Qian Zhang wrote:
> And is it possible for CNI DEL command gets called after `reregisterExecutorMessage` is received?
>
> Jie Yu wrote:
> It's not possible. recover containerizer should happen first before executors are allowed to re-register.
>
> Qian Zhang wrote:
> Yes, agent will recover containerizer first before executors are allowed to re-register, but CNI DEL command is called in an asynchronous way which means it is possible that CNI DEL command gets called after executor re-registers agent (if the CNI plugin need a bit long time to handle the DEL command). I tried the following to prove it.
> 1. Revert the patch https://reviews.apache.org/r/67728/ .
> 2. Modify the mock CNI plugin used in this test to slow it down a bit.
> ```
> diff --git a/src/tests/containerizer/cni_isolator_tests.cpp b/src/tests/containerizer/cni_isolator_tests.cpp
> index b282e1070..d266fcb40 100644
> --- a/src/tests/containerizer/cni_isolator_tests.cpp
> +++ b/src/tests/containerizer/cni_isolator_tests.cpp
> @@ -521,6 +521,7 @@ TEST_F(CniIsolatorTest, ROOT_SlaveRecovery)
> echo ' }'
> echo '}'
> else
> + sleep 0.1
> touch %s
> fi
> )~",
> ```
> And then this test will succeed. That means this test may not be enough to catch the regression described in MESOS-9025.
Created a ticket https://issues.apache.org/jira/browse/MESOS-9039 to fix the issue that I mentioned above.
- Qian
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67737/#review205352
-----------------------------------------------------------
On June 26, 2018, 1:36 p.m., Jie Yu wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67737/
> -----------------------------------------------------------
>
> (Updated June 26, 2018, 1:36 p.m.)
>
>
> Review request for mesos and Qian Zhang.
>
>
> Bugs: MESOS-9025
> https://issues.apache.org/jira/browse/MESOS-9025
>
>
> Repository: mesos
>
>
> Description
> -------
>
> Updated the test so that it can catch regression described in
> MESOS-9025. This test fails without the fix for MESOS-9025 and passes
> after the fix.
>
>
> Diffs
> -----
>
> src/tests/containerizer/cni_isolator_tests.cpp b58a9caca136cfa42689159389bfdcb3f92f05ee
>
>
> Diff: https://reviews.apache.org/r/67737/diff/1/
>
>
> Testing
> -------
>
> sudo make check
>
>
> Thanks,
>
> Jie Yu
>
>
Re: Review Request 67737: Updated CNI slave recovery test.
Posted by Jie Yu <yu...@gmail.com>.
> On June 26, 2018, 7:44 a.m., Qian Zhang wrote:
> > Do we still need to kill the task and wait for `TASK_KILLED`?
>
> Qian Zhang wrote:
> And is it possible for CNI DEL command gets called after `reregisterExecutorMessage` is received?
It's not possible. recover containerizer should happen first before executors are allowed to re-register.
- Jie
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67737/#review205352
-----------------------------------------------------------
On June 26, 2018, 5:36 a.m., Jie Yu wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67737/
> -----------------------------------------------------------
>
> (Updated June 26, 2018, 5:36 a.m.)
>
>
> Review request for mesos and Qian Zhang.
>
>
> Bugs: MESOS-9025
> https://issues.apache.org/jira/browse/MESOS-9025
>
>
> Repository: mesos
>
>
> Description
> -------
>
> Updated the test so that it can catch regression described in
> MESOS-9025. This test fails without the fix for MESOS-9025 and passes
> after the fix.
>
>
> Diffs
> -----
>
> src/tests/containerizer/cni_isolator_tests.cpp b58a9caca136cfa42689159389bfdcb3f92f05ee
>
>
> Diff: https://reviews.apache.org/r/67737/diff/1/
>
>
> Testing
> -------
>
> sudo make check
>
>
> Thanks,
>
> Jie Yu
>
>
Re: Review Request 67737: Updated CNI slave recovery test.
Posted by Qian Zhang <zh...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67737/#review205352
-----------------------------------------------------------
Ship it!
Do we still need to kill the task and wait for `TASK_KILLED`?
- Qian Zhang
On June 26, 2018, 1:36 p.m., Jie Yu wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67737/
> -----------------------------------------------------------
>
> (Updated June 26, 2018, 1:36 p.m.)
>
>
> Review request for mesos and Qian Zhang.
>
>
> Bugs: MESOS-9025
> https://issues.apache.org/jira/browse/MESOS-9025
>
>
> Repository: mesos
>
>
> Description
> -------
>
> Updated the test so that it can catch regression described in
> MESOS-9025. This test fails without the fix for MESOS-9025 and passes
> after the fix.
>
>
> Diffs
> -----
>
> src/tests/containerizer/cni_isolator_tests.cpp b58a9caca136cfa42689159389bfdcb3f92f05ee
>
>
> Diff: https://reviews.apache.org/r/67737/diff/1/
>
>
> Testing
> -------
>
> sudo make check
>
>
> Thanks,
>
> Jie Yu
>
>
Re: Review Request 67737: Updated CNI slave recovery test.
Posted by Mesos Reviewbot Windows <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67737/#review205350
-----------------------------------------------------------
PASS: Mesos patch 67737 was successfully built and tested.
Reviews applied: `['67728', '67737']`
All the build artifacts available at: http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/67737
- Mesos Reviewbot Windows
On June 26, 2018, 5:36 a.m., Jie Yu wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67737/
> -----------------------------------------------------------
>
> (Updated June 26, 2018, 5:36 a.m.)
>
>
> Review request for mesos and Qian Zhang.
>
>
> Bugs: MESOS-9025
> https://issues.apache.org/jira/browse/MESOS-9025
>
>
> Repository: mesos
>
>
> Description
> -------
>
> Updated the test so that it can catch regression described in
> MESOS-9025. This test fails without the fix for MESOS-9025 and passes
> after the fix.
>
>
> Diffs
> -----
>
> src/tests/containerizer/cni_isolator_tests.cpp b58a9caca136cfa42689159389bfdcb3f92f05ee
>
>
> Diff: https://reviews.apache.org/r/67737/diff/1/
>
>
> Testing
> -------
>
> sudo make check
>
>
> Thanks,
>
> Jie Yu
>
>
Re: Review Request 67737: Updated CNI slave recovery test.
Posted by Mesos Reviewbot <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67737/#review205353
-----------------------------------------------------------
Patch looks great!
Reviews applied: [67728, 67737]
Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' CONFIGURATION='--verbose --disable-libtool-wrappers' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1'; ./support/docker-build.sh
- Mesos Reviewbot
On June 26, 2018, 5:36 a.m., Jie Yu wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67737/
> -----------------------------------------------------------
>
> (Updated June 26, 2018, 5:36 a.m.)
>
>
> Review request for mesos and Qian Zhang.
>
>
> Bugs: MESOS-9025
> https://issues.apache.org/jira/browse/MESOS-9025
>
>
> Repository: mesos
>
>
> Description
> -------
>
> Updated the test so that it can catch regression described in
> MESOS-9025. This test fails without the fix for MESOS-9025 and passes
> after the fix.
>
>
> Diffs
> -----
>
> src/tests/containerizer/cni_isolator_tests.cpp b58a9caca136cfa42689159389bfdcb3f92f05ee
>
>
> Diff: https://reviews.apache.org/r/67737/diff/1/
>
>
> Testing
> -------
>
> sudo make check
>
>
> Thanks,
>
> Jie Yu
>
>