You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by Jie Yu <yu...@gmail.com> on 2018/06/26 05:36:57 UTC

Review Request 67737: Updated CNI slave recovery test.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67737/
-----------------------------------------------------------

Review request for mesos and Qian Zhang.


Bugs: MESOS-9025
    https://issues.apache.org/jira/browse/MESOS-9025


Repository: mesos


Description
-------

Updated the test so that it can catch regression described in
MESOS-9025. This test fails without the fix for MESOS-9025 and passes
after the fix.


Diffs
-----

  src/tests/containerizer/cni_isolator_tests.cpp b58a9caca136cfa42689159389bfdcb3f92f05ee 


Diff: https://reviews.apache.org/r/67737/diff/1/


Testing
-------

sudo make check


Thanks,

Jie Yu


Re: Review Request 67737: Updated CNI slave recovery test.

Posted by Qian Zhang <zh...@gmail.com>.

> On June 26, 2018, 3:44 p.m., Qian Zhang wrote:
> > Do we still need to kill the task and wait for `TASK_KILLED`?

And is it possible for CNI DEL command gets called after `reregisterExecutorMessage` is received?


- Qian


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67737/#review205352
-----------------------------------------------------------


On June 26, 2018, 1:36 p.m., Jie Yu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67737/
> -----------------------------------------------------------
> 
> (Updated June 26, 2018, 1:36 p.m.)
> 
> 
> Review request for mesos and Qian Zhang.
> 
> 
> Bugs: MESOS-9025
>     https://issues.apache.org/jira/browse/MESOS-9025
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Updated the test so that it can catch regression described in
> MESOS-9025. This test fails without the fix for MESOS-9025 and passes
> after the fix.
> 
> 
> Diffs
> -----
> 
>   src/tests/containerizer/cni_isolator_tests.cpp b58a9caca136cfa42689159389bfdcb3f92f05ee 
> 
> 
> Diff: https://reviews.apache.org/r/67737/diff/1/
> 
> 
> Testing
> -------
> 
> sudo make check
> 
> 
> Thanks,
> 
> Jie Yu
> 
>


Re: Review Request 67737: Updated CNI slave recovery test.

Posted by Qian Zhang <zh...@gmail.com>.

> On June 26, 2018, 3:44 p.m., Qian Zhang wrote:
> > Do we still need to kill the task and wait for `TASK_KILLED`?
> 
> Qian Zhang wrote:
>     And is it possible for CNI DEL command gets called after `reregisterExecutorMessage` is received?
> 
> Jie Yu wrote:
>     It's not possible. recover containerizer should happen first before executors are allowed to re-register.

Yes, agent will recover containerizer first before executors are allowed to re-register, but CNI DEL command is called in an asynchronous way which means it is possible that CNI DEL command gets called after executor re-registers agent (if the CNI plugin need a bit long time to handle the DEL command). I tried the following to prove it.
1. Revert the patch https://reviews.apache.org/r/67728/ .
2. Modify the mock CNI plugin used in this test to slow it down a bit.
```
diff --git a/src/tests/containerizer/cni_isolator_tests.cpp b/src/tests/containerizer/cni_isolator_tests.cpp
index b282e1070..d266fcb40 100644
--- a/src/tests/containerizer/cni_isolator_tests.cpp
+++ b/src/tests/containerizer/cni_isolator_tests.cpp
@@ -521,6 +521,7 @@ TEST_F(CniIsolatorTest, ROOT_SlaveRecovery)
         echo '  }'
         echo '}'
       else
+        sleep 0.1
         touch %s
       fi
       )~",
```
And then this test will succeed. That means this test may not be enough to catch the regression described in MESOS-9025.


- Qian


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67737/#review205352
-----------------------------------------------------------


On June 26, 2018, 1:36 p.m., Jie Yu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67737/
> -----------------------------------------------------------
> 
> (Updated June 26, 2018, 1:36 p.m.)
> 
> 
> Review request for mesos and Qian Zhang.
> 
> 
> Bugs: MESOS-9025
>     https://issues.apache.org/jira/browse/MESOS-9025
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Updated the test so that it can catch regression described in
> MESOS-9025. This test fails without the fix for MESOS-9025 and passes
> after the fix.
> 
> 
> Diffs
> -----
> 
>   src/tests/containerizer/cni_isolator_tests.cpp b58a9caca136cfa42689159389bfdcb3f92f05ee 
> 
> 
> Diff: https://reviews.apache.org/r/67737/diff/1/
> 
> 
> Testing
> -------
> 
> sudo make check
> 
> 
> Thanks,
> 
> Jie Yu
> 
>


Re: Review Request 67737: Updated CNI slave recovery test.

Posted by Qian Zhang <zh...@gmail.com>.

> On June 26, 2018, 3:44 p.m., Qian Zhang wrote:
> > Do we still need to kill the task and wait for `TASK_KILLED`?
> 
> Qian Zhang wrote:
>     And is it possible for CNI DEL command gets called after `reregisterExecutorMessage` is received?
> 
> Jie Yu wrote:
>     It's not possible. recover containerizer should happen first before executors are allowed to re-register.
> 
> Qian Zhang wrote:
>     Yes, agent will recover containerizer first before executors are allowed to re-register, but CNI DEL command is called in an asynchronous way which means it is possible that CNI DEL command gets called after executor re-registers agent (if the CNI plugin need a bit long time to handle the DEL command). I tried the following to prove it.
>     1. Revert the patch https://reviews.apache.org/r/67728/ .
>     2. Modify the mock CNI plugin used in this test to slow it down a bit.
>     ```
>     diff --git a/src/tests/containerizer/cni_isolator_tests.cpp b/src/tests/containerizer/cni_isolator_tests.cpp
>     index b282e1070..d266fcb40 100644
>     --- a/src/tests/containerizer/cni_isolator_tests.cpp
>     +++ b/src/tests/containerizer/cni_isolator_tests.cpp
>     @@ -521,6 +521,7 @@ TEST_F(CniIsolatorTest, ROOT_SlaveRecovery)
>              echo '  }'
>              echo '}'
>            else
>     +        sleep 0.1
>              touch %s
>            fi
>            )~",
>     ```
>     And then this test will succeed. That means this test may not be enough to catch the regression described in MESOS-9025.

Created a ticket https://issues.apache.org/jira/browse/MESOS-9039 to fix the issue that I mentioned above.


- Qian


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67737/#review205352
-----------------------------------------------------------


On June 26, 2018, 1:36 p.m., Jie Yu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67737/
> -----------------------------------------------------------
> 
> (Updated June 26, 2018, 1:36 p.m.)
> 
> 
> Review request for mesos and Qian Zhang.
> 
> 
> Bugs: MESOS-9025
>     https://issues.apache.org/jira/browse/MESOS-9025
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Updated the test so that it can catch regression described in
> MESOS-9025. This test fails without the fix for MESOS-9025 and passes
> after the fix.
> 
> 
> Diffs
> -----
> 
>   src/tests/containerizer/cni_isolator_tests.cpp b58a9caca136cfa42689159389bfdcb3f92f05ee 
> 
> 
> Diff: https://reviews.apache.org/r/67737/diff/1/
> 
> 
> Testing
> -------
> 
> sudo make check
> 
> 
> Thanks,
> 
> Jie Yu
> 
>


Re: Review Request 67737: Updated CNI slave recovery test.

Posted by Jie Yu <yu...@gmail.com>.

> On June 26, 2018, 7:44 a.m., Qian Zhang wrote:
> > Do we still need to kill the task and wait for `TASK_KILLED`?
> 
> Qian Zhang wrote:
>     And is it possible for CNI DEL command gets called after `reregisterExecutorMessage` is received?

It's not possible. recover containerizer should happen first before executors are allowed to re-register.


- Jie


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67737/#review205352
-----------------------------------------------------------


On June 26, 2018, 5:36 a.m., Jie Yu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67737/
> -----------------------------------------------------------
> 
> (Updated June 26, 2018, 5:36 a.m.)
> 
> 
> Review request for mesos and Qian Zhang.
> 
> 
> Bugs: MESOS-9025
>     https://issues.apache.org/jira/browse/MESOS-9025
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Updated the test so that it can catch regression described in
> MESOS-9025. This test fails without the fix for MESOS-9025 and passes
> after the fix.
> 
> 
> Diffs
> -----
> 
>   src/tests/containerizer/cni_isolator_tests.cpp b58a9caca136cfa42689159389bfdcb3f92f05ee 
> 
> 
> Diff: https://reviews.apache.org/r/67737/diff/1/
> 
> 
> Testing
> -------
> 
> sudo make check
> 
> 
> Thanks,
> 
> Jie Yu
> 
>


Re: Review Request 67737: Updated CNI slave recovery test.

Posted by Qian Zhang <zh...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67737/#review205352
-----------------------------------------------------------


Ship it!




Do we still need to kill the task and wait for `TASK_KILLED`?

- Qian Zhang


On June 26, 2018, 1:36 p.m., Jie Yu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67737/
> -----------------------------------------------------------
> 
> (Updated June 26, 2018, 1:36 p.m.)
> 
> 
> Review request for mesos and Qian Zhang.
> 
> 
> Bugs: MESOS-9025
>     https://issues.apache.org/jira/browse/MESOS-9025
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Updated the test so that it can catch regression described in
> MESOS-9025. This test fails without the fix for MESOS-9025 and passes
> after the fix.
> 
> 
> Diffs
> -----
> 
>   src/tests/containerizer/cni_isolator_tests.cpp b58a9caca136cfa42689159389bfdcb3f92f05ee 
> 
> 
> Diff: https://reviews.apache.org/r/67737/diff/1/
> 
> 
> Testing
> -------
> 
> sudo make check
> 
> 
> Thanks,
> 
> Jie Yu
> 
>


Re: Review Request 67737: Updated CNI slave recovery test.

Posted by Mesos Reviewbot Windows <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67737/#review205350
-----------------------------------------------------------



PASS: Mesos patch 67737 was successfully built and tested.

Reviews applied: `['67728', '67737']`

All the build artifacts available at: http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/67737

- Mesos Reviewbot Windows


On June 26, 2018, 5:36 a.m., Jie Yu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67737/
> -----------------------------------------------------------
> 
> (Updated June 26, 2018, 5:36 a.m.)
> 
> 
> Review request for mesos and Qian Zhang.
> 
> 
> Bugs: MESOS-9025
>     https://issues.apache.org/jira/browse/MESOS-9025
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Updated the test so that it can catch regression described in
> MESOS-9025. This test fails without the fix for MESOS-9025 and passes
> after the fix.
> 
> 
> Diffs
> -----
> 
>   src/tests/containerizer/cni_isolator_tests.cpp b58a9caca136cfa42689159389bfdcb3f92f05ee 
> 
> 
> Diff: https://reviews.apache.org/r/67737/diff/1/
> 
> 
> Testing
> -------
> 
> sudo make check
> 
> 
> Thanks,
> 
> Jie Yu
> 
>


Re: Review Request 67737: Updated CNI slave recovery test.

Posted by Mesos Reviewbot <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67737/#review205353
-----------------------------------------------------------



Patch looks great!

Reviews applied: [67728, 67737]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' CONFIGURATION='--verbose --disable-libtool-wrappers' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1'; ./support/docker-build.sh

- Mesos Reviewbot


On June 26, 2018, 5:36 a.m., Jie Yu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67737/
> -----------------------------------------------------------
> 
> (Updated June 26, 2018, 5:36 a.m.)
> 
> 
> Review request for mesos and Qian Zhang.
> 
> 
> Bugs: MESOS-9025
>     https://issues.apache.org/jira/browse/MESOS-9025
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Updated the test so that it can catch regression described in
> MESOS-9025. This test fails without the fix for MESOS-9025 and passes
> after the fix.
> 
> 
> Diffs
> -----
> 
>   src/tests/containerizer/cni_isolator_tests.cpp b58a9caca136cfa42689159389bfdcb3f92f05ee 
> 
> 
> Diff: https://reviews.apache.org/r/67737/diff/1/
> 
> 
> Testing
> -------
> 
> sudo make check
> 
> 
> Thanks,
> 
> Jie Yu
> 
>