You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by Jie Yu <yu...@gmail.com> on 2018/06/25 19:00:02 UTC

Review Request 67728: Fixed orphan container cleanup issue in CNI isolator.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67728/
-----------------------------------------------------------

Review request for mesos, Qian Zhang and Sagar Patwardhan.


Bugs: MESOS-9025
    https://issues.apache.org/jira/browse/MESOS-9025


Repository: mesos


Description
-------

The bug was introduced in https://reviews.apache.org/r/65987/ when we
want to allow nested container to have a separate network namespace than
its parent (MESOS-8534).

The original code misses a `continue` statement after recovering regular
containers.

I am surprised that the unit tests didn't catch the bug. We'll add a new
unit test for catching the regression.


Diffs
-----

  src/slave/containerizer/mesos/isolators/network/cni/cni.cpp b6ad4fcf9f1596c07ddeb9bbb134f4619d189671 


Diff: https://reviews.apache.org/r/67728/diff/1/


Testing
-------

sudo make check


Thanks,

Jie Yu


Re: Review Request 67728: Fixed orphan container cleanup issue in CNI isolator.

Posted by Jie Yu <yu...@gmail.com>.

> On June 25, 2018, 7:55 p.m., Stéphane Cottin wrote:
> > src/slave/containerizer/mesos/isolators/network/cni/cni.cpp
> > Line 433 (original), 432 (patched)
> > <https://reviews.apache.org/r/67728/diff/1/?file=2044786#file2044786line433>
> >
> >     This test makes sense only if the containerID is removed from orphans during the _recover call. 
> >     
> >     I may miss something, but it seems _recover() only updates infos, not orphans.

The `containerIdToState.contains(containerId)` above makes sure that the else branch is for orphan containers (both known or unknown).

The `orphans` parameter to `recover` method are those containers that the containerizer knows about, but does not tie to any checkpointed executor or standalone container (this is called "known orphans").

So this check makes sure we only call `cleanup` for unknown orphans (containerizer does not know about, but isolator notices it from isolator perspective. this is possible for those pre-1.3 containers, or operator accidentally remove some checkpoint data in the runtime dir).


- Jie


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67728/#review205314
-----------------------------------------------------------


On June 25, 2018, 7 p.m., Jie Yu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67728/
> -----------------------------------------------------------
> 
> (Updated June 25, 2018, 7 p.m.)
> 
> 
> Review request for mesos, Qian Zhang and Sagar Patwardhan.
> 
> 
> Bugs: MESOS-9025
>     https://issues.apache.org/jira/browse/MESOS-9025
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> The bug was introduced in https://reviews.apache.org/r/65987/ when we
> want to allow nested container to have a separate network namespace than
> its parent (MESOS-8534).
> 
> The original code misses a `continue` statement after recovering regular
> containers.
> 
> I am surprised that the unit tests didn't catch the bug. We'll add a new
> unit test for catching the regression.
> 
> 
> Diffs
> -----
> 
>   src/slave/containerizer/mesos/isolators/network/cni/cni.cpp b6ad4fcf9f1596c07ddeb9bbb134f4619d189671 
> 
> 
> Diff: https://reviews.apache.org/r/67728/diff/1/
> 
> 
> Testing
> -------
> 
> sudo make check
> 
> 
> Thanks,
> 
> Jie Yu
> 
>


Re: Review Request 67728: Fixed orphan container cleanup issue in CNI isolator.

Posted by Stéphane Cottin via Review Board <no...@reviews.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67728/#review205314
-----------------------------------------------------------




src/slave/containerizer/mesos/isolators/network/cni/cni.cpp
Line 433 (original), 432 (patched)
<https://reviews.apache.org/r/67728/#comment288244>

    This test makes sense only if the containerID is removed from orphans during the _recover call. 
    
    I may miss something, but it seems _recover() only updates infos, not orphans.


- Stéphane Cottin


On June 25, 2018, 7 p.m., Jie Yu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67728/
> -----------------------------------------------------------
> 
> (Updated June 25, 2018, 7 p.m.)
> 
> 
> Review request for mesos, Qian Zhang and Sagar Patwardhan.
> 
> 
> Bugs: MESOS-9025
>     https://issues.apache.org/jira/browse/MESOS-9025
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> The bug was introduced in https://reviews.apache.org/r/65987/ when we
> want to allow nested container to have a separate network namespace than
> its parent (MESOS-8534).
> 
> The original code misses a `continue` statement after recovering regular
> containers.
> 
> I am surprised that the unit tests didn't catch the bug. We'll add a new
> unit test for catching the regression.
> 
> 
> Diffs
> -----
> 
>   src/slave/containerizer/mesos/isolators/network/cni/cni.cpp b6ad4fcf9f1596c07ddeb9bbb134f4619d189671 
> 
> 
> Diff: https://reviews.apache.org/r/67728/diff/1/
> 
> 
> Testing
> -------
> 
> sudo make check
> 
> 
> Thanks,
> 
> Jie Yu
> 
>


Re: Review Request 67728: Fixed orphan container cleanup issue in CNI isolator.

Posted by Qian Zhang <zh...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67728/#review205339
-----------------------------------------------------------


Ship it!




Ship It!

- Qian Zhang


On June 26, 2018, 3 a.m., Jie Yu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67728/
> -----------------------------------------------------------
> 
> (Updated June 26, 2018, 3 a.m.)
> 
> 
> Review request for mesos, Qian Zhang and Sagar Patwardhan.
> 
> 
> Bugs: MESOS-9025
>     https://issues.apache.org/jira/browse/MESOS-9025
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> The bug was introduced in https://reviews.apache.org/r/65987/ when we
> want to allow nested container to have a separate network namespace than
> its parent (MESOS-8534).
> 
> The original code misses a `continue` statement after recovering regular
> containers.
> 
> I am surprised that the unit tests didn't catch the bug. We'll add a new
> unit test for catching the regression.
> 
> 
> Diffs
> -----
> 
>   src/slave/containerizer/mesos/isolators/network/cni/cni.cpp b6ad4fcf9f1596c07ddeb9bbb134f4619d189671 
> 
> 
> Diff: https://reviews.apache.org/r/67728/diff/1/
> 
> 
> Testing
> -------
> 
> sudo make check
> 
> 
> Thanks,
> 
> Jie Yu
> 
>


Re: Review Request 67728: Fixed orphan container cleanup issue in CNI isolator.

Posted by Mesos Reviewbot Windows <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67728/#review205309
-----------------------------------------------------------



FAIL: Some of the unit tests failed. Please check the relevant logs.

Reviews applied: `['67728']`

Failed command: `Start-MesosCITesting`

All the build artifacts available at: http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/67728

Relevant logs:

- [stout-tests-cmake-stdout.log](http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/67728/logs/stout-tests-cmake-stdout.log):

```
       "D:\DCOS\mesos\3rdparty\bzip2-1.0.6.vcxproj" (default target) (6) ->
       (CustomBuild target) -> 
         C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\Common7\IDE\VC\VCTargets\Microsoft.CppCommon.targets(171,5): error MSB6006: "cmd.exe" exited with code 3. [D:\DCOS\mesos\3rdparty\bzip2-1.0.6.vcxproj]


       "D:\DCOS\mesos\3rdparty\stout\tests\stout-tests.vcxproj" (default target) (1) ->
       "D:\DCOS\mesos\3rdparty\libarchive-3.3.2.vcxproj" (default target) (3) ->
       "D:\DCOS\mesos\3rdparty\xz-5.2.3.vcxproj" (default target) (4) ->
         C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\Common7\IDE\VC\VCTargets\Microsoft.CppCommon.targets(171,5): error MSB6006: "cmd.exe" exited with code 3. [D:\DCOS\mesos\3rdparty\xz-5.2.3.vcxproj]


       "D:\DCOS\mesos\3rdparty\stout\tests\stout-tests.vcxproj" (default target) (1) ->
       "D:\DCOS\mesos\3rdparty\glog-da816ea70.vcxproj" (default target) (9) ->
         C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\Common7\IDE\VC\VCTargets\Microsoft.CppCommon.targets(171,5): error MSB6006: "cmd.exe" exited with code 3. [D:\DCOS\mesos\3rdparty\glog-da816ea70.vcxproj]


       "D:\DCOS\mesos\3rdparty\stout\tests\stout-tests.vcxproj" (default target) (1) ->
       "D:\DCOS\mesos\3rdparty\boost-1.65.0.vcxproj" (default target) (10) ->
         C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\Common7\IDE\VC\VCTargets\Microsoft.CppCommon.targets(171,5): error MSB6006: "cmd.exe" exited with code 3. [D:\DCOS\mesos\3rdparty\boost-1.65.0.vcxproj]


       "D:\DCOS\mesos\3rdparty\stout\tests\stout-tests.vcxproj" (default target) (1) ->
       "D:\DCOS\mesos\3rdparty\protobuf-3.5.0.vcxproj" (default target) (14) ->
         C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\Common7\IDE\VC\VCTargets\Microsoft.CppCommon.targets(171,5): error MSB6006: "cmd.exe" exited with code 3. [D:\DCOS\mesos\3rdparty\protobuf-3.5.0.vcxproj]

    1 Warning(s)
    5 Error(s)

Time Elapsed 00:03:17.53
```

- [libprocess-tests-cmake-stdout.log](http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/67728/logs/libprocess-tests-cmake-stdout.log):

```
         C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\Common7\IDE\VC\VCTargets\Microsoft.CppCommon.targets(171,5): error MSB6006: "cmd.exe" exited with code 3. [D:\DCOS\mesos\3rdparty\bzip2-1.0.6.vcxproj]


       "D:\DCOS\mesos\3rdparty\libprocess\src\tests\libprocess-tests.vcxproj" (default target) (1) ->
       "D:\DCOS\mesos\3rdparty\libprocess\src\tests\benchmarks.vcxproj" (default target) (5) ->
       "D:\DCOS\mesos\3rdparty\libarchive-3.3.2.vcxproj" (default target) (13) ->
       "D:\DCOS\mesos\3rdparty\xz-5.2.3.vcxproj" (default target) (16) ->
         C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\Common7\IDE\VC\VCTargets\Microsoft.CppCommon.targets(171,5): error MSB6006: "cmd.exe" exited with code 3. [D:\DCOS\mesos\3rdparty\xz-5.2.3.vcxproj]


       "D:\DCOS\mesos\3rdparty\libprocess\src\tests\libprocess-tests.vcxproj" (default target) (1) ->
       "D:\DCOS\mesos\3rdparty\glog-da816ea70.vcxproj" (default target) (12) ->
         C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\Common7\IDE\VC\VCTargets\Microsoft.CppCommon.targets(171,5): error MSB6006: "cmd.exe" exited with code 3. [D:\DCOS\mesos\3rdparty\glog-da816ea70.vcxproj]


       "D:\DCOS\mesos\3rdparty\libprocess\src\tests\libprocess-tests.vcxproj" (default target) (1) ->
       "D:\DCOS\mesos\3rdparty\libprocess\examples\example.vcxproj" (default target) (9) ->
       "D:\DCOS\mesos\3rdparty\protobuf-3.5.0.vcxproj" (default target) (19) ->
         C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\Common7\IDE\VC\VCTargets\Microsoft.CppCommon.targets(171,5): error MSB6006: "cmd.exe" exited with code 3. [D:\DCOS\mesos\3rdparty\protobuf-3.5.0.vcxproj]


       "D:\DCOS\mesos\3rdparty\libprocess\src\tests\libprocess-tests.vcxproj" (default target) (1) ->
       "D:\DCOS\mesos\3rdparty\boost-1.65.0.vcxproj" (default target) (4) ->
         C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\Common7\IDE\VC\VCTargets\Microsoft.CppCommon.targets(171,5): error MSB6006: "cmd.exe" exited with code 3. [D:\DCOS\mesos\3rdparty\boost-1.65.0.vcxproj]

    113 Warning(s)
    5 Error(s)

Time Elapsed 00:03:36.63
```

- [mesos-tests-cmake-stdout.log](http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/67728/logs/mesos-tests-cmake-stdout.log):

```

       "D:\DCOS\mesos\src\tests\mesos-tests.vcxproj" (default target) (1) ->
       "D:\DCOS\mesos\src\mesos.vcxproj" (default target) (16) ->
       "D:\DCOS\mesos\3rdparty\sasl2-2.1.27rc3.vcxproj" (default target) (27) ->
         C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\Common7\IDE\VC\VCTargets\Microsoft.CppCommon.targets(171,5): error MSB6006: "cmd.exe" exited with code 3. [D:\DCOS\mesos\3rdparty\sasl2-2.1.27rc3.vcxproj]


       "D:\DCOS\mesos\src\tests\mesos-tests.vcxproj" (default target) (1) ->
       "D:\DCOS\mesos\3rdparty\boost-1.65.0.vcxproj" (default target) (4) ->
         C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\Common7\IDE\VC\VCTargets\Microsoft.CppCommon.targets(171,5): error MSB6006: "cmd.exe" exited with code 3. [D:\DCOS\mesos\3rdparty\boost-1.65.0.vcxproj]


       "D:\DCOS\mesos\src\tests\mesos-tests.vcxproj" (default target) (1) ->
       "D:\DCOS\mesos\src\slave\mesos-agent.vcxproj" (default target) (12) ->
       "D:\DCOS\mesos\src\launcher\mesos-default-executor.vcxproj" (default target) (22) ->
       "D:\DCOS\mesos\3rdparty\zookeeper-3.4.8.vcxproj" (default target) (31) ->
         C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\Common7\IDE\VC\VCTargets\Microsoft.CppCommon.targets(171,5): error MSB6006: "cmd.exe" exited with code 3. [D:\DCOS\mesos\3rdparty\zookeeper-3.4.8.vcxproj]


       "D:\DCOS\mesos\src\tests\mesos-tests.vcxproj" (default target) (1) ->
       "D:\DCOS\mesos\src\slave\mesos-agent.vcxproj" (default target) (12) ->
       "D:\DCOS\mesos\src\slave\containerizer\mesos\mesos-containerizer.vcxproj" (default target) (19) ->
       "D:\DCOS\mesos\3rdparty\protobuf-3.5.0.vcxproj" (default target) (25) ->
         C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\Common7\IDE\VC\VCTargets\Microsoft.CppCommon.targets(171,5): error MSB6006: "cmd.exe" exited with code 3. [D:\DCOS\mesos\3rdparty\protobuf-3.5.0.vcxproj]

    1 Warning(s)
    7 Error(s)

Time Elapsed 00:00:20.40
```

- Mesos Reviewbot Windows


On June 25, 2018, noon, Jie Yu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67728/
> -----------------------------------------------------------
> 
> (Updated June 25, 2018, noon)
> 
> 
> Review request for mesos, Qian Zhang and Sagar Patwardhan.
> 
> 
> Bugs: MESOS-9025
>     https://issues.apache.org/jira/browse/MESOS-9025
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> The bug was introduced in https://reviews.apache.org/r/65987/ when we
> want to allow nested container to have a separate network namespace than
> its parent (MESOS-8534).
> 
> The original code misses a `continue` statement after recovering regular
> containers.
> 
> I am surprised that the unit tests didn't catch the bug. We'll add a new
> unit test for catching the regression.
> 
> 
> Diffs
> -----
> 
>   src/slave/containerizer/mesos/isolators/network/cni/cni.cpp b6ad4fcf9f1596c07ddeb9bbb134f4619d189671 
> 
> 
> Diff: https://reviews.apache.org/r/67728/diff/1/
> 
> 
> Testing
> -------
> 
> sudo make check
> 
> 
> Thanks,
> 
> Jie Yu
> 
>


Re: Review Request 67728: Fixed orphan container cleanup issue in CNI isolator.

Posted by Stéphane Cottin via Review Board <no...@reviews.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67728/#review205347
-----------------------------------------------------------


Ship it!




Ship It!

- Stéphane Cottin


On June 25, 2018, 7 p.m., Jie Yu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67728/
> -----------------------------------------------------------
> 
> (Updated June 25, 2018, 7 p.m.)
> 
> 
> Review request for mesos, Qian Zhang and Sagar Patwardhan.
> 
> 
> Bugs: MESOS-9025
>     https://issues.apache.org/jira/browse/MESOS-9025
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> The bug was introduced in https://reviews.apache.org/r/65987/ when we
> want to allow nested container to have a separate network namespace than
> its parent (MESOS-8534).
> 
> The original code misses a `continue` statement after recovering regular
> containers.
> 
> I am surprised that the unit tests didn't catch the bug. We'll add a new
> unit test for catching the regression.
> 
> 
> Diffs
> -----
> 
>   src/slave/containerizer/mesos/isolators/network/cni/cni.cpp b6ad4fcf9f1596c07ddeb9bbb134f4619d189671 
> 
> 
> Diff: https://reviews.apache.org/r/67728/diff/1/
> 
> 
> Testing
> -------
> 
> sudo make check
> 
> 
> Thanks,
> 
> Jie Yu
> 
>


Re: Review Request 67728: Fixed orphan container cleanup issue in CNI isolator.

Posted by Mesos Reviewbot <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67728/#review205340
-----------------------------------------------------------



Patch looks great!

Reviews applied: [67728]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' CONFIGURATION='--verbose --disable-libtool-wrappers' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1'; ./support/docker-build.sh

- Mesos Reviewbot


On June 25, 2018, 7 p.m., Jie Yu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67728/
> -----------------------------------------------------------
> 
> (Updated June 25, 2018, 7 p.m.)
> 
> 
> Review request for mesos, Qian Zhang and Sagar Patwardhan.
> 
> 
> Bugs: MESOS-9025
>     https://issues.apache.org/jira/browse/MESOS-9025
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> The bug was introduced in https://reviews.apache.org/r/65987/ when we
> want to allow nested container to have a separate network namespace than
> its parent (MESOS-8534).
> 
> The original code misses a `continue` statement after recovering regular
> containers.
> 
> I am surprised that the unit tests didn't catch the bug. We'll add a new
> unit test for catching the regression.
> 
> 
> Diffs
> -----
> 
>   src/slave/containerizer/mesos/isolators/network/cni/cni.cpp b6ad4fcf9f1596c07ddeb9bbb134f4619d189671 
> 
> 
> Diff: https://reviews.apache.org/r/67728/diff/1/
> 
> 
> Testing
> -------
> 
> sudo make check
> 
> 
> Thanks,
> 
> Jie Yu
> 
>