You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by Gilbert Song <so...@gmail.com> on 2019/01/07 12:22:33 UTC

Review Request 69681: Fixed the FD leak if containerizer::_launch() failed or discarded.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69681/
-----------------------------------------------------------

Review request for mesos, Andrei Budnik, Jie Yu, Qian Zhang, and Vinod Kone.


Bugs: MESOS-6632
    https://issues.apache.org/jira/browse/MESOS-6632


Repository: mesos


Description
-------

For the period between IOSB server is up and container process exec,
if the containerizer launch returns failure or discarded, the FD used
for signaling from the container process to the IOSB finish redirect
will be leaked, which would cause the IOSB stuck at io::redirect
forever. It would block the containerizer from cleaning up this
container at IOSB.

This issue could be exposed if there are frequent check containers
launch on an agent with heavy loads.


Diffs
-----

  include/mesos/slave/containerizer.hpp c34f1296586ea5d26619605f8f6a5ae9444f1a96 
  src/slave/containerizer/mesos/containerizer.cpp a5cf2da55c046c5c45e0c2ca3400f64de12de62b 
  src/slave/containerizer/mesos/io/switchboard.cpp c445a8f09d7671d5763281bac9881489b3cc9c39 
  src/slave/containerizer/mesos/utils.hpp bfd07e28c78fc140e395ffccd11d65545bf007fc 
  src/slave/containerizer/mesos/utils.cpp d9964f0b0e8ad8a41e5f5490a1c28bba9a972ae0 


Diff: https://reviews.apache.org/r/69681/diff/1/


Testing
-------

make check


Thanks,

Gilbert Song


Re: Review Request 69681: Fixed the FD leak if containerizer::_launch() failed or discarded.

Posted by Qian Zhang <zh...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69681/#review211724
-----------------------------------------------------------




include/mesos/slave/containerizer.hpp
Lines 85 (patched)
<https://reviews.apache.org/r/69681/#comment297279>

    s/Option<int_fd>()/Option<int_fd>::::none()/



include/mesos/slave/containerizer.hpp
Lines 90 (patched)
<https://reviews.apache.org/r/69681/#comment297280>

    Ditto.



src/slave/containerizer/mesos/containerizer.cpp
Lines 1366-1371 (original), 1368-1373 (patched)
<https://reviews.apache.org/r/69681/#comment297284>

    Can we do the `repair` inside this `then()`? Something like:
    ```
          .then(defer(self(), [=](const Option<ContainerIO>& containerIO) {
            return _launch(containerId, containerIO, environment, pidCheckpointPath)
              .repair([&containerIO](
                  const Future<Containerizer::LaunchResult>& result) {
                CHECK(containerIO.isSome());
                closeContainerIoFds(containerIO.get());
                return result;
              });
          }));
    ```
    In this way we do not need the local variable `_containerIO`.



src/slave/containerizer/mesos/utils.cpp
Lines 22 (patched)
<https://reviews.apache.org/r/69681/#comment297282>

    Why do we need this header?



src/slave/containerizer/mesos/utils.cpp
Lines 153-154 (patched)
<https://reviews.apache.org/r/69681/#comment297281>

    We could merge these two lines into one.


- Qian Zhang


On Jan. 7, 2019, 8:22 p.m., Gilbert Song wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69681/
> -----------------------------------------------------------
> 
> (Updated Jan. 7, 2019, 8:22 p.m.)
> 
> 
> Review request for mesos, Andrei Budnik, Jie Yu, Qian Zhang, and Vinod Kone.
> 
> 
> Bugs: MESOS-6632
>     https://issues.apache.org/jira/browse/MESOS-6632
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> For the period between IOSB server is up and container process exec,
> if the containerizer launch returns failure or discarded, the FD used
> for signaling from the container process to the IOSB finish redirect
> will be leaked, which would cause the IOSB stuck at io::redirect
> forever. It would block the containerizer from cleaning up this
> container at IOSB.
> 
> This issue could be exposed if there are frequent check containers
> launch on an agent with heavy loads.
> 
> 
> Diffs
> -----
> 
>   include/mesos/slave/containerizer.hpp c34f1296586ea5d26619605f8f6a5ae9444f1a96 
>   src/slave/containerizer/mesos/containerizer.cpp a5cf2da55c046c5c45e0c2ca3400f64de12de62b 
>   src/slave/containerizer/mesos/io/switchboard.cpp c445a8f09d7671d5763281bac9881489b3cc9c39 
>   src/slave/containerizer/mesos/utils.hpp bfd07e28c78fc140e395ffccd11d65545bf007fc 
>   src/slave/containerizer/mesos/utils.cpp d9964f0b0e8ad8a41e5f5490a1c28bba9a972ae0 
> 
> 
> Diff: https://reviews.apache.org/r/69681/diff/1/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Gilbert Song
> 
>


Re: Review Request 69681: Fixed the FD leak if containerizer::_launch() failed or discarded.

Posted by Andrei Budnik <ab...@mesosphere.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69681/#review211726
-----------------------------------------------------------




src/slave/containerizer/mesos/containerizer.cpp
Lines 1375 (patched)
<https://reviews.apache.org/r/69681/#comment297288>

    This will *copy* the value of `_containerIO`, hence it is *always* empty.



src/slave/containerizer/mesos/containerizer.cpp
Lines 1377 (patched)
<https://reviews.apache.org/r/69681/#comment297289>

    This might fail in case of discard is called *before* calling `.then` handler for `prepare()`.
    
    So, it should be better to:
    ```
    if (_containerIO.isSome()) {
      closeContainerIoFds(_containerIO.get());
    }
    ```



src/slave/containerizer/mesos/containerizer.cpp
Lines 1378 (patched)
<https://reviews.apache.org/r/69681/#comment297290>

    Will this lead to problem with a double `close` of the same FDs? We close FDs here, but these FDs will be closed in destructor when the IOSB terminates and then removes the same IO item from its hashmap.


- Andrei Budnik


On Jan. 7, 2019, 12:22 p.m., Gilbert Song wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69681/
> -----------------------------------------------------------
> 
> (Updated Jan. 7, 2019, 12:22 p.m.)
> 
> 
> Review request for mesos, Andrei Budnik, Jie Yu, Qian Zhang, and Vinod Kone.
> 
> 
> Bugs: MESOS-6632
>     https://issues.apache.org/jira/browse/MESOS-6632
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> For the period between IOSB server is up and container process exec,
> if the containerizer launch returns failure or discarded, the FD used
> for signaling from the container process to the IOSB finish redirect
> will be leaked, which would cause the IOSB stuck at io::redirect
> forever. It would block the containerizer from cleaning up this
> container at IOSB.
> 
> This issue could be exposed if there are frequent check containers
> launch on an agent with heavy loads.
> 
> 
> Diffs
> -----
> 
>   include/mesos/slave/containerizer.hpp c34f1296586ea5d26619605f8f6a5ae9444f1a96 
>   src/slave/containerizer/mesos/containerizer.cpp a5cf2da55c046c5c45e0c2ca3400f64de12de62b 
>   src/slave/containerizer/mesos/io/switchboard.cpp c445a8f09d7671d5763281bac9881489b3cc9c39 
>   src/slave/containerizer/mesos/utils.hpp bfd07e28c78fc140e395ffccd11d65545bf007fc 
>   src/slave/containerizer/mesos/utils.cpp d9964f0b0e8ad8a41e5f5490a1c28bba9a972ae0 
> 
> 
> Diff: https://reviews.apache.org/r/69681/diff/1/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Gilbert Song
> 
>


Re: Review Request 69681: Fixed the FD leak if containerizer::_launch() failed or discarded.

Posted by Mesos Reviewbot Windows <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69681/#review211792
-----------------------------------------------------------



FAIL: Some of the unit tests failed. Please check the relevant logs.

Reviews applied: `['69681']`

Failed command: `Start-MesosCITesting`

All the build artifacts available at: http://dcos-win.westus2.cloudapp.azure.com/artifacts/mesos-reviewbot-testing/2730/mesos-review-69681

Relevant logs:

- [mesos-tests-cmake.log](http://dcos-win.westus2.cloudapp.azure.com/artifacts/mesos-reviewbot-testing/2730/mesos-review-69681/logs/mesos-tests-cmake.log):

```
         d:\dcos\mesos\3rdparty\zookeeper-3.4.8\src\zookeeper-3.4.8\src\c\src\zookeeper.c(3501): warning C4996: 'sprintf': This function or variable may be unsafe. Consider using sprintf_s instead. To disable deprecation, use _CRT_SECURE_NO_WARNINGS. See online help for details. [D:\DCOS\mesos\3rdparty\zookeeper-3.4.8\src\zookeeper-3.4.8-build\zookeeper.vcxproj] [D:\DCOS\mesos\3rdparty\zookeeper-3.4.8.vcxproj]
         d:\dcos\mesos\3rdparty\zookeeper-3.4.8\src\zookeeper-3.4.8\src\c\src\zookeeper.c(3479): warning C4101: 'addrstr': unreferenced local variable [D:\DCOS\mesos\3rdparty\zookeeper-3.4.8\src\zookeeper-3.4.8-build\zookeeper.vcxproj] [D:\DCOS\mesos\3rdparty\zookeeper-3.4.8.vcxproj]
         d:\dcos\mesos\3rdparty\zookeeper-3.4.8\src\zookeeper-3.4.8\src\c\src\recordio.c(170): warning C4267: '=': conversion from 'size_t' to 'int32_t', possible loss of data [D:\DCOS\mesos\3rdparty\zookeeper-3.4.8\src\zookeeper-3.4.8-build\zookeeper.vcxproj] [D:\DCOS\mesos\3rdparty\zookeeper-3.4.8.vcxproj]
         d:\dcos\mesos\3rdparty\zookeeper-3.4.8\src\zookeeper-3.4.8\src\c\src\mt_adaptor.c(496): warning C4244: '=': conversion from 'time_t' to 'int32_t', possible loss of data [D:\DCOS\mesos\3rdparty\zookeeper-3.4.8\src\zookeeper-3.4.8-build\zookeeper.vcxproj] [D:\DCOS\mesos\3rdparty\zookeeper-3.4.8.vcxproj]
         d:\dcos\mesos\3rdparty\zookeeper-3.4.8\src\zookeeper-3.4.8\src\c\src\winport.c(256): warning C4090: 'function': different 'const' qualifiers [D:\DCOS\mesos\3rdparty\zookeeper-3.4.8\src\zookeeper-3.4.8-build\zookeeper.vcxproj] [D:\DCOS\mesos\3rdparty\zookeeper-3.4.8.vcxproj]
         d:\dcos\mesos\3rdparty\zookeeper-3.4.8\src\zookeeper-3.4.8\src\c\src\winport.c(166): warning C4716: 'pthread_cond_broadcast': must return a value [D:\DCOS\mesos\3rdparty\zookeeper-3.4.8\src\zookeeper-3.4.8-build\zookeeper.vcxproj] [D:\DCOS\mesos\3rdparty\zookeeper-3.4.8.vcxproj]
         d:\dcos\mesos\3rdparty\zookeeper-3.4.8\src\zookeeper-3.4.8\src\c\src\winport.c(205): warning C4716: 'pthread_cond_wait': must return a value [D:\DCOS\mesos\3rdparty\zookeeper-3.4.8\src\zookeeper-3.4.8-build\zookeeper.vcxproj] [D:\DCOS\mesos\3rdparty\zookeeper-3.4.8.vcxproj]
         d:\dcos\mesos\3rdparty\zookeeper-3.4.8\src\zookeeper-3.4.8\src\c\src\cli.c(124): warning C4996: 'fopen': This function or variable may be unsafe. Consider using fopen_s instead. To disable deprecation, use _CRT_SECURE_NO_WARNINGS. See online help for details. [D:\DCOS\mesos\3rdparty\zookeeper-3.4.8\src\zookeeper-3.4.8-build\cli.vcxproj] [D:\DCOS\mesos\3rdparty\zookeeper-3.4.8.vcxproj]
         d:\dcos\mesos\3rdparty\zookeeper-3.4.8\src\zookeeper-3.4.8\src\c\src\cli.c(128): warning C4267: 'initializing': conversion from 'size_t' to 'int', possible loss of data [D:\DCOS\mesos\3rdparty\zookeeper-3.4.8\src\zookeeper-3.4.8-build\cli.vcxproj] [D:\DCOS\mesos\3rdparty\zookeeper-3.4.8.vcxproj]
         d:\dcos\mesos\3rdparty\zookeeper-3.4.8\src\zookeeper-3.4.8\src\c\src\cli.c(279): warning C4267: 'function': conversion from 'size_t' to 'int', possible loss of data [D:\DCOS\mesos\3rdparty\zookeeper-3.4.8\src\zookeeper-3.4.8-build\cli.vcxproj] [D:\DCOS\mesos\3rdparty\zookeeper-3.4.8.vcxproj]
         d:\dcos\mesos\3rdparty\zookeeper-3.4.8\src\zookeeper-3.4.8\src\c\src\cli.c(301): warning C4267: 'initializing': conversion from 'size_t' to 'int', possible loss of data [D:\DCOS\mesos\3rdparty\zookeeper-3.4.8\src\zookeeper-3.4.8-build\cli.vcxproj] [D:\DCOS\mesos\3rdparty\zookeeper-3.4.8.vcxproj]
         d:\dcos\mesos\3rdparty\zookeeper-3.4.8\src\zookeeper-3.4.8\src\c\src\cli.c(368): warning C4267: 'function': conversion from 'size_t' to 'int', possible loss of data [D:\DCOS\mesos\3rdparty\zookeeper-3.4.8\src\zookeeper-3.4.8-build\cli.vcxproj] [D:\DCOS\mesos\3rdparty\zookeeper-3.4.8.vcxproj]
         d:\dcos\mesos\3rdparty\zookeeper-3.4.8\src\zookeeper-3.4.8\src\c\src\cli.c(372): warning C4267: 'function': conversion from 'size_t' to 'int', possible loss of data [D:\DCOS\mesos\3rdparty\zookeeper-3.4.8\src\zookeeper-3.4.8-build\cli.vcxproj] [D:\DCOS\mesos\3rdparty\zookeeper-3.4.8.vcxproj]
         d:\dcos\mesos\3rdparty\zookeeper-3.4.8\src\zookeeper-3.4.8\src\c\src\cli.c(512): warning C4267: 'function': conversion from 'size_t' to 'int', possible loss of data [D:\DCOS\mesos\3rdparty\zookeeper-3.4.8\src\zookeeper-3.4.8-build\cli.vcxproj] [D:\DCOS\mesos\3rdparty\zookeeper-3.4.8.vcxproj]
         d:\dcos\mesos\3rdparty\zookeeper-3.4.8\src\zookeeper-3.4.8\src\c\src\cli.c(543): warning C4996: 'strcpy': This function or variable may be unsafe. Consider using strcpy_s instead. To disable deprecation, use _CRT_SECURE_NO_WARNINGS. See online help for details. [D:\DCOS\mesos\3rdparty\zookeeper-3.4.8\src\zookeeper-3.4.8-build\cli.vcxproj] [D:\DCOS\mesos\3rdparty\zookeeper-3.4.8.vcxproj]
         d:\dcos\mesos\3rdparty\zookeeper-3.4.8\src\zookeeper-3.4.8\src\c\src\cli.c(548): warning C4996: 'fopen': This function or variable may be unsafe. Consider using fopen_s instead. To disable deprecation, use _CRT_SECURE_NO_WARNINGS. See online help for details. [D:\DCOS\mesos\3rdparty\zookeeper-3.4.8\src\zookeeper-3.4.8-build\cli.vcxproj] [D:\DCOS\mesos\3rdparty\zookeeper-3.4.8.vcxproj]
         d:\dcos\mesos\3rdparty\zookeeper-3.4.8\src\zookeeper-3.4.8\src\c\src\cli.c(569): warning C4996: 'strcpy': This function or variable may be unsafe. Consider using strcpy_s instead. To disable deprecation, use _CRT_SECURE_NO_WARNINGS. See online help for details. [D:\DCOS\mesos\3rdparty\zookeeper-3.4.8\src\zookeeper-3.4.8-build\cli.vcxproj] [D:\DCOS\mesos\3rdparty\zookeeper-3.4.8.vcxproj]


       "D:\DCOS\mesos\src\tests\mesos-tests.vcxproj" (default target) (1) ->
       "D:\DCOS\mesos\src\slave\mesos-agent.vcxproj" (default target) (13) ->
       (Link target) -> 
         mesos.lib(containerizer.cpp.obj) : error LNK2019: unresolved external symbol "void __cdecl mesos::internal::slave::closeContainerIoFds(struct mesos::slave::ContainerIO const &)" (?closeContainerIoFds@slave@internal@mesos@@YAXAEBUContainerIO@13@@Z) referenced in function "public: class process::Future<enum mesos::internal::slave::Containerizer::LaunchResult> __cdecl <lambda_82a3c2f76e4e4d542739148c6142fcb9>::operator()(class process::Future<enum mesos::internal::slave::Containerizer::LaunchResult> const &)const " (??R<lambda_82a3c2f76e4e4d542739148c6142fcb9>@@QEBA?AV?$Future@W4LaunchResult@Containerizer@slave@internal@mesos@@@process@@AEBV12@@Z) [D:\DCOS\mesos\src\slave\mesos-agent.vcxproj]
         D:\DCOS\mesos\src\mesos-agent.exe : fatal error LNK1120: 1 unresolved externals [D:\DCOS\mesos\src\slave\mesos-agent.vcxproj]

    172 Warning(s)
    2 Error(s)

Time Elapsed 00:20:26.28
```

- Mesos Reviewbot Windows


On Jan. 7, 2019, 12:22 p.m., Gilbert Song wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69681/
> -----------------------------------------------------------
> 
> (Updated Jan. 7, 2019, 12:22 p.m.)
> 
> 
> Review request for mesos, Andrei Budnik, Jie Yu, Qian Zhang, and Vinod Kone.
> 
> 
> Bugs: MESOS-6632
>     https://issues.apache.org/jira/browse/MESOS-6632
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> For the period between IOSB server is up and container process exec,
> if the containerizer launch returns failure or discarded, the FD used
> for signaling from the container process to the IOSB finish redirect
> will be leaked, which would cause the IOSB stuck at io::redirect
> forever. It would block the containerizer from cleaning up this
> container at IOSB.
> 
> This issue could be exposed if there are frequent check containers
> launch on an agent with heavy loads.
> 
> 
> Diffs
> -----
> 
>   include/mesos/slave/containerizer.hpp c34f1296586ea5d26619605f8f6a5ae9444f1a96 
>   src/slave/containerizer/mesos/containerizer.cpp a5cf2da55c046c5c45e0c2ca3400f64de12de62b 
>   src/slave/containerizer/mesos/io/switchboard.cpp c445a8f09d7671d5763281bac9881489b3cc9c39 
>   src/slave/containerizer/mesos/utils.hpp bfd07e28c78fc140e395ffccd11d65545bf007fc 
>   src/slave/containerizer/mesos/utils.cpp d9964f0b0e8ad8a41e5f5490a1c28bba9a972ae0 
> 
> 
> Diff: https://reviews.apache.org/r/69681/diff/1/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Gilbert Song
> 
>