You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by Kevin Klues <kl...@gmail.com> on 2017/02/02 00:04:02 UTC

Review Request 56195: Updated containerizer->launch path to close IOSwitchboard FDs on error.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56195/
-----------------------------------------------------------

Review request for mesos, Alexander Rukletsov, Gast�n Kleiman, Gilbert Song, and Jie Yu.


Repository: mesos


Description
-------

Previously, if the containizer launch path failed before actually
launching the container, the FDs allocated to the container by the
IOSwitchboard isolator would be leaked. This would lead to deadlock in
the destroy path because the IOSwitchboard does not shutdown until the
FDs it allocates to the container have been closed. Since the
switchboard doesn't shutdown, the future returned by its 'cleanup()'
function is never satisfied.

This commit makes sure to close the FDs under all failure cases in the
launch path.


Diffs
-----

  src/slave/containerizer/mesos/containerizer.cpp 4f0a773676da45fa40ad1ad9cdfab2a19249247d 

Diff: https://reviews.apache.org/r/56195/diff/


Testing
-------

Tests still pending. WIll update when complete.


Thanks,

Kevin Klues


Re: Review Request 56195: Updated containerizer->launch path to close IOSwitchboard FDs on error.

Posted by Jie Yu <yu...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56195/#review164040
-----------------------------------------------------------




src/slave/containerizer/mesos/containerizer.cpp (line 1140)
<https://reviews.apache.org/r/56195/#comment235572>

    Hum, this could cause as FD being closed twice.
    
    Say `_launch` fails after `launcher->fork`. That means the subprocess call already close the parent end of the pipe. Here, you'll be closing it again.
    
    I think the only elegant fix is to introduce file descriptor abstraction and use RAII to make sure the fd will be closed in error case.


- Jie Yu


On Feb. 2, 2017, 5:29 a.m., Kevin Klues wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56195/
> -----------------------------------------------------------
> 
> (Updated Feb. 2, 2017, 5:29 a.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Gast�n Kleiman, Gilbert Song, and Jie Yu.
> 
> 
> Bugs: MESOS-7050
>     https://issues.apache.org/jira/browse/MESOS-7050
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Previously, if the containizer launch path failed before actually
> launching the container, the FDs allocated to the container by the
> IOSwitchboard isolator would be leaked. This would lead to deadlock in
> the destroy path because the IOSwitchboard does not shutdown until the
> FDs it allocates to the container have been closed. Since the
> switchboard doesn't shutdown, the future returned by its 'cleanup()'
> function is never satisfied.
> 
> This commit makes sure to close the FDs under all failure cases in the
> launch path.
> 
> 
> Diffs
> -----
> 
>   src/slave/containerizer/mesos/containerizer.cpp 4f0a773676da45fa40ad1ad9cdfab2a19249247d 
> 
> Diff: https://reviews.apache.org/r/56195/diff/
> 
> 
> Testing
> -------
> 
> Linux CentOS 7:
> ```
> GTEST_FILTER="" make -j check
> src/mesos-tests
> [----------] Global test environment tear-down
> [==========] 1477 tests from 167 test cases ran. (390804 ms total)
> [  PASSED  ] 1477 tests.
> ```
> 
> Mac OS X (El Capitan)
> ```
> GTEST_FILTER="" make -j check
> src/mesos-tests
> [  FAILED  ] 3 tests, listed below:
> [  FAILED  ] ExamplesTest.V1JavaFramework
> [  FAILED  ] ExamplesTest.PythonFramework
> ```
> 
> 
> Thanks,
> 
> Kevin Klues
> 
>


Re: Review Request 56195: Fixed ContainerLogger / IOSwitchboard FD leaks.

Posted by Mesos Reviewbot <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56195/#review166030
-----------------------------------------------------------



Patch looks great!

Reviews applied: [56814, 56195]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' CONFIGURATION='--verbose' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1'; ./support/docker-build.sh

- Mesos Reviewbot


On Feb. 18, 2017, 6:39 p.m., Kevin Klues wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56195/
> -----------------------------------------------------------
> 
> (Updated Feb. 18, 2017, 6:39 p.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Gast�n Kleiman, Gilbert Song, Jie Yu, and Vinod Kone.
> 
> 
> Bugs: MESOS-7050
>     https://issues.apache.org/jira/browse/MESOS-7050
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Previously, the containizer launch path would leak FDs if the
> containerizer launch path failed between successfully calling
> prepare() on either the ContainerLogger (in the case of the Docker
> containerizer) or the IOSwitchboard (in the case of the mesos
> containerizer) and forking the actual container.
> 
> These components relied on the Subprocess call inside launcher->fork()
> to close these FDS on their behalf. If the containerizer launch path
> failed somewhere between calling prepare() and making this fork()
> call, these FDs would never be closed.
> 
> In the case of the IOSwitchboard, this would lead to deadlock in the
> destroy path because the future returned by the IOSwitchboard's
> cleanup function would never be satisfied. The IOSwitchboard doesn't
> shutdown until the FDs it allocates to the container have been closed.
> 
> This commit fixes this problem by updating the
> ContainerLogger::SubprocessInfo::FD abstraction to change the way
> manages FDS. Instead of tagging each FD with the Subprocess::IO::OWNED
> label and forcing the launcher->fork() call to deal with closing the
> FDs once it's forked a new subprocess, we now do things slightly
> differently now.
> 
> We now keep the default DUP label on each FD (instead fo changing it
> to OWNED) to cause launcher->fork() to dup it before mapping it onto
> the stdin/stdout/stderr of the subprocess. It then only closes the
> duped FD, leaving the original one open.
> 
> In doing so, it s now the contaienrizer's responsibility to ensure
> that these FDs are closed properly (whether that's between a
> successful prepare() call and launcher->fork()) or after
> launcher->fork() has completed successfully. While this has the
> potential to complicate things slightly on the SUCCESS path,
> at least it is now the containerizers's responsibility to close these
> FDS in *all* cases, rather than splitting that responsibility across
> components.
> 
> In order to simplify this, we've also modified the
> ContainerLogger::SubprocessInfo::FD abstraction to hold a Shared
> pointer to its underlying file descriptor and (optionally) close it on
> destruction. With this, we can ensure that all file descriptors
> created through this abstraction will be automatically closed onced
> their final reference goes out of scope (even if its been copied
> around several times).
> 
> In essence, this release the containerizer from the burden of manually
> closing these FDS itself. So long as it holds the final reference to
> these FDs (which it does), they will be automatically closed along
> *any* path out of containerizer->launch(). These are exactly the
> semantics we wanted to achieve.
> 
> In the case of the the ContainerLogger, ownership of these FDs (and
> thus their final reference) is passed to the containerizer in the
> SubprocessInfo struct returned by prepare(). In the case of the
> IOSwitchboard, we had to add a new API call to transfer ownership
> (since it is an isolator and prepare() can only return a protobuf),
> but the end result is the same.
> 
> 
> Diffs
> -----
> 
>   include/mesos/slave/container_logger.hpp a3f619b79ca0188df9e231c600dfa396f39ab29a 
>   include/mesos/slave/containerizer.proto c70d437ac3f8f813fcb71c04275cc8eeeb946065 
>   src/slave/containerizer/mesos/containerizer.hpp 10a9b57660388ac2409458a4d07af64cc3b185e2 
>   src/slave/containerizer/mesos/containerizer.cpp d2b4f75a55dbe4746bc2dfc180335fa831a554ef 
>   src/slave/containerizer/mesos/io/switchboard.hpp 7a7ad2da047ef316f4382e45526d503c87a903a0 
>   src/slave/containerizer/mesos/io/switchboard.cpp b948f3c44dd2e1af2228ca1579f24ae3a94160b0 
> 
> Diff: https://reviews.apache.org/r/56195/diff/
> 
> 
> Testing
> -------
> 
> Linux CentOS 7:
> ```
> GTEST_FILTER="" make -j check
> sudo src/mesos-tests
> ```
> 
> 
> Thanks,
> 
> Kevin Klues
> 
>


Re: Review Request 56195: Fixed ContainerLogger / IOSwitchboard FD leaks.

Posted by Kevin Klues <kl...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56195/
-----------------------------------------------------------

(Updated Feb. 19, 2017, 4:36 p.m.)


Review request for mesos, Alexander Rukletsov, Gast�n Kleiman, Gilbert Song, Jie Yu, and Vinod Kone.


Changes
-------

Fixed memory leak.


Bugs: MESOS-7050
    https://issues.apache.org/jira/browse/MESOS-7050


Repository: mesos


Description
-------

Previously, the containizer launch path would leak FDs if the
containerizer launch path failed between successfully calling
prepare() on either the ContainerLogger (in the case of the Docker
containerizer) or the IOSwitchboard (in the case of the mesos
containerizer) and forking the actual container.

These components relied on the Subprocess call inside launcher->fork()
to close these FDS on their behalf. If the containerizer launch path
failed somewhere between calling prepare() and making this fork()
call, these FDs would never be closed.

In the case of the IOSwitchboard, this would lead to deadlock in the
destroy path because the future returned by the IOSwitchboard's
cleanup function would never be satisfied. The IOSwitchboard doesn't
shutdown until the FDs it allocates to the container have been closed.

This commit fixes this problem by updating the
ContainerLogger::SubprocessInfo::FD abstraction to change the way
manages FDS. Instead of tagging each FD with the Subprocess::IO::OWNED
label and forcing the launcher->fork() call to deal with closing the
FDs once it's forked a new subprocess, we now do things slightly
differently now.

We now keep the default DUP label on each FD (instead fo changing it
to OWNED) to cause launcher->fork() to dup it before mapping it onto
the stdin/stdout/stderr of the subprocess. It then only closes the
duped FD, leaving the original one open.

In doing so, it s now the contaienrizer's responsibility to ensure
that these FDs are closed properly (whether that's between a
successful prepare() call and launcher->fork()) or after
launcher->fork() has completed successfully. While this has the
potential to complicate things slightly on the SUCCESS path,
at least it is now the containerizers's responsibility to close these
FDS in *all* cases, rather than splitting that responsibility across
components.

In order to simplify this, we've also modified the
ContainerLogger::SubprocessInfo::FD abstraction to hold a Shared
pointer to its underlying file descriptor and (optionally) close it on
destruction. With this, we can ensure that all file descriptors
created through this abstraction will be automatically closed onced
their final reference goes out of scope (even if its been copied
around several times).

In essence, this release the containerizer from the burden of manually
closing these FDS itself. So long as it holds the final reference to
these FDs (which it does), they will be automatically closed along
*any* path out of containerizer->launch(). These are exactly the
semantics we wanted to achieve.

In the case of the the ContainerLogger, ownership of these FDs (and
thus their final reference) is passed to the containerizer in the
SubprocessInfo struct returned by prepare(). In the case of the
IOSwitchboard, we had to add a new API call to transfer ownership
(since it is an isolator and prepare() can only return a protobuf),
but the end result is the same.


Diffs (updated)
-----

  include/mesos/slave/container_logger.hpp a3f619b79ca0188df9e231c600dfa396f39ab29a 
  include/mesos/slave/containerizer.proto c70d437ac3f8f813fcb71c04275cc8eeeb946065 
  src/slave/containerizer/mesos/containerizer.hpp 10a9b57660388ac2409458a4d07af64cc3b185e2 
  src/slave/containerizer/mesos/containerizer.cpp d2b4f75a55dbe4746bc2dfc180335fa831a554ef 
  src/slave/containerizer/mesos/io/switchboard.hpp 7a7ad2da047ef316f4382e45526d503c87a903a0 
  src/slave/containerizer/mesos/io/switchboard.cpp b948f3c44dd2e1af2228ca1579f24ae3a94160b0 

Diff: https://reviews.apache.org/r/56195/diff/


Testing
-------

Linux CentOS 7:
```
GTEST_FILTER="" make -j check
sudo src/mesos-tests
```


Thanks,

Kevin Klues


Re: Review Request 56195: Fixed ContainerLogger / IOSwitchboard FD leaks.

Posted by Kevin Klues <kl...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56195/
-----------------------------------------------------------

(Updated Feb. 18, 2017, 6:39 p.m.)


Review request for mesos, Alexander Rukletsov, Gast�n Kleiman, Gilbert Song, Jie Yu, and Vinod Kone.


Changes
-------

Updated based on discussion of a better logn-term solution to solving this problem.


Summary (updated)
-----------------

Fixed ContainerLogger / IOSwitchboard FD leaks.


Bugs: MESOS-7050
    https://issues.apache.org/jira/browse/MESOS-7050


Repository: mesos


Description (updated)
-------

Previously, the containizer launch path would leak FDs if the
containerizer launch path failed between successfully calling
prepare() on either the ContainerLogger (in the case of the Docker
containerizer) or the IOSwitchboard (in the case of the mesos
containerizer) and forking the actual container.

These components relied on the Subprocess call inside launcher->fork()
to close these FDS on their behalf. If the containerizer launch path
failed somewhere between calling prepare() and making this fork()
call, these FDs would never be closed.

In the case of the IOSwitchboard, this would lead to deadlock in the
destroy path because the future returned by the IOSwitchboard's
cleanup function would never be satisfied. The IOSwitchboard doesn't
shutdown until the FDs it allocates to the container have been closed.

This commit fixes this problem by updating the
ContainerLogger::SubprocessInfo::FD abstraction to change the way
manages FDS. Instead of tagging each FD with the Subprocess::IO::OWNED
label and forcing the launcher->fork() call to deal with closing the
FDs once it's forked a new subprocess, we now do things slightly
differently now.

We now keep the default DUP label on each FD (instead fo changing it
to OWNED) to cause launcher->fork() to dup it before mapping it onto
the stdin/stdout/stderr of the subprocess. It then only closes the
duped FD, leaving the original one open.

In doing so, it s now the contaienrizer's responsibility to ensure
that these FDs are closed properly (whether that's between a
successful prepare() call and launcher->fork()) or after
launcher->fork() has completed successfully. While this has the
potential to complicate things slightly on the SUCCESS path,
at least it is now the containerizers's responsibility to close these
FDS in *all* cases, rather than splitting that responsibility across
components.

In order to simplify this, we've also modified the
ContainerLogger::SubprocessInfo::FD abstraction to hold a Shared
pointer to its underlying file descriptor and (optionally) close it on
destruction. With this, we can ensure that all file descriptors
created through this abstraction will be automatically closed onced
their final reference goes out of scope (even if its been copied
around several times).

In essence, this release the containerizer from the burden of manually
closing these FDS itself. So long as it holds the final reference to
these FDs (which it does), they will be automatically closed along
*any* path out of containerizer->launch(). These are exactly the
semantics we wanted to achieve.

In the case of the the ContainerLogger, ownership of these FDs (and
thus their final reference) is passed to the containerizer in the
SubprocessInfo struct returned by prepare(). In the case of the
IOSwitchboard, we had to add a new API call to transfer ownership
(since it is an isolator and prepare() can only return a protobuf),
but the end result is the same.


Diffs (updated)
-----

  include/mesos/slave/container_logger.hpp a3f619b79ca0188df9e231c600dfa396f39ab29a 
  include/mesos/slave/containerizer.proto c70d437ac3f8f813fcb71c04275cc8eeeb946065 
  src/slave/containerizer/mesos/containerizer.hpp 10a9b57660388ac2409458a4d07af64cc3b185e2 
  src/slave/containerizer/mesos/containerizer.cpp d2b4f75a55dbe4746bc2dfc180335fa831a554ef 
  src/slave/containerizer/mesos/io/switchboard.hpp 7a7ad2da047ef316f4382e45526d503c87a903a0 
  src/slave/containerizer/mesos/io/switchboard.cpp b948f3c44dd2e1af2228ca1579f24ae3a94160b0 

Diff: https://reviews.apache.org/r/56195/diff/


Testing (updated)
-------

Linux CentOS 7:
```
GTEST_FILTER="" make -j check
sudo src/mesos-tests
```


Thanks,

Kevin Klues


Re: Review Request 56195: Updated containerizer->launch path to close IOSwitchboard FDs on error.

Posted by Mesos Reviewbot <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56195/#review163956
-----------------------------------------------------------



Patch looks great!

Reviews applied: [56195]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' CONFIGURATION='--verbose' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1'; ./support/docker-build.sh

- Mesos Reviewbot


On Feb. 2, 2017, 5:29 a.m., Kevin Klues wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56195/
> -----------------------------------------------------------
> 
> (Updated Feb. 2, 2017, 5:29 a.m.)
> 
> 
> Review request for mesos, Alexander Rukletsov, Gast�n Kleiman, Gilbert Song, and Jie Yu.
> 
> 
> Bugs: MESOS-7050
>     https://issues.apache.org/jira/browse/MESOS-7050
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Previously, if the containizer launch path failed before actually
> launching the container, the FDs allocated to the container by the
> IOSwitchboard isolator would be leaked. This would lead to deadlock in
> the destroy path because the IOSwitchboard does not shutdown until the
> FDs it allocates to the container have been closed. Since the
> switchboard doesn't shutdown, the future returned by its 'cleanup()'
> function is never satisfied.
> 
> This commit makes sure to close the FDs under all failure cases in the
> launch path.
> 
> 
> Diffs
> -----
> 
>   src/slave/containerizer/mesos/containerizer.cpp 4f0a773676da45fa40ad1ad9cdfab2a19249247d 
> 
> Diff: https://reviews.apache.org/r/56195/diff/
> 
> 
> Testing
> -------
> 
> Linux CentOS 7:
> ```
> GTEST_FILTER="" make -j check
> src/mesos-tests
> [----------] Global test environment tear-down
> [==========] 1477 tests from 167 test cases ran. (390804 ms total)
> [  PASSED  ] 1477 tests.
> ```
> 
> Mac OS X (El Capitan)
> ```
> GTEST_FILTER="" make -j check
> src/mesos-tests
> [  FAILED  ] 3 tests, listed below:
> [  FAILED  ] ExamplesTest.V1JavaFramework
> [  FAILED  ] ExamplesTest.PythonFramework
> ```
> 
> 
> Thanks,
> 
> Kevin Klues
> 
>


Re: Review Request 56195: Updated containerizer->launch path to close IOSwitchboard FDs on error.

Posted by Kevin Klues <kl...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56195/
-----------------------------------------------------------

(Updated Feb. 2, 2017, 5:29 a.m.)


Review request for mesos, Alexander Rukletsov, Gast�n Kleiman, Gilbert Song, and Jie Yu.


Changes
-------

Updated to account for the fact that `container->launchInfos` is a `Future<list>` and not just a `list`. Also filled in the test section.


Bugs: MESOS-7050
    https://issues.apache.org/jira/browse/MESOS-7050


Repository: mesos


Description
-------

Previously, if the containizer launch path failed before actually
launching the container, the FDs allocated to the container by the
IOSwitchboard isolator would be leaked. This would lead to deadlock in
the destroy path because the IOSwitchboard does not shutdown until the
FDs it allocates to the container have been closed. Since the
switchboard doesn't shutdown, the future returned by its 'cleanup()'
function is never satisfied.

This commit makes sure to close the FDs under all failure cases in the
launch path.


Diffs (updated)
-----

  src/slave/containerizer/mesos/containerizer.cpp 4f0a773676da45fa40ad1ad9cdfab2a19249247d 

Diff: https://reviews.apache.org/r/56195/diff/


Testing (updated)
-------

Linux CentOS 7:
```
GTEST_FILTER="" make -j check
src/mesos-tests
[----------] Global test environment tear-down
[==========] 1477 tests from 167 test cases ran. (390804 ms total)
[  PASSED  ] 1477 tests.
```

Mac OS X (El Capitan)
```
GTEST_FILTER="" make -j check
src/mesos-tests
[  FAILED  ] 3 tests, listed below:
[  FAILED  ] ExamplesTest.V1JavaFramework
[  FAILED  ] ExamplesTest.PythonFramework
```


Thanks,

Kevin Klues


Re: Review Request 56195: Updated containerizer->launch path to close IOSwitchboard FDs on error.

Posted by Kevin Klues <kl...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56195/
-----------------------------------------------------------

(Updated Feb. 2, 2017, 1:29 a.m.)


Review request for mesos, Alexander Rukletsov, Gast�n Kleiman, Gilbert Song, and Jie Yu.


Changes
-------

Protect calls to access `launchInfo` with a check for `launchInfo.isSome()`.


Bugs: MESOS-7050
    https://issues.apache.org/jira/browse/MESOS-7050


Repository: mesos


Description
-------

Previously, if the containizer launch path failed before actually
launching the container, the FDs allocated to the container by the
IOSwitchboard isolator would be leaked. This would lead to deadlock in
the destroy path because the IOSwitchboard does not shutdown until the
FDs it allocates to the container have been closed. Since the
switchboard doesn't shutdown, the future returned by its 'cleanup()'
function is never satisfied.

This commit makes sure to close the FDs under all failure cases in the
launch path.


Diffs (updated)
-----

  src/slave/containerizer/mesos/containerizer.cpp 4f0a773676da45fa40ad1ad9cdfab2a19249247d 

Diff: https://reviews.apache.org/r/56195/diff/


Testing
-------

Tests still pending. WIll update when complete.


Thanks,

Kevin Klues


Re: Review Request 56195: Updated containerizer->launch path to close IOSwitchboard FDs on error.

Posted by Kevin Klues <kl...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56195/
-----------------------------------------------------------

(Updated Feb. 2, 2017, 12:06 a.m.)


Review request for mesos, Alexander Rukletsov, Gast�n Kleiman, Gilbert Song, and Jie Yu.


Bugs: MESOS-7050
    https://issues.apache.org/jira/browse/MESOS-7050


Repository: mesos


Description
-------

Previously, if the containizer launch path failed before actually
launching the container, the FDs allocated to the container by the
IOSwitchboard isolator would be leaked. This would lead to deadlock in
the destroy path because the IOSwitchboard does not shutdown until the
FDs it allocates to the container have been closed. Since the
switchboard doesn't shutdown, the future returned by its 'cleanup()'
function is never satisfied.

This commit makes sure to close the FDs under all failure cases in the
launch path.


Diffs
-----

  src/slave/containerizer/mesos/containerizer.cpp 4f0a773676da45fa40ad1ad9cdfab2a19249247d 

Diff: https://reviews.apache.org/r/56195/diff/


Testing
-------

Tests still pending. WIll update when complete.


Thanks,

Kevin Klues