You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Joseph Wu (JIRA)" <ji...@apache.org> on 2016/07/22 20:57:20 UTC

[jira] [Commented] (MESOS-5882) `os::cloexec` does not exist on Windows

    [ https://issues.apache.org/jira/browse/MESOS-5882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15390182#comment-15390182 ] 

Joseph Wu commented on MESOS-5882:
----------------------------------

Here's a list of callsites for {{os::cloexec}} and a summary of why {{os::cloexec}} matters in the callsite:
|| Location || Reason ||
| https://github.com/apache/mesos/blob/17a1e58d3f48d866ac5132cc28b2f33c2e287aac/3rdparty/libprocess/src/subprocess.cpp#L140-L144 | When we open a PIPE to a subprocess, those pipes are kept open in the parent.  We need CLOEXEC to prevent future subprocesses from inheriting these incorrectly. |
| https://github.com/apache/mesos/blob/17a1e58d3f48d866ac5132cc28b2f33c2e287aac/3rdparty/libprocess/src/io.cpp#L264
  https://github.com/apache/mesos/blob/17a1e58d3f48d866ac5132cc28b2f33c2e287aac/3rdparty/libprocess/src/io.cpp#L419
  https://github.com/apache/mesos/blob/17a1e58d3f48d866ac5132cc28b2f33c2e287aac/3rdparty/libprocess/src/io.cpp#L476
  https://github.com/apache/mesos/blob/17a1e58d3f48d866ac5132cc28b2f33c2e287aac/3rdparty/libprocess/src/io.cpp#L535 | {{io::peek}}, {{io::read}}, {{io::write}}, and {{io::redirect}} duplicate an FD to control the FD lifecycle asynchronously.  We CLOEXEC to prevent FD leaks. |
| https://github.com/apache/mesos/blob/17a1e58d3f48d866ac5132cc28b2f33c2e287aac/3rdparty/libprocess/src/socket.cpp#L62 | Sockets will CLOEXEC on create to prevent leaks. |
| https://github.com/apache/mesos/blob/17a1e58d3f48d866ac5132cc28b2f33c2e287aac/3rdparty/libprocess/src/libevent_ssl_socket.cpp#L490
  https://github.com/apache/mesos/blob/17a1e58d3f48d866ac5132cc28b2f33c2e287aac/3rdparty/libprocess/src/libevent_ssl_socket.cpp#L827 | SSL sockets CLOEXEC to prevent leaks.  This happens on socket create (because we duplicate the FD to control lifetime separately) and accept. |
| https://github.com/apache/mesos/blob/17a1e58d3f48d866ac5132cc28b2f33c2e287aac/3rdparty/libprocess/src/poll_socket.cpp#L69 | Poll sockets CLOEXEC to prevent leaks.  This happens on accept. |
| https://github.com/apache/mesos/blob/17a1e58d3f48d866ac5132cc28b2f33c2e287aac/3rdparty/stout/include/stout/os/posix/write.hpp#L66-L69 | A {{os::write}} helper CLOEXECs FDs to prevent leaks. |
| https://github.com/apache/mesos/blob/17a1e58d3f48d866ac5132cc28b2f33c2e287aac/3rdparty/stout/include/stout/os/open.hpp#L63-L81 | {{os::open}} has some CLOEXEC logic, but doesn't CLOEXEC by default. |
| All calls to {{os::open}} except in:
  https://github.com/apache/mesos/blob/17a1e58d3f48d866ac5132cc28b2f33c2e287aac/3rdparty/stout/include/stout/os/sunos.hpp
  https://github.com/apache/mesos/blob/17a1e58d3f48d866ac5132cc28b2f33c2e287aac/3rdparty/stout/include/stout/os/touch.hpp#L34
  A couple tests | Except for some Solaris code (possibly not supported anymore) and {{os::touch}} (which opens and immediately closes an FD; this appears to be a possible leak), we CLOEXEC everywhere outside of tests to prevent leaks. |

> `os::cloexec` does not exist on Windows
> ---------------------------------------
>
>                 Key: MESOS-5882
>                 URL: https://issues.apache.org/jira/browse/MESOS-5882
>             Project: Mesos
>          Issue Type: Bug
>          Components: stout
>            Reporter: Alex Clemmer
>            Assignee: Alex Clemmer
>              Labels: mesosphere, stout
>
> `os::cloexec` does not work on Windows. It will never work at the OS level. Because of this, there are likely many important and hard-to-detect bugs hanging around the agent.
> This is extremely important to fix. Some possible solutions to investigate (some of which are _extremely_ risky):
> * Abstract out file descriptors into a class, implement cloexec in that class on Windows (since we can't rely on the OS to do it).
> * Refactor all the code that relies on `os::cloexec` to not rely on it.
> Of the two, the first seems less risky in the short term, because the cloexec code only affects Windows. Depending on the semantics of the implementation of the `FileDescriptor` class, it is possible that this is riskier to Windows in the longer term, as the semantics of `cloexec` may have subtle difference between Linux and Windows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)