You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by Alexander Rukletsov <ru...@gmail.com> on 2017/08/15 15:45:50 UTC

Review Request 61664: Libprocess: Added a timeout for send socket operation.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/61664/
-----------------------------------------------------------

Review request for mesos, Benno Evers, Benjamin Mahler, Jie Yu, and Vinod Kone.


Bugs: MESOS-7748
    https://issues.apache.org/jira/browse/MESOS-7748


Repository: mesos


Description
-------

Prior to this patch, a send socket operation can wait forever for
a send to complete. Clients that drop connections or stop reading
incoming data, aka "slow reader" attack, can eventually exhaust the
resources of a libprocess-based application and cause denial of
service or an OOM event.

This patch introduces an obligatory timeout for all send socket
operations, after which the stalled connection is dropped. The
timeout can be adjusted via the `LIBPROCESS_SOCKET_SEND_TIMEOUT`
environment variable.


Diffs
-----

  3rdparty/libprocess/src/process.cpp dcd9c6738816764aae066fe56cd5f468c98fc9bd 


Diff: https://reviews.apache.org/r/61664/diff/1/


Testing
-------

Manual testing with a rogue client.


Thanks,

Alexander Rukletsov


Re: Review Request 61664: Libprocess: Added a timeout for send socket operation.

Posted by Mesos Reviewbot Windows <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/61664/#review182955
-----------------------------------------------------------



Patch looks great!

Reviews applied: [61664]

Passed command: support\windows-build.bat

- Mesos Reviewbot Windows


On Aug. 15, 2017, 3:45 p.m., Alexander Rukletsov wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/61664/
> -----------------------------------------------------------
> 
> (Updated Aug. 15, 2017, 3:45 p.m.)
> 
> 
> Review request for mesos, Benno Evers, Benjamin Mahler, Jie Yu, and Vinod Kone.
> 
> 
> Bugs: MESOS-7748
>     https://issues.apache.org/jira/browse/MESOS-7748
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Prior to this patch, a send socket operation can wait forever for
> a send to complete. Clients that drop connections or stop reading
> incoming data, aka "slow reader" attack, can eventually exhaust the
> resources of a libprocess-based application and cause denial of
> service or an OOM event.
> 
> This patch introduces an obligatory timeout for all send socket
> operations, after which the stalled connection is dropped. The
> timeout can be adjusted via the `LIBPROCESS_SOCKET_SEND_TIMEOUT`
> environment variable.
> 
> 
> Diffs
> -----
> 
>   3rdparty/libprocess/src/process.cpp dcd9c6738816764aae066fe56cd5f468c98fc9bd 
> 
> 
> Diff: https://reviews.apache.org/r/61664/diff/1/
> 
> 
> Testing
> -------
> 
> Manual testing with a rogue client.
> 
> 
> Thanks,
> 
> Alexander Rukletsov
> 
>


Re: Review Request 61664: Libprocess: Added a timeout for send socket operation.

Posted by Benjamin Mahler <bm...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/61664/#review183390
-----------------------------------------------------------



Thanks Alex! A couple of things of note:

(1) Looks like the libevent ssl socket does not support send discards? Can you implement that?
(2) Can we safely add a default here given the master's health checking and http heartbeating? If users set this flag, does it conflict with the master's timeouts?

See below for details!


3rdparty/libprocess/src/process.cpp
Lines 251 (patched)
<https://reviews.apache.org/r/61664/#comment259417>

    Maybe http_socket_send_timeout so that we can namespace all http related configuration together?



3rdparty/libprocess/src/process.cpp
Lines 252-255 (patched)
<https://reviews.apache.org/r/61664/#comment259416>

    Hm.. how about clarifying that this relates to the clients and that the connection will be closed?
    
    ```
    Timeout between successive writes (send / sendfile) when transmitting a response to a client. If the client does not make progress within this time, the connection is closed.
    ```



3rdparty/libprocess/src/process.cpp
Lines 256-261 (patched)
<https://reviews.apache.org/r/61664/#comment259415>

    I don't think we can choose a default for this without conflicting with the master's health checking of agents (and http schedulers?), no?
    
    On that note, what impact does setting this flag have on existing functionality that has related timeouts (i.e. the master's health checking, and http scheduler heartbeats?)



3rdparty/libprocess/src/process.cpp
Lines 2183-2188 (patched)
<https://reviews.apache.org/r/61664/#comment259413>

    I did an audit of the poll_socket.cpp path and it handles discards correctly FWICT, however the libevent_ssl_socket.cpp path does not :(
    
    Before this patch, can you implement discard in the libevent ssl socket code?
    
    As an aside, I'm still hoping to remove libevent from the project and have a single event loop choice since libev is a better designed library for our needs (it looks like nghttp2 may make this an easier task). :)



3rdparty/libprocess/src/process.cpp
Lines 2187 (patched)
<https://reviews.apache.org/r/61664/#comment259414>

    Hm.. this should be returning the future itself:
    
    ```
      auto discard = [](Future<size_t> future) {
        future.discard();
        return future;
      };
    ```
    
    That way, you don't mask the underlying result. It's possible we try to discard but it's already returned some length, in which case we should just continue sending. Or, it's possible that some other failure occurred and we mask it. Also, if the discard is buggy and hangs, I would not want to mask that issue by proceeding immediately here.
    
    Then, in `_send`, we handle the DISCARDED case as triggering the timeout. To clean it up a bit, you could then use `.recover` that benh is introducing to convert DISCARDED to the failure message you want (but for now it seems ok to have the timeout message within `_send`).


- Benjamin Mahler


On Aug. 15, 2017, 3:45 p.m., Alexander Rukletsov wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/61664/
> -----------------------------------------------------------
> 
> (Updated Aug. 15, 2017, 3:45 p.m.)
> 
> 
> Review request for mesos, Benno Evers, Benjamin Mahler, Jie Yu, and Vinod Kone.
> 
> 
> Bugs: MESOS-7748
>     https://issues.apache.org/jira/browse/MESOS-7748
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Prior to this patch, a send socket operation can wait forever for
> a send to complete. Clients that drop connections or stop reading
> incoming data, aka "slow reader" attack, can eventually exhaust the
> resources of a libprocess-based application and cause denial of
> service or an OOM event.
> 
> This patch introduces an obligatory timeout for all send socket
> operations, after which the stalled connection is dropped. The
> timeout can be adjusted via the `LIBPROCESS_SOCKET_SEND_TIMEOUT`
> environment variable.
> 
> 
> Diffs
> -----
> 
>   3rdparty/libprocess/src/process.cpp dcd9c6738816764aae066fe56cd5f468c98fc9bd 
> 
> 
> Diff: https://reviews.apache.org/r/61664/diff/1/
> 
> 
> Testing
> -------
> 
> Manual testing with a rogue client.
> 
> 
> Thanks,
> 
> Alexander Rukletsov
> 
>


Re: Review Request 61664: Libprocess: Added a timeout for send socket operation.

Posted by Benno Evers <be...@yandex-team.ru>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/61664/#review183125
-----------------------------------------------------------




3rdparty/libprocess/src/process.cpp
Lines 254 (patched)
<https://reviews.apache.org/r/61664/#comment259134>

    I think in the case where the client drops the connection, the send will fail anyways before the timeout with `ECONNRESET` (unless I misunderstand what you mean by dropping a connection)



3rdparty/libprocess/src/process.cpp
Lines 261 (patched)
<https://reviews.apache.org/r/61664/#comment259131>

    If you're hardcoding the value anyways, you could just make the "10" a part of the string. But probably it's better to make these named constants instead.
    
    Also, I'm not sure that we want to prohibit low values: Presumably someone who wants to set a non-default value knows what he's doing, so maybe a warning is enough here.



3rdparty/libprocess/src/process.cpp
Lines 2188 (patched)
<https://reviews.apache.org/r/61664/#comment259138>

    I'm not sure if I completely understand how SocketManager works here, but it looks like this does not reset the underlying TCP connection. So in case of an accidental very long network delay, does the intended receiver actually notice that the request was discarded and retry?
    
    If not, maybe we can think about setting the `SO_RCVTIMEO` and `SO_SNDTIMEO` socket options to enforce this timeout at the OS level.


- Benno Evers


On Aug. 15, 2017, 3:45 p.m., Alexander Rukletsov wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/61664/
> -----------------------------------------------------------
> 
> (Updated Aug. 15, 2017, 3:45 p.m.)
> 
> 
> Review request for mesos, Benno Evers, Benjamin Mahler, Jie Yu, and Vinod Kone.
> 
> 
> Bugs: MESOS-7748
>     https://issues.apache.org/jira/browse/MESOS-7748
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Prior to this patch, a send socket operation can wait forever for
> a send to complete. Clients that drop connections or stop reading
> incoming data, aka "slow reader" attack, can eventually exhaust the
> resources of a libprocess-based application and cause denial of
> service or an OOM event.
> 
> This patch introduces an obligatory timeout for all send socket
> operations, after which the stalled connection is dropped. The
> timeout can be adjusted via the `LIBPROCESS_SOCKET_SEND_TIMEOUT`
> environment variable.
> 
> 
> Diffs
> -----
> 
>   3rdparty/libprocess/src/process.cpp dcd9c6738816764aae066fe56cd5f468c98fc9bd 
> 
> 
> Diff: https://reviews.apache.org/r/61664/diff/1/
> 
> 
> Testing
> -------
> 
> Manual testing with a rogue client.
> 
> 
> Thanks,
> 
> Alexander Rukletsov
> 
>


Re: Review Request 61664: Libprocess: Added a timeout for send socket operation.

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/61664/#review183002
-----------------------------------------------------------




3rdparty/libprocess/src/process.cpp
Lines 256-261 (patched)
<https://reviews.apache.org/r/61664/#comment258985>

    How did you pick these values?



3rdparty/libprocess/src/process.cpp
Lines 2184 (patched)
<https://reviews.apache.org/r/61664/#comment258986>

    s/, see/; see/



3rdparty/libprocess/src/process.cpp
Lines 2170-2182 (original), 2195-2209 (patched)
<https://reviews.apache.org/r/61664/#comment258989>

    I guess this means if writing a very large message takes greater than the timeout we abort? Do you have a rought back of the envelope calculation for what the max data size could be given the minimum timeout?


- Vinod Kone


On Aug. 15, 2017, 3:45 p.m., Alexander Rukletsov wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/61664/
> -----------------------------------------------------------
> 
> (Updated Aug. 15, 2017, 3:45 p.m.)
> 
> 
> Review request for mesos, Benno Evers, Benjamin Mahler, Jie Yu, and Vinod Kone.
> 
> 
> Bugs: MESOS-7748
>     https://issues.apache.org/jira/browse/MESOS-7748
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Prior to this patch, a send socket operation can wait forever for
> a send to complete. Clients that drop connections or stop reading
> incoming data, aka "slow reader" attack, can eventually exhaust the
> resources of a libprocess-based application and cause denial of
> service or an OOM event.
> 
> This patch introduces an obligatory timeout for all send socket
> operations, after which the stalled connection is dropped. The
> timeout can be adjusted via the `LIBPROCESS_SOCKET_SEND_TIMEOUT`
> environment variable.
> 
> 
> Diffs
> -----
> 
>   3rdparty/libprocess/src/process.cpp dcd9c6738816764aae066fe56cd5f468c98fc9bd 
> 
> 
> Diff: https://reviews.apache.org/r/61664/diff/1/
> 
> 
> Testing
> -------
> 
> Manual testing with a rogue client.
> 
> 
> Thanks,
> 
> Alexander Rukletsov
> 
>