You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by Greg Mann <gr...@mesosphere.io> on 2020/04/11 05:21:56 UTC

Review Request 72354: Fixed libevent SSL socket shutdown race condition.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72354/
-----------------------------------------------------------

Review request for mesos, Andrei Sekretenko and Benjamin Mahler.


Bugs: MESOS-10111
    https://issues.apache.org/jira/browse/MESOS-10111


Repository: mesos


Description
-------

This fixes an issue where the functions `shutdown()` and
`event_callback()` race to access the bufferevent held by
our libevent SSL socket implementation, leading to a
CHECK failure.


Diffs
-----

  3rdparty/libprocess/src/posix/libevent/libevent_ssl_socket.cpp dcb6d8e6c82005145c853afa9c24a61d7d0f04a9 


Diff: https://reviews.apache.org/r/72354/diff/1/


Testing
-------

This fix is tested in https://reviews.apache.org/r/72355/, though it's likely the test code will not be merged since it involves unsightly modifications to the socket interface.


Thanks,

Greg Mann


Re: Review Request 72354: Fixed libevent SSL socket shutdown race condition.

Posted by Greg Mann <gr...@mesosphere.io>.

> On April 13, 2020, 5:20 p.m., Benjamin Mahler wrote:
> > Perhaps describing an example of such a race in the description would be helpful for posterity? Ideally the one we encountered in practice with the check failure?

Good call, updated.


- Greg


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72354/#review220299
-----------------------------------------------------------


On April 13, 2020, 8:11 p.m., Greg Mann wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72354/
> -----------------------------------------------------------
> 
> (Updated April 13, 2020, 8:11 p.m.)
> 
> 
> Review request for mesos, Andrei Sekretenko and Benjamin Mahler.
> 
> 
> Bugs: MESOS-10111
>     https://issues.apache.org/jira/browse/MESOS-10111
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This fixes an issue where the functions `shutdown()` and
> `event_callback()` race to access the bufferevent held by
> our libevent SSL socket implementation, leading to a
> CHECK failure.
> 
> This race resulted in MESOS-10111, where multiple rapid
> changes in ZK membership led to one master re-linking to
> another multiple times in RECONNECT mode. This causes
> `shutdown()` to be called on the existing socket while
> it's attempting a connection, at which point a failure to
> connect can produce the CHECK failure.
> 
> 
> Diffs
> -----
> 
>   3rdparty/libprocess/src/posix/libevent/libevent_ssl_socket.cpp dcb6d8e6c82005145c853afa9c24a61d7d0f04a9 
> 
> 
> Diff: https://reviews.apache.org/r/72354/diff/1/
> 
> 
> Testing
> -------
> 
> This fix is tested in https://reviews.apache.org/r/72355/, though it's likely the test code will not be merged since it involves unsightly modifications to the socket interface.
> 
> 
> Thanks,
> 
> Greg Mann
> 
>


Re: Review Request 72354: Fixed libevent SSL socket shutdown race condition.

Posted by Benjamin Mahler <bm...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72354/#review220299
-----------------------------------------------------------


Ship it!




Perhaps describing an example of such a race in the description would be helpful for posterity? Ideally the one we encountered in practice with the check failure?

- Benjamin Mahler


On April 11, 2020, 5:21 a.m., Greg Mann wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72354/
> -----------------------------------------------------------
> 
> (Updated April 11, 2020, 5:21 a.m.)
> 
> 
> Review request for mesos, Andrei Sekretenko and Benjamin Mahler.
> 
> 
> Bugs: MESOS-10111
>     https://issues.apache.org/jira/browse/MESOS-10111
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This fixes an issue where the functions `shutdown()` and
> `event_callback()` race to access the bufferevent held by
> our libevent SSL socket implementation, leading to a
> CHECK failure.
> 
> 
> Diffs
> -----
> 
>   3rdparty/libprocess/src/posix/libevent/libevent_ssl_socket.cpp dcb6d8e6c82005145c853afa9c24a61d7d0f04a9 
> 
> 
> Diff: https://reviews.apache.org/r/72354/diff/1/
> 
> 
> Testing
> -------
> 
> This fix is tested in https://reviews.apache.org/r/72355/, though it's likely the test code will not be merged since it involves unsightly modifications to the socket interface.
> 
> 
> Thanks,
> 
> Greg Mann
> 
>


Re: Review Request 72354: Fixed libevent SSL socket shutdown race condition.

Posted by Greg Mann <gr...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72354/
-----------------------------------------------------------

(Updated April 13, 2020, 8:11 p.m.)


Review request for mesos, Andrei Sekretenko and Benjamin Mahler.


Bugs: MESOS-10111
    https://issues.apache.org/jira/browse/MESOS-10111


Repository: mesos


Description (updated)
-------

This fixes an issue where the functions `shutdown()` and
`event_callback()` race to access the bufferevent held by
our libevent SSL socket implementation, leading to a
CHECK failure.

This race resulted in MESOS-10111, where multiple rapid
changes in ZK membership led to one master re-linking to
another multiple times in RECONNECT mode. This causes
`shutdown()` to be called on the existing socket while
it's attempting a connection, at which point a failure to
connect can produce the CHECK failure.


Diffs
-----

  3rdparty/libprocess/src/posix/libevent/libevent_ssl_socket.cpp dcb6d8e6c82005145c853afa9c24a61d7d0f04a9 


Diff: https://reviews.apache.org/r/72354/diff/1/


Testing
-------

This fix is tested in https://reviews.apache.org/r/72355/, though it's likely the test code will not be merged since it involves unsightly modifications to the socket interface.


Thanks,

Greg Mann