You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by Greg Mann <gr...@mesosphere.io> on 2020/04/11 05:21:56 UTC
Review Request 72354: Fixed libevent SSL socket shutdown race
condition.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72354/
-----------------------------------------------------------
Review request for mesos, Andrei Sekretenko and Benjamin Mahler.
Bugs: MESOS-10111
https://issues.apache.org/jira/browse/MESOS-10111
Repository: mesos
Description
-------
This fixes an issue where the functions `shutdown()` and
`event_callback()` race to access the bufferevent held by
our libevent SSL socket implementation, leading to a
CHECK failure.
Diffs
-----
3rdparty/libprocess/src/posix/libevent/libevent_ssl_socket.cpp dcb6d8e6c82005145c853afa9c24a61d7d0f04a9
Diff: https://reviews.apache.org/r/72354/diff/1/
Testing
-------
This fix is tested in https://reviews.apache.org/r/72355/, though it's likely the test code will not be merged since it involves unsightly modifications to the socket interface.
Thanks,
Greg Mann
Re: Review Request 72354: Fixed libevent SSL socket shutdown race
condition.
Posted by Greg Mann <gr...@mesosphere.io>.
> On April 13, 2020, 5:20 p.m., Benjamin Mahler wrote:
> > Perhaps describing an example of such a race in the description would be helpful for posterity? Ideally the one we encountered in practice with the check failure?
Good call, updated.
- Greg
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72354/#review220299
-----------------------------------------------------------
On April 13, 2020, 8:11 p.m., Greg Mann wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72354/
> -----------------------------------------------------------
>
> (Updated April 13, 2020, 8:11 p.m.)
>
>
> Review request for mesos, Andrei Sekretenko and Benjamin Mahler.
>
>
> Bugs: MESOS-10111
> https://issues.apache.org/jira/browse/MESOS-10111
>
>
> Repository: mesos
>
>
> Description
> -------
>
> This fixes an issue where the functions `shutdown()` and
> `event_callback()` race to access the bufferevent held by
> our libevent SSL socket implementation, leading to a
> CHECK failure.
>
> This race resulted in MESOS-10111, where multiple rapid
> changes in ZK membership led to one master re-linking to
> another multiple times in RECONNECT mode. This causes
> `shutdown()` to be called on the existing socket while
> it's attempting a connection, at which point a failure to
> connect can produce the CHECK failure.
>
>
> Diffs
> -----
>
> 3rdparty/libprocess/src/posix/libevent/libevent_ssl_socket.cpp dcb6d8e6c82005145c853afa9c24a61d7d0f04a9
>
>
> Diff: https://reviews.apache.org/r/72354/diff/1/
>
>
> Testing
> -------
>
> This fix is tested in https://reviews.apache.org/r/72355/, though it's likely the test code will not be merged since it involves unsightly modifications to the socket interface.
>
>
> Thanks,
>
> Greg Mann
>
>
Re: Review Request 72354: Fixed libevent SSL socket shutdown race
condition.
Posted by Benjamin Mahler <bm...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72354/#review220299
-----------------------------------------------------------
Ship it!
Perhaps describing an example of such a race in the description would be helpful for posterity? Ideally the one we encountered in practice with the check failure?
- Benjamin Mahler
On April 11, 2020, 5:21 a.m., Greg Mann wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72354/
> -----------------------------------------------------------
>
> (Updated April 11, 2020, 5:21 a.m.)
>
>
> Review request for mesos, Andrei Sekretenko and Benjamin Mahler.
>
>
> Bugs: MESOS-10111
> https://issues.apache.org/jira/browse/MESOS-10111
>
>
> Repository: mesos
>
>
> Description
> -------
>
> This fixes an issue where the functions `shutdown()` and
> `event_callback()` race to access the bufferevent held by
> our libevent SSL socket implementation, leading to a
> CHECK failure.
>
>
> Diffs
> -----
>
> 3rdparty/libprocess/src/posix/libevent/libevent_ssl_socket.cpp dcb6d8e6c82005145c853afa9c24a61d7d0f04a9
>
>
> Diff: https://reviews.apache.org/r/72354/diff/1/
>
>
> Testing
> -------
>
> This fix is tested in https://reviews.apache.org/r/72355/, though it's likely the test code will not be merged since it involves unsightly modifications to the socket interface.
>
>
> Thanks,
>
> Greg Mann
>
>
Re: Review Request 72354: Fixed libevent SSL socket shutdown race
condition.
Posted by Greg Mann <gr...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72354/
-----------------------------------------------------------
(Updated April 13, 2020, 8:11 p.m.)
Review request for mesos, Andrei Sekretenko and Benjamin Mahler.
Bugs: MESOS-10111
https://issues.apache.org/jira/browse/MESOS-10111
Repository: mesos
Description (updated)
-------
This fixes an issue where the functions `shutdown()` and
`event_callback()` race to access the bufferevent held by
our libevent SSL socket implementation, leading to a
CHECK failure.
This race resulted in MESOS-10111, where multiple rapid
changes in ZK membership led to one master re-linking to
another multiple times in RECONNECT mode. This causes
`shutdown()` to be called on the existing socket while
it's attempting a connection, at which point a failure to
connect can produce the CHECK failure.
Diffs
-----
3rdparty/libprocess/src/posix/libevent/libevent_ssl_socket.cpp dcb6d8e6c82005145c853afa9c24a61d7d0f04a9
Diff: https://reviews.apache.org/r/72354/diff/1/
Testing
-------
This fix is tested in https://reviews.apache.org/r/72355/, though it's likely the test code will not be merged since it involves unsightly modifications to the socket interface.
Thanks,
Greg Mann