You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by Benjamin Hindman <be...@berkeley.edu> on 2018/07/15 17:09:33 UTC
Review Request 67921: Bug fix for semaphore decomission "deadlock".
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67921/
-----------------------------------------------------------
Review request for mesos and Benjamin Mahler.
Bugs: MESOS-8239
https://issues.apache.org/jira/browse/MESOS-8239
Repository: mesos
Description
-------
Fixes MESOS-8239.
When using the DecomissionableLastInFirstOutFixedSizeSemaphore it's
possible that waiting threads may never be properly signaled. This bug
fix makes sure that every waiting thread gets a signal after a
decomission.
Diffs
-----
3rdparty/libprocess/src/semaphore.hpp 50501b9797894ad274eb73f74b3eed00cd719114
Diff: https://reviews.apache.org/r/67921/diff/1/
Testing
-------
make check
Thanks,
Benjamin Hindman
Re: Review Request 67921: Bug fix for semaphore decomission
"deadlock".
Posted by Mesos Reviewbot Windows <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67921/#review206091
-----------------------------------------------------------
PASS: Mesos patch 67921 was successfully built and tested.
Reviews applied: `['67921']`
All the build artifacts available at: http://dcos-win.westus.cloudapp.azure.com/artifacts/mesos-reviewbot-testing/1926/mesos-review-67921
- Mesos Reviewbot Windows
On July 15, 2018, 5:09 p.m., Benjamin Hindman wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67921/
> -----------------------------------------------------------
>
> (Updated July 15, 2018, 5:09 p.m.)
>
>
> Review request for mesos and Benjamin Mahler.
>
>
> Bugs: MESOS-8239
> https://issues.apache.org/jira/browse/MESOS-8239
>
>
> Repository: mesos
>
>
> Description
> -------
>
> Fixes MESOS-8239.
>
> When using the DecomissionableLastInFirstOutFixedSizeSemaphore it's
> possible that waiting threads may never be properly signaled. This bug
> fix makes sure that every waiting thread gets a signal after a
> decomission.
>
>
> Diffs
> -----
>
> 3rdparty/libprocess/src/semaphore.hpp 50501b9797894ad274eb73f74b3eed00cd719114
>
>
> Diff: https://reviews.apache.org/r/67921/diff/1/
>
>
> Testing
> -------
>
> make check
>
>
> Thanks,
>
> Benjamin Hindman
>
>
Re: Review Request 67921: Bug fix for semaphore decomission
"deadlock".
Posted by Mesos Reviewbot <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67921/#review213006
-----------------------------------------------------------
Patch looks great!
Reviews applied: [67921]
Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' CONFIGURATION='--verbose --disable-libtool-wrappers --disable-parallel-test-execution' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1'; ./support/docker-build.sh
- Mesos Reviewbot
On July 15, 2018, 5:09 p.m., Benjamin Hindman wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67921/
> -----------------------------------------------------------
>
> (Updated July 15, 2018, 5:09 p.m.)
>
>
> Review request for mesos and Benjamin Mahler.
>
>
> Bugs: MESOS-8239
> https://issues.apache.org/jira/browse/MESOS-8239
>
>
> Repository: mesos
>
>
> Description
> -------
>
> Fixes MESOS-8239.
>
> When using the DecomissionableLastInFirstOutFixedSizeSemaphore it's
> possible that waiting threads may never be properly signaled. This bug
> fix makes sure that every waiting thread gets a signal after a
> decomission.
>
>
> Diffs
> -----
>
> 3rdparty/libprocess/src/semaphore.hpp 50501b9797894ad274eb73f74b3eed00cd719114
>
>
> Diff: https://reviews.apache.org/r/67921/diff/1/
>
>
> Testing
> -------
>
> make check
>
>
> Thanks,
>
> Benjamin Hindman
>
>
Re: Review Request 67921: Bug fix for semaphore decomission
"deadlock".
Posted by Benjamin Mahler <bm...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67921/#review206144
-----------------------------------------------------------
3rdparty/libprocess/src/semaphore.hpp
Lines 267-274 (original), 267-288 (patched)
<https://reviews.apache.org/r/67921/#comment289046>
Despite the comment below, it looks like we do want to keep some of the if condition re-structuring here but ideally in a separate patch? That would help make the fix simple and clearer (i.e. decommission doesn't "finish" because a new wait() can come in and grab a signal since it's not FIFO, therefore we now loop to completion).
3rdparty/libprocess/src/semaphore.hpp
Lines 379-386 (original), 393-400 (patched)
<https://reviews.apache.org/r/67921/#comment289045>
As discussed offline, the fix seems a little odd in its current place, since signal was doing its job correctly: it's not a FIFO semaphore, so someone can arrive and steal a signal before an existing waiter.
So, maybe we put the burden on decommission to signal until it's definitely finished? E.g.
```
void decomission()
{
comissioned.store(false);
while (waiters.load() > 0) {
signal();
}
}
```
- Benjamin Mahler
On July 15, 2018, 5:09 p.m., Benjamin Hindman wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67921/
> -----------------------------------------------------------
>
> (Updated July 15, 2018, 5:09 p.m.)
>
>
> Review request for mesos and Benjamin Mahler.
>
>
> Bugs: MESOS-8239
> https://issues.apache.org/jira/browse/MESOS-8239
>
>
> Repository: mesos
>
>
> Description
> -------
>
> Fixes MESOS-8239.
>
> When using the DecomissionableLastInFirstOutFixedSizeSemaphore it's
> possible that waiting threads may never be properly signaled. This bug
> fix makes sure that every waiting thread gets a signal after a
> decomission.
>
>
> Diffs
> -----
>
> 3rdparty/libprocess/src/semaphore.hpp 50501b9797894ad274eb73f74b3eed00cd719114
>
>
> Diff: https://reviews.apache.org/r/67921/diff/1/
>
>
> Testing
> -------
>
> make check
>
>
> Thanks,
>
> Benjamin Hindman
>
>
Re: Review Request 67921: Bug fix for semaphore decomission
"deadlock".
Posted by Dario Rexin <da...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67921/#review206111
-----------------------------------------------------------
Ship it!
Ship It!
- Dario Rexin
On July 15, 2018, 5:09 p.m., Benjamin Hindman wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67921/
> -----------------------------------------------------------
>
> (Updated July 15, 2018, 5:09 p.m.)
>
>
> Review request for mesos and Benjamin Mahler.
>
>
> Bugs: MESOS-8239
> https://issues.apache.org/jira/browse/MESOS-8239
>
>
> Repository: mesos
>
>
> Description
> -------
>
> Fixes MESOS-8239.
>
> When using the DecomissionableLastInFirstOutFixedSizeSemaphore it's
> possible that waiting threads may never be properly signaled. This bug
> fix makes sure that every waiting thread gets a signal after a
> decomission.
>
>
> Diffs
> -----
>
> 3rdparty/libprocess/src/semaphore.hpp 50501b9797894ad274eb73f74b3eed00cd719114
>
>
> Diff: https://reviews.apache.org/r/67921/diff/1/
>
>
> Testing
> -------
>
> make check
>
>
> Thanks,
>
> Benjamin Hindman
>
>