You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by Benjamin Hindman <be...@berkeley.edu> on 2018/07/15 17:09:33 UTC

Review Request 67921: Bug fix for semaphore decomission "deadlock".

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67921/
-----------------------------------------------------------

Review request for mesos and Benjamin Mahler.


Bugs: MESOS-8239
    https://issues.apache.org/jira/browse/MESOS-8239


Repository: mesos


Description
-------

Fixes MESOS-8239.

When using the DecomissionableLastInFirstOutFixedSizeSemaphore it's
possible that waiting threads may never be properly signaled. This bug
fix makes sure that every waiting thread gets a signal after a
decomission.


Diffs
-----

  3rdparty/libprocess/src/semaphore.hpp 50501b9797894ad274eb73f74b3eed00cd719114 


Diff: https://reviews.apache.org/r/67921/diff/1/


Testing
-------

make check


Thanks,

Benjamin Hindman


Re: Review Request 67921: Bug fix for semaphore decomission "deadlock".

Posted by Mesos Reviewbot Windows <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67921/#review206091
-----------------------------------------------------------



PASS: Mesos patch 67921 was successfully built and tested.

Reviews applied: `['67921']`

All the build artifacts available at: http://dcos-win.westus.cloudapp.azure.com/artifacts/mesos-reviewbot-testing/1926/mesos-review-67921

- Mesos Reviewbot Windows


On July 15, 2018, 5:09 p.m., Benjamin Hindman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67921/
> -----------------------------------------------------------
> 
> (Updated July 15, 2018, 5:09 p.m.)
> 
> 
> Review request for mesos and Benjamin Mahler.
> 
> 
> Bugs: MESOS-8239
>     https://issues.apache.org/jira/browse/MESOS-8239
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Fixes MESOS-8239.
> 
> When using the DecomissionableLastInFirstOutFixedSizeSemaphore it's
> possible that waiting threads may never be properly signaled. This bug
> fix makes sure that every waiting thread gets a signal after a
> decomission.
> 
> 
> Diffs
> -----
> 
>   3rdparty/libprocess/src/semaphore.hpp 50501b9797894ad274eb73f74b3eed00cd719114 
> 
> 
> Diff: https://reviews.apache.org/r/67921/diff/1/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Benjamin Hindman
> 
>


Re: Review Request 67921: Bug fix for semaphore decomission "deadlock".

Posted by Mesos Reviewbot <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67921/#review213006
-----------------------------------------------------------



Patch looks great!

Reviews applied: [67921]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' CONFIGURATION='--verbose --disable-libtool-wrappers --disable-parallel-test-execution' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1'; ./support/docker-build.sh

- Mesos Reviewbot


On July 15, 2018, 5:09 p.m., Benjamin Hindman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67921/
> -----------------------------------------------------------
> 
> (Updated July 15, 2018, 5:09 p.m.)
> 
> 
> Review request for mesos and Benjamin Mahler.
> 
> 
> Bugs: MESOS-8239
>     https://issues.apache.org/jira/browse/MESOS-8239
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Fixes MESOS-8239.
> 
> When using the DecomissionableLastInFirstOutFixedSizeSemaphore it's
> possible that waiting threads may never be properly signaled. This bug
> fix makes sure that every waiting thread gets a signal after a
> decomission.
> 
> 
> Diffs
> -----
> 
>   3rdparty/libprocess/src/semaphore.hpp 50501b9797894ad274eb73f74b3eed00cd719114 
> 
> 
> Diff: https://reviews.apache.org/r/67921/diff/1/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Benjamin Hindman
> 
>


Re: Review Request 67921: Bug fix for semaphore decomission "deadlock".

Posted by Benjamin Mahler <bm...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67921/#review206144
-----------------------------------------------------------




3rdparty/libprocess/src/semaphore.hpp
Lines 267-274 (original), 267-288 (patched)
<https://reviews.apache.org/r/67921/#comment289046>

    Despite the comment below, it looks like we do want to keep some of the if condition re-structuring here but ideally in a separate patch? That would help make the fix simple and clearer (i.e. decommission doesn't "finish" because a new wait() can come in and grab a signal since it's not FIFO, therefore we now loop to completion).



3rdparty/libprocess/src/semaphore.hpp
Lines 379-386 (original), 393-400 (patched)
<https://reviews.apache.org/r/67921/#comment289045>

    As discussed offline, the fix seems a little odd in its current place, since signal was doing its job correctly: it's not a FIFO semaphore, so someone can arrive and steal a signal before an existing waiter.
    
    So, maybe we put the burden on decommission to signal until it's definitely finished? E.g.
    
    ```
      void decomission()
      {
        comissioned.store(false);
    
        while (waiters.load() > 0) {
          signal();
        }
      }
    ```


- Benjamin Mahler


On July 15, 2018, 5:09 p.m., Benjamin Hindman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67921/
> -----------------------------------------------------------
> 
> (Updated July 15, 2018, 5:09 p.m.)
> 
> 
> Review request for mesos and Benjamin Mahler.
> 
> 
> Bugs: MESOS-8239
>     https://issues.apache.org/jira/browse/MESOS-8239
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Fixes MESOS-8239.
> 
> When using the DecomissionableLastInFirstOutFixedSizeSemaphore it's
> possible that waiting threads may never be properly signaled. This bug
> fix makes sure that every waiting thread gets a signal after a
> decomission.
> 
> 
> Diffs
> -----
> 
>   3rdparty/libprocess/src/semaphore.hpp 50501b9797894ad274eb73f74b3eed00cd719114 
> 
> 
> Diff: https://reviews.apache.org/r/67921/diff/1/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Benjamin Hindman
> 
>


Re: Review Request 67921: Bug fix for semaphore decomission "deadlock".

Posted by Dario Rexin <da...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67921/#review206111
-----------------------------------------------------------


Ship it!




Ship It!

- Dario Rexin


On July 15, 2018, 5:09 p.m., Benjamin Hindman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67921/
> -----------------------------------------------------------
> 
> (Updated July 15, 2018, 5:09 p.m.)
> 
> 
> Review request for mesos and Benjamin Mahler.
> 
> 
> Bugs: MESOS-8239
>     https://issues.apache.org/jira/browse/MESOS-8239
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Fixes MESOS-8239.
> 
> When using the DecomissionableLastInFirstOutFixedSizeSemaphore it's
> possible that waiting threads may never be properly signaled. This bug
> fix makes sure that every waiting thread gets a signal after a
> decomission.
> 
> 
> Diffs
> -----
> 
>   3rdparty/libprocess/src/semaphore.hpp 50501b9797894ad274eb73f74b3eed00cd719114 
> 
> 
> Diff: https://reviews.apache.org/r/67921/diff/1/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Benjamin Hindman
> 
>