You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by Alexander Rojas <al...@mesosphere.io> on 2017/01/16 14:32:31 UTC

Review Request 55576: Fixes FutureTest.After3 flakiness.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55576/
-----------------------------------------------------------

Review request for mesos, Benjamin Bannier, Benjamin Hindman, and Joris Van Remoortere.


Bugs: MESOS-6907
    https://issues.apache.org/jira/browse/MESOS-6907


Repository: mesos


Description
-------

Fixed a  flakiness in `FutureTest.After3` related a timer not being destroyed
on time by libprocess before verifying the expectations.


Diffs
-----

  3rdparty/libprocess/src/tests/future_tests.cpp 380755f9a2329d548969cfb2332c79aacbf7fff2 

Diff: https://reviews.apache.org/r/55576/diff/


Testing
-------

```sh
# Put the machine under stress.
$ stress -c 4 -t 2600 -d 2 -i 2 &
$ ./${MESOS_SRC_ROOT}/build/3rdparty/libprocess/libprocess-tests \
      --gtest_filter="FutureTest.After3" --gtest_repeat=1000000 --gtest_break_on_failure
$ ./${MESOS_SRC_ROOT}/build/build/src/mesos-tests
```


Thanks,

Alexander Rojas


Re: Review Request 55576: Fixes FutureTest.After3 flakiness.

Posted by Mesos ReviewBot <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55576/#review161752
-----------------------------------------------------------



Patch looks great!

Reviews applied: [55576]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' CONFIGURATION='--verbose' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1'; ./support/docker-build.sh

- Mesos ReviewBot


On Jan. 16, 2017, 2:34 p.m., Alexander Rojas wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55576/
> -----------------------------------------------------------
> 
> (Updated Jan. 16, 2017, 2:34 p.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Benjamin Hindman, and Joris Van Remoortere.
> 
> 
> Bugs: MESOS-6907
>     https://issues.apache.org/jira/browse/MESOS-6907
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Fixed a flakiness in `FutureTest.After3` related a timer not being
> destroyed on time by libprocess before verifying the expectations.
> 
> 
> Diffs
> -----
> 
>   3rdparty/libprocess/src/tests/future_tests.cpp 380755f9a2329d548969cfb2332c79aacbf7fff2 
> 
> Diff: https://reviews.apache.org/r/55576/diff/
> 
> 
> Testing
> -------
> 
> ```sh
> # Put the machine under stress.
> $ stress -c 4 -t 2600 -d 2 -i 2 &
> $ ./${MESOS_SRC_ROOT}/build/3rdparty/libprocess/libprocess-tests \
>       --gtest_filter="FutureTest.After3" --gtest_repeat=1000000 --gtest_break_on_failure
> $ ./${MESOS_SRC_ROOT}/build/build/src/mesos-tests
> ```
> 
> 
> Thanks,
> 
> Alexander Rojas
> 
>


Re: Review Request 55576: Fixes FutureTest.After3 flakiness.

Posted by Alexander Rojas <al...@mesosphere.io>.

> On Feb. 21, 2017, 9:34 p.m., Benjamin Mahler wrote:
> > 3rdparty/libprocess/src/clock.cpp, line 188
> > <https://reviews.apache.org/r/55576/diff/3/?file=1640537#file1640537line188>
> >
> >     Can you pull this out into a separate patch?

Broken down with first part into [r/57093/](https://reviews.apache.org/r/57093/)


- Alexander


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55576/#review166216
-----------------------------------------------------------


On Feb. 21, 2017, 9:34 p.m., Alexander Rojas wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55576/
> -----------------------------------------------------------
> 
> (Updated Feb. 21, 2017, 9:34 p.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Benjamin Hindman, Benjamin Mahler, and Joris Van Remoortere.
> 
> 
> Bugs: MESOS-6907
>     https://issues.apache.org/jira/browse/MESOS-6907
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Fixed a flakiness in `FutureTest.After3` caused by the way
> libprocess processes timers.
> 
> Timers are not clear before setting `clock::settle` to 
> `false`, which make cause a race condition for callers which
> expect timers' thunks to be destroyed after the timer is
> called and the clock is settled.
> 
> The issue is addressed by explicitly calling 
> `std::list<T>::clear()` on the `timedout` timer's list
> once the timers are executed.
> 
> 
> Diffs
> -----
> 
>   3rdparty/libprocess/src/clock.cpp 6116c82377b1c2f33c5f4bc14890d35f34cc973d 
>   3rdparty/libprocess/src/tests/future_tests.cpp 380755f9a2329d548969cfb2332c79aacbf7fff2 
> 
> Diff: https://reviews.apache.org/r/55576/diff/
> 
> 
> Testing
> -------
> 
> ```sh
> # Put the machine under stress.
> $ stress -c 4 -t 2600 -d 2 -i 2 &
> $ ./${MESOS_SRC_ROOT}/build/3rdparty/libprocess/libprocess-tests \
>       --gtest_filter="FutureTest.After3" --gtest_repeat=1000000 --gtest_break_on_failure
> $ ./${MESOS_SRC_ROOT}/build/build/src/mesos-tests
> ```
> 
> 
> Thanks,
> 
> Alexander Rojas
> 
>


Re: Review Request 55576: Fixes FutureTest.After3 flakiness.

Posted by Benjamin Mahler <bm...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55576/#review166216
-----------------------------------------------------------



Looks good, thanks! Can you split the `timedout.clear()` change from the test changes? These seem orthogonal to me, and for posterity it would be nice if the change to `Clock::tick` is not described as a flaky test fix (although we can definitely reference the flaky test issue that motivated the change).


3rdparty/libprocess/src/clock.cpp (line 188)
<https://reviews.apache.org/r/55576/#comment238112>

    Can you pull this out into a separate patch?



3rdparty/libprocess/src/tests/future_tests.cpp (lines 271 - 287)
<https://reviews.apache.org/r/55576/#comment238111>

    Do we still need the additional scope here?



3rdparty/libprocess/src/tests/future_tests.cpp (line 283)
<https://reviews.apache.org/r/55576/#comment238110>

    Hm.. why is this `Seconds(2)`? Shouldn't this just be `Milliseconds(1)`?


- Benjamin Mahler


On Feb. 20, 2017, 1:41 p.m., Alexander Rojas wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55576/
> -----------------------------------------------------------
> 
> (Updated Feb. 20, 2017, 1:41 p.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Benjamin Hindman, and Joris Van Remoortere.
> 
> 
> Bugs: MESOS-6907
>     https://issues.apache.org/jira/browse/MESOS-6907
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Fixed a flakiness in `FutureTest.After3` caused by the way
> libprocess processes timers.
> 
> Timers are not clear before setting `clock::settle` to 
> `false`, which make cause a race condition for callers which
> expect timers' thunks to be destroyed after the timer is
> called and the clock is settled.
> 
> The issue is addressed by explicitly calling 
> `std::list<T>::clear()` on the `timedout` timer's list
> once the timers are executed.
> 
> 
> Diffs
> -----
> 
>   3rdparty/libprocess/src/clock.cpp 6116c82377b1c2f33c5f4bc14890d35f34cc973d 
>   3rdparty/libprocess/src/tests/future_tests.cpp 380755f9a2329d548969cfb2332c79aacbf7fff2 
> 
> Diff: https://reviews.apache.org/r/55576/diff/
> 
> 
> Testing
> -------
> 
> ```sh
> # Put the machine under stress.
> $ stress -c 4 -t 2600 -d 2 -i 2 &
> $ ./${MESOS_SRC_ROOT}/build/3rdparty/libprocess/libprocess-tests \
>       --gtest_filter="FutureTest.After3" --gtest_repeat=1000000 --gtest_break_on_failure
> $ ./${MESOS_SRC_ROOT}/build/build/src/mesos-tests
> ```
> 
> 
> Thanks,
> 
> Alexander Rojas
> 
>


Re: Review Request 55576: Fixes FutureTest.After3 flakiness.

Posted by Mesos Reviewbot <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55576/#review166107
-----------------------------------------------------------



Patch looks great!

Reviews applied: [55576]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' CONFIGURATION='--verbose' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1'; ./support/docker-build.sh

- Mesos Reviewbot


On Feb. 20, 2017, 1:41 p.m., Alexander Rojas wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55576/
> -----------------------------------------------------------
> 
> (Updated Feb. 20, 2017, 1:41 p.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Benjamin Hindman, and Joris Van Remoortere.
> 
> 
> Bugs: MESOS-6907
>     https://issues.apache.org/jira/browse/MESOS-6907
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Fixed a flakiness in `FutureTest.After3` caused by the way
> libprocess processes timers.
> 
> Timers are not clear before setting `clock::settle` to 
> `false`, which make cause a race condition for callers which
> expect timers' thunks to be destroyed after the timer is
> called and the clock is settled.
> 
> The issue is addressed by explicitly calling 
> `std::list<T>::clear()` on the `timedout` timer's list
> once the timers are executed.
> 
> 
> Diffs
> -----
> 
>   3rdparty/libprocess/src/clock.cpp 6116c82377b1c2f33c5f4bc14890d35f34cc973d 
>   3rdparty/libprocess/src/tests/future_tests.cpp 380755f9a2329d548969cfb2332c79aacbf7fff2 
> 
> Diff: https://reviews.apache.org/r/55576/diff/
> 
> 
> Testing
> -------
> 
> ```sh
> # Put the machine under stress.
> $ stress -c 4 -t 2600 -d 2 -i 2 &
> $ ./${MESOS_SRC_ROOT}/build/3rdparty/libprocess/libprocess-tests \
>       --gtest_filter="FutureTest.After3" --gtest_repeat=1000000 --gtest_break_on_failure
> $ ./${MESOS_SRC_ROOT}/build/build/src/mesos-tests
> ```
> 
> 
> Thanks,
> 
> Alexander Rojas
> 
>


Re: Review Request 55576: Fixes FutureTest.After3 flakiness.

Posted by Alexander Rojas <al...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55576/
-----------------------------------------------------------

(Updated Feb. 20, 2017, 2:41 p.m.)


Review request for mesos, Benjamin Bannier, Benjamin Hindman, and Joris Van Remoortere.


Bugs: MESOS-6907
    https://issues.apache.org/jira/browse/MESOS-6907


Repository: mesos


Description (updated)
-------

Fixed a flakiness in `FutureTest.After3` caused by the way
libprocess processes timers.

Timers are not clear before setting `clock::settle` to 
`false`, which make cause a race condition for callers which
expect timers' thunks to be destroyed after the timer is
called and the clock is settled.

The issue is addressed by explicitly calling 
`std::list<T>::clear()` on the `timedout` timer's list
once the timers are executed.


Diffs (updated)
-----

  3rdparty/libprocess/src/clock.cpp 6116c82377b1c2f33c5f4bc14890d35f34cc973d 
  3rdparty/libprocess/src/tests/future_tests.cpp 380755f9a2329d548969cfb2332c79aacbf7fff2 

Diff: https://reviews.apache.org/r/55576/diff/


Testing
-------

```sh
# Put the machine under stress.
$ stress -c 4 -t 2600 -d 2 -i 2 &
$ ./${MESOS_SRC_ROOT}/build/3rdparty/libprocess/libprocess-tests \
      --gtest_filter="FutureTest.After3" --gtest_repeat=1000000 --gtest_break_on_failure
$ ./${MESOS_SRC_ROOT}/build/build/src/mesos-tests
```


Thanks,

Alexander Rojas


Re: Review Request 55576: Fixes FutureTest.After3 flakiness.

Posted by Alexander Rojas <al...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55576/
-----------------------------------------------------------

(Updated Feb. 17, 2017, 10:51 a.m.)


Review request for mesos, Benjamin Bannier, Benjamin Hindman, and Joris Van Remoortere.


Bugs: MESOS-6907
    https://issues.apache.org/jira/browse/MESOS-6907


Repository: mesos


Description (updated)
-------

Fixed a flakiness in `FutureTest.After3` caused by the way
libprocess processes timers. 

Timers are executed in batch and also discarded in batch, 
so if a given timer's thunk is executed as the first of its
batch, the callback and all objects in the callback will
be kept around until the last timer's thunk in the batch
is executed.

To get around the issue, we ensure that the first batch
is executed, by waiting on the timer's `Future` to be set,
then schedule a second timer and wait until it begins to
be executed.

Note that a solution based in `Clock` would not work since
the timers are executed in, either `libev` or `libevent` 
loops for which libprocess has no control over.


Diffs
-----

  3rdparty/libprocess/src/tests/future_tests.cpp 380755f9a2329d548969cfb2332c79aacbf7fff2 

Diff: https://reviews.apache.org/r/55576/diff/


Testing
-------

```sh
# Put the machine under stress.
$ stress -c 4 -t 2600 -d 2 -i 2 &
$ ./${MESOS_SRC_ROOT}/build/3rdparty/libprocess/libprocess-tests \
      --gtest_filter="FutureTest.After3" --gtest_repeat=1000000 --gtest_break_on_failure
$ ./${MESOS_SRC_ROOT}/build/build/src/mesos-tests
```


Thanks,

Alexander Rojas


Re: Review Request 55576: Fixes FutureTest.After3 flakiness.

Posted by Alexander Rojas <al...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55576/
-----------------------------------------------------------

(Updated Jan. 16, 2017, 3:34 p.m.)


Review request for mesos, Benjamin Bannier, Benjamin Hindman, and Joris Van Remoortere.


Bugs: MESOS-6907
    https://issues.apache.org/jira/browse/MESOS-6907


Repository: mesos


Description (updated)
-------

Fixed a flakiness in `FutureTest.After3` related a timer not being
destroyed on time by libprocess before verifying the expectations.


Diffs
-----

  3rdparty/libprocess/src/tests/future_tests.cpp 380755f9a2329d548969cfb2332c79aacbf7fff2 

Diff: https://reviews.apache.org/r/55576/diff/


Testing
-------

```sh
# Put the machine under stress.
$ stress -c 4 -t 2600 -d 2 -i 2 &
$ ./${MESOS_SRC_ROOT}/build/3rdparty/libprocess/libprocess-tests \
      --gtest_filter="FutureTest.After3" --gtest_repeat=1000000 --gtest_break_on_failure
$ ./${MESOS_SRC_ROOT}/build/build/src/mesos-tests
```


Thanks,

Alexander Rojas


Re: Review Request 55576: Fixes FutureTest.After3 flakiness.

Posted by Alexander Rojas <al...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55576/
-----------------------------------------------------------

(Updated Jan. 16, 2017, 3:34 p.m.)


Review request for mesos, Benjamin Bannier, Benjamin Hindman, and Joris Van Remoortere.


Bugs: MESOS-6907
    https://issues.apache.org/jira/browse/MESOS-6907


Repository: mesos


Description (updated)
-------

Fixed a flakiness in `FutureTest.After3` related a timer not being
destroyed on time by libprocess before verifying the expectations.


Diffs (updated)
-----

  3rdparty/libprocess/src/tests/future_tests.cpp 380755f9a2329d548969cfb2332c79aacbf7fff2 

Diff: https://reviews.apache.org/r/55576/diff/


Testing
-------

```sh
# Put the machine under stress.
$ stress -c 4 -t 2600 -d 2 -i 2 &
$ ./${MESOS_SRC_ROOT}/build/3rdparty/libprocess/libprocess-tests \
      --gtest_filter="FutureTest.After3" --gtest_repeat=1000000 --gtest_break_on_failure
$ ./${MESOS_SRC_ROOT}/build/build/src/mesos-tests
```


Thanks,

Alexander Rojas