You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by Benjamin Bannier <be...@mesosphere.io> on 2018/05/03 11:46:04 UTC

Review Request 66931: Fixed a race in resource provider resubscription test.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66931/
-----------------------------------------------------------

Review request for mesos, Chun-Hung Hsiao and Jan Schlicht.


Bugs: MESOS-8874
    https://issues.apache.org/jira/browse/MESOS-8874


Repository: mesos


Description
-------

We previously did not make ensure that after the simulated agent
failover in
`ResourceProviderManagerHttpApiTest.ResubscribeResourceProvider` the
mock resource provider created as part of the test did not reconnect
to the restarted agent (as opposed to the newly initialized resource
provider). This lead to unmet test expectations.

With this patch we now explicitly tear down the mock resource provider
after we have detected that the agent went away to prevent the race.


Diffs
-----

  src/tests/resource_provider_manager_tests.cpp e8ca377fd0a927b99fdaf6a8ee0139025a41298e 


Diff: https://reviews.apache.org/r/66931/diff/1/


Testing
-------

`make check`

Ran the test repeatedly under high system load without triggering the issue again with this patch.


Thanks,

Benjamin Bannier


Re: Review Request 66931: Fixed a race in resource provider resubscription test.

Posted by Mesos Reviewbot <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66931/#review202351
-----------------------------------------------------------



Patch looks great!

Reviews applied: [66931]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' CONFIGURATION='--verbose --disable-libtool-wrappers' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1'; ./support/docker-build.sh

- Mesos Reviewbot


On May 3, 2018, 11:46 a.m., Benjamin Bannier wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66931/
> -----------------------------------------------------------
> 
> (Updated May 3, 2018, 11:46 a.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao and Jan Schlicht.
> 
> 
> Bugs: MESOS-8874
>     https://issues.apache.org/jira/browse/MESOS-8874
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> We previously did not make ensure that after the simulated agent
> failover in
> `ResourceProviderManagerHttpApiTest.ResubscribeResourceProvider` the
> mock resource provider created as part of the test did not reconnect
> to the restarted agent (as opposed to the newly initialized resource
> provider). This lead to unmet test expectations.
> 
> With this patch we now explicitly tear down the mock resource provider
> after we have detected that the agent went away to prevent the race.
> 
> 
> Diffs
> -----
> 
>   src/tests/resource_provider_manager_tests.cpp e8ca377fd0a927b99fdaf6a8ee0139025a41298e 
> 
> 
> Diff: https://reviews.apache.org/r/66931/diff/1/
> 
> 
> Testing
> -------
> 
> `make check`
> 
> Ran the test repeatedly under high system load without triggering the issue again with this patch.
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>


Re: Review Request 66931: Fixed a race in resource provider resubscription test.

Posted by Mesos Reviewbot Windows <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66931/#review202435
-----------------------------------------------------------



PASS: Mesos patch 66931 was successfully built and tested.

Reviews applied: `['66931']`

All the build artifacts available at: http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/66931

- Mesos Reviewbot Windows


On May 4, 2018, 10:37 a.m., Benjamin Bannier wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66931/
> -----------------------------------------------------------
> 
> (Updated May 4, 2018, 10:37 a.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao and Jan Schlicht.
> 
> 
> Bugs: MESOS-8874
>     https://issues.apache.org/jira/browse/MESOS-8874
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> We previously did not make ensure that after the simulated agent
> failover in
> `ResourceProviderManagerHttpApiTest.ResubscribeResourceProvider` the
> mock resource provider created as part of the test did not reconnect
> to the restarted agent (as opposed to the newly initialized resource
> provider). This lead to unmet test expectations.
> 
> With this patch we now explicitly tear down the mock resource provider
> after we have detected that the agent went away to prevent the race.
> 
> 
> Diffs
> -----
> 
>   src/tests/resource_provider_manager_tests.cpp e8ca377fd0a927b99fdaf6a8ee0139025a41298e 
> 
> 
> Diff: https://reviews.apache.org/r/66931/diff/2/
> 
> 
> Testing
> -------
> 
> `make check`
> 
> Ran the test repeatedly under high system load without triggering the issue again with this patch.
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>


Re: Review Request 66931: Fixed a race in resource provider resubscription test.

Posted by Chun-Hung Hsiao <ch...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66931/#review202452
-----------------------------------------------------------


Fix it, then Ship it!





src/tests/resource_provider_manager_tests.cpp
Lines 1133 (patched)
<https://reviews.apache.org/r/66931/#comment284285>

    Let's satisfy the future after the RP is destructed.


- Chun-Hung Hsiao


On May 4, 2018, 10:37 a.m., Benjamin Bannier wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66931/
> -----------------------------------------------------------
> 
> (Updated May 4, 2018, 10:37 a.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao and Jan Schlicht.
> 
> 
> Bugs: MESOS-8874
>     https://issues.apache.org/jira/browse/MESOS-8874
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> We previously did not make ensure that after the simulated agent
> failover in
> `ResourceProviderManagerHttpApiTest.ResubscribeResourceProvider` the
> mock resource provider created as part of the test did not reconnect
> to the restarted agent (as opposed to the newly initialized resource
> provider). This lead to unmet test expectations.
> 
> With this patch we now explicitly tear down the mock resource provider
> after we have detected that the agent went away to prevent the race.
> 
> 
> Diffs
> -----
> 
>   src/tests/resource_provider_manager_tests.cpp e8ca377fd0a927b99fdaf6a8ee0139025a41298e 
> 
> 
> Diff: https://reviews.apache.org/r/66931/diff/2/
> 
> 
> Testing
> -------
> 
> `make check`
> 
> Ran the test repeatedly under high system load without triggering the issue again with this patch.
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>


Re: Review Request 66931: Fixed a race in resource provider resubscription test.

Posted by Benjamin Bannier <be...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66931/
-----------------------------------------------------------

(Updated May 4, 2018, 12:37 p.m.)


Review request for mesos, Chun-Hung Hsiao and Jan Schlicht.


Changes
-------

Addressed comments.


Bugs: MESOS-8874
    https://issues.apache.org/jira/browse/MESOS-8874


Repository: mesos


Description
-------

We previously did not make ensure that after the simulated agent
failover in
`ResourceProviderManagerHttpApiTest.ResubscribeResourceProvider` the
mock resource provider created as part of the test did not reconnect
to the restarted agent (as opposed to the newly initialized resource
provider). This lead to unmet test expectations.

With this patch we now explicitly tear down the mock resource provider
after we have detected that the agent went away to prevent the race.


Diffs (updated)
-----

  src/tests/resource_provider_manager_tests.cpp e8ca377fd0a927b99fdaf6a8ee0139025a41298e 


Diff: https://reviews.apache.org/r/66931/diff/2/

Changes: https://reviews.apache.org/r/66931/diff/1-2/


Testing
-------

`make check`

Ran the test repeatedly under high system load without triggering the issue again with this patch.


Thanks,

Benjamin Bannier


Re: Review Request 66931: Fixed a race in resource provider resubscription test.

Posted by Mesos Reviewbot Windows <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66931/#review202347
-----------------------------------------------------------



PASS: Mesos patch 66931 was successfully built and tested.

Reviews applied: `['66931']`

All the build artifacts available at: http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/66931

- Mesos Reviewbot Windows


On May 3, 2018, 1:46 p.m., Benjamin Bannier wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66931/
> -----------------------------------------------------------
> 
> (Updated May 3, 2018, 1:46 p.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao and Jan Schlicht.
> 
> 
> Bugs: MESOS-8874
>     https://issues.apache.org/jira/browse/MESOS-8874
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> We previously did not make ensure that after the simulated agent
> failover in
> `ResourceProviderManagerHttpApiTest.ResubscribeResourceProvider` the
> mock resource provider created as part of the test did not reconnect
> to the restarted agent (as opposed to the newly initialized resource
> provider). This lead to unmet test expectations.
> 
> With this patch we now explicitly tear down the mock resource provider
> after we have detected that the agent went away to prevent the race.
> 
> 
> Diffs
> -----
> 
>   src/tests/resource_provider_manager_tests.cpp e8ca377fd0a927b99fdaf6a8ee0139025a41298e 
> 
> 
> Diff: https://reviews.apache.org/r/66931/diff/1/
> 
> 
> Testing
> -------
> 
> `make check`
> 
> Ran the test repeatedly under high system load without triggering the issue again with this patch.
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>


Re: Review Request 66931: Fixed a race in resource provider resubscription test.

Posted by Chun-Hung Hsiao <ch...@apache.org>.

> On May 3, 2018, 1:52 p.m., Jan Schlicht wrote:
> > src/tests/resource_provider_manager_tests.cpp
> > Lines 1130-1131 (patched)
> > <https://reviews.apache.org/r/66931/diff/1/?file=2016240#file2016240line1130>
> >
> >     Nit: `disconnected` is called asynchronously. The `Clock::settle()` below will make sure that it is called. Not having the settle would introduce another flakyness though. To be extra sure, we could set a future and await that one before we reset `resourceProvider` with a new instance.

+1 for awaiting a future instead of using `Clock::settle()`. This would make the intention more clear.


- Chun-Hung


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66931/#review202348
-----------------------------------------------------------


On May 3, 2018, 11:46 a.m., Benjamin Bannier wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66931/
> -----------------------------------------------------------
> 
> (Updated May 3, 2018, 11:46 a.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao and Jan Schlicht.
> 
> 
> Bugs: MESOS-8874
>     https://issues.apache.org/jira/browse/MESOS-8874
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> We previously did not make ensure that after the simulated agent
> failover in
> `ResourceProviderManagerHttpApiTest.ResubscribeResourceProvider` the
> mock resource provider created as part of the test did not reconnect
> to the restarted agent (as opposed to the newly initialized resource
> provider). This lead to unmet test expectations.
> 
> With this patch we now explicitly tear down the mock resource provider
> after we have detected that the agent went away to prevent the race.
> 
> 
> Diffs
> -----
> 
>   src/tests/resource_provider_manager_tests.cpp e8ca377fd0a927b99fdaf6a8ee0139025a41298e 
> 
> 
> Diff: https://reviews.apache.org/r/66931/diff/1/
> 
> 
> Testing
> -------
> 
> `make check`
> 
> Ran the test repeatedly under high system load without triggering the issue again with this patch.
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>


Re: Review Request 66931: Fixed a race in resource provider resubscription test.

Posted by Benjamin Bannier <be...@mesosphere.io>.

> On May 3, 2018, 3:52 p.m., Jan Schlicht wrote:
> > src/tests/resource_provider_manager_tests.cpp
> > Lines 1130-1131 (patched)
> > <https://reviews.apache.org/r/66931/diff/1/?file=2016240#file2016240line1130>
> >
> >     Nit: `disconnected` is called asynchronously. The `Clock::settle()` below will make sure that it is called. Not having the settle would introduce another flakyness though. To be extra sure, we could set a future and await that one before we reset `resourceProvider` with a new instance.
> 
> Chun-Hung Hsiao wrote:
>     +1 for awaiting a future instead of using `Clock::settle()`. This would make the intention more clear.

Added an explicition expectation on disconnected before restarting the agent.

@chun The `Clock::settle` is still needed for the agent registration.


- Benjamin


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66931/#review202348
-----------------------------------------------------------


On May 4, 2018, 12:37 p.m., Benjamin Bannier wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66931/
> -----------------------------------------------------------
> 
> (Updated May 4, 2018, 12:37 p.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao and Jan Schlicht.
> 
> 
> Bugs: MESOS-8874
>     https://issues.apache.org/jira/browse/MESOS-8874
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> We previously did not make ensure that after the simulated agent
> failover in
> `ResourceProviderManagerHttpApiTest.ResubscribeResourceProvider` the
> mock resource provider created as part of the test did not reconnect
> to the restarted agent (as opposed to the newly initialized resource
> provider). This lead to unmet test expectations.
> 
> With this patch we now explicitly tear down the mock resource provider
> after we have detected that the agent went away to prevent the race.
> 
> 
> Diffs
> -----
> 
>   src/tests/resource_provider_manager_tests.cpp e8ca377fd0a927b99fdaf6a8ee0139025a41298e 
> 
> 
> Diff: https://reviews.apache.org/r/66931/diff/2/
> 
> 
> Testing
> -------
> 
> `make check`
> 
> Ran the test repeatedly under high system load without triggering the issue again with this patch.
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>


Re: Review Request 66931: Fixed a race in resource provider resubscription test.

Posted by Jan Schlicht <ja...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66931/#review202348
-----------------------------------------------------------


Ship it!





src/tests/resource_provider_manager_tests.cpp
Lines 1130-1131 (patched)
<https://reviews.apache.org/r/66931/#comment284116>

    Nit: `disconnected` is called asynchronously. The `Clock::settle()` below will make sure that it is called. Not having the settle would introduce another flakyness though. To be extra sure, we could set a future and await that one before we reset `resourceProvider` with a new instance.


- Jan Schlicht


On May 3, 2018, 1:46 p.m., Benjamin Bannier wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66931/
> -----------------------------------------------------------
> 
> (Updated May 3, 2018, 1:46 p.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao and Jan Schlicht.
> 
> 
> Bugs: MESOS-8874
>     https://issues.apache.org/jira/browse/MESOS-8874
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> We previously did not make ensure that after the simulated agent
> failover in
> `ResourceProviderManagerHttpApiTest.ResubscribeResourceProvider` the
> mock resource provider created as part of the test did not reconnect
> to the restarted agent (as opposed to the newly initialized resource
> provider). This lead to unmet test expectations.
> 
> With this patch we now explicitly tear down the mock resource provider
> after we have detected that the agent went away to prevent the race.
> 
> 
> Diffs
> -----
> 
>   src/tests/resource_provider_manager_tests.cpp e8ca377fd0a927b99fdaf6a8ee0139025a41298e 
> 
> 
> Diff: https://reviews.apache.org/r/66931/diff/1/
> 
> 
> Testing
> -------
> 
> `make check`
> 
> Ran the test repeatedly under high system load without triggering the issue again with this patch.
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>


Re: Review Request 66931: Fixed a race in resource provider resubscription test.

Posted by Chun-Hung Hsiao <ch...@apache.org>.

> On May 4, 2018, 3:22 a.m., Chun-Hung Hsiao wrote:
> > src/tests/resource_provider_manager_tests.cpp
> > Lines 1132 (patched)
> > <https://reviews.apache.org/r/66931/diff/1/?file=2016240#file2016240line1132>
> >
> >     This action will be invoked in the Driver's context. Could you explain why this could happen?
> 
> Benjamin Bannier wrote:
>     The action will be invoked on the thread running the test body while the `disconnected` callback is triggered from the driver. Since the `reset` tears down the driver, the "spurious calls" here could in principle be multiple invocations of `disconnected` before the driver is cleaned up -- looking at the code this does not appear to be possible right now (thanks Jan for the help!), but I think being a little more defensive here does not hurt.
>     
>     What do you think?

IIUC the callback is not run on the caller's thread, not the test body thread. But we currently use `process::async` to call the callback, meaning that all calls to the callback will be run on different actors. So this defensive action LGTM.


- Chun-Hung


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66931/#review202409
-----------------------------------------------------------


On May 4, 2018, 10:37 a.m., Benjamin Bannier wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66931/
> -----------------------------------------------------------
> 
> (Updated May 4, 2018, 10:37 a.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao and Jan Schlicht.
> 
> 
> Bugs: MESOS-8874
>     https://issues.apache.org/jira/browse/MESOS-8874
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> We previously did not make ensure that after the simulated agent
> failover in
> `ResourceProviderManagerHttpApiTest.ResubscribeResourceProvider` the
> mock resource provider created as part of the test did not reconnect
> to the restarted agent (as opposed to the newly initialized resource
> provider). This lead to unmet test expectations.
> 
> With this patch we now explicitly tear down the mock resource provider
> after we have detected that the agent went away to prevent the race.
> 
> 
> Diffs
> -----
> 
>   src/tests/resource_provider_manager_tests.cpp e8ca377fd0a927b99fdaf6a8ee0139025a41298e 
> 
> 
> Diff: https://reviews.apache.org/r/66931/diff/2/
> 
> 
> Testing
> -------
> 
> `make check`
> 
> Ran the test repeatedly under high system load without triggering the issue again with this patch.
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>


Re: Review Request 66931: Fixed a race in resource provider resubscription test.

Posted by Benjamin Bannier <be...@mesosphere.io>.

> On May 4, 2018, 5:22 a.m., Chun-Hung Hsiao wrote:
> > src/tests/resource_provider_manager_tests.cpp
> > Lines 1132 (patched)
> > <https://reviews.apache.org/r/66931/diff/1/?file=2016240#file2016240line1132>
> >
> >     This action will be invoked in the Driver's context. Could you explain why this could happen?

The action will be invoked on the thread running the test body while the `disconnected` callback is triggered from the driver. Since the `reset` tears down the driver, the "spurious calls" here could in principle be multiple invocations of `disconnected` before the driver is cleaned up -- looking at the code this does not appear to be possible right now (thanks Jan for the help!), but I think being a little more defensive here does not hurt.

What do you think?


> On May 4, 2018, 5:22 a.m., Chun-Hung Hsiao wrote:
> > src/tests/resource_provider_manager_tests.cpp
> > Line 1140 (original), 1146 (patched)
> > <https://reviews.apache.org/r/66931/diff/1/?file=2016240#file2016240line1147>
> >
> >     Instead of creating a new mock resource provider, could we restart the agent with the same pid, and let the same mock resource provider to subscribe to the new agent?
> >     
> >     Feel free to drop this if you have concerns with this approach.

There are two reasons for this. First, the resource provider's driver will automatically try to resubscribe and explicitly restarting it here will allow us to more clearly manage test expectations. Second, in non-test code the agent's lifetime always exceeds the RP's and we are approximating that setup here.

Please let me know if you think this part should be rewritten, dropping for now.


- Benjamin


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66931/#review202409
-----------------------------------------------------------


On May 4, 2018, 12:37 p.m., Benjamin Bannier wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66931/
> -----------------------------------------------------------
> 
> (Updated May 4, 2018, 12:37 p.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao and Jan Schlicht.
> 
> 
> Bugs: MESOS-8874
>     https://issues.apache.org/jira/browse/MESOS-8874
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> We previously did not make ensure that after the simulated agent
> failover in
> `ResourceProviderManagerHttpApiTest.ResubscribeResourceProvider` the
> mock resource provider created as part of the test did not reconnect
> to the restarted agent (as opposed to the newly initialized resource
> provider). This lead to unmet test expectations.
> 
> With this patch we now explicitly tear down the mock resource provider
> after we have detected that the agent went away to prevent the race.
> 
> 
> Diffs
> -----
> 
>   src/tests/resource_provider_manager_tests.cpp e8ca377fd0a927b99fdaf6a8ee0139025a41298e 
> 
> 
> Diff: https://reviews.apache.org/r/66931/diff/2/
> 
> 
> Testing
> -------
> 
> `make check`
> 
> Ran the test repeatedly under high system load without triggering the issue again with this patch.
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>


Re: Review Request 66931: Fixed a race in resource provider resubscription test.

Posted by Chun-Hung Hsiao <ch...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66931/#review202409
-----------------------------------------------------------




src/tests/resource_provider_manager_tests.cpp
Lines 1132 (patched)
<https://reviews.apache.org/r/66931/#comment284226>

    This action will be invoked in the Driver's context. Could you explain why this could happen?



src/tests/resource_provider_manager_tests.cpp
Line 1140 (original), 1146 (patched)
<https://reviews.apache.org/r/66931/#comment284227>

    Instead of creating a new mock resource provider, could we restart the agent with the same pid, and let the same mock resource provider to subscribe to the new agent?
    
    Feel free to drop this if you have concerns with this approach.


- Chun-Hung Hsiao


On May 3, 2018, 11:46 a.m., Benjamin Bannier wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66931/
> -----------------------------------------------------------
> 
> (Updated May 3, 2018, 11:46 a.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao and Jan Schlicht.
> 
> 
> Bugs: MESOS-8874
>     https://issues.apache.org/jira/browse/MESOS-8874
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> We previously did not make ensure that after the simulated agent
> failover in
> `ResourceProviderManagerHttpApiTest.ResubscribeResourceProvider` the
> mock resource provider created as part of the test did not reconnect
> to the restarted agent (as opposed to the newly initialized resource
> provider). This lead to unmet test expectations.
> 
> With this patch we now explicitly tear down the mock resource provider
> after we have detected that the agent went away to prevent the race.
> 
> 
> Diffs
> -----
> 
>   src/tests/resource_provider_manager_tests.cpp e8ca377fd0a927b99fdaf6a8ee0139025a41298e 
> 
> 
> Diff: https://reviews.apache.org/r/66931/diff/1/
> 
> 
> Testing
> -------
> 
> `make check`
> 
> Ran the test repeatedly under high system load without triggering the issue again with this patch.
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>