You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by Chun-Hung Hsiao <ch...@apache.org> on 2018/09/19 05:14:55 UTC

Re: Review Request 68763: WIP: Stopped resource providers when removing resource provider configs.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68763/
-----------------------------------------------------------

(Updated Sept. 19, 2018, 5:14 a.m.)


Review request for mesos, Benjamin Bannier, Jie Yu, and Jan Schlicht.


Summary (updated)
-----------------

WIP: Stopped resource providers when removing resource provider configs.


Bugs: MESOS-9228
    https://issues.apache.org/jira/browse/MESOS-9228


Repository: mesos


Description
-------

This patch introduces a `LocalResourceProvider::stop` function to
perform RP-specific stop procedures. SLRP uses this function to kill
plugin container before its config is removed.


Diffs
-----

  src/resource_provider/daemon.cpp 0a76cc6d1d34413674d1af1aa514679a4d2b7b55 
  src/resource_provider/local.hpp 20bcc78d3fe847e03526fa59116bdbac92ec1e29 
  src/resource_provider/storage/provider.hpp 5a371b19289c6e39fedd4cda65fa8be432d095e6 
  src/resource_provider/storage/provider.cpp 6475f653263337c381b6080695d09c49e5ea8fcf 


Diff: https://reviews.apache.org/r/68763/diff/1/


Testing
-------

make check


Thanks,

Chun-Hung Hsiao


Re: Review Request 68763: Cleaned up residual containers when removing resource provider configs.

Posted by Chun-Hung Hsiao <ch...@apache.org>.

> On Sept. 21, 2018, 10:50 a.m., Benjamin Bannier wrote:
> > src/resource_provider/daemon.cpp
> > Lines 566 (patched)
> > <https://reviews.apache.org/r/68763/diff/2/?file=2090658#file2090658line571>
> >
> >     Let's assert here instead of letting the exception propagate.
> 
> Chun-Hung Hsiao wrote:
>     Can you explain more about what assertion to add? You mean making `cleanupContainers` a `void` function and handling errors here? That would mean that the corresponding entry in `providers` needs to be erased here.

Oh this is about asserting `cid_prefix`. Sure I'll do that.


- Chun-Hung


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68763/#review208846
-----------------------------------------------------------


On Sept. 20, 2018, 11:21 p.m., Chun-Hung Hsiao wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68763/
> -----------------------------------------------------------
> 
> (Updated Sept. 20, 2018, 11:21 p.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Jie Yu, and Jan Schlicht.
> 
> 
> Bugs: MESOS-9228
>     https://issues.apache.org/jira/browse/MESOS-9228
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> When processing `REMOVE_RESOURCE_PROVIDER_CONFIG`, the local resource
> provider daemon now performs a best-effort cleanup by killing all
> standalone containers prefixed by the 'cid_prefix' of the resource
> provider. During the cleanup, no resource provider config with the same
> type and name can be added.
> 
> 
> Diffs
> -----
> 
>   src/resource_provider/daemon.cpp 0a76cc6d1d34413674d1af1aa514679a4d2b7b55 
> 
> 
> Diff: https://reviews.apache.org/r/68763/diff/2/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Chun-Hung Hsiao
> 
>


Re: Review Request 68763: Cleaned up residual containers when removing resource provider configs.

Posted by Chun-Hung Hsiao <ch...@apache.org>.

> On Sept. 21, 2018, 10:50 a.m., Benjamin Bannier wrote:
> > src/resource_provider/daemon.cpp
> > Lines 431-433 (original), 483-485 (patched)
> > <https://reviews.apache.org/r/68763/diff/2/?file=2090658#file2090658line488>
> >
> >     How does this hold? It seems we should e.g., insert a `remove` call before continuing with `_launch`.
> 
> Chun-Hung Hsiao wrote:
>     I assume you're talking about an `erase`. No we cannot simply erase it, because the `providers` struct is the source of truth whether we should accept or reject an add, an update, or a remove when requesting the auth token.

Oh now I get it. You're suggesting that we should explicitly call a `remove`? But in the case of `update`, we don't need to actually do the cleanup as we do in `remove`.


- Chun-Hung


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68763/#review208846
-----------------------------------------------------------


On Sept. 20, 2018, 11:21 p.m., Chun-Hung Hsiao wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68763/
> -----------------------------------------------------------
> 
> (Updated Sept. 20, 2018, 11:21 p.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Jie Yu, and Jan Schlicht.
> 
> 
> Bugs: MESOS-9228
>     https://issues.apache.org/jira/browse/MESOS-9228
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> When processing `REMOVE_RESOURCE_PROVIDER_CONFIG`, the local resource
> provider daemon now performs a best-effort cleanup by killing all
> standalone containers prefixed by the 'cid_prefix' of the resource
> provider. During the cleanup, no resource provider config with the same
> type and name can be added.
> 
> 
> Diffs
> -----
> 
>   src/resource_provider/daemon.cpp 0a76cc6d1d34413674d1af1aa514679a4d2b7b55 
> 
> 
> Diff: https://reviews.apache.org/r/68763/diff/2/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Chun-Hung Hsiao
> 
>


Re: Review Request 68763: Cleaned up residual containers when removing resource provider configs.

Posted by Chun-Hung Hsiao <ch...@apache.org>.

> On Sept. 21, 2018, 10:50 a.m., Benjamin Bannier wrote:
> > src/resource_provider/daemon.cpp
> > Lines 426-427 (original), 478-481 (patched)
> > <https://reviews.apache.org/r/68763/diff/2/?file=2090658#file2090658line481>
> >
> >     Wouldn't this be a situation where we'd like to respond with a `Failure`?

Thanks for catching this. In the current logic, `data.removing` is set iff a previous remove returned a success. Since `launch` is called synchronously in `add` or `start`, this should always be false. I'll change this to a check.


> On Sept. 21, 2018, 10:50 a.m., Benjamin Bannier wrote:
> > src/resource_provider/daemon.cpp
> > Lines 431-433 (original), 483-485 (patched)
> > <https://reviews.apache.org/r/68763/diff/2/?file=2090658#file2090658line488>
> >
> >     How does this hold? It seems we should e.g., insert a `remove` call before continuing with `_launch`.

I assume you're talking about an `erase`. No we cannot simply erase it, because the `providers` struct is the source of truth whether we should accept or reject an add, an update, or a remove when requesting the auth token.


> On Sept. 21, 2018, 10:50 a.m., Benjamin Bannier wrote:
> > src/resource_provider/daemon.cpp
> > Lines 505-508 (patched)
> > <https://reviews.apache.org/r/68763/diff/2/?file=2090658#file2090658line510>
> >
> >     Wouldn't this be a situation where we'd like to respond with a `Failure`?

Even if we return a failure here, this failure would only surface as an error log, since the add/update/remove APIs doesn't depend on the launch.

Also, `data.remove` is set only if there is a remove call between `launch` and `_launch`, which is allowed, so we simply abort the launch here.


> On Sept. 21, 2018, 10:50 a.m., Benjamin Bannier wrote:
> > src/resource_provider/daemon.cpp
> > Lines 566 (patched)
> > <https://reviews.apache.org/r/68763/diff/2/?file=2090658#file2090658line571>
> >
> >     Let's assert here instead of letting the exception propagate.

Can you explain more about what assertion to add? You mean making `cleanupContainers` a `void` function and handling errors here? That would mean that the corresponding entry in `providers` needs to be erased here.


- Chun-Hung


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68763/#review208846
-----------------------------------------------------------


On Sept. 20, 2018, 11:21 p.m., Chun-Hung Hsiao wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68763/
> -----------------------------------------------------------
> 
> (Updated Sept. 20, 2018, 11:21 p.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Jie Yu, and Jan Schlicht.
> 
> 
> Bugs: MESOS-9228
>     https://issues.apache.org/jira/browse/MESOS-9228
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> When processing `REMOVE_RESOURCE_PROVIDER_CONFIG`, the local resource
> provider daemon now performs a best-effort cleanup by killing all
> standalone containers prefixed by the 'cid_prefix' of the resource
> provider. During the cleanup, no resource provider config with the same
> type and name can be added.
> 
> 
> Diffs
> -----
> 
>   src/resource_provider/daemon.cpp 0a76cc6d1d34413674d1af1aa514679a4d2b7b55 
> 
> 
> Diff: https://reviews.apache.org/r/68763/diff/2/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Chun-Hung Hsiao
> 
>


Re: Review Request 68763: Cleaned up residual containers when removing resource provider configs.

Posted by Benjamin Bannier <be...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68763/#review208846
-----------------------------------------------------------




src/resource_provider/daemon.cpp
Lines 426-427 (original), 478-481 (patched)
<https://reviews.apache.org/r/68763/#comment293083>

    Wouldn't this be a situation where we'd like to respond with a `Failure`?



src/resource_provider/daemon.cpp
Lines 431-433 (original), 483-485 (patched)
<https://reviews.apache.org/r/68763/#comment293080>

    How does this hold? It seems we should e.g., insert a `remove` call before continuing with `_launch`.



src/resource_provider/daemon.cpp
Lines 505-508 (patched)
<https://reviews.apache.org/r/68763/#comment293082>

    Wouldn't this be a situation where we'd like to respond with a `Failure`?



src/resource_provider/daemon.cpp
Lines 566 (patched)
<https://reviews.apache.org/r/68763/#comment293078>

    Let's assert here instead of letting the exception propagate.



src/resource_provider/daemon.cpp
Lines 626 (patched)
<https://reviews.apache.org/r/68763/#comment293079>

    Nit: we could use an early `continue` here to reduce nesting,
    ```
    if (!strings::startsWith(containerId.value(), cidPrefix)) {
      continue;
    }
    ```


- Benjamin Bannier


On Sept. 21, 2018, 1:21 a.m., Chun-Hung Hsiao wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68763/
> -----------------------------------------------------------
> 
> (Updated Sept. 21, 2018, 1:21 a.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Jie Yu, and Jan Schlicht.
> 
> 
> Bugs: MESOS-9228
>     https://issues.apache.org/jira/browse/MESOS-9228
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> When processing `REMOVE_RESOURCE_PROVIDER_CONFIG`, the local resource
> provider daemon now performs a best-effort cleanup by killing all
> standalone containers prefixed by the 'cid_prefix' of the resource
> provider. During the cleanup, no resource provider config with the same
> type and name can be added.
> 
> 
> Diffs
> -----
> 
>   src/resource_provider/daemon.cpp 0a76cc6d1d34413674d1af1aa514679a4d2b7b55 
> 
> 
> Diff: https://reviews.apache.org/r/68763/diff/2/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Chun-Hung Hsiao
> 
>


Re: Review Request 68763: Cleaned up residual containers when removing resource provider configs.

Posted by Jie Yu <yu...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68763/#review208891
-----------------------------------------------------------


Fix it, then Ship it!





src/resource_provider/daemon.cpp
Lines 129 (patched)
<https://reviews.apache.org/r/68763/#comment293164>

    s/removed/removing/ ?



src/resource_provider/daemon.cpp
Lines 556 (patched)
<https://reviews.apache.org/r/68763/#comment293183>

    It's very unfortunate that we have this SLRP specific code in the general LRP daemon. I'd add a very big TODO here about the proposed solution for refactoring.
    
    I think it's possible that we can intorduce a virtual method for LRP interface (e.g., `shutdown`). If the RP instance is not yet available, the `remove` will simply skip the `shutdown` step.
    
    Since the launch of an RP is async (due to `generateAuthToken(data.info)`), as a result, we need to be careful there. We can either fail the launch by checking the state in `_launch`, or chain remove under launch. Sounds like the former is easier.
    
    Anyway, let's record a big TODO here.



src/resource_provider/daemon.cpp
Lines 686-688 (patched)
<https://reviews.apache.org/r/68763/#comment293184>

    Can we LOG all errors in the `futures` list, not just the first one? You need to combine the error messages.


- Jie Yu


On Sept. 21, 2018, 7:47 p.m., Chun-Hung Hsiao wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68763/
> -----------------------------------------------------------
> 
> (Updated Sept. 21, 2018, 7:47 p.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Jie Yu, and Jan Schlicht.
> 
> 
> Bugs: MESOS-9228
>     https://issues.apache.org/jira/browse/MESOS-9228
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> When processing `REMOVE_RESOURCE_PROVIDER_CONFIG`, the local resource
> provider daemon now performs a best-effort cleanup by killing all
> standalone containers prefixed by the 'cid_prefix' of the resource
> provider. During the cleanup, no resource provider config with the same
> type and name can be added.
> 
> 
> Diffs
> -----
> 
>   src/resource_provider/daemon.cpp 0a76cc6d1d34413674d1af1aa514679a4d2b7b55 
> 
> 
> Diff: https://reviews.apache.org/r/68763/diff/3/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Chun-Hung Hsiao
> 
>


Re: Review Request 68763: Cleaned up residual containers when removing resource provider configs.

Posted by Chun-Hung Hsiao <ch...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68763/
-----------------------------------------------------------

(Updated Sept. 22, 2018, 12:17 a.m.)


Review request for mesos, Benjamin Bannier, Jie Yu, and Jan Schlicht.


Changes
-------

Addressed Jie' comments.


Bugs: MESOS-9228
    https://issues.apache.org/jira/browse/MESOS-9228


Repository: mesos


Description
-------

When processing `REMOVE_RESOURCE_PROVIDER_CONFIG`, the local resource
provider daemon now performs a best-effort cleanup by killing all
standalone containers prefixed by the 'cid_prefix' of the resource
provider. During the cleanup, no resource provider config with the same
type and name can be added.


Diffs (updated)
-----

  src/resource_provider/daemon.cpp 0a76cc6d1d34413674d1af1aa514679a4d2b7b55 


Diff: https://reviews.apache.org/r/68763/diff/4/

Changes: https://reviews.apache.org/r/68763/diff/3-4/


Testing
-------

make check


Thanks,

Chun-Hung Hsiao


Re: Review Request 68763: Cleaned up residual containers when removing resource provider configs.

Posted by Chun-Hung Hsiao <ch...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68763/
-----------------------------------------------------------

(Updated Sept. 21, 2018, 7:47 p.m.)


Review request for mesos, Benjamin Bannier, Jie Yu, and Jan Schlicht.


Changes
-------

Addressed Benjamin's comments.


Bugs: MESOS-9228
    https://issues.apache.org/jira/browse/MESOS-9228


Repository: mesos


Description
-------

When processing `REMOVE_RESOURCE_PROVIDER_CONFIG`, the local resource
provider daemon now performs a best-effort cleanup by killing all
standalone containers prefixed by the 'cid_prefix' of the resource
provider. During the cleanup, no resource provider config with the same
type and name can be added.


Diffs (updated)
-----

  src/resource_provider/daemon.cpp 0a76cc6d1d34413674d1af1aa514679a4d2b7b55 


Diff: https://reviews.apache.org/r/68763/diff/3/

Changes: https://reviews.apache.org/r/68763/diff/2-3/


Testing
-------

make check


Thanks,

Chun-Hung Hsiao


Re: Review Request 68763: Cleaned up residual containers when removing resource provider configs.

Posted by Mesos Reviewbot Windows <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68763/#review208828
-----------------------------------------------------------



PASS: Mesos patch 68763 was successfully built and tested.

Reviews applied: `['68755', '68756', '68757', '68777', '68758', '68790', '68762', '68763']`

All the build artifacts available at: http://dcos-win.westus2.cloudapp.azure.com/artifacts/mesos-reviewbot-testing/2366/mesos-review-68763

- Mesos Reviewbot Windows


On Sept. 20, 2018, 11:21 p.m., Chun-Hung Hsiao wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68763/
> -----------------------------------------------------------
> 
> (Updated Sept. 20, 2018, 11:21 p.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Jie Yu, and Jan Schlicht.
> 
> 
> Bugs: MESOS-9228
>     https://issues.apache.org/jira/browse/MESOS-9228
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> When processing `REMOVE_RESOURCE_PROVIDER_CONFIG`, the local resource
> provider daemon now performs a best-effort cleanup by killing all
> standalone containers prefixed by the 'cid_prefix' of the resource
> provider. During the cleanup, no resource provider config with the same
> type and name can be added.
> 
> 
> Diffs
> -----
> 
>   src/resource_provider/daemon.cpp 0a76cc6d1d34413674d1af1aa514679a4d2b7b55 
> 
> 
> Diff: https://reviews.apache.org/r/68763/diff/2/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Chun-Hung Hsiao
> 
>


Re: Review Request 68763: Cleaned up residual containers when removing resource provider configs.

Posted by Chun-Hung Hsiao <ch...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68763/
-----------------------------------------------------------

(Updated Sept. 20, 2018, 11:21 p.m.)


Review request for mesos, Benjamin Bannier, Jie Yu, and Jan Schlicht.


Changes
-------

Addressed Benjamin's comments and changed the solution to a best-effort cleanup.


Summary (updated)
-----------------

Cleaned up residual containers when removing resource provider configs.


Bugs: MESOS-9228
    https://issues.apache.org/jira/browse/MESOS-9228


Repository: mesos


Description (updated)
-------

When processing `REMOVE_RESOURCE_PROVIDER_CONFIG`, the local resource
provider daemon now performs a best-effort cleanup by killing all
standalone containers prefixed by the 'cid_prefix' of the resource
provider. During the cleanup, no resource provider config with the same
type and name can be added.


Diffs (updated)
-----

  src/resource_provider/daemon.cpp 0a76cc6d1d34413674d1af1aa514679a4d2b7b55 


Diff: https://reviews.apache.org/r/68763/diff/2/

Changes: https://reviews.apache.org/r/68763/diff/1-2/


Testing
-------

make check


Thanks,

Chun-Hung Hsiao


Re: Review Request 68763: WIP: Stopped resource providers when removing resource provider configs.

Posted by Chun-Hung Hsiao <ch...@apache.org>.

> On Sept. 19, 2018, 3:35 p.m., Benjamin Bannier wrote:
> > src/resource_provider/daemon.cpp
> > Lines 310-314 (patched)
> > <https://reviews.apache.org/r/68763/diff/1/?file=2090244#file2090244line312>
> >
> >     This is a pretty bare-bones API. I believe if we'd instead schedule removal once the launch has finished, we'd not only remove the burden to retry from the caller, but probably also make this function safer to use.
> >     
> >     This would likely require us to maintain `Promise<Owned<LocalResourceProvider>>` instead.
> >     
> >     We should probably move this into below `then` continuation.

Dropping this since this no longer fits the current solution.


> On Sept. 19, 2018, 3:35 p.m., Benjamin Bannier wrote:
> > src/resource_provider/daemon.cpp
> > Line 290 (original), 319 (patched)
> > <https://reviews.apache.org/r/68763/diff/1/?file=2090244#file2090244line321>
> >
> >     Not sure how this will look like once we handle agent failover, but it seems this removal should only happen after successful `stop`. That way we might e.g., be able to maintain consistent state (config exists <==> container running).

I'm now aiming for a temporarily fix. We should have a more thorough thinking around the lifecycle management of the standalone containers.


- Chun-Hung


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68763/#review208767
-----------------------------------------------------------


On Sept. 19, 2018, 5:14 a.m., Chun-Hung Hsiao wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68763/
> -----------------------------------------------------------
> 
> (Updated Sept. 19, 2018, 5:14 a.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Jie Yu, and Jan Schlicht.
> 
> 
> Bugs: MESOS-9228
>     https://issues.apache.org/jira/browse/MESOS-9228
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This patch introduces a `LocalResourceProvider::stop` function to
> perform RP-specific stop procedures. SLRP uses this function to kill
> plugin container before its config is removed.
> 
> 
> Diffs
> -----
> 
>   src/resource_provider/daemon.cpp 0a76cc6d1d34413674d1af1aa514679a4d2b7b55 
>   src/resource_provider/local.hpp 20bcc78d3fe847e03526fa59116bdbac92ec1e29 
>   src/resource_provider/storage/provider.hpp 5a371b19289c6e39fedd4cda65fa8be432d095e6 
>   src/resource_provider/storage/provider.cpp 6475f653263337c381b6080695d09c49e5ea8fcf 
> 
> 
> Diff: https://reviews.apache.org/r/68763/diff/1/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Chun-Hung Hsiao
> 
>


Re: Review Request 68763: Cleaned up residual containers when removing resource provider configs.

Posted by Benjamin Bannier <be...@mesosphere.io>.

> On Sept. 19, 2018, 5:35 p.m., Benjamin Bannier wrote:
> > src/resource_provider/daemon.cpp
> > Line 290 (original), 319 (patched)
> > <https://reviews.apache.org/r/68763/diff/1/?file=2090244#file2090244line321>
> >
> >     Not sure how this will look like once we handle agent failover, but it seems this removal should only happen after successful `stop`. That way we might e.g., be able to maintain consistent state (config exists <==> container running).
> 
> Chun-Hung Hsiao wrote:
>     I'm now aiming for a temporarily fix. We should have a more thorough thinking around the lifecycle management of the standalone containers.

I am still in favor of only removing the config after "successful" cleanup. That way we have a shot at having consistent state after failover which can be fixed by users with the expected API (`remove_resource_provider_config`).


- Benjamin


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68763/#review208767
-----------------------------------------------------------


On Sept. 21, 2018, 1:21 a.m., Chun-Hung Hsiao wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68763/
> -----------------------------------------------------------
> 
> (Updated Sept. 21, 2018, 1:21 a.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Jie Yu, and Jan Schlicht.
> 
> 
> Bugs: MESOS-9228
>     https://issues.apache.org/jira/browse/MESOS-9228
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> When processing `REMOVE_RESOURCE_PROVIDER_CONFIG`, the local resource
> provider daemon now performs a best-effort cleanup by killing all
> standalone containers prefixed by the 'cid_prefix' of the resource
> provider. During the cleanup, no resource provider config with the same
> type and name can be added.
> 
> 
> Diffs
> -----
> 
>   src/resource_provider/daemon.cpp 0a76cc6d1d34413674d1af1aa514679a4d2b7b55 
> 
> 
> Diff: https://reviews.apache.org/r/68763/diff/2/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Chun-Hung Hsiao
> 
>


Re: Review Request 68763: WIP: Stopped resource providers when removing resource provider configs.

Posted by Benjamin Bannier <be...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68763/#review208767
-----------------------------------------------------------




src/resource_provider/daemon.cpp
Line 115 (original), 115 (patched)
<https://reviews.apache.org/r/68763/#comment292929>

    I believe using the `None` state to denote `is being removed` is hard to follow below. I'd much prefer an explicit field, even though that would mean we'd need to maintain consistent state.



src/resource_provider/daemon.cpp
Lines 310-314 (patched)
<https://reviews.apache.org/r/68763/#comment292930>

    This is a pretty bare-bones API. I believe if we'd instead schedule removal once the launch has finished, we'd not only remove the burden to retry from the caller, but probably also make this function safer to use.
    
    This would likely require us to maintain `Promise<Owned<LocalResourceProvider>>` instead.
    
    We should probably move this into below `then` continuation.



src/resource_provider/daemon.cpp
Line 290 (original), 319 (patched)
<https://reviews.apache.org/r/68763/#comment292931>

    Not sure how this will look like once we handle agent failover, but it seems this removal should only happen after successful `stop`. That way we might e.g., be able to maintain consistent state (config exists <==> container running).



src/resource_provider/daemon.cpp
Lines 489 (patched)
<https://reviews.apache.org/r/68763/#comment292927>

    Let's not enforce this being set here; just comparing an `Option<UUID>` with an `UUID` is fine below.



src/resource_provider/daemon.cpp
Lines 498 (patched)
<https://reviews.apache.org/r/68763/#comment292928>

    Do we need to check this precondition here? Assigning over an `Owned` would be fine and not leak. Let's at least add a comment why this is needed otherwise.


- Benjamin Bannier


On Sept. 19, 2018, 7:14 a.m., Chun-Hung Hsiao wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68763/
> -----------------------------------------------------------
> 
> (Updated Sept. 19, 2018, 7:14 a.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Jie Yu, and Jan Schlicht.
> 
> 
> Bugs: MESOS-9228
>     https://issues.apache.org/jira/browse/MESOS-9228
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This patch introduces a `LocalResourceProvider::stop` function to
> perform RP-specific stop procedures. SLRP uses this function to kill
> plugin container before its config is removed.
> 
> 
> Diffs
> -----
> 
>   src/resource_provider/daemon.cpp 0a76cc6d1d34413674d1af1aa514679a4d2b7b55 
>   src/resource_provider/local.hpp 20bcc78d3fe847e03526fa59116bdbac92ec1e29 
>   src/resource_provider/storage/provider.hpp 5a371b19289c6e39fedd4cda65fa8be432d095e6 
>   src/resource_provider/storage/provider.cpp 6475f653263337c381b6080695d09c49e5ea8fcf 
> 
> 
> Diff: https://reviews.apache.org/r/68763/diff/1/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Chun-Hung Hsiao
> 
>


***UNCHECKED*** Re: Review Request 68763: WIP: Stopped resource providers when removing resource provider configs.

Posted by Mesos Reviewbot Windows <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68763/#review208751
-----------------------------------------------------------



FAIL: Some of the unit tests failed. Please check the relevant logs.

Reviews applied: `['68755', '68756', '68757', '68758', '68759', '68760', '68761', '68762', '68763']`

Failed command: `Start-MesosCITesting`

All the build artifacts available at: http://dcos-win.westus2.cloudapp.azure.com/artifacts/mesos-reviewbot-testing/2355/mesos-review-68763

Relevant logs:

- [mesos-tests.log](http://dcos-win.westus2.cloudapp.azure.com/artifacts/mesos-reviewbot-testing/2355/mesos-review-68763/logs/mesos-tests.log):

```
I0919 07:11:49.885728 36588 executor.cpp:796] Shutting down
I0919 07:11:49.885728 36588 executor.cpp:909] Sending SIGTERM to process tree at pid 352.aa0q4n2kgcyefckmv0xukjvy4f.xx.internal.cloudapp.net)
I0919 07:11:49.884730 20484 slave.cpp:909] Agent terminating
W0919 07:11:49.885728 20484 slave.cpp:3917] Ignoring shutdown framework 4a53b145-8606-41c5-9e85-86217dadee35-0000 because it is terminating
I0919 07:11:49.887737 38804 master.cpp:1251] Agent 4a53b145-8606-41c5-9e85-86217dadee35-S0 at slave(462)@192.10.1.5:55614 (windows-02.aa0q4n2kgcyefckmv0xukjvy4f.xx.internal.cloudapp.net) disconnected
I0919 07:11:49.887737 38804 master.cpp:3267] Disconnecting agent 4a53b145-8606-41c5-9e85-86217dadee35-S0 at slave(462)@192.10.1.5:55614 (windows-02.aa0q4n2kgcyefckmv0xukjvy4f.xx.internal.cloudapp.net)
I0919 07:11:49.887737 38804 master.cpp:3286] Deactivating agent 4a53b145-8606-41c5-9e85-86217dadee35-S0 at slave(462)@192.10.1.5:55614 (windows-02.aa0q4n2kgcyefckmv0xukjvy4f.xx.internal.cloudapp.net)
I0919 07:11:49.888741 36112 hierarchical.cpp:359] Removed framework 4a53b145-8606-41c5-9e85-86217dadee35-0000
I0919 07:11:49.888741 36112 hierarchical.cpp:795] Agent 4[       OK ] IsolationFlag/MemoryIsolatorTest.ROOT_MemUsage/0 (682 ms)
[----------] 1 test from IsolationFlag/MemoryIsolatorTest (701 ms total)

[----------] Global test environment tear-down
[==========] 1051 tests from 103 test cases ran. (497522 ms total)
[  PASSED  ] 1049 tests.
[  FAILED  ] 2 tests, listed below:
[  FAILED  ] DockerFetcherPluginTest.INTERNET_CURL_FetchImage
[  FAILED  ] DockerFetcherPluginTest.INTERNET_CURL_InvokeFetchByName

 2 FAILED TESTS
  YOU HAVE 231 DISABLED TESTS

a53b145-8606-41c5-9e85-86217dadee35-S0 deactivated
I0919 07:11:49.889865 38632 containerizer.cpp:2455] Destroying container 6617092f-436a-457c-b642-6df6fa5facd0 in RUNNING state
I0919 07:11:49.889865 38632 containerizer.cpp:3118] Transitioning the state of container 6617092f-436a-457c-b642-6df6fa5facd0 from RUNNING to DESTROYING
I0919 07:11:49.890739 38632 launcher.cpp:166] Asked to destroy container 6617092f-436a-457c-b642-6df6fa5facd0
I0919 07:11:49.989167 38804 containerizer.cpp:2957] Container 6617092f-436a-457c-b642-6df6fa5facd0 has exited
I0919 07:11:50.020202 35144 master.cpp:1093] Master terminating
I0919 07:11:50.022158 20484 hierarchical.cpp:637] Removed agent 4a53b145-8606-41c5-9e85-86217dadee35-S0
I0919 07:11:50.528112 38004 process.cpp:926] Stopped the socket accept loop
```

- Mesos Reviewbot Windows


On Sept. 19, 2018, 5:14 a.m., Chun-Hung Hsiao wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68763/
> -----------------------------------------------------------
> 
> (Updated Sept. 19, 2018, 5:14 a.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Jie Yu, and Jan Schlicht.
> 
> 
> Bugs: MESOS-9228
>     https://issues.apache.org/jira/browse/MESOS-9228
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This patch introduces a `LocalResourceProvider::stop` function to
> perform RP-specific stop procedures. SLRP uses this function to kill
> plugin container before its config is removed.
> 
> 
> Diffs
> -----
> 
>   src/resource_provider/daemon.cpp 0a76cc6d1d34413674d1af1aa514679a4d2b7b55 
>   src/resource_provider/local.hpp 20bcc78d3fe847e03526fa59116bdbac92ec1e29 
>   src/resource_provider/storage/provider.hpp 5a371b19289c6e39fedd4cda65fa8be432d095e6 
>   src/resource_provider/storage/provider.cpp 6475f653263337c381b6080695d09c49e5ea8fcf 
> 
> 
> Diff: https://reviews.apache.org/r/68763/diff/1/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Chun-Hung Hsiao
> 
>