You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by Benjamin Bannier <be...@mesosphere.io> on 2018/08/01 15:47:41 UTC

Review Request 68147: Added agent support to remove local resource providers.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68147/
-----------------------------------------------------------

Review request for mesos, Chun-Hung Hsiao and Jan Schlicht.


Bugs: MESOS-8403
    https://issues.apache.org/jira/browse/MESOS-8403


Repository: mesos


Description
-------

This patch adds support for triggering permanent removal of local
resource providers. We also add authorization and tests as part of this
patch.


Diffs
-----

  include/mesos/agent/agent.proto 74488e873cbf99ca487403b70691912cf3788288 
  include/mesos/v1/agent/agent.proto 5d1ab6fd7f6f21593444467c66ed35ee90aea07f 
  src/slave/http.hpp dcfd0d9831775912942afb746ce6704383868468 
  src/slave/http.cpp ab5864d9fd2fde478ed7da2ca7ed8abedc72c7c5 
  src/slave/slave.hpp 802d4eb9e9eba2f1175dd85e56dcc80a61e32f74 
  src/slave/slave.cpp e574c249f81e0e77abe982c126fe210a6ee8b591 
  src/slave/validation.cpp df5e1373dbe497bc859455dcaf4e064e923bd72e 
  src/tests/api_tests.cpp 182622a62d350ebefc891a385de3f2d35a7c0243 


Diff: https://reviews.apache.org/r/68147/diff/1/


Testing
-------

`make check`


Thanks,

Benjamin Bannier


Re: Review Request 68147: Added agent support to remove local resource providers.

Posted by Mesos Reviewbot <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68147/#review207312
-----------------------------------------------------------



Patch looks great!

Reviews applied: [68143, 68144, 68145, 68146, 68147]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' CONFIGURATION='--verbose --disable-libtool-wrappers' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1'; ./support/docker-build.sh

- Mesos Reviewbot


On Aug. 1, 2018, 11:47 p.m., Benjamin Bannier wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68147/
> -----------------------------------------------------------
> 
> (Updated Aug. 1, 2018, 11:47 p.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao and Jan Schlicht.
> 
> 
> Bugs: MESOS-8403
>     https://issues.apache.org/jira/browse/MESOS-8403
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This patch adds support for triggering permanent removal of local
> resource providers. We also add authorization and tests as part of this
> patch.
> 
> 
> Diffs
> -----
> 
>   include/mesos/agent/agent.proto 74488e873cbf99ca487403b70691912cf3788288 
>   include/mesos/v1/agent/agent.proto 5d1ab6fd7f6f21593444467c66ed35ee90aea07f 
>   src/slave/http.hpp dcfd0d9831775912942afb746ce6704383868468 
>   src/slave/http.cpp 1b6d266a1a7821c9de6871cbca43317b3c392a32 
>   src/slave/slave.hpp 802d4eb9e9eba2f1175dd85e56dcc80a61e32f74 
>   src/slave/slave.cpp e574c249f81e0e77abe982c126fe210a6ee8b591 
>   src/slave/validation.cpp df5e1373dbe497bc859455dcaf4e064e923bd72e 
>   src/tests/api_tests.cpp 9c9fa9105a27ac7bb7f30d7eb67512d1e1d63d15 
> 
> 
> Diff: https://reviews.apache.org/r/68147/diff/2/
> 
> 
> Testing
> -------
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>


Re: Review Request 68147: Added agent support to remove local resource providers.

Posted by Chun-Hung Hsiao <ch...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68147/#review207277
-----------------------------------------------------------




include/mesos/agent/agent.proto
Lines 100 (patched)
<https://reviews.apache.org/r/68147/#comment290610>

    Would `MARK_RESOURCE_PROVIDER_GONE` be a more consistent (w.r.t. the `MARK_AGENT_GONE` master API) and less confusing name?


- Chun-Hung Hsiao


On Aug. 1, 2018, 3:47 p.m., Benjamin Bannier wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68147/
> -----------------------------------------------------------
> 
> (Updated Aug. 1, 2018, 3:47 p.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao and Jan Schlicht.
> 
> 
> Bugs: MESOS-8403
>     https://issues.apache.org/jira/browse/MESOS-8403
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This patch adds support for triggering permanent removal of local
> resource providers. We also add authorization and tests as part of this
> patch.
> 
> 
> Diffs
> -----
> 
>   include/mesos/agent/agent.proto 74488e873cbf99ca487403b70691912cf3788288 
>   include/mesos/v1/agent/agent.proto 5d1ab6fd7f6f21593444467c66ed35ee90aea07f 
>   src/slave/http.hpp dcfd0d9831775912942afb746ce6704383868468 
>   src/slave/http.cpp 1b6d266a1a7821c9de6871cbca43317b3c392a32 
>   src/slave/slave.hpp 802d4eb9e9eba2f1175dd85e56dcc80a61e32f74 
>   src/slave/slave.cpp e574c249f81e0e77abe982c126fe210a6ee8b591 
>   src/slave/validation.cpp df5e1373dbe497bc859455dcaf4e064e923bd72e 
>   src/tests/api_tests.cpp 9c9fa9105a27ac7bb7f30d7eb67512d1e1d63d15 
> 
> 
> Diff: https://reviews.apache.org/r/68147/diff/2/
> 
> 
> Testing
> -------
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>


Re: Review Request 68147: Added agent support to remove local resource providers.

Posted by Mesos Reviewbot Windows <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68147/#review206789
-----------------------------------------------------------



FAIL: Some of the unit tests failed. Please check the relevant logs.

Reviews applied: `['68143', '68144', '68145', '68146', '68147']`

Failed command: `Start-MesosCITesting`

All the build artifacts available at: http://dcos-win.westus.cloudapp.azure.com/artifacts/mesos-reviewbot-testing/2028/mesos-review-68147

Relevant logs:

- [mesos-tests-stdout.log](http://dcos-win.westus.cloudapp.azure.com/artifacts/mesos-reviewbot-testing/2028/mesos-review-68147/logs/mesos-tests-stdout.log):

```
[----------] 9 tests from Endpoint/SlaveEndpointTest (980 ms total)

[----------] 2 tests from ContainerizerType/DefaultContainerDNSFlagTest
[ RUN      ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/0
[       OK ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/0 (31 ms)
[ RUN      ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/1
[       OK ] ContainerizerType/DefaultContainerDNSFlagTest.ValidateFlag/1 (36 ms)
[----------] 2 tests from ContainerizerType/DefaultContainerDNSFlagTest (70 ms total)

[----------] 1 test from IsolationFlag/CpuIsolatorTest
[ RUN      ] IsolationFlag/CpuIsolatorTest.ROOT_UserCpuUsage/0
[       OK ] IsolationFlag/CpuIsolatorTest.ROOT_UserCpuUsage/0 (1166 ms)
[----------] 1 test from IsolationFlag/CpuIsolatorTest (1187 ms total)

[----------] 1 test from IsolationFlag/MemoryIsolatorTest
[ RUN      ] IsolationFlag/MemoryIsolatorTest.ROOT_MemUsage/0
[       OK ] IsolationFlag/MemoryIsolatorTest.ROOT_MemUsage/0 (1891 ms)
[----------] 1 test from IsolationFlag/MemoryIsolatorTest (1914 ms total)

[----------] Global test environment tear-down
[==========] 1017 tests from 98 test cases ran. (737321 ms total)
[  PASSED  ] 1015 tests.
[  FAILED  ] 2 tests, listed below:
[  FAILED  ] DockerTest.ROOT_DOCKER_interface
[  FAILED  ] DockerTest.ROOT_DOCKER_kill

 2 FAILED TESTS
  YOU HAVE 222 DISABLED TESTS

```

- [mesos-tests-stderr.log](http://dcos-win.westus.cloudapp.azure.com/artifacts/mesos-reviewbot-testing/2028/mesos-review-68147/logs/mesos-tests-stderr.log):

```
I0802 11:12:05.241119 12580 slave.cpp:3939] Shutting down framework b05da931-60cd-4f01-95c3-fde8eb96c9e7-0000
I0802 11:12:05.241119  1596 master.cpp:10963] Updating the state of task ea44f51e-006c-4f2f-95a7-c1de36c6229f of framework b05da931-60cd-4f01-95c3-fde8eb96c9e7-0000 (latest state: TASK_KILLED, status update state: TASK_KILLED)
I0802 11:12:05.241119 12580 slave.cpp:6658] Shutting down executor 'ea44f51e-006c-4f2f-95a7-c1de36c6229f' of framework b05da931-60cd-I0802 11:12:04.184103 11216 exec.cpp:162] Version: 1.7.0
I0802 11:12:04.209116 12352 exec.cpp:236] Executor registered on agent b05da931-60cd-4f01-95c3-fde8eb96c9e7-S0
I0802 11:12:04.214114 11908 executor.cpp:182] Received SUBSCRIBED event
I0802 11:12:04.218117 11908 executor.cpp:186] Subscribed executor on windows-02.enofukwu14ruplxn0gs3yzmsgf.xx.internal.cloudapp.net
I0802 11:12:04.219120 11908 executor.cpp:182] Received LAUNCH event
I0802 11:12:04.223115 11908 executor.cpp:679] Starting task ea44f51e-006c-4f2f-95a7-c1de36c6229f
I0802 11:12:04.308123 11908 executor.cpp:499] Running 'D:\DCOS\mesos\src\mesos-containerizer.exe launch <POSSIBLY-SENSITIVE-DATA>'
I0802 11:12:05.215116 11908 executor.cpp:693] Forked command at 14264
I0802 11:12:05.244125  9860 exec.cpp:445] Executor asked to shutdown
I0802 11:12:05.245126 11908 executor.cpp:182] Received SHUTDOWN event
I0802 11:12:05.245126 11908 executor.cpp:796] Shutting down
I0802 11:12:05.245126 11908 executor.cpp:909] Sending SIGTERM to process tree at pid 144f01-95c3-fde8eb96c9e7-0000 at executor(1)@192.10.1.6:63036
I0802 11:12:05.243126  1596 master.cpp:11061] Removing task ea44f51e-006c-4f2f-95a7-c1de36c6229f with resources cpus(allocated: *):4; mem(allocated: *):2048; disk(allocated: *):1024; ports(allocated: *):[31000-32000] of framework b05da931-60cd-4f01-95c3-fde8eb96c9e7-0000 on agent b05da931-60cd-4f01-95c3-fde8eb96c9e7-S0 at slave(464)@192.10.1.6:61259 (windows-02.enofukwu14ruplxn0gs3yzmsgf.xx.internal.cloudapp.net)
I0802 11:12:05.243126 12580 slave.cpp:931] Agent terminating
W0802 11:12:05.244125 12580 slave.cpp:3935] Ignoring shutdown framework b05da931-60cd-4f01-95c3-fde8eb96c9e7-0000 because it is terminating
I0802 11:12:05.246127  1396 master.cpp:1338] Agent b05da931-60cd-4f01-95c3-fde8eb96c9e7-S0 at slave(464)@192.10.1.6:61259 (windows-02.enofukwu14ruplxn0gs3yzmsgf.xx.internal.cloudapp.net) disconnected
I0802 11:12:05.246127  1396 master.cpp:3354] Disconnecting agent b05da931-60cd-4f01-95c3-fde8eb96c9e7-S0 at slave(464)@192.10.1.6:61259 (windows-02.enofukwu14ruplxn0gs3yzmsgf.xx.internal.cloudapp.net)
I0802 11:12:05.247131  1396 master.cpp:3373] Deactivating agent b05da931-60cd-4f01-95c3-fde8eb96c9e7-S0 at slave(464)@192.10.1.6:61259 (windows-02.enofukwu14ruplxn0gs3yzmsgf.xx.internal.cloudapp.net)
I0802 11:12:05.248129 12580 containerizer.cpp:2407] Destroying container 14686d38-4041-4dc0-89cd-f6cf8e967204 in RUNNING state
I0802 11:12:05.248129 12580 containerizer.cpp:3021] Transitioning the state of container 14686d38-4041-4dc0-89cd-f6cf8e967204 from RUNNING to DESTROYING
I0802 11:12:05.249122 12580 launcher.cpp:155] Asked to destroy container 14686d38-4041-4dc0-89cd-f6cf8e967204
I0802 11:12:05.249122 12700 hierarchical.cpp:359] Removed framework b05da931-60cd-4f01-95c3-fde8eb96c9e7-0000
I0802 11:12:05.250120 12700 hierarchical.cpp:795] Agent b05da931-60cd-4f01-95c3-fde8eb96c9e7-S0 deactivated
I0802 11:12:05.320128 11900 containerizer.cpp:2860] Container 14686d38-4041-4dc0-89cd-f6cf8e967204 has exited
I0802 11:12:05.348127  6352 master.cpp:1180] Master terminating
I0802 11:12:05.350123  1596 hierarchical.cpp:637] Removed agent b05da931-60cd-4f01-95c3-fde8eb96c9e7-S0
I0802 11:12:05.691124  8044 process.cpp:926] Stopped the socket accept loop
```

- Mesos Reviewbot Windows


On Aug. 1, 2018, 3:47 p.m., Benjamin Bannier wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68147/
> -----------------------------------------------------------
> 
> (Updated Aug. 1, 2018, 3:47 p.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao and Jan Schlicht.
> 
> 
> Bugs: MESOS-8403
>     https://issues.apache.org/jira/browse/MESOS-8403
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This patch adds support for triggering permanent removal of local
> resource providers. We also add authorization and tests as part of this
> patch.
> 
> 
> Diffs
> -----
> 
>   include/mesos/agent/agent.proto 74488e873cbf99ca487403b70691912cf3788288 
>   include/mesos/v1/agent/agent.proto 5d1ab6fd7f6f21593444467c66ed35ee90aea07f 
>   src/slave/http.hpp dcfd0d9831775912942afb746ce6704383868468 
>   src/slave/http.cpp 1b6d266a1a7821c9de6871cbca43317b3c392a32 
>   src/slave/slave.hpp 802d4eb9e9eba2f1175dd85e56dcc80a61e32f74 
>   src/slave/slave.cpp e574c249f81e0e77abe982c126fe210a6ee8b591 
>   src/slave/validation.cpp df5e1373dbe497bc859455dcaf4e064e923bd72e 
>   src/tests/api_tests.cpp 9c9fa9105a27ac7bb7f30d7eb67512d1e1d63d15 
> 
> 
> Diff: https://reviews.apache.org/r/68147/diff/2/
> 
> 
> Testing
> -------
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>


Re: Review Request 68147: Added agent support to remove local resource providers.

Posted by Benjamin Bannier <be...@mesosphere.io>.

> On Aug. 15, 2018, 10:54 p.m., Chun-Hung Hsiao wrote:
> > src/slave/http.cpp
> > Lines 3356 (patched)
> > <https://reviews.apache.org/r/68147/diff/3/?file=2073008#file2073008line3356>
> >
> >     We should return an `InternalServerError` on failed/discarded.

I don't think this is required as a failed `Future` is automatically mapped onto an `InternalServerError`.


> On Aug. 15, 2018, 10:54 p.m., Chun-Hung Hsiao wrote:
> > src/slave/slave.cpp
> > Lines 8107-8108 (patched)
> > <https://reviews.apache.org/r/68147/diff/3/?file=2073010#file2073010line8107>
> >
> >     `"Resource provider manager is not initialized"`

Wouldn't that be leaking internal information? I tried to map this onto an operator-relevant issue here.


> On Aug. 15, 2018, 10:54 p.m., Chun-Hung Hsiao wrote:
> > src/tests/api_tests.cpp
> > Lines 7171-7179 (patched)
> > <https://reviews.apache.org/r/68147/diff/3/?file=2073012#file2073012line7171>
> >
> >     ```
> >     Future<http::Response> v1Response =
> >       post(slave.get()->pid, v1Call, contentType, DEFAULT_CREDENTIAL_2);
> >     ```

I would like to explicitly test the return HTTP status code, and using the `post` helper just maps a `Forbidden` on a `Failure`. Leave as is?


> On Aug. 15, 2018, 10:54 p.m., Chun-Hung Hsiao wrote:
> > src/tests/api_tests.cpp
> > Lines 7183-7191 (patched)
> > <https://reviews.apache.org/r/68147/diff/3/?file=2073012#file2073012line7183>
> >
> >     ```
> >     v1Response = post(slave.get()->pid, v1Call, contentType);
> >     ```

Ditto.


- Benjamin


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68147/#review207351
-----------------------------------------------------------


On Aug. 16, 2018, 4:31 p.m., Benjamin Bannier wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68147/
> -----------------------------------------------------------
> 
> (Updated Aug. 16, 2018, 4:31 p.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao and Jan Schlicht.
> 
> 
> Bugs: MESOS-8403
>     https://issues.apache.org/jira/browse/MESOS-8403
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This patch adds support for triggering permanent removal of local
> resource providers. We also add authorization and tests as part of this
> patch.
> 
> 
> Diffs
> -----
> 
>   include/mesos/agent/agent.proto 74488e873cbf99ca487403b70691912cf3788288 
>   include/mesos/v1/agent/agent.proto 5d1ab6fd7f6f21593444467c66ed35ee90aea07f 
>   src/resource_provider/manager.cpp abd7e38e5517ea600f9fc9b8a96c7d0d26df0620 
>   src/slave/http.hpp 78200874cb03a3d5af4ee0443c03a1d94cd494cc 
>   src/slave/http.cpp 1b6d266a1a7821c9de6871cbca43317b3c392a32 
>   src/slave/slave.hpp 802d4eb9e9eba2f1175dd85e56dcc80a61e32f74 
>   src/slave/slave.cpp 78e8666f402be58af5b6e20a715da4998af2615c 
>   src/slave/validation.cpp df5e1373dbe497bc859455dcaf4e064e923bd72e 
>   src/tests/api_tests.cpp 9c9fa9105a27ac7bb7f30d7eb67512d1e1d63d15 
> 
> 
> Diff: https://reviews.apache.org/r/68147/diff/4/
> 
> 
> Testing
> -------
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>


Re: Review Request 68147: Added agent support to remove local resource providers.

Posted by Benjamin Bannier <be...@mesosphere.io>.

> On Aug. 15, 2018, 10:54 p.m., Chun-Hung Hsiao wrote:
> > src/slave/http.cpp
> > Lines 3356 (patched)
> > <https://reviews.apache.org/r/68147/diff/3/?file=2073008#file2073008line3356>
> >
> >     We should return an `InternalServerError` on failed/discarded.
> 
> Benjamin Bannier wrote:
>     I don't think this is required as a failed `Future` is automatically mapped onto an `InternalServerError`.
> 
> Chun-Hung Hsiao wrote:
>     You are right about this. Which one is more appropriate in your opinion: `InternalServerError` or `ServiceUnavailable`? Feel free to drop this if you prefer the former.

I prefer `InternalServerError` as this is very likely a non-transient error.


> On Aug. 15, 2018, 10:54 p.m., Chun-Hung Hsiao wrote:
> > src/slave/slave.cpp
> > Lines 8107-8108 (patched)
> > <https://reviews.apache.org/r/68147/diff/3/?file=2073010#file2073010line8107>
> >
> >     `"Resource provider manager is not initialized"`
> 
> Benjamin Bannier wrote:
>     Wouldn't that be leaking internal information? I tried to map this onto an operator-relevant issue here.
> 
> Chun-Hung Hsiao wrote:
>     How about `"Agent has not registered yet"`?

Okay.


- Benjamin


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68147/#review207351
-----------------------------------------------------------


On Aug. 16, 2018, 4:31 p.m., Benjamin Bannier wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68147/
> -----------------------------------------------------------
> 
> (Updated Aug. 16, 2018, 4:31 p.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao and Jan Schlicht.
> 
> 
> Bugs: MESOS-8403
>     https://issues.apache.org/jira/browse/MESOS-8403
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This patch adds support for triggering permanent removal of local
> resource providers. We also add authorization and tests as part of this
> patch.
> 
> 
> Diffs
> -----
> 
>   include/mesos/agent/agent.proto 74488e873cbf99ca487403b70691912cf3788288 
>   include/mesos/v1/agent/agent.proto 5d1ab6fd7f6f21593444467c66ed35ee90aea07f 
>   src/slave/http.hpp 78200874cb03a3d5af4ee0443c03a1d94cd494cc 
>   src/slave/http.cpp 1b6d266a1a7821c9de6871cbca43317b3c392a32 
>   src/slave/slave.hpp 802d4eb9e9eba2f1175dd85e56dcc80a61e32f74 
>   src/slave/slave.cpp 78e8666f402be58af5b6e20a715da4998af2615c 
>   src/slave/validation.cpp df5e1373dbe497bc859455dcaf4e064e923bd72e 
>   src/tests/api_tests.cpp 9c9fa9105a27ac7bb7f30d7eb67512d1e1d63d15 
> 
> 
> Diff: https://reviews.apache.org/r/68147/diff/5/
> 
> 
> Testing
> -------
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>


Re: Review Request 68147: Added agent support to remove local resource providers.

Posted by Chun-Hung Hsiao <ch...@apache.org>.

> On Aug. 15, 2018, 8:54 p.m., Chun-Hung Hsiao wrote:
> > src/slave/http.cpp
> > Lines 3356 (patched)
> > <https://reviews.apache.org/r/68147/diff/3/?file=2073008#file2073008line3356>
> >
> >     We should return an `InternalServerError` on failed/discarded.
> 
> Benjamin Bannier wrote:
>     I don't think this is required as a failed `Future` is automatically mapped onto an `InternalServerError`.

You are right about this. Which one is more appropriate in your opinion: `InternalServerError` or `ServiceUnavailable`? Feel free to drop this if you prefer the former.


> On Aug. 15, 2018, 8:54 p.m., Chun-Hung Hsiao wrote:
> > src/slave/slave.cpp
> > Lines 8107-8108 (patched)
> > <https://reviews.apache.org/r/68147/diff/3/?file=2073010#file2073010line8107>
> >
> >     `"Resource provider manager is not initialized"`
> 
> Benjamin Bannier wrote:
>     Wouldn't that be leaking internal information? I tried to map this onto an operator-relevant issue here.

How about `"Agent has not registered yet"`?


> On Aug. 15, 2018, 8:54 p.m., Chun-Hung Hsiao wrote:
> > src/tests/api_tests.cpp
> > Lines 7171-7179 (patched)
> > <https://reviews.apache.org/r/68147/diff/3/?file=2073012#file2073012line7171>
> >
> >     ```
> >     Future<http::Response> v1Response =
> >       post(slave.get()->pid, v1Call, contentType, DEFAULT_CREDENTIAL_2);
> >     ```
> 
> Benjamin Bannier wrote:
>     I would like to explicitly test the return HTTP status code, and using the `post` helper just maps a `Forbidden` on a `Failure`. Leave as is?

Make sense. Dropping this. Maybe we could slightly modify the helper in the future.


- Chun-Hung


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68147/#review207351
-----------------------------------------------------------


On Aug. 16, 2018, 2:31 p.m., Benjamin Bannier wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68147/
> -----------------------------------------------------------
> 
> (Updated Aug. 16, 2018, 2:31 p.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao and Jan Schlicht.
> 
> 
> Bugs: MESOS-8403
>     https://issues.apache.org/jira/browse/MESOS-8403
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This patch adds support for triggering permanent removal of local
> resource providers. We also add authorization and tests as part of this
> patch.
> 
> 
> Diffs
> -----
> 
>   include/mesos/agent/agent.proto 74488e873cbf99ca487403b70691912cf3788288 
>   include/mesos/v1/agent/agent.proto 5d1ab6fd7f6f21593444467c66ed35ee90aea07f 
>   src/slave/http.hpp 78200874cb03a3d5af4ee0443c03a1d94cd494cc 
>   src/slave/http.cpp 1b6d266a1a7821c9de6871cbca43317b3c392a32 
>   src/slave/slave.hpp 802d4eb9e9eba2f1175dd85e56dcc80a61e32f74 
>   src/slave/slave.cpp 78e8666f402be58af5b6e20a715da4998af2615c 
>   src/slave/validation.cpp df5e1373dbe497bc859455dcaf4e064e923bd72e 
>   src/tests/api_tests.cpp 9c9fa9105a27ac7bb7f30d7eb67512d1e1d63d15 
> 
> 
> Diff: https://reviews.apache.org/r/68147/diff/5/
> 
> 
> Testing
> -------
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>


Re: Review Request 68147: Added agent support to remove local resource providers.

Posted by Chun-Hung Hsiao <ch...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68147/#review207351
-----------------------------------------------------------




src/slave/http.cpp
Lines 3356 (patched)
<https://reviews.apache.org/r/68147/#comment290708>

    We should return an `InternalServerError` on failed/discarded.



src/slave/slave.cpp
Lines 8107-8108 (patched)
<https://reviews.apache.org/r/68147/#comment290712>

    `"Resource provider manager is not initialized"`



src/tests/api_tests.cpp
Lines 7171-7179 (patched)
<https://reviews.apache.org/r/68147/#comment290716>

    ```
    Future<http::Response> v1Response =
      post(slave.get()->pid, v1Call, contentType, DEFAULT_CREDENTIAL_2);
    ```



src/tests/api_tests.cpp
Lines 7183-7191 (patched)
<https://reviews.apache.org/r/68147/#comment290717>

    ```
    v1Response = post(slave.get()->pid, v1Call, contentType);
    ```


- Chun-Hung Hsiao


On Aug. 15, 2018, 1:53 p.m., Benjamin Bannier wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68147/
> -----------------------------------------------------------
> 
> (Updated Aug. 15, 2018, 1:53 p.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao and Jan Schlicht.
> 
> 
> Bugs: MESOS-8403
>     https://issues.apache.org/jira/browse/MESOS-8403
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This patch adds support for triggering permanent removal of local
> resource providers. We also add authorization and tests as part of this
> patch.
> 
> 
> Diffs
> -----
> 
>   include/mesos/agent/agent.proto 74488e873cbf99ca487403b70691912cf3788288 
>   include/mesos/v1/agent/agent.proto 5d1ab6fd7f6f21593444467c66ed35ee90aea07f 
>   src/slave/http.hpp 78200874cb03a3d5af4ee0443c03a1d94cd494cc 
>   src/slave/http.cpp 1b6d266a1a7821c9de6871cbca43317b3c392a32 
>   src/slave/slave.hpp 802d4eb9e9eba2f1175dd85e56dcc80a61e32f74 
>   src/slave/slave.cpp 78e8666f402be58af5b6e20a715da4998af2615c 
>   src/slave/validation.cpp df5e1373dbe497bc859455dcaf4e064e923bd72e 
>   src/tests/api_tests.cpp 9c9fa9105a27ac7bb7f30d7eb67512d1e1d63d15 
> 
> 
> Diff: https://reviews.apache.org/r/68147/diff/3/
> 
> 
> Testing
> -------
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>


Re: Review Request 68147: Added agent support to remove local resource providers.

Posted by Benjamin Bannier <be...@mesosphere.io>.

> On Aug. 18, 2018, 12:29 a.m., Chun-Hung Hsiao wrote:
> > src/slave/slave.cpp
> > Lines 7934 (patched)
> > <https://reviews.apache.org/r/68147/diff/6/?file=2074578#file2074578line7934>
> >
> >     `OPERATION_GONE_BY_OPERATOR` is not a terminal state:
> >     https://github.com/apache/mesos/blob/808485da01387bd27a51cb82a90b1f8301d613ee/include/mesos/mesos.proto#L2342-L2349
> 
> Benjamin Bannier wrote:
>     From https://reviews.apache.org/r/68410/#comment_rc290945-207560,
>     
>     > ... It seems that OPERATOR_GONE_BY_OPERATOR does not carry the semantics we want. Looking at the other possible error codes, it seems we could either use OPERATION_ERROR or OPERATION_FAILED instead where OPERATION_ERROR possibly better fits our requirements as the operation's side effects might have materialized. That would mean we should drop this patch completely.  
>     > WDYT?
> 
> Chun-Hung Hsiao wrote:
>     I'm not sure if either is a good choice:
>     
>     `OPERATION_ERROR` usually means some validation failure before applying the operation, the consumed resource remains (i.e., can be used by other operations), but the operation is non-retryable (i.e., the caller must fix the error).
>     `OPERATION_FAILED` usually means something goes wrong after applying the operation, the consumed resource remains, and the operation is retryable.
>     
>     In the case of mark RP gone, although we can argue that any operation becomes an non-retryable one and we don't care about the consumed resoure because it will then be removed. There still are some issues:
>     
>     1. The reason of the operation being non-retryable is not due to any client-side error, but a server-side "error," and this seems inconsistent to the current `OPERATION_ERROR` usage.
>     2. We transition an operation to `OPERATION_GONE_BY_OPERATOR` when marking an agent as gone, but to `OPERATION_ERROR` when marking a local RP as gone. This sounds inconsistent to me from an API perspective.
>     3. For `OPERATION_ERROR` and `OPERATION_FAILED`, Mesos provides reliable retry (at least under "normal" circumstances), and expects acks. For `OPERATION_GONE_BY_OPERATOR`, we could just do a best-effort notification (the implementation is missing though) and don't need any ack, similar to `TASK_GONE_BY_OPERATOR`. In the case of mark RP as gone, we probably want to go with the later.
>     
>     So it seems better if we still go with `OPERATION_GONE_BY_OPERATOR` and make the proper master changes. And we need to make sure that we call `allocator->updateSlave` before calling `allocator->recoverResources` to avoid the consumed resource being offered to other framework in between.

I added changes to use `OPERATION_GONE_BY_OPERATOR` here, and to treat it as terminal.


> On Aug. 18, 2018, 12:29 a.m., Chun-Hung Hsiao wrote:
> > src/slave/slave.cpp
> > Lines 8195 (patched)
> > <https://reviews.apache.org/r/68147/diff/6/?file=2074578#file2074578line8195>
> >
> >     Let's validate that there is no task using the resources provided by this RP before doing the removal, or fail the call.
> 
> Benjamin Bannier wrote:
>     I believe this should be treated as orthogonal issue. It might e.g., happen that the user triggering the RP removal cannot kill tasks or prevent them from being rescheduled. I'd suggest to leaves as is.
>     
>     WDYT?
> 
> Chun-Hung Hsiao wrote:
>     The scenario I am worried about is that when the RP is marked as gone, the RP may fail to tear itself down if there is any task using its resources.
> 
> Benjamin Bannier wrote:
>     We don't seem to have a good story on incompatible resource changes ATM. If we e.g., update a RP config in an incompatible way, we make not attempt to detect or rectify this in any way, either. At least in the agent we were in general we were operating under the assumption that tasks would die on incompatible changes to their resources.
>     
>     I'd suggest to drop this issue for now and potentially file a JIRA to track draining and situations where it might be needed.

Dropping this, see above comment.


- Benjamin


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68147/#review207559
-----------------------------------------------------------


On Oct. 25, 2018, 12:47 p.m., Benjamin Bannier wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68147/
> -----------------------------------------------------------
> 
> (Updated Oct. 25, 2018, 12:47 p.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao, Gastón Kleiman, Greg Mann, and Jan Schlicht.
> 
> 
> Bugs: MESOS-8403
>     https://issues.apache.org/jira/browse/MESOS-8403
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This patch adds support for triggering permanent removal of local
> resource providers. We also add authorization and tests as part of this
> patch.
> 
> 
> Diffs
> -----
> 
>   include/mesos/agent/agent.proto 74488e873cbf99ca487403b70691912cf3788288 
>   include/mesos/v1/agent/agent.proto 5d1ab6fd7f6f21593444467c66ed35ee90aea07f 
>   src/common/protobuf_utils.cpp 2ae37ac610894652a0214022a0cf04c4365396dd 
>   src/master/master.cpp f88c7c1f03f0de7236aad9e3bf4bfac82e91bc65 
>   src/slave/http.hpp 5b113fa2f7e3421d4219e0ece113010937b204c1 
>   src/slave/http.cpp a4db532cc6477c386fcd9bf563895214e95a475a 
>   src/slave/slave.hpp 0bd340176e2a8cefdfa7ef71e059441fb171aff6 
>   src/slave/slave.cpp 74f6fb9036a9ac4f587f53ec2df04eeb4c167bfb 
>   src/slave/validation.cpp df5e1373dbe497bc859455dcaf4e064e923bd72e 
>   src/tests/api_tests.cpp fdd9f871f75617fc26a28679e2a1e41f506c6133 
> 
> 
> Diff: https://reviews.apache.org/r/68147/diff/10/
> 
> 
> Testing
> -------
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>


Re: Review Request 68147: Added agent support to remove local resource providers.

Posted by Chun-Hung Hsiao <ch...@apache.org>.

> On Aug. 17, 2018, 10:29 p.m., Chun-Hung Hsiao wrote:
> > src/slave/slave.cpp
> > Lines 7934 (patched)
> > <https://reviews.apache.org/r/68147/diff/6/?file=2074578#file2074578line7934>
> >
> >     `OPERATION_GONE_BY_OPERATOR` is not a terminal state:
> >     https://github.com/apache/mesos/blob/808485da01387bd27a51cb82a90b1f8301d613ee/include/mesos/mesos.proto#L2342-L2349
> 
> Benjamin Bannier wrote:
>     From https://reviews.apache.org/r/68410/#comment_rc290945-207560,
>     
>     > ... It seems that OPERATOR_GONE_BY_OPERATOR does not carry the semantics we want. Looking at the other possible error codes, it seems we could either use OPERATION_ERROR or OPERATION_FAILED instead where OPERATION_ERROR possibly better fits our requirements as the operation's side effects might have materialized. That would mean we should drop this patch completely.  
>     > WDYT?

I'm not sure if either is a good choice:

`OPERATION_ERROR` usually means some validation failure before applying the operation, the consumed resource remains (i.e., can be used by other operations), but the operation is non-retryable (i.e., the caller must fix the error).
`OPERATION_FAILED` usually means something goes wrong after applying the operation, the consumed resource remains, and the operation is retryable.

In the case of mark RP gone, although we can argue that any operation becomes an non-retryable one and we don't care about the consumed resoure because it will then be removed. There still are some issues:

1. The reason of the operation being non-retryable is not due to any client-side error, but a server-side "error," and this seems inconsistent to the current `OPERATION_ERROR` usage.
2. We transition an operation to `OPERATION_GONE_BY_OPERATOR` when marking an agent as gone, but to `OPERATION_ERROR` when marking a local RP as gone. This sounds inconsistent to me from an API perspective.
3. For `OPERATION_ERROR` and `OPERATION_FAILED`, Mesos provides reliable retry (at least under "normal" circumstances), and expects acks. For `OPERATION_GONE_BY_OPERATOR`, we could just do a best-effort notification (the implementation is missing though) and don't need any ack, similar to `TASK_GONE_BY_OPERATOR`. In the case of mark RP as gone, we probably want to go with the later.

So it seems better if we still go with `OPERATION_GONE_BY_OPERATOR` and make the proper master changes. And we need to make sure that we call `allocator->updateSlave` before calling `allocator->recoverResources` to avoid the consumed resource being offered to other framework in between.


> On Aug. 17, 2018, 10:29 p.m., Chun-Hung Hsiao wrote:
> > src/slave/slave.cpp
> > Lines 8195 (patched)
> > <https://reviews.apache.org/r/68147/diff/6/?file=2074578#file2074578line8195>
> >
> >     Let's validate that there is no task using the resources provided by this RP before doing the removal, or fail the call.
> 
> Benjamin Bannier wrote:
>     I believe this should be treated as orthogonal issue. It might e.g., happen that the user triggering the RP removal cannot kill tasks or prevent them from being rescheduled. I'd suggest to leaves as is.
>     
>     WDYT?

The scenario I am worried about is that when the RP is marked as gone, the RP may fail to tear itself down if there is any task using its resources.


- Chun-Hung


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68147/#review207559
-----------------------------------------------------------


On Aug. 20, 2018, 6:47 p.m., Benjamin Bannier wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68147/
> -----------------------------------------------------------
> 
> (Updated Aug. 20, 2018, 6:47 p.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao, Gastón Kleiman, Greg Mann, and Jan Schlicht.
> 
> 
> Bugs: MESOS-8403
>     https://issues.apache.org/jira/browse/MESOS-8403
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This patch adds support for triggering permanent removal of local
> resource providers. We also add authorization and tests as part of this
> patch.
> 
> 
> Diffs
> -----
> 
>   include/mesos/agent/agent.proto 74488e873cbf99ca487403b70691912cf3788288 
>   include/mesos/v1/agent/agent.proto 5d1ab6fd7f6f21593444467c66ed35ee90aea07f 
>   src/slave/http.hpp 78200874cb03a3d5af4ee0443c03a1d94cd494cc 
>   src/slave/http.cpp 1b6d266a1a7821c9de6871cbca43317b3c392a32 
>   src/slave/slave.hpp 0420109ac93e1249906c52437e5859c5ee033fb6 
>   src/slave/slave.cpp 679394a549cdfe84d64a102164c8652ad96f1eb2 
>   src/slave/validation.cpp df5e1373dbe497bc859455dcaf4e064e923bd72e 
>   src/tests/api_tests.cpp 9c9fa9105a27ac7bb7f30d7eb67512d1e1d63d15 
> 
> 
> Diff: https://reviews.apache.org/r/68147/diff/7/
> 
> 
> Testing
> -------
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>


Re: Review Request 68147: Added agent support to remove local resource providers.

Posted by Chun-Hung Hsiao <ch...@apache.org>.

> On Aug. 17, 2018, 10:29 p.m., Chun-Hung Hsiao wrote:
> > src/slave/slave.cpp
> > Lines 8195 (patched)
> > <https://reviews.apache.org/r/68147/diff/6/?file=2074578#file2074578line8195>
> >
> >     Let's validate that there is no task using the resources provided by this RP before doing the removal, or fail the call.
> 
> Benjamin Bannier wrote:
>     I believe this should be treated as orthogonal issue. It might e.g., happen that the user triggering the RP removal cannot kill tasks or prevent them from being rescheduled. I'd suggest to leaves as is.
>     
>     WDYT?
> 
> Chun-Hung Hsiao wrote:
>     The scenario I am worried about is that when the RP is marked as gone, the RP may fail to tear itself down if there is any task using its resources.
> 
> Benjamin Bannier wrote:
>     We don't seem to have a good story on incompatible resource changes ATM. If we e.g., update a RP config in an incompatible way, we make not attempt to detect or rectify this in any way, either. At least in the agent we were in general we were operating under the assumption that tasks would die on incompatible changes to their resources.
>     
>     I'd suggest to drop this issue for now and potentially file a JIRA to track draining and situations where it might be needed.
> 
> Benjamin Bannier wrote:
>     Dropping this, see above comment.
> 
> Chun-Hung Hsiao wrote:
>     Reopening this. Here's my suggestion:
>     
>     1. Create a ticket for a proper design to drain the resource provider.
>     2. For now, let's fail the call here if `slave->resourceProviders.at(resourceProviderId)->totalResources` is not empty. In other words, the operator must first reconfigure the resource provider to "drain" itself before making the `MARK_RESOURCE_PROVIDER_GONE` call.
> 
> Benjamin Bannier wrote:
>     Okay.
>     
>     1. I created https://issues.apache.org/jira/browse/MESOS-9522.
>     2. We had discussed this some time ago, and I think agreed that preventing operators from marking resource providers as gone whenever a task was running on them was not good (especially if operators cannot drain the RP). We have prior art in `MARK_AGENT_GONE` which asks operators to drain agent before marking as gone. I would suggest for now with MESOS-9522 not implemented we allow operators to mark resource providers as gone even if some of their resources are in use; if needed they can drain them by filtering all tasks using their resources. This also wouldn't expose a race.
>     
>     Does this address your concerns?

I agreed that checking if there is any task running on resources of an RP when marking it as gone would lead to some complexity, so I'm suggesting to use a stronger condition: there must be no resource.

I'm not convinced that asking the operator to drain the RP is a bad idea. My main concern is that, since `MARK_RESOURCE_PROVIDER_GONE` cannot be undone, we should only perform it when it is safe to do, so the operator can have a chance to fix the problem. Also, this restriction would ensure that no framework will have an inconsistent view of cluster resources. WDYT?


- Chun-Hung


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68147/#review207559
-----------------------------------------------------------


On Jan. 15, 2019, 3:03 p.m., Benjamin Bannier wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68147/
> -----------------------------------------------------------
> 
> (Updated Jan. 15, 2019, 3:03 p.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao, Gastón Kleiman, Greg Mann, and Jan Schlicht.
> 
> 
> Bugs: MESOS-8403
>     https://issues.apache.org/jira/browse/MESOS-8403
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This patch adds support for triggering permanent removal of local
> resource providers. We also add authorization and tests as part of this
> patch.
> 
> 
> Diffs
> -----
> 
>   include/mesos/agent/agent.proto 74488e873cbf99ca487403b70691912cf3788288 
>   include/mesos/v1/agent/agent.proto 5d1ab6fd7f6f21593444467c66ed35ee90aea07f 
>   src/common/protobuf_utils.cpp a0159fed8d325808c5e8519da06173441debbbbb 
>   src/master/master.cpp 2339207149a85578ea47cf66f28392182f9075f2 
>   src/slave/http.hpp 5b113fa2f7e3421d4219e0ece113010937b204c1 
>   src/slave/http.cpp e0dc09103e9186aa3f7328d2052fc6747f5be9bb 
>   src/slave/slave.hpp 2eadf5fce9a314f1ec0ac5d51820c6381f5f1468 
>   src/slave/slave.cpp 10cbc190acc7eea5734efa0066541168545a66b6 
>   src/slave/validation.cpp df5e1373dbe497bc859455dcaf4e064e923bd72e 
>   src/tests/api_tests.cpp 79a2d2966ff6533629aeeefaaaf19e849b76e0e6 
> 
> 
> Diff: https://reviews.apache.org/r/68147/diff/13/
> 
> 
> Testing
> -------
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>


Re: Review Request 68147: Added agent support to remove local resource providers.

Posted by Chun-Hung Hsiao <ch...@apache.org>.

> On Aug. 17, 2018, 10:29 p.m., Chun-Hung Hsiao wrote:
> > src/slave/slave.cpp
> > Lines 8195 (patched)
> > <https://reviews.apache.org/r/68147/diff/6/?file=2074578#file2074578line8195>
> >
> >     Let's validate that there is no task using the resources provided by this RP before doing the removal, or fail the call.
> 
> Benjamin Bannier wrote:
>     I believe this should be treated as orthogonal issue. It might e.g., happen that the user triggering the RP removal cannot kill tasks or prevent them from being rescheduled. I'd suggest to leaves as is.
>     
>     WDYT?
> 
> Chun-Hung Hsiao wrote:
>     The scenario I am worried about is that when the RP is marked as gone, the RP may fail to tear itself down if there is any task using its resources.
> 
> Benjamin Bannier wrote:
>     We don't seem to have a good story on incompatible resource changes ATM. If we e.g., update a RP config in an incompatible way, we make not attempt to detect or rectify this in any way, either. At least in the agent we were in general we were operating under the assumption that tasks would die on incompatible changes to their resources.
>     
>     I'd suggest to drop this issue for now and potentially file a JIRA to track draining and situations where it might be needed.
> 
> Benjamin Bannier wrote:
>     Dropping this, see above comment.

Reopening this. Here's my suggestion:

1. Create a ticket for a proper design to drain the resource provider.
2. For now, let's fail the call here if `slave->resourceProviders.at(resourceProviderId)->totalResources` is not empty. In other words, the operator must first reconfigure the resource provider to "drain" itself before making the `MARK_RESOURCE_PROVIDER_GONE` call.


- Chun-Hung


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68147/#review207559
-----------------------------------------------------------


On Oct. 25, 2018, 10:47 a.m., Benjamin Bannier wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68147/
> -----------------------------------------------------------
> 
> (Updated Oct. 25, 2018, 10:47 a.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao, Gastón Kleiman, Greg Mann, and Jan Schlicht.
> 
> 
> Bugs: MESOS-8403
>     https://issues.apache.org/jira/browse/MESOS-8403
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This patch adds support for triggering permanent removal of local
> resource providers. We also add authorization and tests as part of this
> patch.
> 
> 
> Diffs
> -----
> 
>   include/mesos/agent/agent.proto 74488e873cbf99ca487403b70691912cf3788288 
>   include/mesos/v1/agent/agent.proto 5d1ab6fd7f6f21593444467c66ed35ee90aea07f 
>   src/common/protobuf_utils.cpp a45607eed4c4bae5010bcc3f3ffeabd6d911062a 
>   src/master/master.cpp 3b3824a67f46866cd64e32d7f9f92484b5891aa2 
>   src/slave/http.hpp 5b113fa2f7e3421d4219e0ece113010937b204c1 
>   src/slave/http.cpp ba389263ef7ab311682d542424eb56360841d24c 
>   src/slave/slave.hpp edf7269d4057ec8c95bb54c855210ad00d002a50 
>   src/slave/slave.cpp 858b78620e1ef33f3587d0bd95a684917aaf5bbb 
>   src/slave/validation.cpp df5e1373dbe497bc859455dcaf4e064e923bd72e 
>   src/tests/api_tests.cpp edde48a4b4b46c9f47bc44de389c97b77322b8e8 
> 
> 
> Diff: https://reviews.apache.org/r/68147/diff/12/
> 
> 
> Testing
> -------
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>


Re: Review Request 68147: Added agent support to remove local resource providers.

Posted by Benjamin Bannier <be...@mesosphere.io>.

> On Aug. 18, 2018, 12:29 a.m., Chun-Hung Hsiao wrote:
> > src/slave/slave.cpp
> > Lines 8195 (patched)
> > <https://reviews.apache.org/r/68147/diff/6/?file=2074578#file2074578line8195>
> >
> >     Let's validate that there is no task using the resources provided by this RP before doing the removal, or fail the call.
> 
> Benjamin Bannier wrote:
>     I believe this should be treated as orthogonal issue. It might e.g., happen that the user triggering the RP removal cannot kill tasks or prevent them from being rescheduled. I'd suggest to leaves as is.
>     
>     WDYT?
> 
> Chun-Hung Hsiao wrote:
>     The scenario I am worried about is that when the RP is marked as gone, the RP may fail to tear itself down if there is any task using its resources.
> 
> Benjamin Bannier wrote:
>     We don't seem to have a good story on incompatible resource changes ATM. If we e.g., update a RP config in an incompatible way, we make not attempt to detect or rectify this in any way, either. At least in the agent we were in general we were operating under the assumption that tasks would die on incompatible changes to their resources.
>     
>     I'd suggest to drop this issue for now and potentially file a JIRA to track draining and situations where it might be needed.
> 
> Benjamin Bannier wrote:
>     Dropping this, see above comment.
> 
> Chun-Hung Hsiao wrote:
>     Reopening this. Here's my suggestion:
>     
>     1. Create a ticket for a proper design to drain the resource provider.
>     2. For now, let's fail the call here if `slave->resourceProviders.at(resourceProviderId)->totalResources` is not empty. In other words, the operator must first reconfigure the resource provider to "drain" itself before making the `MARK_RESOURCE_PROVIDER_GONE` call.

Okay.

1. I created https://issues.apache.org/jira/browse/MESOS-9522.
2. We had discussed this some time ago, and I think agreed that preventing operators from marking resource providers as gone whenever a task was running on them was not good (especially if operators cannot drain the RP). We have prior art in `MARK_AGENT_GONE` which asks operators to drain agent before marking as gone. I would suggest for now with MESOS-9522 not implemented we allow operators to mark resource providers as gone even if some of their resources are in use; if needed they can drain them by filtering all tasks using their resources. This also wouldn't expose a race.

Does this address your concerns?


- Benjamin


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68147/#review207559
-----------------------------------------------------------


On Jan. 15, 2019, 3:53 p.m., Benjamin Bannier wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68147/
> -----------------------------------------------------------
> 
> (Updated Jan. 15, 2019, 3:53 p.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao, Gastón Kleiman, Greg Mann, and Jan Schlicht.
> 
> 
> Bugs: MESOS-8403
>     https://issues.apache.org/jira/browse/MESOS-8403
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This patch adds support for triggering permanent removal of local
> resource providers. We also add authorization and tests as part of this
> patch.
> 
> 
> Diffs
> -----
> 
>   include/mesos/agent/agent.proto 74488e873cbf99ca487403b70691912cf3788288 
>   include/mesos/v1/agent/agent.proto 5d1ab6fd7f6f21593444467c66ed35ee90aea07f 
>   src/common/protobuf_utils.cpp a45607eed4c4bae5010bcc3f3ffeabd6d911062a 
>   src/master/master.cpp 3b3824a67f46866cd64e32d7f9f92484b5891aa2 
>   src/slave/http.hpp 5b113fa2f7e3421d4219e0ece113010937b204c1 
>   src/slave/http.cpp ba389263ef7ab311682d542424eb56360841d24c 
>   src/slave/slave.hpp edf7269d4057ec8c95bb54c855210ad00d002a50 
>   src/slave/slave.cpp 858b78620e1ef33f3587d0bd95a684917aaf5bbb 
>   src/slave/validation.cpp df5e1373dbe497bc859455dcaf4e064e923bd72e 
>   src/tests/api_tests.cpp edde48a4b4b46c9f47bc44de389c97b77322b8e8 
> 
> 
> Diff: https://reviews.apache.org/r/68147/diff/12/
> 
> 
> Testing
> -------
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>


Re: Review Request 68147: Added agent support to remove local resource providers.

Posted by Benjamin Bannier <be...@mesosphere.io>.

> On Aug. 18, 2018, 12:29 a.m., Chun-Hung Hsiao wrote:
> > src/slave/slave.cpp
> > Lines 7934 (patched)
> > <https://reviews.apache.org/r/68147/diff/6/?file=2074578#file2074578line7934>
> >
> >     `OPERATION_GONE_BY_OPERATOR` is not a terminal state:
> >     https://github.com/apache/mesos/blob/808485da01387bd27a51cb82a90b1f8301d613ee/include/mesos/mesos.proto#L2342-L2349

From https://reviews.apache.org/r/68410/#comment_rc290945-207560,

> ... It seems that OPERATOR_GONE_BY_OPERATOR does not carry the semantics we want. Looking at the other possible error codes, it seems we could either use OPERATION_ERROR or OPERATION_FAILED instead where OPERATION_ERROR possibly better fits our requirements as the operation's side effects might have materialized. That would mean we should drop this patch completely.  
> WDYT?


> On Aug. 18, 2018, 12:29 a.m., Chun-Hung Hsiao wrote:
> > src/slave/slave.cpp
> > Lines 8195 (patched)
> > <https://reviews.apache.org/r/68147/diff/6/?file=2074578#file2074578line8195>
> >
> >     Let's validate that there is no task using the resources provided by this RP before doing the removal, or fail the call.

I believe this should be treated as orthogonal issue. It might e.g., happen that the user triggering the RP removal cannot kill tasks or prevent them from being rescheduled. I'd suggest to leaves as is.

WDYT?


- Benjamin


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68147/#review207559
-----------------------------------------------------------


On Aug. 20, 2018, 11:52 a.m., Benjamin Bannier wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68147/
> -----------------------------------------------------------
> 
> (Updated Aug. 20, 2018, 11:52 a.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao and Jan Schlicht.
> 
> 
> Bugs: MESOS-8403
>     https://issues.apache.org/jira/browse/MESOS-8403
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This patch adds support for triggering permanent removal of local
> resource providers. We also add authorization and tests as part of this
> patch.
> 
> 
> Diffs
> -----
> 
>   include/mesos/agent/agent.proto 74488e873cbf99ca487403b70691912cf3788288 
>   include/mesos/v1/agent/agent.proto 5d1ab6fd7f6f21593444467c66ed35ee90aea07f 
>   src/slave/http.hpp 78200874cb03a3d5af4ee0443c03a1d94cd494cc 
>   src/slave/http.cpp 1b6d266a1a7821c9de6871cbca43317b3c392a32 
>   src/slave/slave.hpp 0420109ac93e1249906c52437e5859c5ee033fb6 
>   src/slave/slave.cpp 679394a549cdfe84d64a102164c8652ad96f1eb2 
>   src/slave/validation.cpp df5e1373dbe497bc859455dcaf4e064e923bd72e 
>   src/tests/api_tests.cpp 9c9fa9105a27ac7bb7f30d7eb67512d1e1d63d15 
> 
> 
> Diff: https://reviews.apache.org/r/68147/diff/7/
> 
> 
> Testing
> -------
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>


Re: Review Request 68147: Added agent support to remove local resource providers.

Posted by Benjamin Bannier <be...@mesosphere.io>.

> On Aug. 18, 2018, 12:29 a.m., Chun-Hung Hsiao wrote:
> > src/slave/slave.cpp
> > Lines 8195 (patched)
> > <https://reviews.apache.org/r/68147/diff/6/?file=2074578#file2074578line8195>
> >
> >     Let's validate that there is no task using the resources provided by this RP before doing the removal, or fail the call.
> 
> Benjamin Bannier wrote:
>     I believe this should be treated as orthogonal issue. It might e.g., happen that the user triggering the RP removal cannot kill tasks or prevent them from being rescheduled. I'd suggest to leaves as is.
>     
>     WDYT?
> 
> Chun-Hung Hsiao wrote:
>     The scenario I am worried about is that when the RP is marked as gone, the RP may fail to tear itself down if there is any task using its resources.

We don't seem to have a good story on incompatible resource changes ATM. If we e.g., update a RP config in an incompatible way, we make not attempt to detect or rectify this in any way, either. At least in the agent we were in general we were operating under the assumption that tasks would die on incompatible changes to their resources.

I'd suggest to drop this issue for now and potentially file a JIRA to track draining and situations where it might be needed.


- Benjamin


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68147/#review207559
-----------------------------------------------------------


On Aug. 20, 2018, 8:47 p.m., Benjamin Bannier wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68147/
> -----------------------------------------------------------
> 
> (Updated Aug. 20, 2018, 8:47 p.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao, Gastón Kleiman, Greg Mann, and Jan Schlicht.
> 
> 
> Bugs: MESOS-8403
>     https://issues.apache.org/jira/browse/MESOS-8403
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This patch adds support for triggering permanent removal of local
> resource providers. We also add authorization and tests as part of this
> patch.
> 
> 
> Diffs
> -----
> 
>   include/mesos/agent/agent.proto 74488e873cbf99ca487403b70691912cf3788288 
>   include/mesos/v1/agent/agent.proto 5d1ab6fd7f6f21593444467c66ed35ee90aea07f 
>   src/slave/http.hpp 78200874cb03a3d5af4ee0443c03a1d94cd494cc 
>   src/slave/http.cpp 1b6d266a1a7821c9de6871cbca43317b3c392a32 
>   src/slave/slave.hpp b1d695b59a63a5af34b19faff16bf6c82b7e7d88 
>   src/slave/slave.cpp e6c7e686f287fb4448a0074d4e99298665fc866d 
>   src/slave/validation.cpp df5e1373dbe497bc859455dcaf4e064e923bd72e 
>   src/tests/api_tests.cpp ee82350f7a6c6d44ba0590608e7d9a223c0be169 
> 
> 
> Diff: https://reviews.apache.org/r/68147/diff/8/
> 
> 
> Testing
> -------
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>


Re: Review Request 68147: Added agent support to remove local resource providers.

Posted by Benjamin Bannier <be...@mesosphere.io>.

> On Aug. 18, 2018, 12:29 a.m., Chun-Hung Hsiao wrote:
> > src/slave/slave.cpp
> > Lines 8195 (patched)
> > <https://reviews.apache.org/r/68147/diff/6/?file=2074578#file2074578line8195>
> >
> >     Let's validate that there is no task using the resources provided by this RP before doing the removal, or fail the call.
> 
> Benjamin Bannier wrote:
>     I believe this should be treated as orthogonal issue. It might e.g., happen that the user triggering the RP removal cannot kill tasks or prevent them from being rescheduled. I'd suggest to leaves as is.
>     
>     WDYT?
> 
> Chun-Hung Hsiao wrote:
>     The scenario I am worried about is that when the RP is marked as gone, the RP may fail to tear itself down if there is any task using its resources.
> 
> Benjamin Bannier wrote:
>     We don't seem to have a good story on incompatible resource changes ATM. If we e.g., update a RP config in an incompatible way, we make not attempt to detect or rectify this in any way, either. At least in the agent we were in general we were operating under the assumption that tasks would die on incompatible changes to their resources.
>     
>     I'd suggest to drop this issue for now and potentially file a JIRA to track draining and situations where it might be needed.
> 
> Benjamin Bannier wrote:
>     Dropping this, see above comment.
> 
> Chun-Hung Hsiao wrote:
>     Reopening this. Here's my suggestion:
>     
>     1. Create a ticket for a proper design to drain the resource provider.
>     2. For now, let's fail the call here if `slave->resourceProviders.at(resourceProviderId)->totalResources` is not empty. In other words, the operator must first reconfigure the resource provider to "drain" itself before making the `MARK_RESOURCE_PROVIDER_GONE` call.
> 
> Benjamin Bannier wrote:
>     Okay.
>     
>     1. I created https://issues.apache.org/jira/browse/MESOS-9522.
>     2. We had discussed this some time ago, and I think agreed that preventing operators from marking resource providers as gone whenever a task was running on them was not good (especially if operators cannot drain the RP). We have prior art in `MARK_AGENT_GONE` which asks operators to drain agent before marking as gone. I would suggest for now with MESOS-9522 not implemented we allow operators to mark resource providers as gone even if some of their resources are in use; if needed they can drain them by filtering all tasks using their resources. This also wouldn't expose a race.
>     
>     Does this address your concerns?
> 
> Chun-Hung Hsiao wrote:
>     I agreed that checking if there is any task running on resources of an RP when marking it as gone would lead to some complexity, so I'm suggesting to use a stronger condition: there must be no resource.
>     
>     I'm not convinced that asking the operator to drain the RP is a bad idea. My main concern is that, since `MARK_RESOURCE_PROVIDER_GONE` cannot be undone, we should only perform it when it is safe to do, so the operator can have a chance to fix the problem. Also, this restriction would ensure that no framework will have an inconsistent view of cluster resources. WDYT?

Sorry for not reading your comment -- only allowing marking RPs without resources as gone sounds good to me. Updated this and the follow-up patch & marking as _Fixed_.


- Benjamin


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68147/#review207559
-----------------------------------------------------------


On Jan. 16, 2019, 11:51 a.m., Benjamin Bannier wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68147/
> -----------------------------------------------------------
> 
> (Updated Jan. 16, 2019, 11:51 a.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao, Gastón Kleiman, Greg Mann, and Jan Schlicht.
> 
> 
> Bugs: MESOS-8403
>     https://issues.apache.org/jira/browse/MESOS-8403
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This patch adds support for triggering permanent removal of local
> resource providers. We also add authorization and tests as part of this
> patch.
> 
> 
> Diffs
> -----
> 
>   include/mesos/agent/agent.proto 74488e873cbf99ca487403b70691912cf3788288 
>   include/mesos/v1/agent/agent.proto 5d1ab6fd7f6f21593444467c66ed35ee90aea07f 
>   src/common/protobuf_utils.cpp a0159fed8d325808c5e8519da06173441debbbbb 
>   src/master/master.cpp 2339207149a85578ea47cf66f28392182f9075f2 
>   src/slave/http.hpp 5b113fa2f7e3421d4219e0ece113010937b204c1 
>   src/slave/http.cpp e0dc09103e9186aa3f7328d2052fc6747f5be9bb 
>   src/slave/slave.hpp 2eadf5fce9a314f1ec0ac5d51820c6381f5f1468 
>   src/slave/slave.cpp 10cbc190acc7eea5734efa0066541168545a66b6 
>   src/slave/validation.cpp df5e1373dbe497bc859455dcaf4e064e923bd72e 
>   src/tests/api_tests.cpp 79a2d2966ff6533629aeeefaaaf19e849b76e0e6 
> 
> 
> Diff: https://reviews.apache.org/r/68147/diff/14/
> 
> 
> Testing
> -------
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>


Re: Review Request 68147: Added agent support to remove local resource providers.

Posted by Chun-Hung Hsiao <ch...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68147/#review207559
-----------------------------------------------------------




src/slave/slave.cpp
Lines 7934 (patched)
<https://reviews.apache.org/r/68147/#comment290949>

    `OPERATION_GONE_BY_OPERATOR` is not a terminal state:
    https://github.com/apache/mesos/blob/808485da01387bd27a51cb82a90b1f8301d613ee/include/mesos/mesos.proto#L2342-L2349



src/slave/slave.cpp
Lines 8195 (patched)
<https://reviews.apache.org/r/68147/#comment290963>

    Let's validate that there is no task using the resources provided by this RP before doing the removal, or fail the call.


- Chun-Hung Hsiao


On Aug. 17, 2018, 1:54 p.m., Benjamin Bannier wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68147/
> -----------------------------------------------------------
> 
> (Updated Aug. 17, 2018, 1:54 p.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao and Jan Schlicht.
> 
> 
> Bugs: MESOS-8403
>     https://issues.apache.org/jira/browse/MESOS-8403
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This patch adds support for triggering permanent removal of local
> resource providers. We also add authorization and tests as part of this
> patch.
> 
> 
> Diffs
> -----
> 
>   include/mesos/agent/agent.proto 74488e873cbf99ca487403b70691912cf3788288 
>   include/mesos/v1/agent/agent.proto 5d1ab6fd7f6f21593444467c66ed35ee90aea07f 
>   src/slave/http.hpp 78200874cb03a3d5af4ee0443c03a1d94cd494cc 
>   src/slave/http.cpp 1b6d266a1a7821c9de6871cbca43317b3c392a32 
>   src/slave/slave.hpp 802d4eb9e9eba2f1175dd85e56dcc80a61e32f74 
>   src/slave/slave.cpp 78e8666f402be58af5b6e20a715da4998af2615c 
>   src/slave/validation.cpp df5e1373dbe497bc859455dcaf4e064e923bd72e 
>   src/tests/api_tests.cpp 9c9fa9105a27ac7bb7f30d7eb67512d1e1d63d15 
> 
> 
> Diff: https://reviews.apache.org/r/68147/diff/6/
> 
> 
> Testing
> -------
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>


Re: Review Request 68147: Added agent support to remove local resource providers.

Posted by Benjamin Bannier <be...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68147/
-----------------------------------------------------------

(Updated Aug. 20, 2018, 11:52 a.m.)


Review request for mesos, Chun-Hung Hsiao and Jan Schlicht.


Changes
-------

Use `OPERATION_ERROR` instead of `OPERATION_GONE_BY_OPERATOR`.


Bugs: MESOS-8403
    https://issues.apache.org/jira/browse/MESOS-8403


Repository: mesos


Description
-------

This patch adds support for triggering permanent removal of local
resource providers. We also add authorization and tests as part of this
patch.


Diffs (updated)
-----

  include/mesos/agent/agent.proto 74488e873cbf99ca487403b70691912cf3788288 
  include/mesos/v1/agent/agent.proto 5d1ab6fd7f6f21593444467c66ed35ee90aea07f 
  src/slave/http.hpp 78200874cb03a3d5af4ee0443c03a1d94cd494cc 
  src/slave/http.cpp 1b6d266a1a7821c9de6871cbca43317b3c392a32 
  src/slave/slave.hpp 0420109ac93e1249906c52437e5859c5ee033fb6 
  src/slave/slave.cpp 679394a549cdfe84d64a102164c8652ad96f1eb2 
  src/slave/validation.cpp df5e1373dbe497bc859455dcaf4e064e923bd72e 
  src/tests/api_tests.cpp 9c9fa9105a27ac7bb7f30d7eb67512d1e1d63d15 


Diff: https://reviews.apache.org/r/68147/diff/7/

Changes: https://reviews.apache.org/r/68147/diff/6-7/


Testing
-------

`make check`


Thanks,

Benjamin Bannier


Re: Review Request 68147: Added agent support to remove local resource providers.

Posted by Benjamin Bannier <be...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68147/
-----------------------------------------------------------

(Updated Aug. 17, 2018, 3:54 p.m.)


Review request for mesos, Chun-Hung Hsiao and Jan Schlicht.


Bugs: MESOS-8403
    https://issues.apache.org/jira/browse/MESOS-8403


Repository: mesos


Description
-------

This patch adds support for triggering permanent removal of local
resource providers. We also add authorization and tests as part of this
patch.


Diffs (updated)
-----

  include/mesos/agent/agent.proto 74488e873cbf99ca487403b70691912cf3788288 
  include/mesos/v1/agent/agent.proto 5d1ab6fd7f6f21593444467c66ed35ee90aea07f 
  src/slave/http.hpp 78200874cb03a3d5af4ee0443c03a1d94cd494cc 
  src/slave/http.cpp 1b6d266a1a7821c9de6871cbca43317b3c392a32 
  src/slave/slave.hpp 802d4eb9e9eba2f1175dd85e56dcc80a61e32f74 
  src/slave/slave.cpp 78e8666f402be58af5b6e20a715da4998af2615c 
  src/slave/validation.cpp df5e1373dbe497bc859455dcaf4e064e923bd72e 
  src/tests/api_tests.cpp 9c9fa9105a27ac7bb7f30d7eb67512d1e1d63d15 


Diff: https://reviews.apache.org/r/68147/diff/6/

Changes: https://reviews.apache.org/r/68147/diff/5-6/


Testing
-------

`make check`


Thanks,

Benjamin Bannier


Re: Review Request 68147: Added agent support to remove local resource providers.

Posted by Benjamin Bannier <be...@mesosphere.io>.

> On Aug. 17, 2018, 1:27 a.m., Chun-Hung Hsiao wrote:
> > src/slave/slave.cpp
> > Lines 7934 (patched)
> > <https://reviews.apache.org/r/68147/diff/3-5/?file=2073010#file2073010line7934>
> >
> >     This line combined with Line 8079 will crash the agent.

Ouch, added https://reviews.apache.org/r/68410.


- Benjamin


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68147/#review207450
-----------------------------------------------------------


On Aug. 16, 2018, 4:31 p.m., Benjamin Bannier wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68147/
> -----------------------------------------------------------
> 
> (Updated Aug. 16, 2018, 4:31 p.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao and Jan Schlicht.
> 
> 
> Bugs: MESOS-8403
>     https://issues.apache.org/jira/browse/MESOS-8403
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This patch adds support for triggering permanent removal of local
> resource providers. We also add authorization and tests as part of this
> patch.
> 
> 
> Diffs
> -----
> 
>   include/mesos/agent/agent.proto 74488e873cbf99ca487403b70691912cf3788288 
>   include/mesos/v1/agent/agent.proto 5d1ab6fd7f6f21593444467c66ed35ee90aea07f 
>   src/slave/http.hpp 78200874cb03a3d5af4ee0443c03a1d94cd494cc 
>   src/slave/http.cpp 1b6d266a1a7821c9de6871cbca43317b3c392a32 
>   src/slave/slave.hpp 802d4eb9e9eba2f1175dd85e56dcc80a61e32f74 
>   src/slave/slave.cpp 78e8666f402be58af5b6e20a715da4998af2615c 
>   src/slave/validation.cpp df5e1373dbe497bc859455dcaf4e064e923bd72e 
>   src/tests/api_tests.cpp 9c9fa9105a27ac7bb7f30d7eb67512d1e1d63d15 
> 
> 
> Diff: https://reviews.apache.org/r/68147/diff/5/
> 
> 
> Testing
> -------
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>


Re: Review Request 68147: Added agent support to remove local resource providers.

Posted by Chun-Hung Hsiao <ch...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68147/#review207450
-----------------------------------------------------------




src/slave/slave.cpp
Lines 7934 (patched)
<https://reviews.apache.org/r/68147/#comment290851>

    This line combined with Line 8079 will crash the agent.


- Chun-Hung Hsiao


On Aug. 16, 2018, 2:31 p.m., Benjamin Bannier wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68147/
> -----------------------------------------------------------
> 
> (Updated Aug. 16, 2018, 2:31 p.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao and Jan Schlicht.
> 
> 
> Bugs: MESOS-8403
>     https://issues.apache.org/jira/browse/MESOS-8403
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This patch adds support for triggering permanent removal of local
> resource providers. We also add authorization and tests as part of this
> patch.
> 
> 
> Diffs
> -----
> 
>   include/mesos/agent/agent.proto 74488e873cbf99ca487403b70691912cf3788288 
>   include/mesos/v1/agent/agent.proto 5d1ab6fd7f6f21593444467c66ed35ee90aea07f 
>   src/slave/http.hpp 78200874cb03a3d5af4ee0443c03a1d94cd494cc 
>   src/slave/http.cpp 1b6d266a1a7821c9de6871cbca43317b3c392a32 
>   src/slave/slave.hpp 802d4eb9e9eba2f1175dd85e56dcc80a61e32f74 
>   src/slave/slave.cpp 78e8666f402be58af5b6e20a715da4998af2615c 
>   src/slave/validation.cpp df5e1373dbe497bc859455dcaf4e064e923bd72e 
>   src/tests/api_tests.cpp 9c9fa9105a27ac7bb7f30d7eb67512d1e1d63d15 
> 
> 
> Diff: https://reviews.apache.org/r/68147/diff/5/
> 
> 
> Testing
> -------
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>


Re: Review Request 68147: Added agent support to remove local resource providers.

Posted by Benjamin Bannier <be...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68147/
-----------------------------------------------------------

(Updated Aug. 16, 2018, 4:31 p.m.)


Review request for mesos, Chun-Hung Hsiao and Jan Schlicht.


Changes
-------

Improved handling of removed resource providers in agent; rebased.


Bugs: MESOS-8403
    https://issues.apache.org/jira/browse/MESOS-8403


Repository: mesos


Description
-------

This patch adds support for triggering permanent removal of local
resource providers. We also add authorization and tests as part of this
patch.


Diffs (updated)
-----

  include/mesos/agent/agent.proto 74488e873cbf99ca487403b70691912cf3788288 
  include/mesos/v1/agent/agent.proto 5d1ab6fd7f6f21593444467c66ed35ee90aea07f 
  src/resource_provider/manager.cpp abd7e38e5517ea600f9fc9b8a96c7d0d26df0620 
  src/slave/http.hpp 78200874cb03a3d5af4ee0443c03a1d94cd494cc 
  src/slave/http.cpp 1b6d266a1a7821c9de6871cbca43317b3c392a32 
  src/slave/slave.hpp 802d4eb9e9eba2f1175dd85e56dcc80a61e32f74 
  src/slave/slave.cpp 78e8666f402be58af5b6e20a715da4998af2615c 
  src/slave/validation.cpp df5e1373dbe497bc859455dcaf4e064e923bd72e 
  src/tests/api_tests.cpp 9c9fa9105a27ac7bb7f30d7eb67512d1e1d63d15 


Diff: https://reviews.apache.org/r/68147/diff/4/

Changes: https://reviews.apache.org/r/68147/diff/3-4/


Testing
-------

`make check`


Thanks,

Benjamin Bannier


Re: Review Request 68147: Added agent support to remove local resource providers.

Posted by Benjamin Bannier <be...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68147/
-----------------------------------------------------------

(Updated Aug. 15, 2018, 3:53 p.m.)


Review request for mesos, Chun-Hung Hsiao and Jan Schlicht.


Changes
-------

Addressed issues raised by Chun.


Bugs: MESOS-8403
    https://issues.apache.org/jira/browse/MESOS-8403


Repository: mesos


Description
-------

This patch adds support for triggering permanent removal of local
resource providers. We also add authorization and tests as part of this
patch.


Diffs (updated)
-----

  include/mesos/agent/agent.proto 74488e873cbf99ca487403b70691912cf3788288 
  include/mesos/v1/agent/agent.proto 5d1ab6fd7f6f21593444467c66ed35ee90aea07f 
  src/slave/http.hpp 78200874cb03a3d5af4ee0443c03a1d94cd494cc 
  src/slave/http.cpp 1b6d266a1a7821c9de6871cbca43317b3c392a32 
  src/slave/slave.hpp 802d4eb9e9eba2f1175dd85e56dcc80a61e32f74 
  src/slave/slave.cpp 78e8666f402be58af5b6e20a715da4998af2615c 
  src/slave/validation.cpp df5e1373dbe497bc859455dcaf4e064e923bd72e 
  src/tests/api_tests.cpp 9c9fa9105a27ac7bb7f30d7eb67512d1e1d63d15 


Diff: https://reviews.apache.org/r/68147/diff/3/

Changes: https://reviews.apache.org/r/68147/diff/2-3/


Testing
-------

`make check`


Thanks,

Benjamin Bannier


Re: Review Request 68147: Added agent support to remove local resource providers.

Posted by Mesos Reviewbot Windows <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68147/#review206753
-----------------------------------------------------------



PASS: Mesos patch 68147 was successfully built and tested.

Reviews applied: `['68143', '68144', '68145', '68146', '68147']`

All the build artifacts available at: http://dcos-win.westus.cloudapp.azure.com/artifacts/mesos-reviewbot-testing/2021/mesos-review-68147

- Mesos Reviewbot Windows


On Aug. 1, 2018, 3:47 p.m., Benjamin Bannier wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68147/
> -----------------------------------------------------------
> 
> (Updated Aug. 1, 2018, 3:47 p.m.)
> 
> 
> Review request for mesos, Chun-Hung Hsiao and Jan Schlicht.
> 
> 
> Bugs: MESOS-8403
>     https://issues.apache.org/jira/browse/MESOS-8403
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This patch adds support for triggering permanent removal of local
> resource providers. We also add authorization and tests as part of this
> patch.
> 
> 
> Diffs
> -----
> 
>   include/mesos/agent/agent.proto 74488e873cbf99ca487403b70691912cf3788288 
>   include/mesos/v1/agent/agent.proto 5d1ab6fd7f6f21593444467c66ed35ee90aea07f 
>   src/slave/http.hpp dcfd0d9831775912942afb746ce6704383868468 
>   src/slave/http.cpp ab5864d9fd2fde478ed7da2ca7ed8abedc72c7c5 
>   src/slave/slave.hpp 802d4eb9e9eba2f1175dd85e56dcc80a61e32f74 
>   src/slave/slave.cpp e574c249f81e0e77abe982c126fe210a6ee8b591 
>   src/slave/validation.cpp df5e1373dbe497bc859455dcaf4e064e923bd72e 
>   src/tests/api_tests.cpp 182622a62d350ebefc891a385de3f2d35a7c0243 
> 
> 
> Diff: https://reviews.apache.org/r/68147/diff/1/
> 
> 
> Testing
> -------
> 
> `make check`
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>