You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by Gastón Kleiman <ga...@mesosphere.io> on 2019/03/22 20:37:45 UTC
Review Request 70283: Improved handling of resources consumed by
orphan operations.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70283/
-----------------------------------------------------------
Review request for mesos, Benjamin Bannier, Greg Mann, Joseph Wu, and Meng Zhu.
Bugs: MESOS-9635
https://issues.apache.org/jira/browse/MESOS-9635
Repository: mesos
Description
-------
This patch makes the master's `UpdateSlaveMessage` handler include
resources consumed by orphan operations when calling
`allocator->addResourceProvider()`.
The change prevents some races that lead to the master reoffering the
resources consumed by the operations and makes the
`OperationReconciliationTest.AgentPendingOperationAfterMasterFailover`
test stable.
Diffs
-----
src/master/master.cpp 9c4a9e83da94535873d72c902835f229c4f96320
Diff: https://reviews.apache.org/r/70283/diff/1/
Testing
-------
`OperationReconciliationTest.AgentPendingOperationAfterMasterFailover` passed over 5000 iterations under stress. Other tests still pass on GNU/Linux.
Thanks,
Gastón Kleiman
Re: Review Request 70283: Improved handling of resources consumed by
orphan operations.
Posted by Mesos Reviewbot <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70283/#review213941
-----------------------------------------------------------
Patch looks great!
Reviews applied: [70283]
Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' CONFIGURATION='--verbose --disable-libtool-wrappers --disable-parallel-test-execution' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1'; ./support/docker-build.sh
- Mesos Reviewbot
On March 22, 2019, 8:37 p.m., Gastón Kleiman wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70283/
> -----------------------------------------------------------
>
> (Updated March 22, 2019, 8:37 p.m.)
>
>
> Review request for mesos, Benjamin Bannier, Greg Mann, Joseph Wu, and Meng Zhu.
>
>
> Bugs: MESOS-9635
> https://issues.apache.org/jira/browse/MESOS-9635
>
>
> Repository: mesos
>
>
> Description
> -------
>
> This patch makes the master's `UpdateSlaveMessage` handler include
> resources consumed by orphan operations when calling
> `allocator->addResourceProvider()`.
>
> The change prevents some races that lead to the master reoffering the
> resources consumed by the operations and makes the
> `OperationReconciliationTest.AgentPendingOperationAfterMasterFailover`
> test stable.
>
>
> Diffs
> -----
>
> src/master/master.cpp 9c4a9e83da94535873d72c902835f229c4f96320
>
>
> Diff: https://reviews.apache.org/r/70283/diff/1/
>
>
> Testing
> -------
>
> `OperationReconciliationTest.AgentPendingOperationAfterMasterFailover` passed over 5000 iterations under stress. Other tests still pass on GNU/Linux.
>
>
> Thanks,
>
> Gastón Kleiman
>
>
Re: Review Request 70283: Improved handling of resources consumed by
orphan operations.
Posted by Mesos Reviewbot Windows <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70283/#review213931
-----------------------------------------------------------
PASS: Mesos patch 70283 was successfully built and tested.
Reviews applied: `['70283']`
All the build artifacts available at: http://dcos-win.westus2.cloudapp.azure.com/artifacts/mesos-reviewbot-testing/2998/mesos-review-70283
- Mesos Reviewbot Windows
On March 22, 2019, 8:37 p.m., Gastón Kleiman wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70283/
> -----------------------------------------------------------
>
> (Updated March 22, 2019, 8:37 p.m.)
>
>
> Review request for mesos, Benjamin Bannier, Greg Mann, Joseph Wu, and Meng Zhu.
>
>
> Bugs: MESOS-9635
> https://issues.apache.org/jira/browse/MESOS-9635
>
>
> Repository: mesos
>
>
> Description
> -------
>
> This patch makes the master's `UpdateSlaveMessage` handler include
> resources consumed by orphan operations when calling
> `allocator->addResourceProvider()`.
>
> The change prevents some races that lead to the master reoffering the
> resources consumed by the operations and makes the
> `OperationReconciliationTest.AgentPendingOperationAfterMasterFailover`
> test stable.
>
>
> Diffs
> -----
>
> src/master/master.cpp 9c4a9e83da94535873d72c902835f229c4f96320
>
>
> Diff: https://reviews.apache.org/r/70283/diff/1/
>
>
> Testing
> -------
>
> `OperationReconciliationTest.AgentPendingOperationAfterMasterFailover` passed over 5000 iterations under stress. Other tests still pass on GNU/Linux.
>
>
> Thanks,
>
> Gastón Kleiman
>
>
Re: Review Request 70283: Improved handling of resources consumed by
orphan operations.
Posted by Meng Zhu <mz...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70283/#review213938
-----------------------------------------------------------
As pointed out by Greg earlier, this patch violates the comment here: https://github.com/apache/mesos/blob/4580834471fb3bc0b95e2b96e04a63d34faef724/src/master/allocator/mesos/hierarchical.cpp#L769-L770
While I think the current code seems to work with this patch (except the comment mentioned above), I think we should check with @Bbannier. In particular, the `TODO` here https://github.com/apache/mesos/blob/4580834471fb3bc0b95e2b96e04a63d34faef724/src/master/master.cpp#L8330-L8334:
```
// TODO(bbannier): Consider introducing ways of making sure an agent
// always knows the `FrameworkInfo` of operations triggered on its
// resources, e.g., by adding an explicit `FrameworkInfo` to
// operations like is already done for `RunTaskMessage`, see
// MESOS-8582.
```
Preferably, it will be great if we can fix the above TODO and ensure frameworkInfo is always available.
src/master/master.cpp
Line 8384 (original), 8371 (patched)
<https://reviews.apache.org/r/70283/#comment300050>
can we separate out irrelevant changes?
- Meng Zhu
On March 22, 2019, 1:37 p.m., Gastón Kleiman wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70283/
> -----------------------------------------------------------
>
> (Updated March 22, 2019, 1:37 p.m.)
>
>
> Review request for mesos, Benjamin Bannier, Greg Mann, Joseph Wu, and Meng Zhu.
>
>
> Bugs: MESOS-9635
> https://issues.apache.org/jira/browse/MESOS-9635
>
>
> Repository: mesos
>
>
> Description
> -------
>
> This patch makes the master's `UpdateSlaveMessage` handler include
> resources consumed by orphan operations when calling
> `allocator->addResourceProvider()`.
>
> The change prevents some races that lead to the master reoffering the
> resources consumed by the operations and makes the
> `OperationReconciliationTest.AgentPendingOperationAfterMasterFailover`
> test stable.
>
>
> Diffs
> -----
>
> src/master/master.cpp 9c4a9e83da94535873d72c902835f229c4f96320
>
>
> Diff: https://reviews.apache.org/r/70283/diff/1/
>
>
> Testing
> -------
>
> `OperationReconciliationTest.AgentPendingOperationAfterMasterFailover` passed over 5000 iterations under stress. Other tests still pass on GNU/Linux.
>
>
> Thanks,
>
> Gastón Kleiman
>
>