You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by Greg Mann <gr...@mesosphere.io> on 2019/02/05 16:24:17 UTC
Review Request 69891: Sent operation updates to schedulers when
agents are removed.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69891/
-----------------------------------------------------------
Review request for mesos, Gastón Kleiman and Joseph Wu.
Bugs: MESOS-9541
https://issues.apache.org/jira/browse/MESOS-9541
Repository: mesos
Description
-------
This patch makes the master send operation updates when
an agent is removed and a framework has requested
feedback for a pending operation on that agent.
Diffs
-----
src/master/master.cpp f74b7c280569e1c24e0940463bb28bd795d429d5
src/tests/master_tests.cpp acc6096239e4992bdca084d88880d644ab4a2385
Diff: https://reviews.apache.org/r/69891/diff/1/
Testing
-------
`make check`
`bin/mesos-tests.sh --gtest_filter="*OperationUpdatesAfterAgentShutdown*" --gtest_repeat=-1 --gtest_break_on_failure`
Thanks,
Greg Mann
Re: Review Request 69891: Sent operation updates to schedulers when
agents are removed.
Posted by Mesos Reviewbot Windows <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69891/#review212559
-----------------------------------------------------------
PASS: Mesos patch 69891 was successfully built and tested.
Reviews applied: `['69876', '69880', '69891']`
All the build artifacts available at: http://dcos-win.westus2.cloudapp.azure.com/artifacts/mesos-reviewbot-testing/2852/mesos-review-69891
- Mesos Reviewbot Windows
On Feb. 5, 2019, 4:24 p.m., Greg Mann wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69891/
> -----------------------------------------------------------
>
> (Updated Feb. 5, 2019, 4:24 p.m.)
>
>
> Review request for mesos, Gastón Kleiman and Joseph Wu.
>
>
> Bugs: MESOS-9541
> https://issues.apache.org/jira/browse/MESOS-9541
>
>
> Repository: mesos
>
>
> Description
> -------
>
> This patch makes the master send operation updates when
> an agent is removed and a framework has requested
> feedback for a pending operation on that agent.
>
>
> Diffs
> -----
>
> src/master/master.cpp f74b7c280569e1c24e0940463bb28bd795d429d5
> src/tests/master_tests.cpp acc6096239e4992bdca084d88880d644ab4a2385
>
>
> Diff: https://reviews.apache.org/r/69891/diff/1/
>
>
> Testing
> -------
>
> `make check`
> `bin/mesos-tests.sh --gtest_filter="*OperationUpdatesAfterAgentShutdown*" --gtest_repeat=-1 --gtest_break_on_failure`
>
>
> Thanks,
>
> Greg Mann
>
>
Re: Review Request 69891: Sent operation updates to schedulers when
agents are removed.
Posted by Mesos Reviewbot <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69891/#review212728
-----------------------------------------------------------
Patch looks great!
Reviews applied: [69876, 69880, 69891]
Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' CONFIGURATION='--verbose --disable-libtool-wrappers --disable-parallel-test-execution' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1'; ./support/docker-build.sh
- Mesos Reviewbot
On Feb. 6, 2019, 12:24 a.m., Greg Mann wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69891/
> -----------------------------------------------------------
>
> (Updated Feb. 6, 2019, 12:24 a.m.)
>
>
> Review request for mesos, Gastón Kleiman and Joseph Wu.
>
>
> Bugs: MESOS-9541
> https://issues.apache.org/jira/browse/MESOS-9541
>
>
> Repository: mesos
>
>
> Description
> -------
>
> This patch makes the master send operation updates when
> an agent is removed and a framework has requested
> feedback for a pending operation on that agent.
>
>
> Diffs
> -----
>
> src/master/master.cpp f74b7c280569e1c24e0940463bb28bd795d429d5
> src/tests/master_tests.cpp acc6096239e4992bdca084d88880d644ab4a2385
>
>
> Diff: https://reviews.apache.org/r/69891/diff/1/
>
>
> Testing
> -------
>
> `make check`
> `bin/mesos-tests.sh --gtest_filter="*OperationUpdatesAfterAgentShutdown*" --gtest_repeat=-1 --gtest_break_on_failure`
>
>
> Thanks,
>
> Greg Mann
>
>
Re: Review Request 69891: Sent operation updates to schedulers when
agents are removed.
Posted by Mesos Reviewbot Windows <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69891/#review212770
-----------------------------------------------------------
FAIL: Some of the unit tests failed. Please check the relevant logs.
Reviews applied: `['69891']`
Failed command: `Start-MesosCITesting`
All the build artifacts available at: http://dcos-win.westus2.cloudapp.azure.com/artifacts/mesos-reviewbot-testing/2877/mesos-review-69891
Relevant logs:
- [mesos-tests.log](http://dcos-win.westus2.cloudapp.azure.com/artifacts/mesos-reviewbot-testing/2877/mesos-review-69891/logs/mesos-tests.log):
```
W0212 22:48:10.210011 15552 slave.cpp:3928] Ignoring shutdown framework 3fa56582-0ae7-4da5-92a0-5877dcfa075c-0000 because it is terminating
I0212 22:48:10.213016 22360 master.cpp:1269] Agent 3fa56582-0ae7-4da5-92a0-5877dcfa075c-S0 at slave(491)@192.10.1.4:50261 (windows-01.chtsmhjxogyevckjfayqqcnjda.xx.internal.cloudapp.net) disconnected
I0212 22:48:10.213016 22360 master.cpp:3272] Disconnecting agent 3fa56582-0ae7-4da5-92a0-5877dcfa075c-S0 at slave(491)@192.10.1.4:50261 (windows-01.chtsmhjxogyevckjfayqqcnjda.xx.internal.cloudapp.net)
I0212 22:48:10.213016 22360 master.cpp:3291] Deactivating agent 3fa56582-0ae7-4da5-92a0-5877dcfa075c-S0 at slave(491)@192.10.1.4:50261 (windows-01.chtsmhjxogyevckjfayqqcnjda.xx.internal.cloudapp.net)
I0212 22:48:10.215008 12120 hierarchical.cpp:358] Removed framework 3fa56582-0ae7-4da5-92a0-5877dcfa075c-0000
I0212 22:48:10.215008 12120 hierarchical.cpp:793] Agent 3fa56582-0ae7-4da5-92a0-5877dcfa075c-S0 deactivated
I0212 22:48:10.217051 15552 containerizer.cpp:2526] Destroying container 480015c6-acf4-4f5c-898f-f6789020156c in RUNNING state
I0212 22:48:10.217051 15552 containerizer.cpp:3193] Transitioning the state of container 480015c6-acf4-4f5c-898f-f6789020156c from RUNNING to DESTROYING
I0212 22:48:10.217051 15552 launcher.cpp:161] Asked to destroy container 480015c6-acf4-4f5c-898f-f6789020156c
W0212 22:48:10.219003 19932 process.cpp:1423] Failed to recv on socket WindowsFD::Type::SOCKET=10288 to peer '192.10.1.4:52185': IO failed with error code: The specified network name is no longer available.
W0212 22:48:10.220007 19932 process.cpp:838] Failed to recv on socket WindowsFD::Type::SOCKET=10564 to p[ OK ] IsolationFlag/MemoryIsolatorTest.ROOT_MemUsage/0 (787 ms)
[----------] 1 test from IsolationFlag/MemoryIsolatorTest (810 ms total)
[----------] Global test environment tear-down
[==========] 1108 tests from 105 test cases ran. (564233 ms total)
[ PASSED ] 1107 tests.
[ FAILED ] 1 test, listed below:
[ FAILED ] DockerFetcherPluginTest.INTERNET_CURL_FetchImage
1 FAILED TEST
YOU HAVE 232 DISABLED TESTS
eer '192.10.1.4:52186': IO failed with error code: The specified network name is no longer available.
I0212 22:48:10.234997 22360 containerizer.cpp:3032] Container 480015c6-acf4-4f5c-898f-f6789020156c has exited
I0212 22:48:10.274006 20964 master.cpp:1109] Master terminating
I0212 22:48:10.275992 11940 hierarchical.cpp:644] Removed agent 3fa56582-0ae7-4da5-92a0-5877dcfa075c-S0
I0212 22:48:10.702548 19932 process.cpp:927] Stopped the socket accept loop
```
- Mesos Reviewbot Windows
On Feb. 12, 2019, 9:38 p.m., Greg Mann wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69891/
> -----------------------------------------------------------
>
> (Updated Feb. 12, 2019, 9:38 p.m.)
>
>
> Review request for mesos, Gastón Kleiman and Joseph Wu.
>
>
> Bugs: MESOS-9541
> https://issues.apache.org/jira/browse/MESOS-9541
>
>
> Repository: mesos
>
>
> Description
> -------
>
> This patch makes the master send operation updates when
> an agent is removed and a framework has requested
> feedback for a pending operation on that agent.
>
>
> Diffs
> -----
>
> src/master/master.cpp f74b7c280569e1c24e0940463bb28bd795d429d5
> src/tests/master_tests.cpp acc6096239e4992bdca084d88880d644ab4a2385
>
>
> Diff: https://reviews.apache.org/r/69891/diff/1/
>
>
> Testing
> -------
>
> `make check`
> `bin/mesos-tests.sh --gtest_filter="*OperationUpdatesAfterAgentShutdown*" --gtest_repeat=-1 --gtest_break_on_failure`
>
>
> Thanks,
>
> Greg Mann
>
>
Re: Review Request 69891: Sent operation updates to schedulers when
agents are removed.
Posted by Greg Mann <gr...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69891/
-----------------------------------------------------------
(Updated Feb. 12, 2019, 9:38 p.m.)
Review request for mesos, Gastón Kleiman and Joseph Wu.
Bugs: MESOS-9541
https://issues.apache.org/jira/browse/MESOS-9541
Repository: mesos
Description
-------
This patch makes the master send operation updates when
an agent is removed and a framework has requested
feedback for a pending operation on that agent.
Diffs
-----
src/master/master.cpp f74b7c280569e1c24e0940463bb28bd795d429d5
src/tests/master_tests.cpp acc6096239e4992bdca084d88880d644ab4a2385
Diff: https://reviews.apache.org/r/69891/diff/1/
Testing
-------
`make check`
`bin/mesos-tests.sh --gtest_filter="*OperationUpdatesAfterAgentShutdown*" --gtest_repeat=-1 --gtest_break_on_failure`
Thanks,
Greg Mann
Re: Review Request 69891: Sent operation updates to schedulers when
agents are removed.
Posted by Greg Mann <gr...@mesosphere.io>.
> On Feb. 5, 2019, 4:57 p.m., Vinod Kone wrote:
> > src/master/master.cpp
> > Lines 10950 (patched)
> > <https://reviews.apache.org/r/69891/diff/1/?file=2123900#file2123900line10950>
> >
> > I don't know if sending UNREACHABLE for all the 3 cases when `_removeSlave` is called is the right way. I think we need to have a discussion around the agent state / task state / operation state for each of the cases.
> >
> > What happens if a framework reconciles the operation after this code gets executed. Will it always get OPERATION_UNREACHABLE? If not, then that would be cnofusing.
> >
> > Also, the operation status is not changed in-memory here. Is that intentional?
>
> Greg Mann wrote:
> Continuing this discussion on Slack: https://mesos.slack.com/archives/C8NN4M0CT/p1549066656027000
>
> Yep, I intended not to update the operation status in this patch. The JIRA issue is purely for sending updates to frameworks, and I intend to address all of the operation state updates as part of https://issues.apache.org/jira/browse/MESOS-9546 since that involves more significant code changes.
I'm working on an update for this patch which will ensure that we return correct reconciliation results for these operations.
- Greg
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69891/#review212551
-----------------------------------------------------------
On Feb. 5, 2019, 4:24 p.m., Greg Mann wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69891/
> -----------------------------------------------------------
>
> (Updated Feb. 5, 2019, 4:24 p.m.)
>
>
> Review request for mesos, Gastón Kleiman and Joseph Wu.
>
>
> Bugs: MESOS-9541
> https://issues.apache.org/jira/browse/MESOS-9541
>
>
> Repository: mesos
>
>
> Description
> -------
>
> This patch makes the master send operation updates when
> an agent is removed and a framework has requested
> feedback for a pending operation on that agent.
>
>
> Diffs
> -----
>
> src/master/master.cpp f74b7c280569e1c24e0940463bb28bd795d429d5
> src/tests/master_tests.cpp acc6096239e4992bdca084d88880d644ab4a2385
>
>
> Diff: https://reviews.apache.org/r/69891/diff/1/
>
>
> Testing
> -------
>
> `make check`
> `bin/mesos-tests.sh --gtest_filter="*OperationUpdatesAfterAgentShutdown*" --gtest_repeat=-1 --gtest_break_on_failure`
>
>
> Thanks,
>
> Greg Mann
>
>
Re: Review Request 69891: Sent operation updates to schedulers when
agents are removed.
Posted by Greg Mann <gr...@mesosphere.io>.
> On Feb. 5, 2019, 4:57 p.m., Vinod Kone wrote:
> > src/master/master.cpp
> > Lines 10950 (patched)
> > <https://reviews.apache.org/r/69891/diff/1/?file=2123900#file2123900line10950>
> >
> > I don't know if sending UNREACHABLE for all the 3 cases when `_removeSlave` is called is the right way. I think we need to have a discussion around the agent state / task state / operation state for each of the cases.
> >
> > What happens if a framework reconciles the operation after this code gets executed. Will it always get OPERATION_UNREACHABLE? If not, then that would be cnofusing.
> >
> > Also, the operation status is not changed in-memory here. Is that intentional?
Continuing this discussion on Slack: https://mesos.slack.com/archives/C8NN4M0CT/p1549066656027000
Yep, I intended not to update the operation status in this patch. The JIRA issue is purely for sending updates to frameworks, and I intend to address all of the operation state updates as part of https://issues.apache.org/jira/browse/MESOS-9546 since that involves more significant code changes.
- Greg
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69891/#review212551
-----------------------------------------------------------
On Feb. 5, 2019, 4:24 p.m., Greg Mann wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69891/
> -----------------------------------------------------------
>
> (Updated Feb. 5, 2019, 4:24 p.m.)
>
>
> Review request for mesos, Gastón Kleiman and Joseph Wu.
>
>
> Bugs: MESOS-9541
> https://issues.apache.org/jira/browse/MESOS-9541
>
>
> Repository: mesos
>
>
> Description
> -------
>
> This patch makes the master send operation updates when
> an agent is removed and a framework has requested
> feedback for a pending operation on that agent.
>
>
> Diffs
> -----
>
> src/master/master.cpp f74b7c280569e1c24e0940463bb28bd795d429d5
> src/tests/master_tests.cpp acc6096239e4992bdca084d88880d644ab4a2385
>
>
> Diff: https://reviews.apache.org/r/69891/diff/1/
>
>
> Testing
> -------
>
> `make check`
> `bin/mesos-tests.sh --gtest_filter="*OperationUpdatesAfterAgentShutdown*" --gtest_repeat=-1 --gtest_break_on_failure`
>
>
> Thanks,
>
> Greg Mann
>
>
Re: Review Request 69891: Sent operation updates to schedulers when
agents are removed.
Posted by Vinod Kone <vi...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69891/#review212551
-----------------------------------------------------------
src/master/master.cpp
Lines 10950 (patched)
<https://reviews.apache.org/r/69891/#comment298369>
I don't know if sending UNREACHABLE for all the 3 cases when `_removeSlave` is called is the right way. I think we need to have a discussion around the agent state / task state / operation state for each of the cases.
What happens if a framework reconciles the operation after this code gets executed. Will it always get OPERATION_UNREACHABLE? If not, then that would be cnofusing.
Also, the operation status is not changed in-memory here. Is that intentional?
- Vinod Kone
On Feb. 5, 2019, 4:24 p.m., Greg Mann wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69891/
> -----------------------------------------------------------
>
> (Updated Feb. 5, 2019, 4:24 p.m.)
>
>
> Review request for mesos, Gastón Kleiman and Joseph Wu.
>
>
> Bugs: MESOS-9541
> https://issues.apache.org/jira/browse/MESOS-9541
>
>
> Repository: mesos
>
>
> Description
> -------
>
> This patch makes the master send operation updates when
> an agent is removed and a framework has requested
> feedback for a pending operation on that agent.
>
>
> Diffs
> -----
>
> src/master/master.cpp f74b7c280569e1c24e0940463bb28bd795d429d5
> src/tests/master_tests.cpp acc6096239e4992bdca084d88880d644ab4a2385
>
>
> Diff: https://reviews.apache.org/r/69891/diff/1/
>
>
> Testing
> -------
>
> `make check`
> `bin/mesos-tests.sh --gtest_filter="*OperationUpdatesAfterAgentShutdown*" --gtest_repeat=-1 --gtest_break_on_failure`
>
>
> Thanks,
>
> Greg Mann
>
>