You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by Greg Mann <gr...@mesosphere.io> on 2019/02/05 16:24:17 UTC

Review Request 69891: Sent operation updates to schedulers when agents are removed.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69891/
-----------------------------------------------------------

Review request for mesos, Gastón Kleiman and Joseph Wu.


Bugs: MESOS-9541
    https://issues.apache.org/jira/browse/MESOS-9541


Repository: mesos


Description
-------

This patch makes the master send operation updates when
an agent is removed and a framework has requested
feedback for a pending operation on that agent.


Diffs
-----

  src/master/master.cpp f74b7c280569e1c24e0940463bb28bd795d429d5 
  src/tests/master_tests.cpp acc6096239e4992bdca084d88880d644ab4a2385 


Diff: https://reviews.apache.org/r/69891/diff/1/


Testing
-------

`make check`
`bin/mesos-tests.sh --gtest_filter="*OperationUpdatesAfterAgentShutdown*" --gtest_repeat=-1 --gtest_break_on_failure`


Thanks,

Greg Mann


Re: Review Request 69891: Sent operation updates to schedulers when agents are removed.

Posted by Mesos Reviewbot Windows <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69891/#review212559
-----------------------------------------------------------



PASS: Mesos patch 69891 was successfully built and tested.

Reviews applied: `['69876', '69880', '69891']`

All the build artifacts available at: http://dcos-win.westus2.cloudapp.azure.com/artifacts/mesos-reviewbot-testing/2852/mesos-review-69891

- Mesos Reviewbot Windows


On Feb. 5, 2019, 4:24 p.m., Greg Mann wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69891/
> -----------------------------------------------------------
> 
> (Updated Feb. 5, 2019, 4:24 p.m.)
> 
> 
> Review request for mesos, Gastón Kleiman and Joseph Wu.
> 
> 
> Bugs: MESOS-9541
>     https://issues.apache.org/jira/browse/MESOS-9541
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This patch makes the master send operation updates when
> an agent is removed and a framework has requested
> feedback for a pending operation on that agent.
> 
> 
> Diffs
> -----
> 
>   src/master/master.cpp f74b7c280569e1c24e0940463bb28bd795d429d5 
>   src/tests/master_tests.cpp acc6096239e4992bdca084d88880d644ab4a2385 
> 
> 
> Diff: https://reviews.apache.org/r/69891/diff/1/
> 
> 
> Testing
> -------
> 
> `make check`
> `bin/mesos-tests.sh --gtest_filter="*OperationUpdatesAfterAgentShutdown*" --gtest_repeat=-1 --gtest_break_on_failure`
> 
> 
> Thanks,
> 
> Greg Mann
> 
>


Re: Review Request 69891: Sent operation updates to schedulers when agents are removed.

Posted by Mesos Reviewbot <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69891/#review212728
-----------------------------------------------------------



Patch looks great!

Reviews applied: [69876, 69880, 69891]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' CONFIGURATION='--verbose --disable-libtool-wrappers --disable-parallel-test-execution' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1'; ./support/docker-build.sh

- Mesos Reviewbot


On Feb. 6, 2019, 12:24 a.m., Greg Mann wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69891/
> -----------------------------------------------------------
> 
> (Updated Feb. 6, 2019, 12:24 a.m.)
> 
> 
> Review request for mesos, Gastón Kleiman and Joseph Wu.
> 
> 
> Bugs: MESOS-9541
>     https://issues.apache.org/jira/browse/MESOS-9541
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This patch makes the master send operation updates when
> an agent is removed and a framework has requested
> feedback for a pending operation on that agent.
> 
> 
> Diffs
> -----
> 
>   src/master/master.cpp f74b7c280569e1c24e0940463bb28bd795d429d5 
>   src/tests/master_tests.cpp acc6096239e4992bdca084d88880d644ab4a2385 
> 
> 
> Diff: https://reviews.apache.org/r/69891/diff/1/
> 
> 
> Testing
> -------
> 
> `make check`
> `bin/mesos-tests.sh --gtest_filter="*OperationUpdatesAfterAgentShutdown*" --gtest_repeat=-1 --gtest_break_on_failure`
> 
> 
> Thanks,
> 
> Greg Mann
> 
>


Re: Review Request 69891: Sent operation updates to schedulers when agents are removed.

Posted by Mesos Reviewbot Windows <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69891/#review212770
-----------------------------------------------------------



FAIL: Some of the unit tests failed. Please check the relevant logs.

Reviews applied: `['69891']`

Failed command: `Start-MesosCITesting`

All the build artifacts available at: http://dcos-win.westus2.cloudapp.azure.com/artifacts/mesos-reviewbot-testing/2877/mesos-review-69891

Relevant logs:

- [mesos-tests.log](http://dcos-win.westus2.cloudapp.azure.com/artifacts/mesos-reviewbot-testing/2877/mesos-review-69891/logs/mesos-tests.log):

```
W0212 22:48:10.210011 15552 slave.cpp:3928] Ignoring shutdown framework 3fa56582-0ae7-4da5-92a0-5877dcfa075c-0000 because it is terminating
I0212 22:48:10.213016 22360 master.cpp:1269] Agent 3fa56582-0ae7-4da5-92a0-5877dcfa075c-S0 at slave(491)@192.10.1.4:50261 (windows-01.chtsmhjxogyevckjfayqqcnjda.xx.internal.cloudapp.net) disconnected
I0212 22:48:10.213016 22360 master.cpp:3272] Disconnecting agent 3fa56582-0ae7-4da5-92a0-5877dcfa075c-S0 at slave(491)@192.10.1.4:50261 (windows-01.chtsmhjxogyevckjfayqqcnjda.xx.internal.cloudapp.net)
I0212 22:48:10.213016 22360 master.cpp:3291] Deactivating agent 3fa56582-0ae7-4da5-92a0-5877dcfa075c-S0 at slave(491)@192.10.1.4:50261 (windows-01.chtsmhjxogyevckjfayqqcnjda.xx.internal.cloudapp.net)
I0212 22:48:10.215008 12120 hierarchical.cpp:358] Removed framework 3fa56582-0ae7-4da5-92a0-5877dcfa075c-0000
I0212 22:48:10.215008 12120 hierarchical.cpp:793] Agent 3fa56582-0ae7-4da5-92a0-5877dcfa075c-S0 deactivated
I0212 22:48:10.217051 15552 containerizer.cpp:2526] Destroying container 480015c6-acf4-4f5c-898f-f6789020156c in RUNNING state
I0212 22:48:10.217051 15552 containerizer.cpp:3193] Transitioning the state of container 480015c6-acf4-4f5c-898f-f6789020156c from RUNNING to DESTROYING
I0212 22:48:10.217051 15552 launcher.cpp:161] Asked to destroy container 480015c6-acf4-4f5c-898f-f6789020156c
W0212 22:48:10.219003 19932 process.cpp:1423] Failed to recv on socket WindowsFD::Type::SOCKET=10288 to peer '192.10.1.4:52185': IO failed with error code: The specified network name is no longer available.

W0212 22:48:10.220007 19932 process.cpp:838] Failed to recv on socket WindowsFD::Type::SOCKET=10564 to p[       OK ] IsolationFlag/MemoryIsolatorTest.ROOT_MemUsage/0 (787 ms)
[----------] 1 test from IsolationFlag/MemoryIsolatorTest (810 ms total)

[----------] Global test environment tear-down
[==========] 1108 tests from 105 test cases ran. (564233 ms total)
[  PASSED  ] 1107 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] DockerFetcherPluginTest.INTERNET_CURL_FetchImage

 1 FAILED TEST
  YOU HAVE 232 DISABLED TESTS

eer '192.10.1.4:52186': IO failed with error code: The specified network name is no longer available.

I0212 22:48:10.234997 22360 containerizer.cpp:3032] Container 480015c6-acf4-4f5c-898f-f6789020156c has exited
I0212 22:48:10.274006 20964 master.cpp:1109] Master terminating
I0212 22:48:10.275992 11940 hierarchical.cpp:644] Removed agent 3fa56582-0ae7-4da5-92a0-5877dcfa075c-S0
I0212 22:48:10.702548 19932 process.cpp:927] Stopped the socket accept loop
```

- Mesos Reviewbot Windows


On Feb. 12, 2019, 9:38 p.m., Greg Mann wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69891/
> -----------------------------------------------------------
> 
> (Updated Feb. 12, 2019, 9:38 p.m.)
> 
> 
> Review request for mesos, Gastón Kleiman and Joseph Wu.
> 
> 
> Bugs: MESOS-9541
>     https://issues.apache.org/jira/browse/MESOS-9541
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This patch makes the master send operation updates when
> an agent is removed and a framework has requested
> feedback for a pending operation on that agent.
> 
> 
> Diffs
> -----
> 
>   src/master/master.cpp f74b7c280569e1c24e0940463bb28bd795d429d5 
>   src/tests/master_tests.cpp acc6096239e4992bdca084d88880d644ab4a2385 
> 
> 
> Diff: https://reviews.apache.org/r/69891/diff/1/
> 
> 
> Testing
> -------
> 
> `make check`
> `bin/mesos-tests.sh --gtest_filter="*OperationUpdatesAfterAgentShutdown*" --gtest_repeat=-1 --gtest_break_on_failure`
> 
> 
> Thanks,
> 
> Greg Mann
> 
>


Re: Review Request 69891: Sent operation updates to schedulers when agents are removed.

Posted by Greg Mann <gr...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69891/
-----------------------------------------------------------

(Updated Feb. 12, 2019, 9:38 p.m.)


Review request for mesos, Gastón Kleiman and Joseph Wu.


Bugs: MESOS-9541
    https://issues.apache.org/jira/browse/MESOS-9541


Repository: mesos


Description
-------

This patch makes the master send operation updates when
an agent is removed and a framework has requested
feedback for a pending operation on that agent.


Diffs
-----

  src/master/master.cpp f74b7c280569e1c24e0940463bb28bd795d429d5 
  src/tests/master_tests.cpp acc6096239e4992bdca084d88880d644ab4a2385 


Diff: https://reviews.apache.org/r/69891/diff/1/


Testing
-------

`make check`
`bin/mesos-tests.sh --gtest_filter="*OperationUpdatesAfterAgentShutdown*" --gtest_repeat=-1 --gtest_break_on_failure`


Thanks,

Greg Mann


Re: Review Request 69891: Sent operation updates to schedulers when agents are removed.

Posted by Greg Mann <gr...@mesosphere.io>.

> On Feb. 5, 2019, 4:57 p.m., Vinod Kone wrote:
> > src/master/master.cpp
> > Lines 10950 (patched)
> > <https://reviews.apache.org/r/69891/diff/1/?file=2123900#file2123900line10950>
> >
> >     I don't know if sending UNREACHABLE for all the 3 cases when `_removeSlave` is called is the right way. I think we need to have a discussion around the agent state / task state / operation state for each of the cases.
> >     
> >     What happens if a framework reconciles the operation after this code gets executed. Will it always get OPERATION_UNREACHABLE? If not, then that would be cnofusing.
> >     
> >     Also, the operation status is not changed in-memory here. Is that intentional?
> 
> Greg Mann wrote:
>     Continuing this discussion on Slack: https://mesos.slack.com/archives/C8NN4M0CT/p1549066656027000
>     
>     Yep, I intended not to update the operation status in this patch. The JIRA issue is purely for sending updates to frameworks, and I intend to address all of the operation state updates as part of https://issues.apache.org/jira/browse/MESOS-9546 since that involves more significant code changes.

I'm working on an update for this patch which will ensure that we return correct reconciliation results for these operations.


- Greg


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69891/#review212551
-----------------------------------------------------------


On Feb. 5, 2019, 4:24 p.m., Greg Mann wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69891/
> -----------------------------------------------------------
> 
> (Updated Feb. 5, 2019, 4:24 p.m.)
> 
> 
> Review request for mesos, Gastón Kleiman and Joseph Wu.
> 
> 
> Bugs: MESOS-9541
>     https://issues.apache.org/jira/browse/MESOS-9541
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This patch makes the master send operation updates when
> an agent is removed and a framework has requested
> feedback for a pending operation on that agent.
> 
> 
> Diffs
> -----
> 
>   src/master/master.cpp f74b7c280569e1c24e0940463bb28bd795d429d5 
>   src/tests/master_tests.cpp acc6096239e4992bdca084d88880d644ab4a2385 
> 
> 
> Diff: https://reviews.apache.org/r/69891/diff/1/
> 
> 
> Testing
> -------
> 
> `make check`
> `bin/mesos-tests.sh --gtest_filter="*OperationUpdatesAfterAgentShutdown*" --gtest_repeat=-1 --gtest_break_on_failure`
> 
> 
> Thanks,
> 
> Greg Mann
> 
>


Re: Review Request 69891: Sent operation updates to schedulers when agents are removed.

Posted by Greg Mann <gr...@mesosphere.io>.

> On Feb. 5, 2019, 4:57 p.m., Vinod Kone wrote:
> > src/master/master.cpp
> > Lines 10950 (patched)
> > <https://reviews.apache.org/r/69891/diff/1/?file=2123900#file2123900line10950>
> >
> >     I don't know if sending UNREACHABLE for all the 3 cases when `_removeSlave` is called is the right way. I think we need to have a discussion around the agent state / task state / operation state for each of the cases.
> >     
> >     What happens if a framework reconciles the operation after this code gets executed. Will it always get OPERATION_UNREACHABLE? If not, then that would be cnofusing.
> >     
> >     Also, the operation status is not changed in-memory here. Is that intentional?

Continuing this discussion on Slack: https://mesos.slack.com/archives/C8NN4M0CT/p1549066656027000

Yep, I intended not to update the operation status in this patch. The JIRA issue is purely for sending updates to frameworks, and I intend to address all of the operation state updates as part of https://issues.apache.org/jira/browse/MESOS-9546 since that involves more significant code changes.


- Greg


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69891/#review212551
-----------------------------------------------------------


On Feb. 5, 2019, 4:24 p.m., Greg Mann wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69891/
> -----------------------------------------------------------
> 
> (Updated Feb. 5, 2019, 4:24 p.m.)
> 
> 
> Review request for mesos, Gastón Kleiman and Joseph Wu.
> 
> 
> Bugs: MESOS-9541
>     https://issues.apache.org/jira/browse/MESOS-9541
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This patch makes the master send operation updates when
> an agent is removed and a framework has requested
> feedback for a pending operation on that agent.
> 
> 
> Diffs
> -----
> 
>   src/master/master.cpp f74b7c280569e1c24e0940463bb28bd795d429d5 
>   src/tests/master_tests.cpp acc6096239e4992bdca084d88880d644ab4a2385 
> 
> 
> Diff: https://reviews.apache.org/r/69891/diff/1/
> 
> 
> Testing
> -------
> 
> `make check`
> `bin/mesos-tests.sh --gtest_filter="*OperationUpdatesAfterAgentShutdown*" --gtest_repeat=-1 --gtest_break_on_failure`
> 
> 
> Thanks,
> 
> Greg Mann
> 
>


Re: Review Request 69891: Sent operation updates to schedulers when agents are removed.

Posted by Vinod Kone <vi...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69891/#review212551
-----------------------------------------------------------




src/master/master.cpp
Lines 10950 (patched)
<https://reviews.apache.org/r/69891/#comment298369>

    I don't know if sending UNREACHABLE for all the 3 cases when `_removeSlave` is called is the right way. I think we need to have a discussion around the agent state / task state / operation state for each of the cases.
    
    What happens if a framework reconciles the operation after this code gets executed. Will it always get OPERATION_UNREACHABLE? If not, then that would be cnofusing.
    
    Also, the operation status is not changed in-memory here. Is that intentional?


- Vinod Kone


On Feb. 5, 2019, 4:24 p.m., Greg Mann wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69891/
> -----------------------------------------------------------
> 
> (Updated Feb. 5, 2019, 4:24 p.m.)
> 
> 
> Review request for mesos, Gastón Kleiman and Joseph Wu.
> 
> 
> Bugs: MESOS-9541
>     https://issues.apache.org/jira/browse/MESOS-9541
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This patch makes the master send operation updates when
> an agent is removed and a framework has requested
> feedback for a pending operation on that agent.
> 
> 
> Diffs
> -----
> 
>   src/master/master.cpp f74b7c280569e1c24e0940463bb28bd795d429d5 
>   src/tests/master_tests.cpp acc6096239e4992bdca084d88880d644ab4a2385 
> 
> 
> Diff: https://reviews.apache.org/r/69891/diff/1/
> 
> 
> Testing
> -------
> 
> `make check`
> `bin/mesos-tests.sh --gtest_filter="*OperationUpdatesAfterAgentShutdown*" --gtest_repeat=-1 --gtest_break_on_failure`
> 
> 
> Thanks,
> 
> Greg Mann
> 
>