You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by Benjamin Bannier <be...@mesosphere.io> on 2017/12/15 13:15:58 UTC

Review Request 64648: Fixed handling of resource versions in agent oversubscribed updates.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64648/
-----------------------------------------------------------

Review request for mesos, Jie Yu and Jan Schlicht.


Repository: mesos


Description
-------

Agents can use 'UpdateSlaveMessage' to send updates on their
oversubscribed resources, their resource provider state, or both. We
previously assumed that 'UpdateSlaveMessage' from a resource
provider-capable agent would always contain the most recent resource
version of the agent even thought the field is marked 'optional'.

This patch simplifies the handling in the master to not assert a set
resource version for 'UpdateSlaveMessage' for resource
provider-capable agents. Instead we explicitly and unconditionally
check whether the field is set and handle only set values.


Diffs
-----

  src/master/master.cpp e082da8267fa22c26818c67bd6da573fe1808696 


Diff: https://reviews.apache.org/r/64648/diff/1/


Testing
-------

Tested on a number of platforms and setups in internal CI.


Thanks,

Benjamin Bannier


Re: Review Request 64648: Fixed handling of resource versions in agent oversubscribed updates.

Posted by Jie Yu <yu...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64648/#review193932
-----------------------------------------------------------


Ship it!




Ship It!

- Jie Yu


On Dec. 15, 2017, 1:15 p.m., Benjamin Bannier wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64648/
> -----------------------------------------------------------
> 
> (Updated Dec. 15, 2017, 1:15 p.m.)
> 
> 
> Review request for mesos, Jie Yu and Jan Schlicht.
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Agents can use 'UpdateSlaveMessage' to send updates on their
> oversubscribed resources, their resource provider state, or both. We
> previously assumed that 'UpdateSlaveMessage' from a resource
> provider-capable agent would always contain the most recent resource
> version of the agent even thought the field is marked 'optional'.
> 
> This patch simplifies the handling in the master to not assert a set
> resource version for 'UpdateSlaveMessage' for resource
> provider-capable agents. Instead we explicitly and unconditionally
> check whether the field is set and handle only set values.
> 
> 
> Diffs
> -----
> 
>   src/master/master.cpp e082da8267fa22c26818c67bd6da573fe1808696 
> 
> 
> Diff: https://reviews.apache.org/r/64648/diff/1/
> 
> 
> Testing
> -------
> 
> Tested on a number of platforms and setups in internal CI.
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>


Re: Review Request 64648: Fixed handling of resource versions in agent oversubscribed updates.

Posted by Mesos Reviewbot Windows <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64648/#review193912
-----------------------------------------------------------



FAIL: Some Mesos tests failed.

Reviews applied: `['64648']`

Failed command: `D:\DCOS\mesos\src\mesos-tests.exe --verbose`

All the build artifacts available at: http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/64648

Relevant logs:

- [mesos-tests-stdout.log](http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/64648/logs/mesos-tests-stdout.log):

```

[----------] 1 test from IsolationFlag/CpuIsolatorTest
[ RUN      ] IsolationFlag/CpuIsolatorTest.ROOT_UserCpuUsage/0
[       OK ] IsolationFlag/CpuIsolatorTest.ROOT_UserCpuUsage/0 (2425 ms)
[----------] 1 test from IsolationFlag/CpuIsolatorTest (2450 ms total)

[----------] 1 test from IsolationFlag/MemoryIsolatorTest
[ RUN      ] IsolationFlag/MemoryIsolatorTest.ROOT_MemUsage/0
[       OK ] IsolationFlag/MemoryIsolatorTest.ROOT_MemUsage/0 (2374 ms)
[----------] 1 test from IsolationFlag/MemoryIsolatorTest (2400 ms total)

[----------] Global test environment tear-down
[==========] 835 tests from 85 test cases ran. (319626 ms total)
[  PASSED  ] 825 tests.
[  FAILED  ] 10 tests, listed below:
[  FAILED  ] OfferOperationStatusUpdateManagerTest.UpdateAndAckNonTerminalUpdate
[  FAILED  ] OfferOperationStatusUpdateManagerTest.RecoverCheckpointedStream
[  FAILED  ] OfferOperationStatusUpdateManagerTest.RecoverEmptyFile
[  FAILED  ] OfferOperationStatusUpdateManagerTest.RecoverTerminatedStream
[  FAILED  ] OfferOperationStatusUpdateManagerTest.IgnoreDuplicateUpdate
[  FAILED  ] OfferOperationStatusUpdateManagerTest.IgnoreDuplicateUpdateAfterRecover
[  FAILED  ] OfferOperationStatusUpdateManagerTest.RejectDuplicateAck
[  FAILED  ] OfferOperationStatusUpdateManagerTest.RejectDuplicateAckAfterRecover
[  FAILED  ] OfferOperationStatusUpdateManagerTest.NonStrictRecoveryCorruptedFile
[  FAILED  ] OfferOperationStatusUpdateManagerTest.UpdateLatestWhenResending

10 FAILED TESTS
  YOU HAVE 205 DISABLED TESTS

```

- [mesos-tests-stderr.log](http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/64648/logs/mesos-tests-stderr.log):

```
I1215 14:11:51.022387  6896 slave.cpp:3401] Shutting down framework 1f3c1bf2-6079-4fd6-937f-984bfd86e2ce-0000
I1215 14:11:51.022387  6896 slave.cpp:6109] Shutting down executor '891703b9-fd6f-4f81-b804-5d47fc675f0c' of framI1215 14:11:50.339382  7592 exec.cpp:162] Version: 1.5.0
I1215 14:11:50.363382  3216 exec.cpp:237] Executor registered on agent 1f3c1bf2-6079-4fd6-937f-984bfd86e2ce-S0
I1215 14:11:50.366389  1476 executor.cpp:171] Received SUBSCRIBED event
I1215 14:11:50.371387  1476 executor.cpp:175] Subscribed executor on build-srv-03.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net
I1215 14:11:50.371387  1476 executor.cpp:171] Received LAUNCH event
I1215 14:11:50.375386  1476 executor.cpp:638] Starting task 891703b9-fd6f-4f81-b804-5d47fc675f0c
I1215 14:11:50.454388  1476 executor.cpp:478] Running 'D:\DCOS\mesos\src\mesos-containerizer.exe launch <POSSIBLY-SENSITIVE-DATA>'
I1215 14:11:50.997388  1476 executor.cpp:651] Forked command at 7272
I1215 14:11:51.025388  9772 exec.cpp:435] Executor asked to shutdown
I1215 14:11:51.025388  1476 executor.cpp:171] Received SHUTDOWN event
I1215 14:11:51.025388  1476 executor.cpp:748] Shutting down
I1215 14:11:51.025388  1476 executor.cpp:855] Sending SIGTERM to process tree at pid 7ework 1f3c1bf2-6079-4fd6-937f-984bfd86e2ce-0000 at executor(1)@10.3.1.11:64964
I1215 14:11:51.023658  4796 master.cpp:10156] Updating the state of task 891703b9-fd6f-4f81-b804-5d47fc675f0c of framework 1f3c1bf2-6079-4fd6-937f-984bfd86e2ce-0000 (latest state: TASK_KILLED, status update state: TASK_KILLED)
I1215 14:11:51.024386  6896 slave.cpp:909] Agent terminating
W1215 14:11:51.024386  6896 slave.cpp:3397] Ignoring shutdown framework 1f3c1bf2-6079-4fd6-937f-984bfd86e2ce-0000 because it is terminating
I1215 14:11:51.025388  4796 master.cpp:10262] Removing task 891703b9-fd6f-4f81-b804-5d47fc675f0c with resources cpus(allocated: *):4; mem(allocated: *):2048; disk(allocated: *):1024; ports(allocated: *):[31000-32000] of framework 1f3c1bf2-6079-4fd6-937f-984bfd86e2ce-0000 on agent 1f3c1bf2-6079-4fd6-937f-984bfd86e2ce-S0 at slave(327)@10.3.1.11:64942 (build-srv-03.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net)
I1215 14:11:51.026387  8100 containerizer.cpp:2337] Destroying container 99967f28-5cd5-483e-b317-47b4dfbf6fb7 in RUNNING state
I1215 14:11:51.027398  8100 containerizer.cpp:2939] Transitioning the state of container 99967f28-5cd5-483e-b317-47b4dfbf6fb7 from RUNNING to DESTROYING
I1215 14:11:51.028406  4796 master.cpp:1305] Agent 1f3c1bf2-6079-4fd6-937f-984bfd86e2ce-S0 at slave(327)@10.3.1.11:64942 (build-srv-03.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net) disconnected
I1215 14:11:51.028406  4796 master.cpp:3364] Disconnecting agent 1f3c1bf2-6079-4fd6-937f-984bfd86e2ce-S0 at slave(327)@10.3.1.11:64942 (build-srv-03.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net)
I1215 14:11:51.028406  4796 master.cpp:3383] Deactivating agent 1f3c1bf2-6079-4fd6-937f-984bfd86e2ce-S0 at slave(327)@10.3.1.11:64942 (build-srv-03.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net)
I1215 14:11:51.028406  8100 launcher.cpp:156] Asked to destroy container 99967f28-5cd5-483e-b317-47b4dfbf6fb7
I1215 14:11:51.029387  1120 hierarchical.cpp:344] Removed framework 1f3c1bf2-6079-4fd6-937f-984bfd86e2ce-0000
I1215 14:11:51.029387  1120 hierarchical.cpp:766] Agent 1f3c1bf2-6079-4fd6-937f-984bfd86e2ce-S0 deactivated
I1215 14:11:51.038398  8100 containerizer.cpp:2788] Container 99967f28-5cd5-483e-b317-47b4dfbf6fb7 has exited
I1215 14:11:51.068393  8880 master.cpp:1147] Master terminating
I1215 14:11:51.070389  4660 hierarchical.cpp:609] Removed agent 1f3c1bf2-6079-4fd6-937f-984bfd86e2ce-S0
I1215 14:11:51.386394  8640 process.cpp:887] Failed to accept socket: future discarded
```

- Mesos Reviewbot Windows


On Dec. 15, 2017, 1:15 p.m., Benjamin Bannier wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64648/
> -----------------------------------------------------------
> 
> (Updated Dec. 15, 2017, 1:15 p.m.)
> 
> 
> Review request for mesos, Jie Yu and Jan Schlicht.
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Agents can use 'UpdateSlaveMessage' to send updates on their
> oversubscribed resources, their resource provider state, or both. We
> previously assumed that 'UpdateSlaveMessage' from a resource
> provider-capable agent would always contain the most recent resource
> version of the agent even thought the field is marked 'optional'.
> 
> This patch simplifies the handling in the master to not assert a set
> resource version for 'UpdateSlaveMessage' for resource
> provider-capable agents. Instead we explicitly and unconditionally
> check whether the field is set and handle only set values.
> 
> 
> Diffs
> -----
> 
>   src/master/master.cpp e082da8267fa22c26818c67bd6da573fe1808696 
> 
> 
> Diff: https://reviews.apache.org/r/64648/diff/1/
> 
> 
> Testing
> -------
> 
> Tested on a number of platforms and setups in internal CI.
> 
> 
> Thanks,
> 
> Benjamin Bannier
> 
>