You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by Greg Mann <gr...@mesosphere.io> on 2017/12/08 23:46:15 UTC

Review Request 64464: Made master reconcile known offer operations with agent.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64464/
-----------------------------------------------------------

Review request for mesos, Benjamin Bannier, Gaston Kleiman, and Jie Yu.


Bugs: MESOS-8195
    https://issues.apache.org/jira/browse/MESOS-8195


Repository: mesos


Description
-------

In cases where the agent fails over or where an `UpdateSlaveMessage`
races with an `ApplyOfferOperationMessage`, it's possible that the
master knows about an offer operation which is not contained in an
`UpdateSlaveMessage`. In such cases, the master should send a
`ReconcileOfferOperations` message to the agent. The agent will
then respond by sending OFFER_OPERATION_DROPPED status updates for
any operations which it does not know about.


Diffs
-----

  src/master/master.cpp b3e074cfe86600793310deb87932fa145e95055d 


Diff: https://reviews.apache.org/r/64464/diff/1/


Testing
-------

make check


Thanks,

Greg Mann


Re: Review Request 64464: Made master reconcile known offer operations with agent.

Posted by Mesos Reviewbot Windows <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64464/#review193408
-----------------------------------------------------------



FAIL: Some Mesos tests failed.

Reviews applied: `['64457', '64458', '64462', '64463', '64464']`

Failed command: `D:\DCOS\mesos\src\mesos-tests.exe --verbose`

All the build artifacts available at: http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/64464

Relevant logs:

- [mesos-tests-stdout.log](http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/64464/logs/mesos-tests-stdout.log):

```

[----------] 1 test from IsolationFlag/CpuIsolatorTest
[ RUN      ] IsolationFlag/CpuIsolatorTest.ROOT_UserCpuUsage/0
[       OK ] IsolationFlag/CpuIsolatorTest.ROOT_UserCpuUsage/0 (2318 ms)
[----------] 1 test from IsolationFlag/CpuIsolatorTest (2340 ms total)

[----------] 1 test from IsolationFlag/MemoryIsolatorTest
[ RUN      ] IsolationFlag/MemoryIsolatorTest.ROOT_MemUsage/0
[       OK ] IsolationFlag/MemoryIsolatorTest.ROOT_MemUsage/0 (2267 ms)
[----------] 1 test from IsolationFlag/MemoryIsolatorTest (2289 ms total)

[----------] Global test environment tear-down
[==========] 825 tests from 84 test cases ran. (306961 ms total)
[  PASSED  ] 815 tests.
[  FAILED  ] 10 tests, listed below:
[  FAILED  ] OfferOperationStatusUpdateManagerTest.UpdateAndAckNonTerminalUpdate
[  FAILED  ] OfferOperationStatusUpdateManagerTest.RecoverCheckpointedStream
[  FAILED  ] OfferOperationStatusUpdateManagerTest.RecoverEmptyFile
[  FAILED  ] OfferOperationStatusUpdateManagerTest.RecoverTerminatedStream
[  FAILED  ] OfferOperationStatusUpdateManagerTest.IgnoreDuplicateUpdate
[  FAILED  ] OfferOperationStatusUpdateManagerTest.IgnoreDuplicateUpdateAfterRecover
[  FAILED  ] OfferOperationStatusUpdateManagerTest.RejectDuplicateAck
[  FAILED  ] OfferOperationStatusUpdateManagerTest.RejectDuplicateAckAfterRecover
[  FAILED  ] OfferOperationStatusUpdateManagerTest.NonStrictRecoveryCorruptedFile
[  FAILED  ] SlaveTest.ResourceProviderPublishAll

10 FAILED TESTS
  YOU HAVE 204 DISABLED TESTS

```

- [mesos-tests-stderr.log](http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/64464/logs/mesos-tests-stderr.log):

```
I1211 17:58:04.948421  4308 executor.cpp:171] Received SUBSCRIBED event
I1211 17:58:04.952181  4308 executor.cpp:175] Subscribed executor on build-srv-04.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net
I1211 17:58:04.953182  4308 executor.cpp:171] Received LAUNCH event
I1211 17:58:04.956182  4308 executor.cpp:637] Starting task 61c1e9db-979d-4f24-a6e8-f4b778638371
I1211 17:58:05.028184  4308 executor.cpp:477] Running 'D:\DCOS\mesos\src\mesos-containerizer.exe launch <POSSIBLY-SENSITIVE-DATA>'
I1211 17:58:05.531175  4308 executor.cpp:650] Forked command at 6512
I1211 17:58:05.557174  2020 exec.cpp:435] Executor asked to shutdown
I1211 17:58:05.558176  4308 executor.cpp:171] Received SHUTDOWN event
I1211 17:58:05.558176  4308 executor.cpp:747] Shutting down
I1211 17:58:05.558176  4308 executor.cpp:854] Sending SIGTERM to process tree at pid 629-71bb7e019a9f@10.3.1.5:59670
I1211 17:58:05.556175  1640 hierarchical.cpp:405] Deactivated framework b24356ce-7f3c-4f21-b78c-6f012fc8a020-0000
I1211 17:58:05.556175  8680 master.cpp:10115] Updating the state of task 61c1e9db-979d-4f24-a6e8-f4b778638371 of framework b24356ce-7f3c-4f21-b78c-6f012fc8a020-0000 (latest state: TASK_KILLED, status update state: TASK_KILLED)
I1211 17:58:05.556175  8984 slave.cpp:3400] Shutting down framework b24356ce-7f3c-4f21-b78c-6f012fc8a020-0000
I1211 17:58:05.556175  8984 slave.cpp:6091] Shutting down executor '61c1e9db-979d-4f24-a6e8-f4b778638371' of framework b24356ce-7f3c-4f21-b78c-6f012fc8a020-0000 at executor(1)@10.3.1.5:59691
I1211 17:58:05.557174  8984 slave.cpp:909] Agent terminating
W1211 17:58:05.557174  8984 slave.cpp:3396] Ignoring shutdown framework b24356ce-7f3c-4f21-b78c-6f012fc8a020-0000 because it is terminating
I1211 17:58:05.558176  8680 master.cpp:10221] Removing task 61c1e9db-979d-4f24-a6e8-f4b778638371 with resources cpus(allocated: *):4; mem(allocated: *):2048; disk(allocated: *):1024; ports(allocated: *):[31000-32000] of framework b24356ce-7f3c-4f21-b78c-6f012fc8a020-0000 on agent b24356ce-7f3c-4f21-b78c-6f012fc8a020-S0 at slave(326)@10.3.1.5:59670 (build-srv-04.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net)
I1211 17:58:05.560175  6484 containerizer.cpp:2328] Destroying container 7f665881-5c1f-4b11-b437-f0f3c42c187c in RUNNING state
I1211 17:58:05.560175  6484 containerizer.cpp:2930] Transitioning the state of container 7f665881-5c1f-4b11-b437-f0f3c42c187c from RUNNING to DESTROYING
I1211 17:58:05.561177  8680 master.cpp:1310] Agent b24356ce-7f3c-4f21-b78c-6f012fc8a020-S0 at slave(326)@10.3.1.5:59670 (build-srv-04.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net) disconnected
I1211 17:58:05.561177  8680 master.cpp:3369] Disconnecting agent b24356ce-7f3c-4f21-b78c-6f012fc8a020-S0 at slave(326)@10.3.1.5:59670 (build-srv-04.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net)
I1211 17:58:05.561177  6484 launcher.cpp:156] Asked to destroy container 7f665881-5c1f-4b11-b437-f0f3c42c187c
I1211 17:58:05.561177  8680 master.cpp:3388] Deactivating agent b24356ce-7f3c-4f21-b78c-6f012fc8a020-S0 at slave(326)@10.3.1.5:59670 (build-srv-04.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net)
I1211 17:58:05.561177  8984 hierarchical.cpp:344] Removed framework b24356ce-7f3c-4f21-b78c-6f012fc8a020-0000
I1211 17:58:05.561177  8984 hierarchical.cpp:762] Agent b24356ce-7f3c-4f21-b78c-6f012fc8a020-S0 deactivated
I1211 17:58:05.653242  8680 containerizer.cpp:2779] Container 7f665881-5c1f-4b11-b437-f0f3c42c187c has exited
I1211 17:58:05.681249  8160 master.cpp:1152] Master terminating
I1211 17:58:05.683250  8680 hierarchical.cpp:605] Removed agent b24356ce-7f3c-4f21-b78c-6f012fc8a020-S0
I1211 17:58:05.957247   576 process.cpp:887] Failed to accept socket: future discarded
```

- Mesos Reviewbot Windows


On Dec. 8, 2017, 11:46 p.m., Greg Mann wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64464/
> -----------------------------------------------------------
> 
> (Updated Dec. 8, 2017, 11:46 p.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Gaston Kleiman, and Jie Yu.
> 
> 
> Bugs: MESOS-8195
>     https://issues.apache.org/jira/browse/MESOS-8195
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> In cases where the agent fails over or where an `UpdateSlaveMessage`
> races with an `ApplyOfferOperationMessage`, it's possible that the
> master knows about an offer operation which is not contained in an
> `UpdateSlaveMessage`. In such cases, the master should send a
> `ReconcileOfferOperations` message to the agent. The agent will
> then respond by sending OFFER_OPERATION_DROPPED status updates for
> any operations which it does not know about.
> 
> 
> Diffs
> -----
> 
>   src/master/master.cpp b3e074cfe86600793310deb87932fa145e95055d 
> 
> 
> Diff: https://reviews.apache.org/r/64464/diff/1/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Greg Mann
> 
>


Re: Review Request 64464: Made master reconcile known offer operations with agent.

Posted by Gaston Kleiman <ga...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64464/#review193306
-----------------------------------------------------------


Ship it!




Ship It!

- Gaston Kleiman


On Dec. 8, 2017, 3:46 p.m., Greg Mann wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64464/
> -----------------------------------------------------------
> 
> (Updated Dec. 8, 2017, 3:46 p.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Gaston Kleiman, and Jie Yu.
> 
> 
> Bugs: MESOS-8195
>     https://issues.apache.org/jira/browse/MESOS-8195
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> In cases where the agent fails over or where an `UpdateSlaveMessage`
> races with an `ApplyOfferOperationMessage`, it's possible that the
> master knows about an offer operation which is not contained in an
> `UpdateSlaveMessage`. In such cases, the master should send a
> `ReconcileOfferOperations` message to the agent. The agent will
> then respond by sending OFFER_OPERATION_DROPPED status updates for
> any operations which it does not know about.
> 
> 
> Diffs
> -----
> 
>   src/master/master.cpp b3e074cfe86600793310deb87932fa145e95055d 
> 
> 
> Diff: https://reviews.apache.org/r/64464/diff/1/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Greg Mann
> 
>


Re: Review Request 64464: Made master reconcile known offer operations with agent.

Posted by Jie Yu <yu...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64464/#review193473
-----------------------------------------------------------


Ship it!




Ship It!

- Jie Yu


On Dec. 8, 2017, 11:46 p.m., Greg Mann wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64464/
> -----------------------------------------------------------
> 
> (Updated Dec. 8, 2017, 11:46 p.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Gaston Kleiman, and Jie Yu.
> 
> 
> Bugs: MESOS-8195
>     https://issues.apache.org/jira/browse/MESOS-8195
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> In cases where the agent fails over or where an `UpdateSlaveMessage`
> races with an `ApplyOfferOperationMessage`, it's possible that the
> master knows about an offer operation which is not contained in an
> `UpdateSlaveMessage`. In such cases, the master should send a
> `ReconcileOfferOperations` message to the agent. The agent will
> then respond by sending OFFER_OPERATION_DROPPED status updates for
> any operations which it does not know about.
> 
> 
> Diffs
> -----
> 
>   src/master/master.cpp b3e074cfe86600793310deb87932fa145e95055d 
> 
> 
> Diff: https://reviews.apache.org/r/64464/diff/1/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Greg Mann
> 
>