You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by Jie Yu <yu...@gmail.com> on 2017/12/11 04:59:16 UTC

Review Request 64494: Sent resource version uuid only for agent default resources.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64494/
-----------------------------------------------------------

Review request for mesos, Benjamin Bannier and Greg Mann.


Repository: mesos


Description
-------

It's not correct to send resource version uuids for local resources
providers during agent re(registration) because the total resources from
those local resource providers are not sent in the same message.

Consider the following sequence of events:
(1) Agent disconnects
(2) Speculative operation fails in an RP, the RP bumps the version uuid
(3) Agent updates the RP’s resource version uuid
(4) Agent reregisters
(5) Master is informed about the new resource version uuid of that RP
(6) Master still has the old total of the RP
(7) Framework launch an operation assuming the old total, but with the
    new resource version uuid

This patch updated the `RegisterSlaveMessage` and
`ReregisterSlaveMessage` to only send resource version uuids for the
agent default resources.


Diffs
-----

  src/master/master.hpp 5c26f2066aae31223ffdd76ed058d5a4e63a2603 
  src/master/master.cpp b3e074cfe86600793310deb87932fa145e95055d 
  src/messages/messages.proto a13a6410891a45876b4458369c282aa4d502db08 
  src/slave/slave.cpp 373e393ca1e7c0c30c3474cc9e630e25ad92f235 
  src/tests/slave_tests.cpp 0fb2a63a5c9e7bd353c4f1b8f8f32e58e3e12e94 


Diff: https://reviews.apache.org/r/64494/diff/1/


Testing
-------

make check


Thanks,

Jie Yu


Re: Review Request 64494: Sent resource version uuid only for agent default resources.

Posted by Jie Yu <yu...@gmail.com>.

> On Dec. 11, 2017, 12:38 p.m., Benjamin Bannier wrote:
> > src/master/master.cpp
> > Lines 6300 (patched)
> > <https://reviews.apache.org/r/64494/diff/1/?file=1912395#file1912395line6300>
> >
> >     Let's init this with `None()` for consistency.

oh? I never initialize an Option to None() explicitly in other places as the default executor will initialize it to None.


> On Dec. 11, 2017, 12:38 p.m., Benjamin Bannier wrote:
> > src/messages/messages.proto
> > Line 521 (original), 522 (patched)
> > <https://reviews.apache.org/r/64494/diff/1/?file=1912396#file1912396line522>
> >
> >     `s/  / /`

What's this?


> On Dec. 11, 2017, 12:38 p.m., Benjamin Bannier wrote:
> > src/slave/slave.cpp
> > Lines 1546 (patched)
> > <https://reviews.apache.org/r/64494/diff/1/?file=1912397#file1912397line1555>
> >
> >     Could this be unconditional?

Yeah, this is probably fine. The reason I did that is because i want to mimic the old agent behavior when the capability is not set so that we can validate in the master that if resource version uuid is set, it has resource provider capability. But if we don't have that check, it's ok to always set it.


- Jie


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64494/#review193384
-----------------------------------------------------------


On Dec. 11, 2017, 4:59 a.m., Jie Yu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64494/
> -----------------------------------------------------------
> 
> (Updated Dec. 11, 2017, 4:59 a.m.)
> 
> 
> Review request for mesos, Benjamin Bannier and Greg Mann.
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> It's not correct to send resource version uuids for local resources
> providers during agent re(registration) because the total resources from
> those local resource providers are not sent in the same message.
> 
> Consider the following sequence of events:
> (1) Agent disconnects
> (2) Speculative operation fails in an RP, the RP bumps the version uuid
> (3) Agent updates the RP’s resource version uuid
> (4) Agent reregisters
> (5) Master is informed about the new resource version uuid of that RP
> (6) Master still has the old total of the RP
> (7) Framework launch an operation assuming the old total, but with the
>     new resource version uuid
> 
> This patch updated the `RegisterSlaveMessage` and
> `ReregisterSlaveMessage` to only send resource version uuids for the
> agent default resources.
> 
> 
> Diffs
> -----
> 
>   src/master/master.hpp 5c26f2066aae31223ffdd76ed058d5a4e63a2603 
>   src/master/master.cpp b3e074cfe86600793310deb87932fa145e95055d 
>   src/messages/messages.proto a13a6410891a45876b4458369c282aa4d502db08 
>   src/slave/slave.cpp 373e393ca1e7c0c30c3474cc9e630e25ad92f235 
>   src/tests/slave_tests.cpp 0fb2a63a5c9e7bd353c4f1b8f8f32e58e3e12e94 
> 
> 
> Diff: https://reviews.apache.org/r/64494/diff/1/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Jie Yu
> 
>


Re: Review Request 64494: Sent resource version uuid only for agent default resources.

Posted by Benjamin Bannier <be...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64494/#review193384
-----------------------------------------------------------


Fix it, then Ship it!





src/master/master.cpp
Lines 6300 (patched)
<https://reviews.apache.org/r/64494/#comment271937>

    Let's init this with `None()` for consistency.



src/master/master.cpp
Lines 6974 (patched)
<https://reviews.apache.org/r/64494/#comment271938>

    Keep breaking on assignment like before?



src/messages/messages.proto
Line 517 (original), 517 (patched)
<https://reviews.apache.org/r/64494/#comment271939>

    `s/uuid/UUID/`



src/messages/messages.proto
Line 521 (original), 522 (patched)
<https://reviews.apache.org/r/64494/#comment271940>

    `s/  / /`



src/messages/messages.proto
Line 575 (original), 576 (patched)
<https://reviews.apache.org/r/64494/#comment271941>

    `s/Resoruce/Resource/`
    `s/uuid/UIID/`



src/messages/messages.proto
Line 579 (original), 581 (patched)
<https://reviews.apache.org/r/64494/#comment271942>

    `s/  / /`



src/slave/slave.cpp
Lines 1546 (patched)
<https://reviews.apache.org/r/64494/#comment271943>

    Could this be unconditional?



src/slave/slave.cpp
Lines 1562 (patched)
<https://reviews.apache.org/r/64494/#comment271944>

    Ditto.



src/tests/slave_tests.cpp
Lines 8967 (patched)
<https://reviews.apache.org/r/64494/#comment271945>

    This is not needed if we always send the `optional` agent resource version.


- Benjamin Bannier


On Dec. 11, 2017, 5:59 a.m., Jie Yu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64494/
> -----------------------------------------------------------
> 
> (Updated Dec. 11, 2017, 5:59 a.m.)
> 
> 
> Review request for mesos, Benjamin Bannier and Greg Mann.
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> It's not correct to send resource version uuids for local resources
> providers during agent re(registration) because the total resources from
> those local resource providers are not sent in the same message.
> 
> Consider the following sequence of events:
> (1) Agent disconnects
> (2) Speculative operation fails in an RP, the RP bumps the version uuid
> (3) Agent updates the RP’s resource version uuid
> (4) Agent reregisters
> (5) Master is informed about the new resource version uuid of that RP
> (6) Master still has the old total of the RP
> (7) Framework launch an operation assuming the old total, but with the
>     new resource version uuid
> 
> This patch updated the `RegisterSlaveMessage` and
> `ReregisterSlaveMessage` to only send resource version uuids for the
> agent default resources.
> 
> 
> Diffs
> -----
> 
>   src/master/master.hpp 5c26f2066aae31223ffdd76ed058d5a4e63a2603 
>   src/master/master.cpp b3e074cfe86600793310deb87932fa145e95055d 
>   src/messages/messages.proto a13a6410891a45876b4458369c282aa4d502db08 
>   src/slave/slave.cpp 373e393ca1e7c0c30c3474cc9e630e25ad92f235 
>   src/tests/slave_tests.cpp 0fb2a63a5c9e7bd353c4f1b8f8f32e58e3e12e94 
> 
> 
> Diff: https://reviews.apache.org/r/64494/diff/1/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Jie Yu
> 
>


Re: Review Request 64494: Sent resource version uuid only for agent default resources.

Posted by Mesos Reviewbot Windows <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64494/#review193465
-----------------------------------------------------------



FAIL: Some Mesos tests failed.

Failed command: `D:\DCOS\mesos\src\mesos-tests.exe --verbose`

All the build artifacts available at: http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/64494

Relevant logs:

- [mesos-tests-stdout.log](http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/64494/logs/mesos-tests-stdout.log):

```

[----------] 1 test from IsolationFlag/CpuIsolatorTest
[ RUN      ] IsolationFlag/CpuIsolatorTest.ROOT_UserCpuUsage/0
[       OK ] IsolationFlag/CpuIsolatorTest.ROOT_UserCpuUsage/0 (2305 ms)
[----------] 1 test from IsolationFlag/CpuIsolatorTest (2328 ms total)

[----------] 1 test from IsolationFlag/MemoryIsolatorTest
[ RUN      ] IsolationFlag/MemoryIsolatorTest.ROOT_MemUsage/0
[       OK ] IsolationFlag/MemoryIsolatorTest.ROOT_MemUsage/0 (2263 ms)
[----------] 1 test from IsolationFlag/MemoryIsolatorTest (2286 ms total)

[----------] Global test environment tear-down
[==========] 825 tests from 84 test cases ran. (305141 ms total)
[  PASSED  ] 815 tests.
[  FAILED  ] 10 tests, listed below:
[  FAILED  ] OfferOperationStatusUpdateManagerTest.UpdateAndAckNonTerminalUpdate
[  FAILED  ] OfferOperationStatusUpdateManagerTest.RecoverCheckpointedStream
[  FAILED  ] OfferOperationStatusUpdateManagerTest.RecoverEmptyFile
[  FAILED  ] OfferOperationStatusUpdateManagerTest.RecoverTerminatedStream
[  FAILED  ] OfferOperationStatusUpdateManagerTest.IgnoreDuplicateUpdate
[  FAILED  ] OfferOperationStatusUpdateManagerTest.IgnoreDuplicateUpdateAfterRecover
[  FAILED  ] OfferOperationStatusUpdateManagerTest.RejectDuplicateAck
[  FAILED  ] OfferOperationStatusUpdateManagerTest.RejectDuplicateAckAfterRecover
[  FAILED  ] OfferOperationStatusUpdateManagerTest.NonStrictRecoveryCorruptedFile
[  FAILED  ] SlaveTest.ResourceProviderPublishAll

10 FAILED TESTS
  YOU HAVE 204 DISABLED TESTS

```

- [mesos-tests-stderr.log](http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/64494/logs/mesos-tests-stderr.log):

```
I1211 23:15:11.661341  1136 exec.cpp:237] Executor registered on agent 9f03c4cc-a0ff-41dd-8c03-3f9eeb2e9489-S0
I1211 23:15:11.664343  1220 executor.cpp:171] Received SUBSCRIBED event
I1211 23:15:11.668361  1220 executor.cpp:175] Subscribed executor on build-srv-04.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net
I1211 23:15:11.669363  1220 executor.cpp:171] Received LAUNCH event
I1211 23:15:11.672340  1220 executor.cpp:637] Starting task da7be4e2-b07c-4fcc-a757-17d712a96fda
I1211 23:15:11.744361  1220 executor.cpp:477] Running 'D:\DCOS\mesos\src\mesos-containerizer.exe launch <POSSIBLY-SENSITIVE-DATA>'
I1211 23:15:12.259343  1220 executor.cpp:650] Forked command at 4304
I1211 23:15:12.284334  7916 exec.cpp:435] Executor asked to shutdown
I1211 23:15:12.284334  1240 executor.cpp:171] Received SHUTDOWN event
I1211 23:15:12.284334  1240 executor.cpp:747] Shutting down
I1211 23:15:12.285333  1240 executor.cpp:854] Sending SIGTERM to process tree at pid 4pp:3400] Shutting down framework 9f03c4cc-a0ff-41dd-8c03-3f9eeb2e9489-0000
I1211 23:15:12.282335  5800 hierarchical.cpp:405] Deactivated framework 9f03c4cc-a0ff-41dd-8c03-3f9eeb2e9489-0000
I1211 23:15:12.282335  3848 master.cpp:10114] Updating the state of task da7be4e2-b07c-4fcc-a757-17d712a96fda of framework 9f03c4cc-a0ff-41dd-8c03-3f9eeb2e9489-0000 (latest state: TASK_KILLED, status update state: TASK_KILLED)
I1211 23:15:12.283334  2044 slave.cpp:6091] Shutting down executor 'da7be4e2-b07c-4fcc-a757-17d712a96fda' of framework 9f03c4cc-a0ff-41dd-8c03-3f9eeb2e9489-0000 at executor(1)@10.3.1.5:65256
I1211 23:15:12.284334  7784 slave.cpp:909] Agent terminating
W1211 23:15:12.284334  7784 slave.cpp:3396] Ignoring shutdown framework 9f03c4cc-a0ff-41dd-8c03-3f9eeb2e9489-0000 because it is terminating
I1211 23:15:12.285333  3848 master.cpp:10220] Removing task da7be4e2-b07c-4fcc-a757-17d712a96fda with resources cpus(allocated: *):4; mem(allocated: *):2048; disk(allocated: *):1024; ports(allocated: *):[31000-32000] of framework 9f03c4cc-a0ff-41dd-8c03-3f9eeb2e9489-0000 on agent 9f03c4cc-a0ff-41dd-8c03-3f9eeb2e9489-S0 at slave(326)@10.3.1.5:65235 (build-srv-04.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net)
I1211 23:15:12.286334  5800 containerizer.cpp:2328] Destroying container a658b9b2-7377-4da7-b6ec-cb3fbc75f326 in RUNNING state
I1211 23:15:12.287334  5800 containerizer.cpp:2930] Transitioning the state of container a658b9b2-7377-4da7-b6ec-cb3fbc75f326 from RUNNING to DESTROYING
I1211 23:15:12.287334  3848 master.cpp:1305] Agent 9f03c4cc-a0ff-41dd-8c03-3f9eeb2e9489-S0 at slave(326)@10.3.1.5:65235 (build-srv-04.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net) disconnected
I1211 23:15:12.287334  3848 master.cpp:3364] Disconnecting agent 9f03c4cc-a0ff-41dd-8c03-3f9eeb2e9489-S0 at slave(326)@10.3.1.5:65235 (build-srv-04.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net)
I1211 23:15:12.287334  5800 launcher.cpp:156] Asked to destroy container a658b9b2-7377-4da7-b6ec-cb3fbc75f326
I1211 23:15:12.288336  3848 master.cpp:3383] Deactivating agent 9f03c4cc-a0ff-41dd-8c03-3f9eeb2e9489-S0 at slave(326)@10.3.1.5:65235 (build-srv-04.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net)
I1211 23:15:12.288336  6760 hierarchical.cpp:344] Removed framework 9f03c4cc-a0ff-41dd-8c03-3f9eeb2e9489-0000
I1211 23:15:12.288336  2044 hierarchical.cpp:762] Agent 9f03c4cc-a0ff-41dd-8c03-3f9eeb2e9489-S0 deactivated
I1211 23:15:12.328429  5800 containerizer.cpp:2779] Container a658b9b2-7377-4da7-b6ec-cb3fbc75f326 has exited
I1211 23:15:12.355486  7784 master.cpp:1147] Master terminating
I1211 23:15:12.357473  2044 hierarchical.cpp:605] Removed agent 9f03c4cc-a0ff-41dd-8c03-3f9eeb2e9489-S0
I1211 23:15:12.643465  6016 process.cpp:887] Failed to accept socket: future discarded
```

- Mesos Reviewbot Windows


On Dec. 11, 2017, 8:31 p.m., Jie Yu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64494/
> -----------------------------------------------------------
> 
> (Updated Dec. 11, 2017, 8:31 p.m.)
> 
> 
> Review request for mesos, Benjamin Bannier and Greg Mann.
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> It's not correct to send resource version uuids for local resources
> providers during agent re(registration) because the total resources from
> those local resource providers are not sent in the same message.
> 
> Consider the following sequence of events:
> (1) Agent disconnects
> (2) Speculative operation fails in an RP, the RP bumps the version uuid
> (3) Agent updates the RP’s resource version uuid
> (4) Agent reregisters
> (5) Master is informed about the new resource version uuid of that RP
> (6) Master still has the old total of the RP
> (7) Framework launch an operation assuming the old total, but with the
>     new resource version uuid
> 
> This patch updated the `RegisterSlaveMessage` and
> `ReregisterSlaveMessage` to only send resource version uuids for the
> agent default resources.
> 
> 
> Diffs
> -----
> 
>   src/master/master.hpp 5c26f2066aae31223ffdd76ed058d5a4e63a2603 
>   src/master/master.cpp 55e919504b14f35415015ffb97c455f789733c6a 
>   src/messages/messages.proto a13a6410891a45876b4458369c282aa4d502db08 
>   src/slave/slave.cpp 373e393ca1e7c0c30c3474cc9e630e25ad92f235 
>   src/tests/slave_tests.cpp 0fb2a63a5c9e7bd353c4f1b8f8f32e58e3e12e94 
> 
> 
> Diff: https://reviews.apache.org/r/64494/diff/2/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Jie Yu
> 
>


Re: Review Request 64494: Sent resource version uuid only for agent default resources.

Posted by Jie Yu <yu...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64494/
-----------------------------------------------------------

(Updated Dec. 11, 2017, 8:31 p.m.)


Review request for mesos, Benjamin Bannier and Greg Mann.


Changes
-------

Addressed comments.


Repository: mesos


Description
-------

It's not correct to send resource version uuids for local resources
providers during agent re(registration) because the total resources from
those local resource providers are not sent in the same message.

Consider the following sequence of events:
(1) Agent disconnects
(2) Speculative operation fails in an RP, the RP bumps the version uuid
(3) Agent updates the RP’s resource version uuid
(4) Agent reregisters
(5) Master is informed about the new resource version uuid of that RP
(6) Master still has the old total of the RP
(7) Framework launch an operation assuming the old total, but with the
    new resource version uuid

This patch updated the `RegisterSlaveMessage` and
`ReregisterSlaveMessage` to only send resource version uuids for the
agent default resources.


Diffs (updated)
-----

  src/master/master.hpp 5c26f2066aae31223ffdd76ed058d5a4e63a2603 
  src/master/master.cpp 55e919504b14f35415015ffb97c455f789733c6a 
  src/messages/messages.proto a13a6410891a45876b4458369c282aa4d502db08 
  src/slave/slave.cpp 373e393ca1e7c0c30c3474cc9e630e25ad92f235 
  src/tests/slave_tests.cpp 0fb2a63a5c9e7bd353c4f1b8f8f32e58e3e12e94 


Diff: https://reviews.apache.org/r/64494/diff/2/

Changes: https://reviews.apache.org/r/64494/diff/1-2/


Testing
-------

make check


Thanks,

Jie Yu


Re: Review Request 64494: Sent resource version uuid only for agent default resources.

Posted by Mesos Reviewbot Windows <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64494/#review193415
-----------------------------------------------------------



FAIL: Failed to apply the current review.

Failed command: `python.exe .\support\apply-reviews.py -n -r 64494`

All the build artifacts available at: http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/64494

Relevant logs:

- [apply-review-64494-stderr.log](http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/64494/logs/apply-review-64494-stderr.log):

```
Traceback (most recent call last):
  File ".\support\apply-reviews.py", line 434, in <module>
    main()
  File ".\support\apply-reviews.py", line 429, in main
    reviewboard(options)
  File ".\support\apply-reviews.py", line 419, in reviewboard
    apply_review(options)
  File ".\support\apply-reviews.py", line 160, in apply_review
    commit_patch(options)
  File ".\support\apply-reviews.py", line 271, in commit_patch
    message.write(data['message'])
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 430: ordinal not in range(128)
```

- Mesos Reviewbot Windows


On Dec. 11, 2017, 4:59 a.m., Jie Yu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64494/
> -----------------------------------------------------------
> 
> (Updated Dec. 11, 2017, 4:59 a.m.)
> 
> 
> Review request for mesos, Benjamin Bannier and Greg Mann.
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> It's not correct to send resource version uuids for local resources
> providers during agent re(registration) because the total resources from
> those local resource providers are not sent in the same message.
> 
> Consider the following sequence of events:
> (1) Agent disconnects
> (2) Speculative operation fails in an RP, the RP bumps the version uuid
> (3) Agent updates the RP’s resource version uuid
> (4) Agent reregisters
> (5) Master is informed about the new resource version uuid of that RP
> (6) Master still has the old total of the RP
> (7) Framework launch an operation assuming the old total, but with the
>     new resource version uuid
> 
> This patch updated the `RegisterSlaveMessage` and
> `ReregisterSlaveMessage` to only send resource version uuids for the
> agent default resources.
> 
> 
> Diffs
> -----
> 
>   src/master/master.hpp 5c26f2066aae31223ffdd76ed058d5a4e63a2603 
>   src/master/master.cpp b3e074cfe86600793310deb87932fa145e95055d 
>   src/messages/messages.proto a13a6410891a45876b4458369c282aa4d502db08 
>   src/slave/slave.cpp 373e393ca1e7c0c30c3474cc9e630e25ad92f235 
>   src/tests/slave_tests.cpp 0fb2a63a5c9e7bd353c4f1b8f8f32e58e3e12e94 
> 
> 
> Diff: https://reviews.apache.org/r/64494/diff/1/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Jie Yu
> 
>