You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by Jie Yu <yu...@gmail.com> on 2017/12/11 04:59:16 UTC
Review Request 64494: Sent resource version uuid only for agent
default resources.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64494/
-----------------------------------------------------------
Review request for mesos, Benjamin Bannier and Greg Mann.
Repository: mesos
Description
-------
It's not correct to send resource version uuids for local resources
providers during agent re(registration) because the total resources from
those local resource providers are not sent in the same message.
Consider the following sequence of events:
(1) Agent disconnects
(2) Speculative operation fails in an RP, the RP bumps the version uuid
(3) Agent updates the RP’s resource version uuid
(4) Agent reregisters
(5) Master is informed about the new resource version uuid of that RP
(6) Master still has the old total of the RP
(7) Framework launch an operation assuming the old total, but with the
new resource version uuid
This patch updated the `RegisterSlaveMessage` and
`ReregisterSlaveMessage` to only send resource version uuids for the
agent default resources.
Diffs
-----
src/master/master.hpp 5c26f2066aae31223ffdd76ed058d5a4e63a2603
src/master/master.cpp b3e074cfe86600793310deb87932fa145e95055d
src/messages/messages.proto a13a6410891a45876b4458369c282aa4d502db08
src/slave/slave.cpp 373e393ca1e7c0c30c3474cc9e630e25ad92f235
src/tests/slave_tests.cpp 0fb2a63a5c9e7bd353c4f1b8f8f32e58e3e12e94
Diff: https://reviews.apache.org/r/64494/diff/1/
Testing
-------
make check
Thanks,
Jie Yu
Re: Review Request 64494: Sent resource version uuid only for agent
default resources.
Posted by Jie Yu <yu...@gmail.com>.
> On Dec. 11, 2017, 12:38 p.m., Benjamin Bannier wrote:
> > src/master/master.cpp
> > Lines 6300 (patched)
> > <https://reviews.apache.org/r/64494/diff/1/?file=1912395#file1912395line6300>
> >
> > Let's init this with `None()` for consistency.
oh? I never initialize an Option to None() explicitly in other places as the default executor will initialize it to None.
> On Dec. 11, 2017, 12:38 p.m., Benjamin Bannier wrote:
> > src/messages/messages.proto
> > Line 521 (original), 522 (patched)
> > <https://reviews.apache.org/r/64494/diff/1/?file=1912396#file1912396line522>
> >
> > `s/ / /`
What's this?
> On Dec. 11, 2017, 12:38 p.m., Benjamin Bannier wrote:
> > src/slave/slave.cpp
> > Lines 1546 (patched)
> > <https://reviews.apache.org/r/64494/diff/1/?file=1912397#file1912397line1555>
> >
> > Could this be unconditional?
Yeah, this is probably fine. The reason I did that is because i want to mimic the old agent behavior when the capability is not set so that we can validate in the master that if resource version uuid is set, it has resource provider capability. But if we don't have that check, it's ok to always set it.
- Jie
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64494/#review193384
-----------------------------------------------------------
On Dec. 11, 2017, 4:59 a.m., Jie Yu wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64494/
> -----------------------------------------------------------
>
> (Updated Dec. 11, 2017, 4:59 a.m.)
>
>
> Review request for mesos, Benjamin Bannier and Greg Mann.
>
>
> Repository: mesos
>
>
> Description
> -------
>
> It's not correct to send resource version uuids for local resources
> providers during agent re(registration) because the total resources from
> those local resource providers are not sent in the same message.
>
> Consider the following sequence of events:
> (1) Agent disconnects
> (2) Speculative operation fails in an RP, the RP bumps the version uuid
> (3) Agent updates the RP’s resource version uuid
> (4) Agent reregisters
> (5) Master is informed about the new resource version uuid of that RP
> (6) Master still has the old total of the RP
> (7) Framework launch an operation assuming the old total, but with the
> new resource version uuid
>
> This patch updated the `RegisterSlaveMessage` and
> `ReregisterSlaveMessage` to only send resource version uuids for the
> agent default resources.
>
>
> Diffs
> -----
>
> src/master/master.hpp 5c26f2066aae31223ffdd76ed058d5a4e63a2603
> src/master/master.cpp b3e074cfe86600793310deb87932fa145e95055d
> src/messages/messages.proto a13a6410891a45876b4458369c282aa4d502db08
> src/slave/slave.cpp 373e393ca1e7c0c30c3474cc9e630e25ad92f235
> src/tests/slave_tests.cpp 0fb2a63a5c9e7bd353c4f1b8f8f32e58e3e12e94
>
>
> Diff: https://reviews.apache.org/r/64494/diff/1/
>
>
> Testing
> -------
>
> make check
>
>
> Thanks,
>
> Jie Yu
>
>
Re: Review Request 64494: Sent resource version uuid only for agent
default resources.
Posted by Benjamin Bannier <be...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64494/#review193384
-----------------------------------------------------------
Fix it, then Ship it!
src/master/master.cpp
Lines 6300 (patched)
<https://reviews.apache.org/r/64494/#comment271937>
Let's init this with `None()` for consistency.
src/master/master.cpp
Lines 6974 (patched)
<https://reviews.apache.org/r/64494/#comment271938>
Keep breaking on assignment like before?
src/messages/messages.proto
Line 517 (original), 517 (patched)
<https://reviews.apache.org/r/64494/#comment271939>
`s/uuid/UUID/`
src/messages/messages.proto
Line 521 (original), 522 (patched)
<https://reviews.apache.org/r/64494/#comment271940>
`s/ / /`
src/messages/messages.proto
Line 575 (original), 576 (patched)
<https://reviews.apache.org/r/64494/#comment271941>
`s/Resoruce/Resource/`
`s/uuid/UIID/`
src/messages/messages.proto
Line 579 (original), 581 (patched)
<https://reviews.apache.org/r/64494/#comment271942>
`s/ / /`
src/slave/slave.cpp
Lines 1546 (patched)
<https://reviews.apache.org/r/64494/#comment271943>
Could this be unconditional?
src/slave/slave.cpp
Lines 1562 (patched)
<https://reviews.apache.org/r/64494/#comment271944>
Ditto.
src/tests/slave_tests.cpp
Lines 8967 (patched)
<https://reviews.apache.org/r/64494/#comment271945>
This is not needed if we always send the `optional` agent resource version.
- Benjamin Bannier
On Dec. 11, 2017, 5:59 a.m., Jie Yu wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64494/
> -----------------------------------------------------------
>
> (Updated Dec. 11, 2017, 5:59 a.m.)
>
>
> Review request for mesos, Benjamin Bannier and Greg Mann.
>
>
> Repository: mesos
>
>
> Description
> -------
>
> It's not correct to send resource version uuids for local resources
> providers during agent re(registration) because the total resources from
> those local resource providers are not sent in the same message.
>
> Consider the following sequence of events:
> (1) Agent disconnects
> (2) Speculative operation fails in an RP, the RP bumps the version uuid
> (3) Agent updates the RP’s resource version uuid
> (4) Agent reregisters
> (5) Master is informed about the new resource version uuid of that RP
> (6) Master still has the old total of the RP
> (7) Framework launch an operation assuming the old total, but with the
> new resource version uuid
>
> This patch updated the `RegisterSlaveMessage` and
> `ReregisterSlaveMessage` to only send resource version uuids for the
> agent default resources.
>
>
> Diffs
> -----
>
> src/master/master.hpp 5c26f2066aae31223ffdd76ed058d5a4e63a2603
> src/master/master.cpp b3e074cfe86600793310deb87932fa145e95055d
> src/messages/messages.proto a13a6410891a45876b4458369c282aa4d502db08
> src/slave/slave.cpp 373e393ca1e7c0c30c3474cc9e630e25ad92f235
> src/tests/slave_tests.cpp 0fb2a63a5c9e7bd353c4f1b8f8f32e58e3e12e94
>
>
> Diff: https://reviews.apache.org/r/64494/diff/1/
>
>
> Testing
> -------
>
> make check
>
>
> Thanks,
>
> Jie Yu
>
>
Re: Review Request 64494: Sent resource version uuid only for agent
default resources.
Posted by Mesos Reviewbot Windows <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64494/#review193465
-----------------------------------------------------------
FAIL: Some Mesos tests failed.
Failed command: `D:\DCOS\mesos\src\mesos-tests.exe --verbose`
All the build artifacts available at: http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/64494
Relevant logs:
- [mesos-tests-stdout.log](http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/64494/logs/mesos-tests-stdout.log):
```
[----------] 1 test from IsolationFlag/CpuIsolatorTest
[ RUN ] IsolationFlag/CpuIsolatorTest.ROOT_UserCpuUsage/0
[ OK ] IsolationFlag/CpuIsolatorTest.ROOT_UserCpuUsage/0 (2305 ms)
[----------] 1 test from IsolationFlag/CpuIsolatorTest (2328 ms total)
[----------] 1 test from IsolationFlag/MemoryIsolatorTest
[ RUN ] IsolationFlag/MemoryIsolatorTest.ROOT_MemUsage/0
[ OK ] IsolationFlag/MemoryIsolatorTest.ROOT_MemUsage/0 (2263 ms)
[----------] 1 test from IsolationFlag/MemoryIsolatorTest (2286 ms total)
[----------] Global test environment tear-down
[==========] 825 tests from 84 test cases ran. (305141 ms total)
[ PASSED ] 815 tests.
[ FAILED ] 10 tests, listed below:
[ FAILED ] OfferOperationStatusUpdateManagerTest.UpdateAndAckNonTerminalUpdate
[ FAILED ] OfferOperationStatusUpdateManagerTest.RecoverCheckpointedStream
[ FAILED ] OfferOperationStatusUpdateManagerTest.RecoverEmptyFile
[ FAILED ] OfferOperationStatusUpdateManagerTest.RecoverTerminatedStream
[ FAILED ] OfferOperationStatusUpdateManagerTest.IgnoreDuplicateUpdate
[ FAILED ] OfferOperationStatusUpdateManagerTest.IgnoreDuplicateUpdateAfterRecover
[ FAILED ] OfferOperationStatusUpdateManagerTest.RejectDuplicateAck
[ FAILED ] OfferOperationStatusUpdateManagerTest.RejectDuplicateAckAfterRecover
[ FAILED ] OfferOperationStatusUpdateManagerTest.NonStrictRecoveryCorruptedFile
[ FAILED ] SlaveTest.ResourceProviderPublishAll
10 FAILED TESTS
YOU HAVE 204 DISABLED TESTS
```
- [mesos-tests-stderr.log](http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/64494/logs/mesos-tests-stderr.log):
```
I1211 23:15:11.661341 1136 exec.cpp:237] Executor registered on agent 9f03c4cc-a0ff-41dd-8c03-3f9eeb2e9489-S0
I1211 23:15:11.664343 1220 executor.cpp:171] Received SUBSCRIBED event
I1211 23:15:11.668361 1220 executor.cpp:175] Subscribed executor on build-srv-04.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net
I1211 23:15:11.669363 1220 executor.cpp:171] Received LAUNCH event
I1211 23:15:11.672340 1220 executor.cpp:637] Starting task da7be4e2-b07c-4fcc-a757-17d712a96fda
I1211 23:15:11.744361 1220 executor.cpp:477] Running 'D:\DCOS\mesos\src\mesos-containerizer.exe launch <POSSIBLY-SENSITIVE-DATA>'
I1211 23:15:12.259343 1220 executor.cpp:650] Forked command at 4304
I1211 23:15:12.284334 7916 exec.cpp:435] Executor asked to shutdown
I1211 23:15:12.284334 1240 executor.cpp:171] Received SHUTDOWN event
I1211 23:15:12.284334 1240 executor.cpp:747] Shutting down
I1211 23:15:12.285333 1240 executor.cpp:854] Sending SIGTERM to process tree at pid 4pp:3400] Shutting down framework 9f03c4cc-a0ff-41dd-8c03-3f9eeb2e9489-0000
I1211 23:15:12.282335 5800 hierarchical.cpp:405] Deactivated framework 9f03c4cc-a0ff-41dd-8c03-3f9eeb2e9489-0000
I1211 23:15:12.282335 3848 master.cpp:10114] Updating the state of task da7be4e2-b07c-4fcc-a757-17d712a96fda of framework 9f03c4cc-a0ff-41dd-8c03-3f9eeb2e9489-0000 (latest state: TASK_KILLED, status update state: TASK_KILLED)
I1211 23:15:12.283334 2044 slave.cpp:6091] Shutting down executor 'da7be4e2-b07c-4fcc-a757-17d712a96fda' of framework 9f03c4cc-a0ff-41dd-8c03-3f9eeb2e9489-0000 at executor(1)@10.3.1.5:65256
I1211 23:15:12.284334 7784 slave.cpp:909] Agent terminating
W1211 23:15:12.284334 7784 slave.cpp:3396] Ignoring shutdown framework 9f03c4cc-a0ff-41dd-8c03-3f9eeb2e9489-0000 because it is terminating
I1211 23:15:12.285333 3848 master.cpp:10220] Removing task da7be4e2-b07c-4fcc-a757-17d712a96fda with resources cpus(allocated: *):4; mem(allocated: *):2048; disk(allocated: *):1024; ports(allocated: *):[31000-32000] of framework 9f03c4cc-a0ff-41dd-8c03-3f9eeb2e9489-0000 on agent 9f03c4cc-a0ff-41dd-8c03-3f9eeb2e9489-S0 at slave(326)@10.3.1.5:65235 (build-srv-04.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net)
I1211 23:15:12.286334 5800 containerizer.cpp:2328] Destroying container a658b9b2-7377-4da7-b6ec-cb3fbc75f326 in RUNNING state
I1211 23:15:12.287334 5800 containerizer.cpp:2930] Transitioning the state of container a658b9b2-7377-4da7-b6ec-cb3fbc75f326 from RUNNING to DESTROYING
I1211 23:15:12.287334 3848 master.cpp:1305] Agent 9f03c4cc-a0ff-41dd-8c03-3f9eeb2e9489-S0 at slave(326)@10.3.1.5:65235 (build-srv-04.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net) disconnected
I1211 23:15:12.287334 3848 master.cpp:3364] Disconnecting agent 9f03c4cc-a0ff-41dd-8c03-3f9eeb2e9489-S0 at slave(326)@10.3.1.5:65235 (build-srv-04.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net)
I1211 23:15:12.287334 5800 launcher.cpp:156] Asked to destroy container a658b9b2-7377-4da7-b6ec-cb3fbc75f326
I1211 23:15:12.288336 3848 master.cpp:3383] Deactivating agent 9f03c4cc-a0ff-41dd-8c03-3f9eeb2e9489-S0 at slave(326)@10.3.1.5:65235 (build-srv-04.zq4gs31qjdiunm1ryi1452nvnh.dx.internal.cloudapp.net)
I1211 23:15:12.288336 6760 hierarchical.cpp:344] Removed framework 9f03c4cc-a0ff-41dd-8c03-3f9eeb2e9489-0000
I1211 23:15:12.288336 2044 hierarchical.cpp:762] Agent 9f03c4cc-a0ff-41dd-8c03-3f9eeb2e9489-S0 deactivated
I1211 23:15:12.328429 5800 containerizer.cpp:2779] Container a658b9b2-7377-4da7-b6ec-cb3fbc75f326 has exited
I1211 23:15:12.355486 7784 master.cpp:1147] Master terminating
I1211 23:15:12.357473 2044 hierarchical.cpp:605] Removed agent 9f03c4cc-a0ff-41dd-8c03-3f9eeb2e9489-S0
I1211 23:15:12.643465 6016 process.cpp:887] Failed to accept socket: future discarded
```
- Mesos Reviewbot Windows
On Dec. 11, 2017, 8:31 p.m., Jie Yu wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64494/
> -----------------------------------------------------------
>
> (Updated Dec. 11, 2017, 8:31 p.m.)
>
>
> Review request for mesos, Benjamin Bannier and Greg Mann.
>
>
> Repository: mesos
>
>
> Description
> -------
>
> It's not correct to send resource version uuids for local resources
> providers during agent re(registration) because the total resources from
> those local resource providers are not sent in the same message.
>
> Consider the following sequence of events:
> (1) Agent disconnects
> (2) Speculative operation fails in an RP, the RP bumps the version uuid
> (3) Agent updates the RP’s resource version uuid
> (4) Agent reregisters
> (5) Master is informed about the new resource version uuid of that RP
> (6) Master still has the old total of the RP
> (7) Framework launch an operation assuming the old total, but with the
> new resource version uuid
>
> This patch updated the `RegisterSlaveMessage` and
> `ReregisterSlaveMessage` to only send resource version uuids for the
> agent default resources.
>
>
> Diffs
> -----
>
> src/master/master.hpp 5c26f2066aae31223ffdd76ed058d5a4e63a2603
> src/master/master.cpp 55e919504b14f35415015ffb97c455f789733c6a
> src/messages/messages.proto a13a6410891a45876b4458369c282aa4d502db08
> src/slave/slave.cpp 373e393ca1e7c0c30c3474cc9e630e25ad92f235
> src/tests/slave_tests.cpp 0fb2a63a5c9e7bd353c4f1b8f8f32e58e3e12e94
>
>
> Diff: https://reviews.apache.org/r/64494/diff/2/
>
>
> Testing
> -------
>
> make check
>
>
> Thanks,
>
> Jie Yu
>
>
Re: Review Request 64494: Sent resource version uuid only for agent
default resources.
Posted by Jie Yu <yu...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64494/
-----------------------------------------------------------
(Updated Dec. 11, 2017, 8:31 p.m.)
Review request for mesos, Benjamin Bannier and Greg Mann.
Changes
-------
Addressed comments.
Repository: mesos
Description
-------
It's not correct to send resource version uuids for local resources
providers during agent re(registration) because the total resources from
those local resource providers are not sent in the same message.
Consider the following sequence of events:
(1) Agent disconnects
(2) Speculative operation fails in an RP, the RP bumps the version uuid
(3) Agent updates the RP’s resource version uuid
(4) Agent reregisters
(5) Master is informed about the new resource version uuid of that RP
(6) Master still has the old total of the RP
(7) Framework launch an operation assuming the old total, but with the
new resource version uuid
This patch updated the `RegisterSlaveMessage` and
`ReregisterSlaveMessage` to only send resource version uuids for the
agent default resources.
Diffs (updated)
-----
src/master/master.hpp 5c26f2066aae31223ffdd76ed058d5a4e63a2603
src/master/master.cpp 55e919504b14f35415015ffb97c455f789733c6a
src/messages/messages.proto a13a6410891a45876b4458369c282aa4d502db08
src/slave/slave.cpp 373e393ca1e7c0c30c3474cc9e630e25ad92f235
src/tests/slave_tests.cpp 0fb2a63a5c9e7bd353c4f1b8f8f32e58e3e12e94
Diff: https://reviews.apache.org/r/64494/diff/2/
Changes: https://reviews.apache.org/r/64494/diff/1-2/
Testing
-------
make check
Thanks,
Jie Yu
Re: Review Request 64494: Sent resource version uuid only for agent
default resources.
Posted by Mesos Reviewbot Windows <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64494/#review193415
-----------------------------------------------------------
FAIL: Failed to apply the current review.
Failed command: `python.exe .\support\apply-reviews.py -n -r 64494`
All the build artifacts available at: http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/64494
Relevant logs:
- [apply-review-64494-stderr.log](http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/64494/logs/apply-review-64494-stderr.log):
```
Traceback (most recent call last):
File ".\support\apply-reviews.py", line 434, in <module>
main()
File ".\support\apply-reviews.py", line 429, in main
reviewboard(options)
File ".\support\apply-reviews.py", line 419, in reviewboard
apply_review(options)
File ".\support\apply-reviews.py", line 160, in apply_review
commit_patch(options)
File ".\support\apply-reviews.py", line 271, in commit_patch
message.write(data['message'])
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 430: ordinal not in range(128)
```
- Mesos Reviewbot Windows
On Dec. 11, 2017, 4:59 a.m., Jie Yu wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64494/
> -----------------------------------------------------------
>
> (Updated Dec. 11, 2017, 4:59 a.m.)
>
>
> Review request for mesos, Benjamin Bannier and Greg Mann.
>
>
> Repository: mesos
>
>
> Description
> -------
>
> It's not correct to send resource version uuids for local resources
> providers during agent re(registration) because the total resources from
> those local resource providers are not sent in the same message.
>
> Consider the following sequence of events:
> (1) Agent disconnects
> (2) Speculative operation fails in an RP, the RP bumps the version uuid
> (3) Agent updates the RP’s resource version uuid
> (4) Agent reregisters
> (5) Master is informed about the new resource version uuid of that RP
> (6) Master still has the old total of the RP
> (7) Framework launch an operation assuming the old total, but with the
> new resource version uuid
>
> This patch updated the `RegisterSlaveMessage` and
> `ReregisterSlaveMessage` to only send resource version uuids for the
> agent default resources.
>
>
> Diffs
> -----
>
> src/master/master.hpp 5c26f2066aae31223ffdd76ed058d5a4e63a2603
> src/master/master.cpp b3e074cfe86600793310deb87932fa145e95055d
> src/messages/messages.proto a13a6410891a45876b4458369c282aa4d502db08
> src/slave/slave.cpp 373e393ca1e7c0c30c3474cc9e630e25ad92f235
> src/tests/slave_tests.cpp 0fb2a63a5c9e7bd353c4f1b8f8f32e58e3e12e94
>
>
> Diff: https://reviews.apache.org/r/64494/diff/1/
>
>
> Testing
> -------
>
> make check
>
>
> Thanks,
>
> Jie Yu
>
>