You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by Jie Yu <yu...@gmail.com> on 2017/12/12 22:54:09 UTC
Review Request 64557: Handled the RP disconnection case in the agent.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64557/
-----------------------------------------------------------
Review request for mesos, Benjamin Bannier, Chun-Hung Hsiao, and Greg Mann.
Repository: mesos
Description
-------
If an RP is disconnected, we'll shrink its total resources to zero so
that no offer will be made on this RP until it reconnects. This prevents
frameworks from sending operations to the disconnected RP.
Diffs
-----
src/resource_provider/manager.cpp fd138b9914d925b5be7a11255dd632921c107dba
src/resource_provider/message.hpp eab90cffd6aab9e38207dcf109cc737171ed3953
src/slave/slave.cpp 5869e73ca1c14c99e580da9d7375181da2073ec5
src/tests/resource_provider_manager_tests.cpp e37a53ac6a03e2ea58dd6580fc8a399a1398d950
Diff: https://reviews.apache.org/r/64557/diff/1/
Testing
-------
make check
Thanks,
Jie Yu
Re: Review Request 64557: Handled the RP disconnection case in the
agent.
Posted by Benjamin Bannier <be...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64557/#review193733
-----------------------------------------------------------
Fix it, then Ship it!
src/resource_provider/manager.cpp
Lines 613-614 (patched)
<https://reviews.apache.org/r/64557/#comment272361>
We could use aggregate initialization here,
ResourceProviderMessage::Disconnect disconnect{resourceProviderId};
src/tests/resource_provider_manager_tests.cpp
Lines 1246 (patched)
<https://reviews.apache.org/r/64557/#comment272358>
I would suggest to add an assertion that `disk` is contained in the total resources of the resource provider in this update (i.e., a check similar to the one on the second update with inverted contains expectation).
This makes sure that we actually detect a change when checking below expectations.
src/tests/resource_provider_manager_tests.cpp
Lines 1255 (patched)
<https://reviews.apache.org/r/64557/#comment272354>
Replace `1u` with just `1` here.
Protobuf tries to be nice and safe us from integer conversion surprises by using a signed integer for sizes.
src/tests/resource_provider_manager_tests.cpp
Lines 1257 (patched)
<https://reviews.apache.org/r/64557/#comment272355>
`const Resources&`
src/tests/resource_provider_manager_tests.cpp
Lines 1260 (patched)
<https://reviews.apache.org/r/64557/#comment272357>
This expecation is always met as `Resource`s in `totalResources` will have a provider ID set, but `disk` has not. We need to create an updated `disk` after the resource provider has subscribed, e.g., after l.1245 add
ASSERT_TRUE(resourceProvider->info.has_id());
disk.mutable_provider_id()->CopyFrom(resourceProvider->info.id());
- Benjamin Bannier
On Dec. 12, 2017, 11:54 p.m., Jie Yu wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64557/
> -----------------------------------------------------------
>
> (Updated Dec. 12, 2017, 11:54 p.m.)
>
>
> Review request for mesos, Benjamin Bannier, Chun-Hung Hsiao, and Greg Mann.
>
>
> Repository: mesos
>
>
> Description
> -------
>
> If an RP is disconnected, we'll shrink its total resources to zero so
> that no offer will be made on this RP until it reconnects. This prevents
> frameworks from sending operations to the disconnected RP.
>
>
> Diffs
> -----
>
> src/resource_provider/manager.cpp fd138b9914d925b5be7a11255dd632921c107dba
> src/resource_provider/message.hpp eab90cffd6aab9e38207dcf109cc737171ed3953
> src/slave/slave.cpp 5869e73ca1c14c99e580da9d7375181da2073ec5
> src/tests/resource_provider_manager_tests.cpp e37a53ac6a03e2ea58dd6580fc8a399a1398d950
>
>
> Diff: https://reviews.apache.org/r/64557/diff/1/
>
>
> Testing
> -------
>
> make check
>
>
> Thanks,
>
> Jie Yu
>
>
Re: Review Request 64557: Handled the RP disconnection case in the
agent.
Posted by Mesos Reviewbot Windows <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64557/#review193630
-----------------------------------------------------------
FAIL: Some Mesos tests failed.
Reviews applied: `['64557']`
Failed command: `D:\DCOS\mesos\src\mesos-tests.exe --verbose`
All the build artifacts available at: http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/64557
Relevant logs:
- [mesos-tests-stdout.log](http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/64557/logs/mesos-tests-stdout.log):
```
[ RUN ] SlaveTest.ShutdownV0ExecutorIfItReregistersWithoutReconnect
[ OK ] SlaveTest.ShutdownV0ExecutorIfItReregistersWithoutReconnect (244 ms)
[ RUN ] SlaveTest.IgnoreV0ExecutorIfItReregistersWithoutReconnect
[ OK ] SlaveTest.IgnoreV0ExecutorIfItReregistersWithoutReconnect (250 ms)
[ RUN ] SlaveTest.BrowseExecutorSandboxByVirtualPath
[ OK ] SlaveTest.BrowseExecutorSandboxByVirtualPath (289 ms)
[ RUN ] SlaveTest.DisconnectedExecutorDropsMessages
[ OK ] SlaveTest.DisconnectedExecutorDropsMessages (276 ms)
[ RUN ] SlaveTest.ResourceProviderSubscribe
[ OK ] SlaveTest.ResourceProviderSubscribe (209 ms)
[ RUN ] SlaveTest.ResourceProviderPublishAll
D:\DCOS\mesos\mesos\src\tests\slave_tests.cpp(8915): error: Mock function called more times than expected - returning directly.
Function call: statusUpdate(000001BC8156B210, @00000034280FE440 136-byte object <40-88 12-83 F7-7F 00-00 00-00 00-00 00-00 00-00 BE-A8 00-00 00-00 00-00 C8-08 9E-84 F7-7F 00-00 D0-62 EE-81 BC-01 00-00 10-76 EE-81 BC-01 00-00 80-B2 A6-81 BC-01 00-00 80-AF A6-81 BC-01 00-00 ... 00-00 00-00 00-00 00-00 60-52 80-81 BC-01 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 84-10 90-D3 1D-8C D6-41 00-00 00-00 02-00 00-00 00-00 00-00 03-00 00-00>)
Expected: to be called twice
Actual: called 3 times - over-saturated and active
D:\DCOS\mesos\mesos\src\tests\slave_tests.cpp(8891): error: Mock function called more times than expected - returning directly.
Function call: resourceOffers(000001BC8156B210, @00000034283FEE50 { 160-byte object <00-7D 12-83 F7-7F 00-00 00-00 00-00 00-00 00-00 5F-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 05-00 00-00 05-00 00-00 40-26 70-81 BC-01 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 ... 60-3A 70-81 BC-01 00-00 80-4D 77-82 BC-01 00-00 40-4B 77-82 BC-01 00-00 60-58 77-82 BC-01 00-00 20-52 EC-81 BC-01 00-00 00-00 00-00 00-00 00-00 00-4F 77-82 BC-01 00-00 00-00 00-00 00-00 00-00> })
Expected: to be called once
Actual: called twice - over-saturated and active
D:\DCOS\mesos\mesos\src\tests\slave_tests.cpp(8934): error: Expected: i + 1
Which is: 2
To be equal to: v1::Resources(publish->resources()).reservations().size()
Which is: 1
[ FAILED ] SlaveTest.ResourceProviderPublishAll (8113 ms)
[ RUN ] SlaveTest.ResourceVersions
[ OK ] SlaveTest.ResourceVersions (165 ms)
[ RUN ] SlaveTest.ReconfigurationPolicy
[ OK ] SlaveTest.ReconfigurationPolicy (246 ms)
[ RUN ] SlaveTest.ResourceProviderReconciliation
```
- [mesos-tests-stderr.log](http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/64557/logs/mesos-tests-stderr.log):
```
@ 00007FF77F7C41B0 google::LogMessage::SendToLog
@ 00007FF77F7C3997 google::LogMessage::Flush
@ 00007FF77F7C54D1 google::LogMessageFatal::~LogMessageFatal
@ 00007FF77D6E09A4 mesos::internal::slave::Slave::handleResourceProviderMessage
@ 00007FF77D88BDD5 ??
@ 00007FF77D779218 std::_Invoker_functor::_Call<<lambda_2c5dd0fb32c5d47ebd14bcab173f55d7>,process::Future<mesos::internal::ResourceProviderMessage>,process::ProcessBase * __ptr64>
@ 00007FF77D7FEEE8 std::invoke<<lambda_2c5dd0fb32c5d47ebd14bcab173f55d7>,process::Future<mesos::internal::ResourceProviderMessage>,process::ProcessBase * __ptr64>
@ 00007FF77D80F0BB lambda::internal::Partial<<lambda_2c5dd0fb32c5d47ebd14bcab173f55d7>,process::Future<mesos::internal::ResourceProviderMessage>,std::_Ph<1> >::invoke_expand<<lambda_2c5dd0fb32c5d47ebd14bcab173f55d7>,std::tuple<process::Future<mesos::internal::ResourceProvi
@ 00007FF77D7553DA )<process::ProcessBase * __ptr64
@ 00007FF77D77F7AC std::_Invoker_functor::_Call<lambda::internal::Partial<<lambda_2c5dd0fb32c5d47ebd14bcab173f55d7>,process::Future<mesos::internal::ResourceProviderMessage>,std::_Ph<1> >,process::ProcessBase * __ptr64>
@ 00007FF77D80528C std::invoke<lambda::internal::Partial<<lambda_2c5dd0fb32c5d47ebd14bcab173f55d7>,process::Future<mesos::internal::ResourceProviderMessage>,std::_Ph<1> >,process::ProcessBase * __ptr64>
@ 00007FF77D75D311 )<lambda::internal::Partial<<lambda_2c5dd0fb32c5d47ebd14bcab173f55d7>,process::Future<mesos::internal::ResourceProviderMessage>,std::_Ph<1> >,process::ProcessBase * __ptr64
@ 00007FF77D899726 process::ProcessBase * __ptr64)>::CallableFn<lambda::internal::Partial<<lambda_2c5dd0fb32c5d47ebd14bcab173f55d7>,process::Future<mesos::internal::ResourceProviderMessage>,std::_Ph<1> > >::operator(
@ 00007FF77F2B3BED process::ProcessBase * __ptr64)>::operator(
@ 00007FF77F18CCC9 process::ProcessBase::consume
@ 00007FF77F307D4A process::DispatchEvent::consume
@ 00007FF77B7633A7 process::ProcessBase::serve
@ 00007FF77F19A9AB process::ProcessManager::resume
@ 00007FF77F2A4411 ??
@ 00007FF77F1E2EF0 std::_Invoker_functor::_Call<<lambda_124422ac022fa041208b80c1460630d7> >
@ 00007FF77F238890 std::invoke<<lambda_124422ac022fa041208b80c1460630d7> >
@ 00007FF77F1F1CAC std::_LaunchPad<std::unique_ptr<std::tuple<<lambda_124422ac022fa041208b80c1460630d7> >,std::default_delete<std::tuple<<lambda_124422ac022fa041208b80c1460630d7> > > > >::_Execute<0>
@ 00007FF77F2EFD4A std::_LaunchPad<std::unique_ptr<std::tuple<<lambda_124422ac022fa041208b80c1460630d7> >,std::default_delete<std::tuple<<lambda_124422ac022fa041208b80c1460630d7> > > > >::_Run
@ 00007FF77F2DC7D8 std::_LaunchPad<std::unique_ptr<std::tuple<<lambda_124422ac022fa041208b80c1460630d7> >,std::default_delete<std::tuple<<lambda_124422ac022fa041208b80c1460630d7> > > > >::_Go
@ 00007FF77F2C44BD std::_Pad::_Call_func
@ 00007FF780462F78 invoke_thread_procedure
@ 00007FF780462A21 __cdecl*)(void * __ptr64)
@ 00007FFD4B841FE4 BaseThreadInitThunk
@ 00007FFD4E3CEF91 RtlUserThreadStart
```
- Mesos Reviewbot Windows
On Dec. 12, 2017, 2:54 p.m., Jie Yu wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64557/
> -----------------------------------------------------------
>
> (Updated Dec. 12, 2017, 2:54 p.m.)
>
>
> Review request for mesos, Benjamin Bannier, Chun-Hung Hsiao, and Greg Mann.
>
>
> Repository: mesos
>
>
> Description
> -------
>
> If an RP is disconnected, we'll shrink its total resources to zero so
> that no offer will be made on this RP until it reconnects. This prevents
> frameworks from sending operations to the disconnected RP.
>
>
> Diffs
> -----
>
> src/resource_provider/manager.cpp fd138b9914d925b5be7a11255dd632921c107dba
> src/resource_provider/message.hpp eab90cffd6aab9e38207dcf109cc737171ed3953
> src/slave/slave.cpp 5869e73ca1c14c99e580da9d7375181da2073ec5
> src/tests/resource_provider_manager_tests.cpp e37a53ac6a03e2ea58dd6580fc8a399a1398d950
>
>
> Diff: https://reviews.apache.org/r/64557/diff/1/
>
>
> Testing
> -------
>
> make check
>
>
> Thanks,
>
> Jie Yu
>
>