You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by Jie Yu <yu...@gmail.com> on 2017/12/12 22:54:09 UTC

Review Request 64557: Handled the RP disconnection case in the agent.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64557/
-----------------------------------------------------------

Review request for mesos, Benjamin Bannier, Chun-Hung Hsiao, and Greg Mann.


Repository: mesos


Description
-------

If an RP is disconnected, we'll shrink its total resources to zero so
that no offer will be made on this RP until it reconnects. This prevents
frameworks from sending operations to the disconnected RP.


Diffs
-----

  src/resource_provider/manager.cpp fd138b9914d925b5be7a11255dd632921c107dba 
  src/resource_provider/message.hpp eab90cffd6aab9e38207dcf109cc737171ed3953 
  src/slave/slave.cpp 5869e73ca1c14c99e580da9d7375181da2073ec5 
  src/tests/resource_provider_manager_tests.cpp e37a53ac6a03e2ea58dd6580fc8a399a1398d950 


Diff: https://reviews.apache.org/r/64557/diff/1/


Testing
-------

make check


Thanks,

Jie Yu


Re: Review Request 64557: Handled the RP disconnection case in the agent.

Posted by Benjamin Bannier <be...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64557/#review193733
-----------------------------------------------------------


Fix it, then Ship it!





src/resource_provider/manager.cpp
Lines 613-614 (patched)
<https://reviews.apache.org/r/64557/#comment272361>

    We could use aggregate initialization here,
    
        ResourceProviderMessage::Disconnect disconnect{resourceProviderId};



src/tests/resource_provider_manager_tests.cpp
Lines 1246 (patched)
<https://reviews.apache.org/r/64557/#comment272358>

    I would suggest to add an assertion that `disk` is contained in the total resources of the resource provider in this update (i.e., a check similar to the one on the second update with inverted contains expectation).
    
    This makes sure that we actually detect a change when checking below expectations.



src/tests/resource_provider_manager_tests.cpp
Lines 1255 (patched)
<https://reviews.apache.org/r/64557/#comment272354>

    Replace `1u` with just `1` here.
    
    Protobuf tries to be nice and safe us from integer conversion surprises by using a signed integer for sizes.



src/tests/resource_provider_manager_tests.cpp
Lines 1257 (patched)
<https://reviews.apache.org/r/64557/#comment272355>

    `const Resources&`



src/tests/resource_provider_manager_tests.cpp
Lines 1260 (patched)
<https://reviews.apache.org/r/64557/#comment272357>

    This expecation is always met as `Resource`s in `totalResources` will have a provider ID set, but `disk` has not. We need to create an updated `disk` after the resource provider has subscribed, e.g., after l.1245 add
    
        ASSERT_TRUE(resourceProvider->info.has_id());
        disk.mutable_provider_id()->CopyFrom(resourceProvider->info.id());


- Benjamin Bannier


On Dec. 12, 2017, 11:54 p.m., Jie Yu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64557/
> -----------------------------------------------------------
> 
> (Updated Dec. 12, 2017, 11:54 p.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Chun-Hung Hsiao, and Greg Mann.
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> If an RP is disconnected, we'll shrink its total resources to zero so
> that no offer will be made on this RP until it reconnects. This prevents
> frameworks from sending operations to the disconnected RP.
> 
> 
> Diffs
> -----
> 
>   src/resource_provider/manager.cpp fd138b9914d925b5be7a11255dd632921c107dba 
>   src/resource_provider/message.hpp eab90cffd6aab9e38207dcf109cc737171ed3953 
>   src/slave/slave.cpp 5869e73ca1c14c99e580da9d7375181da2073ec5 
>   src/tests/resource_provider_manager_tests.cpp e37a53ac6a03e2ea58dd6580fc8a399a1398d950 
> 
> 
> Diff: https://reviews.apache.org/r/64557/diff/1/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Jie Yu
> 
>


Re: Review Request 64557: Handled the RP disconnection case in the agent.

Posted by Mesos Reviewbot Windows <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64557/#review193630
-----------------------------------------------------------



FAIL: Some Mesos tests failed.

Reviews applied: `['64557']`

Failed command: `D:\DCOS\mesos\src\mesos-tests.exe --verbose`

All the build artifacts available at: http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/64557

Relevant logs:

- [mesos-tests-stdout.log](http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/64557/logs/mesos-tests-stdout.log):

```
[ RUN      ] SlaveTest.ShutdownV0ExecutorIfItReregistersWithoutReconnect
[       OK ] SlaveTest.ShutdownV0ExecutorIfItReregistersWithoutReconnect (244 ms)
[ RUN      ] SlaveTest.IgnoreV0ExecutorIfItReregistersWithoutReconnect
[       OK ] SlaveTest.IgnoreV0ExecutorIfItReregistersWithoutReconnect (250 ms)
[ RUN      ] SlaveTest.BrowseExecutorSandboxByVirtualPath
[       OK ] SlaveTest.BrowseExecutorSandboxByVirtualPath (289 ms)
[ RUN      ] SlaveTest.DisconnectedExecutorDropsMessages
[       OK ] SlaveTest.DisconnectedExecutorDropsMessages (276 ms)
[ RUN      ] SlaveTest.ResourceProviderSubscribe
[       OK ] SlaveTest.ResourceProviderSubscribe (209 ms)
[ RUN      ] SlaveTest.ResourceProviderPublishAll
D:\DCOS\mesos\mesos\src\tests\slave_tests.cpp(8915): error: Mock function called more times than expected - returning directly.
    Function call: statusUpdate(000001BC8156B210, @00000034280FE440 136-byte object <40-88 12-83 F7-7F 00-00 00-00 00-00 00-00 00-00 BE-A8 00-00 00-00 00-00 C8-08 9E-84 F7-7F 00-00 D0-62 EE-81 BC-01 00-00 10-76 EE-81 BC-01 00-00 80-B2 A6-81 BC-01 00-00 80-AF A6-81 BC-01 00-00 ... 00-00 00-00 00-00 00-00 60-52 80-81 BC-01 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 84-10 90-D3 1D-8C D6-41 00-00 00-00 02-00 00-00 00-00 00-00 03-00 00-00>)
         Expected: to be called twice
           Actual: called 3 times - over-saturated and active
D:\DCOS\mesos\mesos\src\tests\slave_tests.cpp(8891): error: Mock function called more times than expected - returning directly.
    Function call: resourceOffers(000001BC8156B210, @00000034283FEE50 { 160-byte object <00-7D 12-83 F7-7F 00-00 00-00 00-00 00-00 00-00 5F-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 05-00 00-00 05-00 00-00 40-26 70-81 BC-01 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 ... 60-3A 70-81 BC-01 00-00 80-4D 77-82 BC-01 00-00 40-4B 77-82 BC-01 00-00 60-58 77-82 BC-01 00-00 20-52 EC-81 BC-01 00-00 00-00 00-00 00-00 00-00 00-4F 77-82 BC-01 00-00 00-00 00-00 00-00 00-00> })
         Expected: to be called once
           Actual: called twice - over-saturated and active
D:\DCOS\mesos\mesos\src\tests\slave_tests.cpp(8934): error:       Expected: i + 1
      Which is: 2
To be equal to: v1::Resources(publish->resources()).reservations().size()
      Which is: 1
[  FAILED  ] SlaveTest.ResourceProviderPublishAll (8113 ms)
[ RUN      ] SlaveTest.ResourceVersions
[       OK ] SlaveTest.ResourceVersions (165 ms)
[ RUN      ] SlaveTest.ReconfigurationPolicy
[       OK ] SlaveTest.ReconfigurationPolicy (246 ms)
[ RUN      ] SlaveTest.ResourceProviderReconciliation
```

- [mesos-tests-stderr.log](http://dcos-win.westus.cloudapp.azure.com/mesos-build/review/64557/logs/mesos-tests-stderr.log):

```
    @   00007FF77F7C41B0  google::LogMessage::SendToLog
    @   00007FF77F7C3997  google::LogMessage::Flush
    @   00007FF77F7C54D1  google::LogMessageFatal::~LogMessageFatal
    @   00007FF77D6E09A4  mesos::internal::slave::Slave::handleResourceProviderMessage
    @   00007FF77D88BDD5   ?? 
    @   00007FF77D779218  std::_Invoker_functor::_Call<<lambda_2c5dd0fb32c5d47ebd14bcab173f55d7>,process::Future<mesos::internal::ResourceProviderMessage>,process::ProcessBase * __ptr64>
    @   00007FF77D7FEEE8  std::invoke<<lambda_2c5dd0fb32c5d47ebd14bcab173f55d7>,process::Future<mesos::internal::ResourceProviderMessage>,process::ProcessBase * __ptr64>
    @   00007FF77D80F0BB  lambda::internal::Partial<<lambda_2c5dd0fb32c5d47ebd14bcab173f55d7>,process::Future<mesos::internal::ResourceProviderMessage>,std::_Ph<1> >::invoke_expand<<lambda_2c5dd0fb32c5d47ebd14bcab173f55d7>,std::tuple<process::Future<mesos::internal::ResourceProvi
    @   00007FF77D7553DA  )<process::ProcessBase * __ptr64
    @   00007FF77D77F7AC  std::_Invoker_functor::_Call<lambda::internal::Partial<<lambda_2c5dd0fb32c5d47ebd14bcab173f55d7>,process::Future<mesos::internal::ResourceProviderMessage>,std::_Ph<1> >,process::ProcessBase * __ptr64>
    @   00007FF77D80528C  std::invoke<lambda::internal::Partial<<lambda_2c5dd0fb32c5d47ebd14bcab173f55d7>,process::Future<mesos::internal::ResourceProviderMessage>,std::_Ph<1> >,process::ProcessBase * __ptr64>
    @   00007FF77D75D311  )<lambda::internal::Partial<<lambda_2c5dd0fb32c5d47ebd14bcab173f55d7>,process::Future<mesos::internal::ResourceProviderMessage>,std::_Ph<1> >,process::ProcessBase * __ptr64
    @   00007FF77D899726  process::ProcessBase * __ptr64)>::CallableFn<lambda::internal::Partial<<lambda_2c5dd0fb32c5d47ebd14bcab173f55d7>,process::Future<mesos::internal::ResourceProviderMessage>,std::_Ph<1> > >::operator(
    @   00007FF77F2B3BED  process::ProcessBase * __ptr64)>::operator(
    @   00007FF77F18CCC9  process::ProcessBase::consume
    @   00007FF77F307D4A  process::DispatchEvent::consume
    @   00007FF77B7633A7  process::ProcessBase::serve
    @   00007FF77F19A9AB  process::ProcessManager::resume
    @   00007FF77F2A4411   ?? 
    @   00007FF77F1E2EF0  std::_Invoker_functor::_Call<<lambda_124422ac022fa041208b80c1460630d7> >
    @   00007FF77F238890  std::invoke<<lambda_124422ac022fa041208b80c1460630d7> >
    @   00007FF77F1F1CAC  std::_LaunchPad<std::unique_ptr<std::tuple<<lambda_124422ac022fa041208b80c1460630d7> >,std::default_delete<std::tuple<<lambda_124422ac022fa041208b80c1460630d7> > > > >::_Execute<0>
    @   00007FF77F2EFD4A  std::_LaunchPad<std::unique_ptr<std::tuple<<lambda_124422ac022fa041208b80c1460630d7> >,std::default_delete<std::tuple<<lambda_124422ac022fa041208b80c1460630d7> > > > >::_Run
    @   00007FF77F2DC7D8  std::_LaunchPad<std::unique_ptr<std::tuple<<lambda_124422ac022fa041208b80c1460630d7> >,std::default_delete<std::tuple<<lambda_124422ac022fa041208b80c1460630d7> > > > >::_Go
    @   00007FF77F2C44BD  std::_Pad::_Call_func
    @   00007FF780462F78  invoke_thread_procedure
    @   00007FF780462A21  __cdecl*)(void * __ptr64)
    @   00007FFD4B841FE4  BaseThreadInitThunk
    @   00007FFD4E3CEF91  RtlUserThreadStart
```

- Mesos Reviewbot Windows


On Dec. 12, 2017, 2:54 p.m., Jie Yu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64557/
> -----------------------------------------------------------
> 
> (Updated Dec. 12, 2017, 2:54 p.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Chun-Hung Hsiao, and Greg Mann.
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> If an RP is disconnected, we'll shrink its total resources to zero so
> that no offer will be made on this RP until it reconnects. This prevents
> frameworks from sending operations to the disconnected RP.
> 
> 
> Diffs
> -----
> 
>   src/resource_provider/manager.cpp fd138b9914d925b5be7a11255dd632921c107dba 
>   src/resource_provider/message.hpp eab90cffd6aab9e38207dcf109cc737171ed3953 
>   src/slave/slave.cpp 5869e73ca1c14c99e580da9d7375181da2073ec5 
>   src/tests/resource_provider_manager_tests.cpp e37a53ac6a03e2ea58dd6580fc8a399a1398d950 
> 
> 
> Diff: https://reviews.apache.org/r/64557/diff/1/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Jie Yu
> 
>