You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by Joseph Wu <jo...@mesosphere.io> on 2019/02/13 02:19:42 UTC

Review Request 69967: Added a recovery path for orphan operation.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69967/
-----------------------------------------------------------

Review request for mesos, Benno Evers, Gastón Kleiman, and Greg Mann.


Bugs: MESOS-9542
    https://issues.apache.org/jira/browse/MESOS-9542


Repository: mesos


Description
-------

An orphan can be recovered if the originating framework reregisters
with the master.  When this happens, the resource accounting is
reversed and resources are added back to the agent's total
and the allocator.


Diffs
-----

  src/master/master.cpp 014e0e053cdf5c53a5ef8d63300205a121bed319 


Diff: https://reviews.apache.org/r/69967/diff/1/


Testing
-------

TODO: One more patch to go, which adds a delay between orphan-ing an operation and the master adopting them.


Thanks,

Joseph Wu


Re: Review Request 69967: Added a recovery path for orphan operation.

Posted by Greg Mann <gr...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69967/#review212836
-----------------------------------------------------------


Fix it, then Ship it!





src/master/master.cpp
Lines 10264-10266 (patched)
<https://reviews.apache.org/r/69967/#comment298727>

    I found this comment a bit confusing at first glance - maybe you could note here that the terminal state implies that the scheduler has not yet acknowledged the state, and the operation will be removed when that occurs?


- Greg Mann


On Feb. 13, 2019, 2:19 a.m., Joseph Wu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69967/
> -----------------------------------------------------------
> 
> (Updated Feb. 13, 2019, 2:19 a.m.)
> 
> 
> Review request for mesos, Benno Evers, Gastón Kleiman, and Greg Mann.
> 
> 
> Bugs: MESOS-9542
>     https://issues.apache.org/jira/browse/MESOS-9542
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> An orphan can be recovered if the originating framework reregisters
> with the master.  When this happens, the resource accounting is
> reversed and resources are added back to the agent's total
> and the allocator.
> 
> 
> Diffs
> -----
> 
>   src/master/master.cpp 014e0e053cdf5c53a5ef8d63300205a121bed319 
> 
> 
> Diff: https://reviews.apache.org/r/69967/diff/2/
> 
> 
> Testing
> -------
> 
> See last patch in chain.
> 
> 
> Thanks,
> 
> Joseph Wu
> 
>


Re: Review Request 69967: Added a recovery path for orphan operation.

Posted by Joseph Wu <jo...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69967/
-----------------------------------------------------------

(Updated Feb. 21, 2019, 4:07 p.m.)


Review request for mesos, Benno Evers, Gastón Kleiman, and Greg Mann.


Changes
-------

* Added a log message when an operation is un-orphaned.


Bugs: MESOS-9542
    https://issues.apache.org/jira/browse/MESOS-9542


Repository: mesos


Description
-------

An orphan can be recovered if the originating framework reregisters
with the master.  When this happens, the resource accounting is
reversed and resources are added back to the agent's total
and the allocator.


Diffs (updated)
-----

  src/master/master.cpp 106d924bf16231b3bda3fb719db68c01d73644ee 


Diff: https://reviews.apache.org/r/69967/diff/4/

Changes: https://reviews.apache.org/r/69967/diff/3-4/


Testing
-------

See last patch in chain.


Thanks,

Joseph Wu


Re: Review Request 69967: Added a recovery path for orphan operation.

Posted by Greg Mann <gr...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69967/#review213004
-----------------------------------------------------------


Ship it!




Ship It!

- Greg Mann


On Feb. 20, 2019, 12:46 a.m., Joseph Wu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69967/
> -----------------------------------------------------------
> 
> (Updated Feb. 20, 2019, 12:46 a.m.)
> 
> 
> Review request for mesos, Benno Evers, Gastón Kleiman, and Greg Mann.
> 
> 
> Bugs: MESOS-9542
>     https://issues.apache.org/jira/browse/MESOS-9542
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> An orphan can be recovered if the originating framework reregisters
> with the master.  When this happens, the resource accounting is
> reversed and resources are added back to the agent's total
> and the allocator.
> 
> 
> Diffs
> -----
> 
>   src/master/master.cpp 106d924bf16231b3bda3fb719db68c01d73644ee 
> 
> 
> Diff: https://reviews.apache.org/r/69967/diff/3/
> 
> 
> Testing
> -------
> 
> See last patch in chain.
> 
> 
> Thanks,
> 
> Joseph Wu
> 
>


Re: Review Request 69967: Added a recovery path for orphan operation.

Posted by Joseph Wu <jo...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69967/
-----------------------------------------------------------

(Updated Feb. 19, 2019, 4:46 p.m.)


Review request for mesos, Benno Evers, Gastón Kleiman, and Greg Mann.


Changes
-------

Was missing a modification to `slave->usedResources` when transitioning out of the orphan state.
Also modified a comment per suggestion.


Bugs: MESOS-9542
    https://issues.apache.org/jira/browse/MESOS-9542


Repository: mesos


Description
-------

An orphan can be recovered if the originating framework reregisters
with the master.  When this happens, the resource accounting is
reversed and resources are added back to the agent's total
and the allocator.


Diffs (updated)
-----

  src/master/master.cpp 106d924bf16231b3bda3fb719db68c01d73644ee 


Diff: https://reviews.apache.org/r/69967/diff/3/

Changes: https://reviews.apache.org/r/69967/diff/2-3/


Testing
-------

See last patch in chain.


Thanks,

Joseph Wu


Re: Review Request 69967: Added a recovery path for orphan operation.

Posted by Mesos Reviewbot Windows <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69967/#review212786
-----------------------------------------------------------



FAIL: Some of the unit tests failed. Please check the relevant logs.

Reviews applied: `['69968', '69960', '69961', '69962', '69963', '69967']`

Failed command: `Start-MesosCITesting`

All the build artifacts available at: http://dcos-win.westus2.cloudapp.azure.com/artifacts/mesos-reviewbot-testing/2882/mesos-review-69967

Relevant logs:

- [mesos-tests.log](http://dcos-win.westus2.cloudapp.azure.com/artifacts/mesos-reviewbot-testing/2882/mesos-review-69967/logs/mesos-tests.log):

```
    @   00007FF61C9F5C90  mesos::internal::master::Master::teardown
    @   00007FF61C9EFFE0  mesos::internal::master::Master::receive
    @   00007FF61CB59A76  ProtobufProcess<mesos::internal::master::Master>::handlerMutM<mesos::scheduler::Call>
    @   00007FF61CAA0509  __cdecl*&)(mesos::internal::master::Master *,void (__cdecl mesos::internal::master::Master::*)(process::UPID const &,mesos::scheduler::Call &&),process::UPID const &,std::basic_string<char,std::char_traits<char>,std::allocator<char> > const &),mesos::int
    @   00007FF61CB62860  __cdecl*&)(mesos::internal::master::Master *,void (__cdecl mesos::internal::master::Master::*)(process::UPID const &,mesos::scheduler::Call &&),process::UPID const &,std::basic_string<char,std::char_traits<char>,std::allocator<char> > const &),mesos::int
    @   00007FF61CAA0430  __cdecl*&)(mesos::internal::master::Master *,void (__cdecl mesos::internal::master::Master::*)(process::UPID const &,mesos::scheduler::Call &&),process::UPID const &,std::basic_string<char,std::char_traits<char>,std::allocator<char> > const &),mesos::int
    @   00007FF61CAB65C3  __cdecl*)(mesos::internal::master::Master *,void (__cdecl mesos::internal::master::Master::*)(process::UPID const &,mesos::scheduler::Call &&),process::UPID const &,std::basic_string<char,std::char_traits<char>,std::allocator<char> > const &),std::tuple<
    @   00007FF61CA81E4B  __cdecl*)(mesos::internal::master::Master *,void (__cdecl mesos::internal::master::Master::*)(process::UPID const &,mesos::scheduler::Call &&),process::UPID const &,std::basic_string<char,std::char_traits<char>,std::allocator<char> > const &),mesos::inte
    @   00007FF61CAA9318  __cdecl*)(mesos::internal::master::Master *,void (__cdecl mesos::internal::master::Master::*)(process::UPID const &,mesos::scheduler::Call &&),process::UPID const &,std::basic_string<char,std::char_traits<char>,std::allocator<char> > const &),mesos::inte
    @   00007FF61CB67028  __cdecl*)(mesos::internal::master::Master *,void (__cdecl mesos::internal::master::Master::*)(process::UPID const &,mesos::scheduler::Call &&),process::UPID const &,std::basic_string<char,std::char_traits<char>,std::allocator<char> > const &),mesos::inte
    @   00007FF61CAA9288  __cdecl*)(mesos::internal::master::Master *,void (__cdecl mesos::internal::master::Master::*)(process::UPID const &,mesos::scheduler::Call &&),process::UPID const &,std::basic_string<char,std::char_traits<char>,std::allocator<char> > const &),mesos::inte
    @   00007FF61CC83E2A  __cdecl*)(mesos::internal::master::Master *,void (__cdecl mesos::internal::master::Master::*)(process::UPID const &,mesos::scheduler::Call &&),process::UPID const &,std::basic_string<char,std::char_traits<char>,std::allocator<char> > const &),mesos::inte
    @   00007FF61AF8A64B   ?? 
    @   00007FF61CD133C2  ProtobufProcess<mesos::internal::master::Master>::consume
    @   00007FF61C9CACFC  mesos::internal::master::Master::_consume
    @   00007FF61C9C8C0D  mesos::internal::master::Master::consume
    @   00007FF61CD137CA  process::MessageEvent::consume
    @   00007FF61ABCA057  process::ProcessBase::serve
    @   00007FF61EA37D90  process::ProcessManager::resume
    @   00007FF61EB5B721   ?? 
    @   00007FF61EA86B20  std::_Invoker_functor::_Call<<lambda_124422ac022fa041208b80c1460630d7> >
    @   00007FF61EAE2C60  std::invoke<<lambda_124422ac022fa041208b80c1460630d7> >
    @   00007FF61EA9975C  std::_LaunchPad<std::unique_ptr<std::tuple<<lambda_124422ac022fa041208b80c1460630d7> >,std::default_delete<std::tuple<<lambda_124422ac022fa041208b80c1460630d7> > > > >::_Execute<0>
    @   00007FF61EBAF25A  std::_LaunchPad<std::unique_ptr<std::tuple<<lambda_124422ac022fa041208b80c1460630d7> >,std::default_delete<std::tuple<<lambda_124422ac022fa041208b80c1460630d7> > > > >::_Run
    @   00007FF61EB9A848  std::_LaunchPad<std::unique_ptr<std::tuple<<lambda_124422ac022fa041208b80c1460630d7> >,std::default_delete<std::tuple<<lambda_124422ac022fa041208b80c1460630d7> > > > >::_Go
    @   00007FF61EB8032D  std::_Pad::_Call_func
d029d-3afc-4d61-a2aa-643e8c45526f'
I0213 04:54:36.666474 47356 slave.cpp:8182] Forwarding new total resources cpus:4; mem:2048; disk:1024; ports:[31000-32000]
I0213 04:54:36.667424 45968 slave.cpp:917] Agent terminating
```

- Mesos Reviewbot Windows


On Feb. 13, 2019, 2:19 a.m., Joseph Wu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69967/
> -----------------------------------------------------------
> 
> (Updated Feb. 13, 2019, 2:19 a.m.)
> 
> 
> Review request for mesos, Benno Evers, Gastón Kleiman, and Greg Mann.
> 
> 
> Bugs: MESOS-9542
>     https://issues.apache.org/jira/browse/MESOS-9542
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> An orphan can be recovered if the originating framework reregisters
> with the master.  When this happens, the resource accounting is
> reversed and resources are added back to the agent's total
> and the allocator.
> 
> 
> Diffs
> -----
> 
>   src/master/master.cpp 014e0e053cdf5c53a5ef8d63300205a121bed319 
> 
> 
> Diff: https://reviews.apache.org/r/69967/diff/1/
> 
> 
> Testing
> -------
> 
> TODO: One more patch to go, which adds a delay between orphan-ing an operation and the master adopting them.
> 
> 
> Thanks,
> 
> Joseph Wu
> 
>