You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by Zhitao Li <zh...@gmail.com> on 2018/03/06 02:44:43 UTC

Re: Review Request 61262: Added 'heartbeat' event for the operator API.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/61262/#review198678
-----------------------------------------------------------




src/master/http.cpp
Lines 845-857 (original), 845-861 (patched)
<https://reviews.apache.org/r/61262/#comment278922>

    I believe this is a behavior change.
    
    Previous, `master->subscribe(...)` call will not trigger additional event being sent, so `SUBSCRIBED` is still guaranteed to be the first event.
    
    However, we start the `heartbeater` process immediately without wait for `SUBSCRIBED` to be sent, so this could mean that subscriber can receive the `SUBSCRIBED` event after heartbeater sends something.
    
    Should reorder the call so we only call `master->subscribe(http);` after the `http.send` on `SUBSCRIBED`?


- Zhitao Li


On Aug. 18, 2017, 6:54 p.m., Quinn Leng wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/61262/
> -----------------------------------------------------------
> 
> (Updated Aug. 18, 2017, 6:54 p.m.)
> 
> 
> Review request for mesos, Anand Mazumdar and Greg Mann.
> 
> 
> Bugs: MESOS-7695
>     https://issues.apache.org/jira/browse/MESOS-7695
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Added the 'HEARTBEAT' event for the operator API, modified other
> related test cases to accept heartbeats.
> 
> 
> Diffs
> -----
> 
>   include/mesos/master/master.proto fc5bd894ce55fe8e946d4c5b4b33d3c0505f3c2b 
>   include/mesos/v1/master/master.proto c3fb31de2509adcdec8204f8bbe46b46a31540e8 
>   src/master/http.cpp 959091c8ec03b6ac7bcb5d21b04d2f7d5aff7d54 
>   src/master/master.hpp b802fd153a10f6012cea381f153c28cc78cae995 
>   src/tests/api_tests.cpp 3ab4740bcac29ecb89585da6adb1f563d6fc1f5f 
> 
> 
> Diff: https://reviews.apache.org/r/61262/diff/8/
> 
> 
> Testing
> -------
> 
> make check -j48
> 
> 
> Thanks,
> 
> Quinn Leng
> 
>


Re: Review Request 61262: Added 'heartbeat' event for the operator API.

Posted by Greg Mann <gr...@mesosphere.io>.

> On March 6, 2018, 2:44 a.m., Zhitao Li wrote:
> > src/master/http.cpp
> > Lines 845-857 (original), 845-861 (patched)
> > <https://reviews.apache.org/r/61262/diff/8/?file=1800242#file1800242line845>
> >
> >     I believe this is a behavior change.
> >     
> >     Previous, `master->subscribe(...)` call will not trigger additional event being sent, so `SUBSCRIBED` is still guaranteed to be the first event.
> >     
> >     However, we start the `heartbeater` process immediately without wait for `SUBSCRIBED` to be sent, so this could mean that subscriber can receive the `SUBSCRIBED` event after heartbeater sends something.
> >     
> >     Should reorder the call so we only call `master->subscribe(http);` after the `http.send` on `SUBSCRIBED`?

Reordering is an option, but I wonder if this will really be an issue in practice? We delay the first heartbeat event sent by the Heartbeater by DEFAULT_HEARTBEAT_INTERVAL: https://github.com/apache/mesos/blob/0d247c3887ea08b6273992218cd5899010d89fed/src/master/master.hpp#L1986
so the Heartbeater sends its first heartbeat event after that interval elapses. So, the heartbeat event could only arrive first in the presence of some extreme CPU contention, it doesn't seem likely/possible to me? WDYT?


- Greg


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/61262/#review198678
-----------------------------------------------------------


On Aug. 18, 2017, 6:54 p.m., Quinn Leng wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/61262/
> -----------------------------------------------------------
> 
> (Updated Aug. 18, 2017, 6:54 p.m.)
> 
> 
> Review request for mesos, Anand Mazumdar and Greg Mann.
> 
> 
> Bugs: MESOS-7695
>     https://issues.apache.org/jira/browse/MESOS-7695
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Added the 'HEARTBEAT' event for the operator API, modified other
> related test cases to accept heartbeats.
> 
> 
> Diffs
> -----
> 
>   include/mesos/master/master.proto fc5bd894ce55fe8e946d4c5b4b33d3c0505f3c2b 
>   include/mesos/v1/master/master.proto c3fb31de2509adcdec8204f8bbe46b46a31540e8 
>   src/master/http.cpp 959091c8ec03b6ac7bcb5d21b04d2f7d5aff7d54 
>   src/master/master.hpp b802fd153a10f6012cea381f153c28cc78cae995 
>   src/tests/api_tests.cpp 3ab4740bcac29ecb89585da6adb1f563d6fc1f5f 
> 
> 
> Diff: https://reviews.apache.org/r/61262/diff/8/
> 
> 
> Testing
> -------
> 
> make check -j48
> 
> 
> Thanks,
> 
> Quinn Leng
> 
>


Re: Review Request 61262: Added 'heartbeat' event for the operator API.

Posted by Zhitao Li <zh...@gmail.com>.

> On March 6, 2018, 2:44 a.m., Zhitao Li wrote:
> > src/master/http.cpp
> > Lines 845-857 (original), 845-861 (patched)
> > <https://reviews.apache.org/r/61262/diff/8/?file=1800242#file1800242line845>
> >
> >     I believe this is a behavior change.
> >     
> >     Previous, `master->subscribe(...)` call will not trigger additional event being sent, so `SUBSCRIBED` is still guaranteed to be the first event.
> >     
> >     However, we start the `heartbeater` process immediately without wait for `SUBSCRIBED` to be sent, so this could mean that subscriber can receive the `SUBSCRIBED` event after heartbeater sends something.
> >     
> >     Should reorder the call so we only call `master->subscribe(http);` after the `http.send` on `SUBSCRIBED`?
> 
> Greg Mann wrote:
>     Reordering is an option, but I wonder if this will really be an issue in practice? We delay the first heartbeat event sent by the Heartbeater by DEFAULT_HEARTBEAT_INTERVAL: https://github.com/apache/mesos/blob/0d247c3887ea08b6273992218cd5899010d89fed/src/master/master.hpp#L1986
>     so the Heartbeater sends its first heartbeat event after that interval elapses. So, the heartbeat event could only arrive first in the presence of some extreme CPU contention, it doesn't seem likely/possible to me? WDYT?

I think in large cluster, the first SUBSCRIBED message could take longer than `DEFAULT_HEARTBEAT_INTERVAL` under extreme cases. In practice this has hit our cluster maybe less than 10% of master failovers.

Still, I would argue that we should ensure `SUBSCRIBED` message is sent before any `HEARTBEAT`, also because subscriber cannot event process the `HEARTBEAT` without the `heartbeat_interval_seconds` value from `SUBSCRIBED` message.


- Zhitao


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/61262/#review198678
-----------------------------------------------------------


On Aug. 18, 2017, 6:54 p.m., Quinn Leng wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/61262/
> -----------------------------------------------------------
> 
> (Updated Aug. 18, 2017, 6:54 p.m.)
> 
> 
> Review request for mesos, Anand Mazumdar and Greg Mann.
> 
> 
> Bugs: MESOS-7695
>     https://issues.apache.org/jira/browse/MESOS-7695
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Added the 'HEARTBEAT' event for the operator API, modified other
> related test cases to accept heartbeats.
> 
> 
> Diffs
> -----
> 
>   include/mesos/master/master.proto fc5bd894ce55fe8e946d4c5b4b33d3c0505f3c2b 
>   include/mesos/v1/master/master.proto c3fb31de2509adcdec8204f8bbe46b46a31540e8 
>   src/master/http.cpp 959091c8ec03b6ac7bcb5d21b04d2f7d5aff7d54 
>   src/master/master.hpp b802fd153a10f6012cea381f153c28cc78cae995 
>   src/tests/api_tests.cpp 3ab4740bcac29ecb89585da6adb1f563d6fc1f5f 
> 
> 
> Diff: https://reviews.apache.org/r/61262/diff/8/
> 
> 
> Testing
> -------
> 
> make check -j48
> 
> 
> Thanks,
> 
> Quinn Leng
> 
>