You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by Benjamin Mahler <bm...@apache.org> on 2019/11/26 21:31:04 UTC

Review Request 71827: Improved operator api subscribe initial payload performance.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71827/
-----------------------------------------------------------

Review request for mesos, Andrei Sekretenko and Greg Mann.


Bugs: MESOS-10026
    https://issues.apache.org/jira/browse/MESOS-10026


Repository: mesos


Description
-------

This uses the same approach for other GET_ calls in MESOS-10026
of directly serializing from in-memory state, rather than building
up the temporary object and evolving it.

There is currently no benchmark but the improvement should closely
resemble that of the GET_STATE call, for example:

    Before:
    v0 '/state' response took 6.55 secs
    v1 'GetState' application/x-protobuf response took 24.08 secs
    v1 'GetState' application/json response took 22.76 secs

    After:
    v0 '/state' response took 8.00 secs
    v1 'GetState' application/x-protobuf response took 5.73 secs
    v1 'GetState' application/json response took 9.62 secs


Diffs
-----

  src/master/http.cpp e03655863ea2d4e4464b3d14b359de3d7f059778 
  src/master/master.hpp 93630421d58e6fd26566e81a23cd910957795665 


Diff: https://reviews.apache.org/r/71827/diff/1/


Testing
-------

make check


Thanks,

Benjamin Mahler


Re: Review Request 71827: Improved operator api subscribe initial payload performance.

Posted by Andrei Sekretenko <as...@mesosphere.io>.

> On Dec. 2, 2019, 1:23 p.m., Andrei Sekretenko wrote:
> > src/master/http.cpp
> > Lines 444 (patched)
> > <https://reviews.apache.org/r/71827/diff/1/?file=2179402#file2179402line444>
> >
> >     Shouldn't we delete `_getState`, `_getFrameworks` and so on altogether? 
> >     
> >     They are dead code now: not used anywhere, not covered by tests and, essentialy, not maintained. I don't think their correctness can be relied upon after this patch.
> >     
> >     Probably they should be removed in a separate next patch so that if we need them back someday, the first step to getting them back will be reverting that patch.
> >     
> >     And the comments will need to be adjusted to not refer to them as well.
> 
> Benjamin Mahler wrote:
>     Good observation!
>     
>     When writing the code, these were useful for debugging to compare the serialized bytes against the expected serialized bytes (turns out the only bug was my lack of calling Trim() which left trailing garbage bytes at the end of the string).
>     
>     We should consider the case where we have a serialization bug, and we need to investigate it. Having the expected vs actual will be helpful, but we won't have that if we remove the object creation functions. If it's just as simple as a field being missing, then that's pretty easy (probably just forgot to write it). But if it's more like the data has some garbage in it, that may be harder to debug especially without the actual vs expected bytes handy. Thoughts on this?
>     
>     Another thing I was considering (but couldn't see a clean way to do) was to have our test suite somehow assert that the two paths produce identical results, e.g.:
>     
>      - If running within tests, have the endpoint handler run both paths and CHECK they produce the same serialized bytes.
>      - Or, introduce some test visible function in the master that could get called at some point in any test lifecycle (e.g. cleanup, or periodically) to assert that both produce equal results.
>     
>     They both seemed pretty heavy to leave around in the code, and I became less paranoid about serialization bugs, so I decided not to do these.

Hmm.. if we want to make use of old `_getState`, I would propose two options:

1. Add a flag to the GET_STATE call which would switch it to the old serialization path, and add a local test that will compare results of two GET_STATEs, old path vs new.
This might be a good aid in case someone needs to debug protobuf serialization again. Can imagine this happending when updating the bundled protobuf. 
Also, this will somewhat alleviate the concern about the unmaintained `_getState` eventually becoming buggy and non-usable.
The coverage of such test will be rather poor, though.

2. Add a master flag that, as you suggested, will make it run both paths and crash the master if the results are different. 
Executed on a testing cluster, this will give a much better coverage. The price will be introducing more complexity into GET_STATE.


- Andrei


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71827/#review218878
-----------------------------------------------------------


On Nov. 26, 2019, 9:31 p.m., Benjamin Mahler wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71827/
> -----------------------------------------------------------
> 
> (Updated Nov. 26, 2019, 9:31 p.m.)
> 
> 
> Review request for mesos, Andrei Sekretenko and Greg Mann.
> 
> 
> Bugs: MESOS-10026
>     https://issues.apache.org/jira/browse/MESOS-10026
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This uses the same approach for other GET_ calls in MESOS-10026
> of directly serializing from in-memory state, rather than building
> up the temporary object and evolving it.
> 
> There is currently no benchmark but the improvement should closely
> resemble that of the GET_STATE call, for example:
> 
>     Before:
>     v0 '/state' response took 6.55 secs
>     v1 'GetState' application/x-protobuf response took 24.08 secs
>     v1 'GetState' application/json response took 22.76 secs
> 
>     After:
>     v0 '/state' response took 8.00 secs
>     v1 'GetState' application/x-protobuf response took 5.73 secs
>     v1 'GetState' application/json response took 9.62 secs
> 
> 
> Diffs
> -----
> 
>   src/master/http.cpp e03655863ea2d4e4464b3d14b359de3d7f059778 
>   src/master/master.hpp 93630421d58e6fd26566e81a23cd910957795665 
> 
> 
> Diff: https://reviews.apache.org/r/71827/diff/1/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Benjamin Mahler
> 
>


Re: Review Request 71827: Improved operator api subscribe initial payload performance.

Posted by Benjamin Mahler <bm...@apache.org>.

> On Dec. 2, 2019, 1:23 p.m., Andrei Sekretenko wrote:
> > src/master/http.cpp
> > Lines 444 (patched)
> > <https://reviews.apache.org/r/71827/diff/1/?file=2179402#file2179402line444>
> >
> >     Shouldn't we delete `_getState`, `_getFrameworks` and so on altogether? 
> >     
> >     They are dead code now: not used anywhere, not covered by tests and, essentialy, not maintained. I don't think their correctness can be relied upon after this patch.
> >     
> >     Probably they should be removed in a separate next patch so that if we need them back someday, the first step to getting them back will be reverting that patch.
> >     
> >     And the comments will need to be adjusted to not refer to them as well.

Good observation!

When writing the code, these were useful for debugging to compare the serialized bytes against the expected serialized bytes (turns out the only bug was my lack of calling Trim() which left trailing garbage bytes at the end of the string).

We should consider the case where we have a serialization bug, and we need to investigate it. Having the expected vs actual will be helpful, but we won't have that if we remove the object creation functions. If it's just as simple as a field being missing, then that's pretty easy (probably just forgot to write it). But if it's more like the data has some garbage in it, that may be harder to debug especially without the actual vs expected bytes handy. Thoughts on this?

Another thing I was considering (but couldn't see a clean way to do) was to have our test suite somehow assert that the two paths produce identical results, e.g.:

 - If running within tests, have the endpoint handler run both paths and CHECK they produce the same serialized bytes.
 - Or, introduce some test visible function in the master that could get called at some point in any test lifecycle (e.g. cleanup, or periodically) to assert that both produce equal results.

They both seemed pretty heavy to leave around in the code, and I became less paranoid about serialization bugs, so I decided not to do these.


- Benjamin


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71827/#review218878
-----------------------------------------------------------


On Nov. 26, 2019, 9:31 p.m., Benjamin Mahler wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71827/
> -----------------------------------------------------------
> 
> (Updated Nov. 26, 2019, 9:31 p.m.)
> 
> 
> Review request for mesos, Andrei Sekretenko and Greg Mann.
> 
> 
> Bugs: MESOS-10026
>     https://issues.apache.org/jira/browse/MESOS-10026
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This uses the same approach for other GET_ calls in MESOS-10026
> of directly serializing from in-memory state, rather than building
> up the temporary object and evolving it.
> 
> There is currently no benchmark but the improvement should closely
> resemble that of the GET_STATE call, for example:
> 
>     Before:
>     v0 '/state' response took 6.55 secs
>     v1 'GetState' application/x-protobuf response took 24.08 secs
>     v1 'GetState' application/json response took 22.76 secs
> 
>     After:
>     v0 '/state' response took 8.00 secs
>     v1 'GetState' application/x-protobuf response took 5.73 secs
>     v1 'GetState' application/json response took 9.62 secs
> 
> 
> Diffs
> -----
> 
>   src/master/http.cpp e03655863ea2d4e4464b3d14b359de3d7f059778 
>   src/master/master.hpp 93630421d58e6fd26566e81a23cd910957795665 
> 
> 
> Diff: https://reviews.apache.org/r/71827/diff/1/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Benjamin Mahler
> 
>


Re: Review Request 71827: Improved operator api subscribe initial payload performance.

Posted by Andrei Sekretenko <as...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71827/#review218878
-----------------------------------------------------------


Fix it, then Ship it!





src/master/http.cpp
Lines 444 (patched)
<https://reviews.apache.org/r/71827/#comment306790>

    Shouldn't we delete `_getState`, `_getFrameworks` and so on altogether? 
    
    They are dead code now: not used anywhere, not covered by tests and, essentialy, not maintained. I don't think their correctness can be relied upon after this patch.
    
    Probably they should be removed in a separate next patch so that if we need them back someday, the first step to getting them back will be reverting that patch.
    
    And the comments will need to be adjusted to not refer to them as well.


- Andrei Sekretenko


On Nov. 26, 2019, 9:31 p.m., Benjamin Mahler wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71827/
> -----------------------------------------------------------
> 
> (Updated Nov. 26, 2019, 9:31 p.m.)
> 
> 
> Review request for mesos, Andrei Sekretenko and Greg Mann.
> 
> 
> Bugs: MESOS-10026
>     https://issues.apache.org/jira/browse/MESOS-10026
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This uses the same approach for other GET_ calls in MESOS-10026
> of directly serializing from in-memory state, rather than building
> up the temporary object and evolving it.
> 
> There is currently no benchmark but the improvement should closely
> resemble that of the GET_STATE call, for example:
> 
>     Before:
>     v0 '/state' response took 6.55 secs
>     v1 'GetState' application/x-protobuf response took 24.08 secs
>     v1 'GetState' application/json response took 22.76 secs
> 
>     After:
>     v0 '/state' response took 8.00 secs
>     v1 'GetState' application/x-protobuf response took 5.73 secs
>     v1 'GetState' application/json response took 9.62 secs
> 
> 
> Diffs
> -----
> 
>   src/master/http.cpp e03655863ea2d4e4464b3d14b359de3d7f059778 
>   src/master/master.hpp 93630421d58e6fd26566e81a23cd910957795665 
> 
> 
> Diff: https://reviews.apache.org/r/71827/diff/1/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Benjamin Mahler
> 
>


Re: Review Request 71827: Improved operator api subscribe initial payload performance.

Posted by Mesos Reviewbot <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71827/#review218811
-----------------------------------------------------------



Patch looks great!

Reviews applied: [71823, 71824, 71825, 71826, 71827]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' CONFIGURATION='--verbose --disable-libtool-wrappers --disable-parallel-test-execution' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1'; ./support/docker-build.sh

- Mesos Reviewbot


On Nov. 26, 2019, 9:31 p.m., Benjamin Mahler wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71827/
> -----------------------------------------------------------
> 
> (Updated Nov. 26, 2019, 9:31 p.m.)
> 
> 
> Review request for mesos, Andrei Sekretenko and Greg Mann.
> 
> 
> Bugs: MESOS-10026
>     https://issues.apache.org/jira/browse/MESOS-10026
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This uses the same approach for other GET_ calls in MESOS-10026
> of directly serializing from in-memory state, rather than building
> up the temporary object and evolving it.
> 
> There is currently no benchmark but the improvement should closely
> resemble that of the GET_STATE call, for example:
> 
>     Before:
>     v0 '/state' response took 6.55 secs
>     v1 'GetState' application/x-protobuf response took 24.08 secs
>     v1 'GetState' application/json response took 22.76 secs
> 
>     After:
>     v0 '/state' response took 8.00 secs
>     v1 'GetState' application/x-protobuf response took 5.73 secs
>     v1 'GetState' application/json response took 9.62 secs
> 
> 
> Diffs
> -----
> 
>   src/master/http.cpp e03655863ea2d4e4464b3d14b359de3d7f059778 
>   src/master/master.hpp 93630421d58e6fd26566e81a23cd910957795665 
> 
> 
> Diff: https://reviews.apache.org/r/71827/diff/1/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Benjamin Mahler
> 
>


Re: Review Request 71827: Improved operator api subscribe initial payload performance.

Posted by Mesos Reviewbot <re...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71827/#review218887
-----------------------------------------------------------



Patch looks great!

Reviews applied: [71823, 71824, 71825, 71826, 71827]

Passed command: export OS='ubuntu:14.04' BUILDTOOL='autotools' COMPILER='gcc' CONFIGURATION='--verbose --disable-libtool-wrappers --disable-parallel-test-execution' ENVIRONMENT='GLOG_v=1 MESOS_VERBOSE=1'; ./support/docker-build.sh

- Mesos Reviewbot


On Nov. 26, 2019, 9:31 p.m., Benjamin Mahler wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71827/
> -----------------------------------------------------------
> 
> (Updated Nov. 26, 2019, 9:31 p.m.)
> 
> 
> Review request for mesos, Andrei Sekretenko and Greg Mann.
> 
> 
> Bugs: MESOS-10026
>     https://issues.apache.org/jira/browse/MESOS-10026
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This uses the same approach for other GET_ calls in MESOS-10026
> of directly serializing from in-memory state, rather than building
> up the temporary object and evolving it.
> 
> There is currently no benchmark but the improvement should closely
> resemble that of the GET_STATE call, for example:
> 
>     Before:
>     v0 '/state' response took 6.55 secs
>     v1 'GetState' application/x-protobuf response took 24.08 secs
>     v1 'GetState' application/json response took 22.76 secs
> 
>     After:
>     v0 '/state' response took 8.00 secs
>     v1 'GetState' application/x-protobuf response took 5.73 secs
>     v1 'GetState' application/json response took 9.62 secs
> 
> 
> Diffs
> -----
> 
>   src/master/http.cpp e03655863ea2d4e4464b3d14b359de3d7f059778 
>   src/master/master.hpp 93630421d58e6fd26566e81a23cd910957795665 
> 
> 
> Diff: https://reviews.apache.org/r/71827/diff/1/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Benjamin Mahler
> 
>