You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@mesos.apache.org by Meng Zhu <mz...@mesosphere.io> on 2018/08/23 22:14:00 UTC

Review Request 68490: Optimized `class Resources` with copy-on-write.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68490/
-----------------------------------------------------------

Review request for mesos and Benjamin Mahler.


Bugs: MESOS-6765
    https://issues.apache.org/jira/browse/MESOS-6765


Repository: mesos


Description
-------

This patch lets `class Resources` only store shared
pointers to the underlying resource objects, so that
read-only filter operations such as `reserved()`,
`unreserved()` and etc. can avoid making copies of
the whole resource objects. Instead, only shared pointers
are copied.

In write operations, we check if there are more than one
references to the resource object. If so, a copy is made
for safe mutation without affecting owners.

To maintain the usage abstraction that `class Resources`
still holds resource objects, we utilize
`boost::indirect_iterator` iterator adapter to deference
the shared pointers as we iterate.


Diffs
-----

  include/mesos/resources.hpp 6f81b14f8bc090a144eeae8f15639c370366166d 
  include/mesos/v1/resources.hpp 09110530da16678abf6bf6b308906dd8ccc8180a 
  src/common/resources.cpp 0110c0ee3e810ad1c29dfa5507b13ebd5d0222a2 
  src/v1/resources.cpp 228a7327ffe7934d37b56ee67b8be9ae1e119ca8 


Diff: https://reviews.apache.org/r/68490/diff/1/


Testing
-------

make check

Did a quick test on Mac with an optimized build, running benchmark `HierarchicalAllocator_BENCHMARK_Test.ResourceLabels`, here are the results of comparing "before" and "after". Note, DVFS is not fixed. And we only did a partial run to verify the validity of the patch, full evaluation coming soon.

**Overall, 33% performance improvement for the 1st steup (1000 agents and 1 frameworks) and 32% improvement for the first 50 iterarions of the 2nd setup (1000 agents and 50 frameworks).**

Before:

[ RUN      ] SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.ResourceLabels/0

Using 1000 agents and 1 frameworks
Added 1 frameworks in 1.220022ms
Added 1000 agents in 465.045382ms
round 0 allocate() took 54.803773ms to make 0 offers after filtering 1000 offers
round 1 allocate() took 66.690456ms to make 0 offers after filtering 1000 offers

Using 1000 agents and 50 frameworks
Added 50 frameworks in 1.40774ms
Added 1000 agents in 460.562488ms
round 0 allocate() took 155.500548ms to make 1000 offers after filtering 1000 offers
round 1 allocate() took 155.194986ms to make 1000 offers after filtering 2000 offers
round 2 allocate() took 160.706712ms to make 1000 offers after filtering 3000 offers
round 3 allocate() took 156.701883ms to make 1000 offers after filtering 4000 offers
round 4 allocate() took 164.246987ms to make 1000 offers after filtering 5000 offers
round 5 allocate() took 160.791809ms to make 1000 offers after filtering 6000 offers
round 6 allocate() took 166.064652ms to make 1000 offers after filtering 7000 offers
round 7 allocate() took 162.177213ms to make 1000 offers after filtering 8000 offers
round 8 allocate() took 166.620163ms to make 1000 offers after filtering 9000 offers
round 9 allocate() took 167.970045ms to make 1000 offers after filtering 10000 offers
round 10 allocate() took 166.189463ms to make 1000 offers after filtering 11000 offers
round 11 allocate() took 170.863462ms to make 1000 offers after filtering 12000 offers
round 12 allocate() took 172.260473ms to make 1000 offers after filtering 13000 offers
round 13 allocate() took 170.553911ms to make 1000 offers after filtering 14000 offers
round 14 allocate() took 235.764593ms to make 1000 offers after filtering 15000 offers
round 15 allocate() took 247.250433ms to make 1000 offers after filtering 16000 offers
round 16 allocate() took 276.932824ms to make 1000 offers after filtering 17000 offers
round 17 allocate() took 248.888469ms to make 1000 offers after filtering 18000 offers
round 18 allocate() took 193.890556ms to make 1000 offers after filtering 19000 offers
round 19 allocate() took 195.105346ms to make 1000 offers after filtering 20000 offers
round 20 allocate() took 194.447428ms to make 1000 offers after filtering 21000 offers
round 21 allocate() took 205.486287ms to make 1000 offers after filtering 22000 offers
round 22 allocate() took 199.241922ms to make 1000 offers after filtering 23000 offers
round 23 allocate() took 200.885488ms to make 1000 offers after filtering 24000 offers
round 24 allocate() took 216.132361ms to make 1000 offers after filtering 25000 offers
round 25 allocate() took 210.638273ms to make 1000 offers after filtering 26000 offers
round 26 allocate() took 232.397778ms to make 1000 offers after filtering 27000 offers
round 27 allocate() took 239.633708ms to make 1000 offers after filtering 28000 offers
round 28 allocate() took 225.829677ms to make 1000 offers after filtering 29000 offers
round 29 allocate() took 245.143272ms to make 1000 offers after filtering 30000 offers
round 30 allocate() took 248.630295ms to make 1000 offers after filtering 31000 offers
round 31 allocate() took 262.147804ms to make 1000 offers after filtering 32000 offers
round 32 allocate() took 251.109969ms to make 1000 offers after filtering 33000 offers
round 33 allocate() took 263.92273ms to make 1000 offers after filtering 34000 offers
round 34 allocate() took 268.422275ms to make 1000 offers after filtering 35000 offers
round 35 allocate() took 273.830163ms to make 1000 offers after filtering 36000 offers
round 36 allocate() took 283.25481ms to make 1000 offers after filtering 37000 offers
round 37 allocate() took 366.400781ms to make 1000 offers after filtering 38000 offers
round 38 allocate() took 382.202755ms to make 1000 offers after filtering 39000 offers
round 39 allocate() took 337.266121ms to make 1000 offers after filtering 40000 offers
round 40 allocate() took 381.033696ms to make 1000 offers after filtering 41000 offers
round 41 allocate() took 365.941946ms to make 1000 offers after filtering 42000 offers
round 42 allocate() took 379.527886ms to make 1000 offers after filtering 43000 offers
round 43 allocate() took 425.661181ms to make 1000 offers after filtering 44000 offers
round 44 allocate() took 455.86657ms to make 1000 offers after filtering 45000 offers
round 45 allocate() took 506.943117ms to make 1000 offers after filtering 46000 offers
round 46 allocate() took 681.880233ms to make 1000 offers after filtering 47000 offers
round 47 allocate() took 860.932974ms to make 1000 offers after filtering 48000 offers
round 48 allocate() took 960.272209ms to make 1000 offers after filtering 49000 offers
round 49 allocate() took 1.907427057secs to make 0 offers after filtering 50000 offers
round 50 allocate() took 2.157864654secs to make 0 offers after filtering 50000 offers


After:

Using 1000 agents and 1 frameworks
Added 1 frameworks in 404709ns
Added 1000 agents in 245.162138ms
round 0 allocate() took 38.498883ms to make 0 offers after filtering 1000 offers
round 1 allocate() took 42.234898ms to make 0 offers after filtering 1000 offers

Using 1000 agents and 50 frameworks
Added 50 frameworks in 1.441281ms
Added 1000 agents in 261.923711ms
round 0 allocate() took 121.706074ms to make 1000 offers after filtering 1000 offers
round 1 allocate() took 150.465925ms to make 1000 offers after filtering 2000 offers
round 2 allocate() took 273.549893ms to make 1000 offers after filtering 3000 offers
round 3 allocate() took 199.248799ms to make 1000 offers after filtering 4000 offers
round 4 allocate() took 133.865752ms to make 1000 offers after filtering 5000 offers
round 5 allocate() took 130.828599ms to make 1000 offers after filtering 6000 offers
round 6 allocate() took 134.385597ms to make 1000 offers after filtering 7000 offers
round 7 allocate() took 135.810198ms to make 1000 offers after filtering 8000 offers
round 8 allocate() took 126.128592ms to make 1000 offers after filtering 9000 offers
round 9 allocate() took 144.794829ms to make 1000 offers after filtering 10000 offers
round 10 allocate() took 162.533506ms to make 1000 offers after filtering 11000 offers
round 11 allocate() took 159.22141ms to make 1000 offers after filtering 12000 offers
round 12 allocate() took 174.739823ms to make 1000 offers after filtering 13000 offers
round 13 allocate() took 171.095423ms to make 1000 offers after filtering 14000 offers
round 14 allocate() took 186.876661ms to make 1000 offers after filtering 15000 offers
round 15 allocate() took 177.021603ms to make 1000 offers after filtering 16000 offers
round 16 allocate() took 165.970722ms to make 1000 offers after filtering 17000 offers
round 17 allocate() took 162.016338ms to make 1000 offers after filtering 18000 offers
round 18 allocate() took 138.698917ms to make 1000 offers after filtering 19000 offers
round 19 allocate() took 144.556913ms to make 1000 offers after filtering 20000 offers
round 20 allocate() took 155.689926ms to make 1000 offers after filtering 21000 offers
round 21 allocate() took 149.952025ms to make 1000 offers after filtering 22000 offers
round 22 allocate() took 135.98823ms to make 1000 offers after filtering 23000 offers
round 23 allocate() took 132.520992ms to make 1000 offers after filtering 24000 offers
round 24 allocate() took 143.325635ms to make 1000 offers after filtering 25000 offers
round 25 allocate() took 153.313423ms to make 1000 offers after filtering 26000 offers
round 26 allocate() took 169.889066ms to make 1000 offers after filtering 27000 offers
round 27 allocate() took 188.969694ms to make 1000 offers after filtering 28000 offers
round 28 allocate() took 176.132259ms to make 1000 offers after filtering 29000 offers
round 29 allocate() took 186.754676ms to make 1000 offers after filtering 30000 offers
round 30 allocate() took 166.346508ms to make 1000 offers after filtering 31000 offers
round 31 allocate() took 172.557665ms to make 1000 offers after filtering 32000 offers
round 32 allocate() took 169.874406ms to make 1000 offers after filtering 33000 offers
round 33 allocate() took 190.470692ms to make 1000 offers after filtering 34000 offers
round 34 allocate() took 184.328221ms to make 1000 offers after filtering 35000 offers
round 35 allocate() took 222.081892ms to make 1000 offers after filtering 36000 offers
round 36 allocate() took 203.134216ms to make 1000 offers after filtering 37000 offers
round 37 allocate() took 217.490016ms to make 1000 offers after filtering 38000 offers
round 38 allocate() took 257.449904ms to make 1000 offers after filtering 39000 offers
round 39 allocate() took 252.468529ms to make 1000 offers after filtering 40000 offers
round 40 allocate() took 229.433398ms to make 1000 offers after filtering 41000 offers
round 41 allocate() took 251.920859ms to make 1000 offers after filtering 42000 offers
round 42 allocate() took 273.529747ms to make 1000 offers after filtering 43000 offers
round 43 allocate() took 315.08445ms to make 1000 offers after filtering 44000 offers
round 44 allocate() took 354.758003ms to make 1000 offers after filtering 45000 offers
round 45 allocate() took 350.378463ms to make 1000 offers after filtering 46000 offers
round 46 allocate() took 415.070355ms to make 1000 offers after filtering 47000 offers
round 47 allocate() took 519.922944ms to make 1000 offers after filtering 48000 offers
round 48 allocate() took 710.598546ms to make 1000 offers after filtering 49000 offers
round 49 allocate() took 1.267395251secs to make 0 offers after filtering 50000 offers
round 50 allocate() took 1.210188235secs to make 0 offers after filtering 50000 offers


Thanks,

Meng Zhu

Re: Review Request 68490: Optimized `class Resources` with copy-on-write.

Posted by Meng Zhu <mz...@mesosphere.io>.


> On Sept. 4, 2018, 2:36 a.m., Benjamin Bannier wrote:
> > Have you benchmarked against a dedicated copy-on-write abstraction? I understand that the current implementation which only copies when needed might be more performant, but it also appears to carry a huge mental overhead to me as users of `Resources` need to always pay a lot of attention to how used objects could be shared. Using a slightly safer copy-on-write abstraction might introduce some overhead, but could IMO be much easier to use and reduced the likelihood of subtle bugs. It would also help encapsulate low-level issues like `use_count`, see e.g., https://wg21.cmeerw.net/lwg/issue2776, http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0521r0.html.
> > 
> > I imagine this could be useful elsewhere as well in the future.
> > 
> > Note that implementations like the one presented in https://en.wikibooks.org/wiki/More_C%2B%2B_Idioms/Copy-on-write which allow getting references to wrapped value are prone to subtle issues in user code. Not directly giving users a reference requires a little more determination to introduce bugs.
> > 
> > Dummy:
> > 
> >     // UNTESTED AND BUGGY. FOR DEMONSTRATION PURPOSES ONLY ;)
> >     template <typename T> struct cow {
> >       // We can only construct from a value we need to wrap. All other constructors
> >       // should be deleted as needed.
> >       //
> >       // Verify with e.g., SFINAE that
> >       //
> >       //     static_assert(is_convertible_v<remove_cvref_t<U>, T>);
> >       template <typename U> explicit cow(U);
> >     
> >       // Allow assignments from a `cow` we own. All other assignment operators
> >       // should be deleted as needed. These operators need to internally synchronize
> >       // concurrent access to the wrapped value.
> >       cow &operator=(cow);
> >     
> >       // Non-mutable access to wrapped value.
> >       const T &operator->() const;
> >       const T& operator*() const;
> >     
> >       // Mutating access to the wrapped value. `F` is callable as `void(T&)` and
> >       // applied to the wrapped value immediately. Synchronizes internally.
> >       template <typename F> cow &mut(F) { return *this; }
> >     };
> >     
> >     void example() {
> >         cow<int> i(47);
> >         assert(*i == 47);
> >     
> >         i.mut([](int &x) { x = 11; });
> >         assert(*i == 11);
> >     }

Thanks a lot for the suggestion, Benjamin! There is certainly room for improvement. As Ben mentioned above, the boilerplate code regarding use_count is brittle. Hide setters and only expose mutation in a controlled fashion is the way to go.

But I need some time to think through and evaluate the options. For now,  only code inside the realm of resource.cpp (`class Resources` specifically) needs to carry that mental burden. Hopefully, this part of the code is only touched sparingly. Added a todo and I will make sure to follow up on this.


- Meng


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68490/#review208295
-----------------------------------------------------------


On Sept. 5, 2018, 12:17 p.m., Meng Zhu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68490/
> -----------------------------------------------------------
> 
> (Updated Sept. 5, 2018, 12:17 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler and Gastón Kleiman.
> 
> 
> Bugs: MESOS-6765
>     https://issues.apache.org/jira/browse/MESOS-6765
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This patch lets `class Resources` only store shared
> pointers to the underlying resource objects, so that
> read-only filter operations such as `reserved()`,
> `unreserved()` and etc. can avoid making copies of
> the whole resource objects. Instead, only shared pointers
> are copied.
> 
> In write operations, we check if there are more than one
> references to the resource object. If so, a copy is made
> for safe mutation without affecting owners.
> 
> To maintain the usage abstraction that `class Resources`
> still holds resource objects, we utilize
> `boost::indirect_iterator` iterator adapter to deference
> the shared pointers as we iterate.
> 
> 
> Diffs
> -----
> 
>   include/mesos/resources.hpp 6f81b14f8bc090a144eeae8f15639c370366166d 
>   include/mesos/v1/resources.hpp 09110530da16678abf6bf6b308906dd8ccc8180a 
>   src/common/resources.cpp 3e63cdedb9261970dbeb9bb9f97eed65819f68a7 
>   src/v1/resources.cpp 3683a331e0859cd6f2ad061db6ba67112ecfcb0d 
> 
> 
> Diff: https://reviews.apache.org/r/68490/diff/2/
> 
> 
> Testing
> -------
> 
> make check
> 
> Did a quick test on Mac with an optimized build, running benchmark `HierarchicalAllocator_BENCHMARK_Test.ResourceLabels`, here are the results of comparing "before" and "after". Note, DVFS is not fixed. And we only did a partial run to verify the validity of the patch, full evaluation coming soon.
> 
> **Overall, 33% performance improvement for the 1st steup (1000 agents and 1 frameworks) and 32% improvement for the first 50 iterarions of the 2nd setup (1000 agents and 50 frameworks).**
> 
> Before:
> 
> [ RUN      ] SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.ResourceLabels/0
> 
> Using 1000 agents and 1 frameworks
> Added 1 frameworks in 1.220022ms
> Added 1000 agents in 465.045382ms
> round 0 allocate() took 54.803773ms to make 0 offers after filtering 1000 offers
> round 1 allocate() took 66.690456ms to make 0 offers after filtering 1000 offers
> 
> Using 1000 agents and 50 frameworks
> Added 50 frameworks in 1.40774ms
> Added 1000 agents in 460.562488ms
> round 0 allocate() took 155.500548ms to make 1000 offers after filtering 1000 offers
> round 1 allocate() took 155.194986ms to make 1000 offers after filtering 2000 offers
> round 2 allocate() took 160.706712ms to make 1000 offers after filtering 3000 offers
> round 3 allocate() took 156.701883ms to make 1000 offers after filtering 4000 offers
> round 4 allocate() took 164.246987ms to make 1000 offers after filtering 5000 offers
> round 5 allocate() took 160.791809ms to make 1000 offers after filtering 6000 offers
> round 6 allocate() took 166.064652ms to make 1000 offers after filtering 7000 offers
> round 7 allocate() took 162.177213ms to make 1000 offers after filtering 8000 offers
> round 8 allocate() took 166.620163ms to make 1000 offers after filtering 9000 offers
> round 9 allocate() took 167.970045ms to make 1000 offers after filtering 10000 offers
> round 10 allocate() took 166.189463ms to make 1000 offers after filtering 11000 offers
> round 11 allocate() took 170.863462ms to make 1000 offers after filtering 12000 offers
> round 12 allocate() took 172.260473ms to make 1000 offers after filtering 13000 offers
> round 13 allocate() took 170.553911ms to make 1000 offers after filtering 14000 offers
> round 14 allocate() took 235.764593ms to make 1000 offers after filtering 15000 offers
> round 15 allocate() took 247.250433ms to make 1000 offers after filtering 16000 offers
> round 16 allocate() took 276.932824ms to make 1000 offers after filtering 17000 offers
> round 17 allocate() took 248.888469ms to make 1000 offers after filtering 18000 offers
> round 18 allocate() took 193.890556ms to make 1000 offers after filtering 19000 offers
> round 19 allocate() took 195.105346ms to make 1000 offers after filtering 20000 offers
> round 20 allocate() took 194.447428ms to make 1000 offers after filtering 21000 offers
> round 21 allocate() took 205.486287ms to make 1000 offers after filtering 22000 offers
> round 22 allocate() took 199.241922ms to make 1000 offers after filtering 23000 offers
> round 23 allocate() took 200.885488ms to make 1000 offers after filtering 24000 offers
> round 24 allocate() took 216.132361ms to make 1000 offers after filtering 25000 offers
> round 25 allocate() took 210.638273ms to make 1000 offers after filtering 26000 offers
> round 26 allocate() took 232.397778ms to make 1000 offers after filtering 27000 offers
> round 27 allocate() took 239.633708ms to make 1000 offers after filtering 28000 offers
> round 28 allocate() took 225.829677ms to make 1000 offers after filtering 29000 offers
> round 29 allocate() took 245.143272ms to make 1000 offers after filtering 30000 offers
> round 30 allocate() took 248.630295ms to make 1000 offers after filtering 31000 offers
> round 31 allocate() took 262.147804ms to make 1000 offers after filtering 32000 offers
> round 32 allocate() took 251.109969ms to make 1000 offers after filtering 33000 offers
> round 33 allocate() took 263.92273ms to make 1000 offers after filtering 34000 offers
> round 34 allocate() took 268.422275ms to make 1000 offers after filtering 35000 offers
> round 35 allocate() took 273.830163ms to make 1000 offers after filtering 36000 offers
> round 36 allocate() took 283.25481ms to make 1000 offers after filtering 37000 offers
> round 37 allocate() took 366.400781ms to make 1000 offers after filtering 38000 offers
> round 38 allocate() took 382.202755ms to make 1000 offers after filtering 39000 offers
> round 39 allocate() took 337.266121ms to make 1000 offers after filtering 40000 offers
> round 40 allocate() took 381.033696ms to make 1000 offers after filtering 41000 offers
> round 41 allocate() took 365.941946ms to make 1000 offers after filtering 42000 offers
> round 42 allocate() took 379.527886ms to make 1000 offers after filtering 43000 offers
> round 43 allocate() took 425.661181ms to make 1000 offers after filtering 44000 offers
> round 44 allocate() took 455.86657ms to make 1000 offers after filtering 45000 offers
> round 45 allocate() took 506.943117ms to make 1000 offers after filtering 46000 offers
> round 46 allocate() took 681.880233ms to make 1000 offers after filtering 47000 offers
> round 47 allocate() took 860.932974ms to make 1000 offers after filtering 48000 offers
> round 48 allocate() took 960.272209ms to make 1000 offers after filtering 49000 offers
> round 49 allocate() took 1.907427057secs to make 0 offers after filtering 50000 offers
> round 50 allocate() took 2.157864654secs to make 0 offers after filtering 50000 offers
> 
> 
> After:
> 
> Using 1000 agents and 1 frameworks
> Added 1 frameworks in 404709ns
> Added 1000 agents in 245.162138ms
> round 0 allocate() took 38.498883ms to make 0 offers after filtering 1000 offers
> round 1 allocate() took 42.234898ms to make 0 offers after filtering 1000 offers
> 
> Using 1000 agents and 50 frameworks
> Added 50 frameworks in 1.441281ms
> Added 1000 agents in 261.923711ms
> round 0 allocate() took 121.706074ms to make 1000 offers after filtering 1000 offers
> round 1 allocate() took 150.465925ms to make 1000 offers after filtering 2000 offers
> round 2 allocate() took 273.549893ms to make 1000 offers after filtering 3000 offers
> round 3 allocate() took 199.248799ms to make 1000 offers after filtering 4000 offers
> round 4 allocate() took 133.865752ms to make 1000 offers after filtering 5000 offers
> round 5 allocate() took 130.828599ms to make 1000 offers after filtering 6000 offers
> round 6 allocate() took 134.385597ms to make 1000 offers after filtering 7000 offers
> round 7 allocate() took 135.810198ms to make 1000 offers after filtering 8000 offers
> round 8 allocate() took 126.128592ms to make 1000 offers after filtering 9000 offers
> round 9 allocate() took 144.794829ms to make 1000 offers after filtering 10000 offers
> round 10 allocate() took 162.533506ms to make 1000 offers after filtering 11000 offers
> round 11 allocate() took 159.22141ms to make 1000 offers after filtering 12000 offers
> round 12 allocate() took 174.739823ms to make 1000 offers after filtering 13000 offers
> round 13 allocate() took 171.095423ms to make 1000 offers after filtering 14000 offers
> round 14 allocate() took 186.876661ms to make 1000 offers after filtering 15000 offers
> round 15 allocate() took 177.021603ms to make 1000 offers after filtering 16000 offers
> round 16 allocate() took 165.970722ms to make 1000 offers after filtering 17000 offers
> round 17 allocate() took 162.016338ms to make 1000 offers after filtering 18000 offers
> round 18 allocate() took 138.698917ms to make 1000 offers after filtering 19000 offers
> round 19 allocate() took 144.556913ms to make 1000 offers after filtering 20000 offers
> round 20 allocate() took 155.689926ms to make 1000 offers after filtering 21000 offers
> round 21 allocate() took 149.952025ms to make 1000 offers after filtering 22000 offers
> round 22 allocate() took 135.98823ms to make 1000 offers after filtering 23000 offers
> round 23 allocate() took 132.520992ms to make 1000 offers after filtering 24000 offers
> round 24 allocate() took 143.325635ms to make 1000 offers after filtering 25000 offers
> round 25 allocate() took 153.313423ms to make 1000 offers after filtering 26000 offers
> round 26 allocate() took 169.889066ms to make 1000 offers after filtering 27000 offers
> round 27 allocate() took 188.969694ms to make 1000 offers after filtering 28000 offers
> round 28 allocate() took 176.132259ms to make 1000 offers after filtering 29000 offers
> round 29 allocate() took 186.754676ms to make 1000 offers after filtering 30000 offers
> round 30 allocate() took 166.346508ms to make 1000 offers after filtering 31000 offers
> round 31 allocate() took 172.557665ms to make 1000 offers after filtering 32000 offers
> round 32 allocate() took 169.874406ms to make 1000 offers after filtering 33000 offers
> round 33 allocate() took 190.470692ms to make 1000 offers after filtering 34000 offers
> round 34 allocate() took 184.328221ms to make 1000 offers after filtering 35000 offers
> round 35 allocate() took 222.081892ms to make 1000 offers after filtering 36000 offers
> round 36 allocate() took 203.134216ms to make 1000 offers after filtering 37000 offers
> round 37 allocate() took 217.490016ms to make 1000 offers after filtering 38000 offers
> round 38 allocate() took 257.449904ms to make 1000 offers after filtering 39000 offers
> round 39 allocate() took 252.468529ms to make 1000 offers after filtering 40000 offers
> round 40 allocate() took 229.433398ms to make 1000 offers after filtering 41000 offers
> round 41 allocate() took 251.920859ms to make 1000 offers after filtering 42000 offers
> round 42 allocate() took 273.529747ms to make 1000 offers after filtering 43000 offers
> round 43 allocate() took 315.08445ms to make 1000 offers after filtering 44000 offers
> round 44 allocate() took 354.758003ms to make 1000 offers after filtering 45000 offers
> round 45 allocate() took 350.378463ms to make 1000 offers after filtering 46000 offers
> round 46 allocate() took 415.070355ms to make 1000 offers after filtering 47000 offers
> round 47 allocate() took 519.922944ms to make 1000 offers after filtering 48000 offers
> round 48 allocate() took 710.598546ms to make 1000 offers after filtering 49000 offers
> round 49 allocate() took 1.267395251secs to make 0 offers after filtering 50000 offers
> round 50 allocate() took 1.210188235secs to make 0 offers after filtering 50000 offers
> 
> 
> Thanks,
> 
> Meng Zhu
> 
>

Re: Review Request 68490: Optimized `class Resources` with copy-on-write.

Posted by Benjamin Bannier <be...@mesosphere.io>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68490/#review208295
-----------------------------------------------------------



Have you benchmarked against a dedicated copy-on-write abstraction? I understand that the current implementation which only copies when needed might be more performant, but it also appears to carry a huge mental overhead to me as users of `Resources` need to always pay a lot of attention to how used objects could be shared. Using a slightly safer copy-on-write abstraction might introduce some overhead, but could IMO be much easier to use and reduced the likelihood of subtle bugs. It would also help encapsulate low-level issues like `use_count`, see e.g., https://wg21.cmeerw.net/lwg/issue2776, http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0521r0.html.

I imagine this could be useful elsewhere as well in the future.

Note that implementations like the one presented in https://en.wikibooks.org/wiki/More_C%2B%2B_Idioms/Copy-on-write which allow getting references to wrapped value are prone to subtle issues in user code. Not directly giving users a reference requires a little more determination to introduce bugs.

Dummy:

    // UNTESTED AND BUGGY. FOR DEMONSTRATION PURPOSES ONLY ;)
    template <typename T> struct cow {
      // We can only construct from a value we need to wrap. All other constructors
      // should be deleted as needed.
      //
      // Verify with e.g., SFINAE that
      //
      //     static_assert(is_convertible_v<remove_cvref_t<U>, T>);
      template <typename U> explicit cow(U);
    
      // Allow assignments from a `cow` we own. All other assignment operators
      // should be deleted as needed. These operators need to internally synchronize
      // concurrent access to the wrapped value.
      cow &operator=(cow);
    
      // Non-mutable access to wrapped value.
      const T &operator->() const;
      const T& operator*() const;
    
      // Mutating access to the wrapped value. `F` is callable as `void(T&)` and
      // applied to the wrapped value immediately. Synchronizes internally.
      template <typename F> cow &mut(F) { return *this; }
    };
    
    void example() {
        cow<int> i(47);
        assert(*i == 47);
    
        i.mut([](int &x) { x = 11; });
        assert(*i == 11);
    }

- Benjamin Bannier


On Aug. 24, 2018, 12:13 a.m., Meng Zhu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68490/
> -----------------------------------------------------------
> 
> (Updated Aug. 24, 2018, 12:13 a.m.)
> 
> 
> Review request for mesos and Benjamin Mahler.
> 
> 
> Bugs: MESOS-6765
>     https://issues.apache.org/jira/browse/MESOS-6765
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This patch lets `class Resources` only store shared
> pointers to the underlying resource objects, so that
> read-only filter operations such as `reserved()`,
> `unreserved()` and etc. can avoid making copies of
> the whole resource objects. Instead, only shared pointers
> are copied.
> 
> In write operations, we check if there are more than one
> references to the resource object. If so, a copy is made
> for safe mutation without affecting owners.
> 
> To maintain the usage abstraction that `class Resources`
> still holds resource objects, we utilize
> `boost::indirect_iterator` iterator adapter to deference
> the shared pointers as we iterate.
> 
> 
> Diffs
> -----
> 
>   include/mesos/resources.hpp 6f81b14f8bc090a144eeae8f15639c370366166d 
>   include/mesos/v1/resources.hpp 09110530da16678abf6bf6b308906dd8ccc8180a 
>   src/common/resources.cpp 0110c0ee3e810ad1c29dfa5507b13ebd5d0222a2 
>   src/v1/resources.cpp 228a7327ffe7934d37b56ee67b8be9ae1e119ca8 
> 
> 
> Diff: https://reviews.apache.org/r/68490/diff/1/
> 
> 
> Testing
> -------
> 
> make check
> 
> Did a quick test on Mac with an optimized build, running benchmark `HierarchicalAllocator_BENCHMARK_Test.ResourceLabels`, here are the results of comparing "before" and "after". Note, DVFS is not fixed. And we only did a partial run to verify the validity of the patch, full evaluation coming soon.
> 
> **Overall, 33% performance improvement for the 1st steup (1000 agents and 1 frameworks) and 32% improvement for the first 50 iterarions of the 2nd setup (1000 agents and 50 frameworks).**
> 
> Before:
> 
> [ RUN      ] SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.ResourceLabels/0
> 
> Using 1000 agents and 1 frameworks
> Added 1 frameworks in 1.220022ms
> Added 1000 agents in 465.045382ms
> round 0 allocate() took 54.803773ms to make 0 offers after filtering 1000 offers
> round 1 allocate() took 66.690456ms to make 0 offers after filtering 1000 offers
> 
> Using 1000 agents and 50 frameworks
> Added 50 frameworks in 1.40774ms
> Added 1000 agents in 460.562488ms
> round 0 allocate() took 155.500548ms to make 1000 offers after filtering 1000 offers
> round 1 allocate() took 155.194986ms to make 1000 offers after filtering 2000 offers
> round 2 allocate() took 160.706712ms to make 1000 offers after filtering 3000 offers
> round 3 allocate() took 156.701883ms to make 1000 offers after filtering 4000 offers
> round 4 allocate() took 164.246987ms to make 1000 offers after filtering 5000 offers
> round 5 allocate() took 160.791809ms to make 1000 offers after filtering 6000 offers
> round 6 allocate() took 166.064652ms to make 1000 offers after filtering 7000 offers
> round 7 allocate() took 162.177213ms to make 1000 offers after filtering 8000 offers
> round 8 allocate() took 166.620163ms to make 1000 offers after filtering 9000 offers
> round 9 allocate() took 167.970045ms to make 1000 offers after filtering 10000 offers
> round 10 allocate() took 166.189463ms to make 1000 offers after filtering 11000 offers
> round 11 allocate() took 170.863462ms to make 1000 offers after filtering 12000 offers
> round 12 allocate() took 172.260473ms to make 1000 offers after filtering 13000 offers
> round 13 allocate() took 170.553911ms to make 1000 offers after filtering 14000 offers
> round 14 allocate() took 235.764593ms to make 1000 offers after filtering 15000 offers
> round 15 allocate() took 247.250433ms to make 1000 offers after filtering 16000 offers
> round 16 allocate() took 276.932824ms to make 1000 offers after filtering 17000 offers
> round 17 allocate() took 248.888469ms to make 1000 offers after filtering 18000 offers
> round 18 allocate() took 193.890556ms to make 1000 offers after filtering 19000 offers
> round 19 allocate() took 195.105346ms to make 1000 offers after filtering 20000 offers
> round 20 allocate() took 194.447428ms to make 1000 offers after filtering 21000 offers
> round 21 allocate() took 205.486287ms to make 1000 offers after filtering 22000 offers
> round 22 allocate() took 199.241922ms to make 1000 offers after filtering 23000 offers
> round 23 allocate() took 200.885488ms to make 1000 offers after filtering 24000 offers
> round 24 allocate() took 216.132361ms to make 1000 offers after filtering 25000 offers
> round 25 allocate() took 210.638273ms to make 1000 offers after filtering 26000 offers
> round 26 allocate() took 232.397778ms to make 1000 offers after filtering 27000 offers
> round 27 allocate() took 239.633708ms to make 1000 offers after filtering 28000 offers
> round 28 allocate() took 225.829677ms to make 1000 offers after filtering 29000 offers
> round 29 allocate() took 245.143272ms to make 1000 offers after filtering 30000 offers
> round 30 allocate() took 248.630295ms to make 1000 offers after filtering 31000 offers
> round 31 allocate() took 262.147804ms to make 1000 offers after filtering 32000 offers
> round 32 allocate() took 251.109969ms to make 1000 offers after filtering 33000 offers
> round 33 allocate() took 263.92273ms to make 1000 offers after filtering 34000 offers
> round 34 allocate() took 268.422275ms to make 1000 offers after filtering 35000 offers
> round 35 allocate() took 273.830163ms to make 1000 offers after filtering 36000 offers
> round 36 allocate() took 283.25481ms to make 1000 offers after filtering 37000 offers
> round 37 allocate() took 366.400781ms to make 1000 offers after filtering 38000 offers
> round 38 allocate() took 382.202755ms to make 1000 offers after filtering 39000 offers
> round 39 allocate() took 337.266121ms to make 1000 offers after filtering 40000 offers
> round 40 allocate() took 381.033696ms to make 1000 offers after filtering 41000 offers
> round 41 allocate() took 365.941946ms to make 1000 offers after filtering 42000 offers
> round 42 allocate() took 379.527886ms to make 1000 offers after filtering 43000 offers
> round 43 allocate() took 425.661181ms to make 1000 offers after filtering 44000 offers
> round 44 allocate() took 455.86657ms to make 1000 offers after filtering 45000 offers
> round 45 allocate() took 506.943117ms to make 1000 offers after filtering 46000 offers
> round 46 allocate() took 681.880233ms to make 1000 offers after filtering 47000 offers
> round 47 allocate() took 860.932974ms to make 1000 offers after filtering 48000 offers
> round 48 allocate() took 960.272209ms to make 1000 offers after filtering 49000 offers
> round 49 allocate() took 1.907427057secs to make 0 offers after filtering 50000 offers
> round 50 allocate() took 2.157864654secs to make 0 offers after filtering 50000 offers
> 
> 
> After:
> 
> Using 1000 agents and 1 frameworks
> Added 1 frameworks in 404709ns
> Added 1000 agents in 245.162138ms
> round 0 allocate() took 38.498883ms to make 0 offers after filtering 1000 offers
> round 1 allocate() took 42.234898ms to make 0 offers after filtering 1000 offers
> 
> Using 1000 agents and 50 frameworks
> Added 50 frameworks in 1.441281ms
> Added 1000 agents in 261.923711ms
> round 0 allocate() took 121.706074ms to make 1000 offers after filtering 1000 offers
> round 1 allocate() took 150.465925ms to make 1000 offers after filtering 2000 offers
> round 2 allocate() took 273.549893ms to make 1000 offers after filtering 3000 offers
> round 3 allocate() took 199.248799ms to make 1000 offers after filtering 4000 offers
> round 4 allocate() took 133.865752ms to make 1000 offers after filtering 5000 offers
> round 5 allocate() took 130.828599ms to make 1000 offers after filtering 6000 offers
> round 6 allocate() took 134.385597ms to make 1000 offers after filtering 7000 offers
> round 7 allocate() took 135.810198ms to make 1000 offers after filtering 8000 offers
> round 8 allocate() took 126.128592ms to make 1000 offers after filtering 9000 offers
> round 9 allocate() took 144.794829ms to make 1000 offers after filtering 10000 offers
> round 10 allocate() took 162.533506ms to make 1000 offers after filtering 11000 offers
> round 11 allocate() took 159.22141ms to make 1000 offers after filtering 12000 offers
> round 12 allocate() took 174.739823ms to make 1000 offers after filtering 13000 offers
> round 13 allocate() took 171.095423ms to make 1000 offers after filtering 14000 offers
> round 14 allocate() took 186.876661ms to make 1000 offers after filtering 15000 offers
> round 15 allocate() took 177.021603ms to make 1000 offers after filtering 16000 offers
> round 16 allocate() took 165.970722ms to make 1000 offers after filtering 17000 offers
> round 17 allocate() took 162.016338ms to make 1000 offers after filtering 18000 offers
> round 18 allocate() took 138.698917ms to make 1000 offers after filtering 19000 offers
> round 19 allocate() took 144.556913ms to make 1000 offers after filtering 20000 offers
> round 20 allocate() took 155.689926ms to make 1000 offers after filtering 21000 offers
> round 21 allocate() took 149.952025ms to make 1000 offers after filtering 22000 offers
> round 22 allocate() took 135.98823ms to make 1000 offers after filtering 23000 offers
> round 23 allocate() took 132.520992ms to make 1000 offers after filtering 24000 offers
> round 24 allocate() took 143.325635ms to make 1000 offers after filtering 25000 offers
> round 25 allocate() took 153.313423ms to make 1000 offers after filtering 26000 offers
> round 26 allocate() took 169.889066ms to make 1000 offers after filtering 27000 offers
> round 27 allocate() took 188.969694ms to make 1000 offers after filtering 28000 offers
> round 28 allocate() took 176.132259ms to make 1000 offers after filtering 29000 offers
> round 29 allocate() took 186.754676ms to make 1000 offers after filtering 30000 offers
> round 30 allocate() took 166.346508ms to make 1000 offers after filtering 31000 offers
> round 31 allocate() took 172.557665ms to make 1000 offers after filtering 32000 offers
> round 32 allocate() took 169.874406ms to make 1000 offers after filtering 33000 offers
> round 33 allocate() took 190.470692ms to make 1000 offers after filtering 34000 offers
> round 34 allocate() took 184.328221ms to make 1000 offers after filtering 35000 offers
> round 35 allocate() took 222.081892ms to make 1000 offers after filtering 36000 offers
> round 36 allocate() took 203.134216ms to make 1000 offers after filtering 37000 offers
> round 37 allocate() took 217.490016ms to make 1000 offers after filtering 38000 offers
> round 38 allocate() took 257.449904ms to make 1000 offers after filtering 39000 offers
> round 39 allocate() took 252.468529ms to make 1000 offers after filtering 40000 offers
> round 40 allocate() took 229.433398ms to make 1000 offers after filtering 41000 offers
> round 41 allocate() took 251.920859ms to make 1000 offers after filtering 42000 offers
> round 42 allocate() took 273.529747ms to make 1000 offers after filtering 43000 offers
> round 43 allocate() took 315.08445ms to make 1000 offers after filtering 44000 offers
> round 44 allocate() took 354.758003ms to make 1000 offers after filtering 45000 offers
> round 45 allocate() took 350.378463ms to make 1000 offers after filtering 46000 offers
> round 46 allocate() took 415.070355ms to make 1000 offers after filtering 47000 offers
> round 47 allocate() took 519.922944ms to make 1000 offers after filtering 48000 offers
> round 48 allocate() took 710.598546ms to make 1000 offers after filtering 49000 offers
> round 49 allocate() took 1.267395251secs to make 0 offers after filtering 50000 offers
> round 50 allocate() took 1.210188235secs to make 0 offers after filtering 50000 offers
> 
> 
> Thanks,
> 
> Meng Zhu
> 
>

Re: Review Request 68490: Optimized `class Resources` with copy-on-write.

Posted by Meng Zhu <mz...@mesosphere.io>.


> On Sept. 3, 2018, 9:59 p.m., Benjamin Mahler wrote:
> > Thanks! Can you also include some commentary on the rest of the allocation benchmarks as well as the overhead within the Resources micro-benchmarks?

Posted the updated patches. Will summarize and post more comprehensive results soon.


> On Sept. 3, 2018, 9:59 p.m., Benjamin Mahler wrote:
> > include/mesos/resources.hpp
> > Lines 678-679 (patched)
> > <https://reviews.apache.org/r/68490/diff/1/?file=2076881#file2076881line680>
> >
> >     What's this referring to? Reading this I see it already has an rvalue reference?

Referring to `void add(std::shared_ptr<Resource_>&& that);`. Moved the comment down to clarifiy it.


> On Sept. 3, 2018, 9:59 p.m., Benjamin Mahler wrote:
> > include/mesos/resources.hpp
> > Lines 682 (patched)
> > <https://reviews.apache.org/r/68490/diff/1/?file=2076881#file2076881line684>
> >
> >     Why do we need this? It seems like the callers should just be de-referencing their shared_ptr prior to calling?

Consider adding two non-mergeble `Resources`(e.g. 1 cpu + 1 mem), if we only have `Resources::add(const Resource_& that)`, we will always need to `make_shared(Resource_)`. If we directly pass `shared_ptr`, we can then just `push_back`. For any non-mergable `Resource_`, we will be able to save one `make_shared`.

One rule of thumb I figured is that, we should avoid `make_shared` when possible (because it makes a copy). This means if we already have a `shared_ptr` we should just use it and pass it around. This also explains why I use `shared_ptr` for foreach loops.


> On Sept. 3, 2018, 9:59 p.m., Benjamin Mahler wrote:
> > src/common/resources.cpp
> > Line 1471 (original), 1473 (patched)
> > <https://reviews.apache.org/r/68490/diff/1/?file=2076883#file2076883line1473>
> >
> >     I'm a little puzzled about these `const shared_ptr<...>` foreach loops. Why don't the const ones do `const Resource_&` loops?
> >     
> >     ```
> >     foreach (const Resource_& resource_, that) {
> >     ```
> >     
> >     Something I'm missing?

See my comment above regarding `add(const std::shared_ptr<Resource_>& that)`.

Actual benefits aside, I also want to make it consistent that inside the `class Resources`, we are speaking `shared_ptr` unless we want to make an implicit copy as in `push/popReservation`.


> On Sept. 3, 2018, 9:59 p.m., Benjamin Mahler wrote:
> > src/common/resources.cpp
> > Line 1503 (original), 1505 (patched)
> > <https://reviews.apache.org/r/68490/diff/1/?file=2076883#file2076883line1505>
> >
> >     It seems ok to slip this in here, but generally please send unrelated cleanups as separate patches. :)

Got it.


> On Sept. 3, 2018, 9:59 p.m., Benjamin Mahler wrote:
> > src/common/resources.cpp
> > Line 1514 (original), 1516-1518 (patched)
> > <https://reviews.apache.org/r/68490/diff/1/?file=2076883#file2076883line1516>
> >
> >     I'm not sure how obvious these will be to readers, maybe a comment on all of these?
> >     
> >     ```
> >     // Copy-on-write (if more than 1 reference).
> >     if (resource_.use_count() > 1) {
> >       resource_ = make_shared<Resource_>(*resource_);
> >     }
> >     ```
> >     
> >     Maybe something like (since we can't use `mutable`):
> >     
> >     ```
> >     make_mutable(resource_);
> >     ```
> >     
> >     Maybe also make it composable?
> >     
> >     ```
> >     make_mutable(resource_)->resource.mutable_allocation_info()->set_role(role);
> >     ```
> >     
> >     Still, seems rather error prone and easy to forget to check for copies prior to mutating? Wonder if there's a way to make it less brittle.

Added the comment. Added a TODO regarding introducing a more controlled mutation interface.


> On Sept. 3, 2018, 9:59 p.m., Benjamin Mahler wrote:
> > src/common/resources.cpp
> > Lines 1667-1670 (patched)
> > <https://reviews.apache.org/r/68490/diff/1/?file=2076883#file2076883line1667>
> >
> >     Seems odd, why don't we use `Resource` directly here?
> >     
> >     ```
> >           Resource r = *resource_;
> >           r.clear_reservations();
> >     
> >           result.add(std::move(r));
> >     ```
> >     
> >     (I left a comment above about removing the add that takes a shared_ptr, since it doesn't seem needed?)

See my comments regarding `add` shared_ptr above.


> On Sept. 3, 2018, 9:59 p.m., Benjamin Mahler wrote:
> > src/common/resources.cpp
> > Line 1676 (original), 1692 (patched)
> > <https://reviews.apache.org/r/68490/diff/1/?file=2076883#file2076883line1692>
> >
> >     Feel free to pull some of these unrelated improvements out in front of this, it would help make the diff easier for reviewers.

Got it.


> On Sept. 3, 2018, 9:59 p.m., Benjamin Mahler wrote:
> > src/common/resources.cpp
> > Line 1954 (original), 1972-1979 (patched)
> > <https://reviews.apache.org/r/68490/diff/1/?file=2076883#file2076883line1972>
> >
> >     Why is this needed? Isn't the point of this change that toUnreserved avoids copying now?

You are right, we can just do

```
Resources unreserved;
unreserved.add(resource_);
unreserved = unreserved.toUnreserved();
```


> On Sept. 3, 2018, 9:59 p.m., Benjamin Mahler wrote:
> > src/common/resources.cpp
> > Lines 1958-1963 (original), 1983-1993 (patched)
> > <https://reviews.apache.org/r/68490/diff/1/?file=2076883#file2076883line1983>
> >
> >     Hm.. is there a way to simplify this? This seems to be just adding in the reserved resources? Could we call pushReservation or something instead of touching the resources field directly?
> >     
> >     Right now `find()` is not performance critical, so it doesn't need to be optimal if it's simpler not to be.

Done. We will just make a copy then.


- Meng


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68490/#review208286
-----------------------------------------------------------


On Sept. 5, 2018, 12:17 p.m., Meng Zhu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68490/
> -----------------------------------------------------------
> 
> (Updated Sept. 5, 2018, 12:17 p.m.)
> 
> 
> Review request for mesos, Benjamin Mahler and Gastón Kleiman.
> 
> 
> Bugs: MESOS-6765
>     https://issues.apache.org/jira/browse/MESOS-6765
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This patch lets `class Resources` only store shared
> pointers to the underlying resource objects, so that
> read-only filter operations such as `reserved()`,
> `unreserved()` and etc. can avoid making copies of
> the whole resource objects. Instead, only shared pointers
> are copied.
> 
> In write operations, we check if there are more than one
> references to the resource object. If so, a copy is made
> for safe mutation without affecting owners.
> 
> To maintain the usage abstraction that `class Resources`
> still holds resource objects, we utilize
> `boost::indirect_iterator` iterator adapter to deference
> the shared pointers as we iterate.
> 
> 
> Diffs
> -----
> 
>   include/mesos/resources.hpp 6f81b14f8bc090a144eeae8f15639c370366166d 
>   include/mesos/v1/resources.hpp 09110530da16678abf6bf6b308906dd8ccc8180a 
>   src/common/resources.cpp 3e63cdedb9261970dbeb9bb9f97eed65819f68a7 
>   src/v1/resources.cpp 3683a331e0859cd6f2ad061db6ba67112ecfcb0d 
> 
> 
> Diff: https://reviews.apache.org/r/68490/diff/2/
> 
> 
> Testing
> -------
> 
> make check
> 
> Did a quick test on Mac with an optimized build, running benchmark `HierarchicalAllocator_BENCHMARK_Test.ResourceLabels`, here are the results of comparing "before" and "after". Note, DVFS is not fixed. And we only did a partial run to verify the validity of the patch, full evaluation coming soon.
> 
> **Overall, 33% performance improvement for the 1st steup (1000 agents and 1 frameworks) and 32% improvement for the first 50 iterarions of the 2nd setup (1000 agents and 50 frameworks).**
> 
> Before:
> 
> [ RUN      ] SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.ResourceLabels/0
> 
> Using 1000 agents and 1 frameworks
> Added 1 frameworks in 1.220022ms
> Added 1000 agents in 465.045382ms
> round 0 allocate() took 54.803773ms to make 0 offers after filtering 1000 offers
> round 1 allocate() took 66.690456ms to make 0 offers after filtering 1000 offers
> 
> Using 1000 agents and 50 frameworks
> Added 50 frameworks in 1.40774ms
> Added 1000 agents in 460.562488ms
> round 0 allocate() took 155.500548ms to make 1000 offers after filtering 1000 offers
> round 1 allocate() took 155.194986ms to make 1000 offers after filtering 2000 offers
> round 2 allocate() took 160.706712ms to make 1000 offers after filtering 3000 offers
> round 3 allocate() took 156.701883ms to make 1000 offers after filtering 4000 offers
> round 4 allocate() took 164.246987ms to make 1000 offers after filtering 5000 offers
> round 5 allocate() took 160.791809ms to make 1000 offers after filtering 6000 offers
> round 6 allocate() took 166.064652ms to make 1000 offers after filtering 7000 offers
> round 7 allocate() took 162.177213ms to make 1000 offers after filtering 8000 offers
> round 8 allocate() took 166.620163ms to make 1000 offers after filtering 9000 offers
> round 9 allocate() took 167.970045ms to make 1000 offers after filtering 10000 offers
> round 10 allocate() took 166.189463ms to make 1000 offers after filtering 11000 offers
> round 11 allocate() took 170.863462ms to make 1000 offers after filtering 12000 offers
> round 12 allocate() took 172.260473ms to make 1000 offers after filtering 13000 offers
> round 13 allocate() took 170.553911ms to make 1000 offers after filtering 14000 offers
> round 14 allocate() took 235.764593ms to make 1000 offers after filtering 15000 offers
> round 15 allocate() took 247.250433ms to make 1000 offers after filtering 16000 offers
> round 16 allocate() took 276.932824ms to make 1000 offers after filtering 17000 offers
> round 17 allocate() took 248.888469ms to make 1000 offers after filtering 18000 offers
> round 18 allocate() took 193.890556ms to make 1000 offers after filtering 19000 offers
> round 19 allocate() took 195.105346ms to make 1000 offers after filtering 20000 offers
> round 20 allocate() took 194.447428ms to make 1000 offers after filtering 21000 offers
> round 21 allocate() took 205.486287ms to make 1000 offers after filtering 22000 offers
> round 22 allocate() took 199.241922ms to make 1000 offers after filtering 23000 offers
> round 23 allocate() took 200.885488ms to make 1000 offers after filtering 24000 offers
> round 24 allocate() took 216.132361ms to make 1000 offers after filtering 25000 offers
> round 25 allocate() took 210.638273ms to make 1000 offers after filtering 26000 offers
> round 26 allocate() took 232.397778ms to make 1000 offers after filtering 27000 offers
> round 27 allocate() took 239.633708ms to make 1000 offers after filtering 28000 offers
> round 28 allocate() took 225.829677ms to make 1000 offers after filtering 29000 offers
> round 29 allocate() took 245.143272ms to make 1000 offers after filtering 30000 offers
> round 30 allocate() took 248.630295ms to make 1000 offers after filtering 31000 offers
> round 31 allocate() took 262.147804ms to make 1000 offers after filtering 32000 offers
> round 32 allocate() took 251.109969ms to make 1000 offers after filtering 33000 offers
> round 33 allocate() took 263.92273ms to make 1000 offers after filtering 34000 offers
> round 34 allocate() took 268.422275ms to make 1000 offers after filtering 35000 offers
> round 35 allocate() took 273.830163ms to make 1000 offers after filtering 36000 offers
> round 36 allocate() took 283.25481ms to make 1000 offers after filtering 37000 offers
> round 37 allocate() took 366.400781ms to make 1000 offers after filtering 38000 offers
> round 38 allocate() took 382.202755ms to make 1000 offers after filtering 39000 offers
> round 39 allocate() took 337.266121ms to make 1000 offers after filtering 40000 offers
> round 40 allocate() took 381.033696ms to make 1000 offers after filtering 41000 offers
> round 41 allocate() took 365.941946ms to make 1000 offers after filtering 42000 offers
> round 42 allocate() took 379.527886ms to make 1000 offers after filtering 43000 offers
> round 43 allocate() took 425.661181ms to make 1000 offers after filtering 44000 offers
> round 44 allocate() took 455.86657ms to make 1000 offers after filtering 45000 offers
> round 45 allocate() took 506.943117ms to make 1000 offers after filtering 46000 offers
> round 46 allocate() took 681.880233ms to make 1000 offers after filtering 47000 offers
> round 47 allocate() took 860.932974ms to make 1000 offers after filtering 48000 offers
> round 48 allocate() took 960.272209ms to make 1000 offers after filtering 49000 offers
> round 49 allocate() took 1.907427057secs to make 0 offers after filtering 50000 offers
> round 50 allocate() took 2.157864654secs to make 0 offers after filtering 50000 offers
> 
> 
> After:
> 
> Using 1000 agents and 1 frameworks
> Added 1 frameworks in 404709ns
> Added 1000 agents in 245.162138ms
> round 0 allocate() took 38.498883ms to make 0 offers after filtering 1000 offers
> round 1 allocate() took 42.234898ms to make 0 offers after filtering 1000 offers
> 
> Using 1000 agents and 50 frameworks
> Added 50 frameworks in 1.441281ms
> Added 1000 agents in 261.923711ms
> round 0 allocate() took 121.706074ms to make 1000 offers after filtering 1000 offers
> round 1 allocate() took 150.465925ms to make 1000 offers after filtering 2000 offers
> round 2 allocate() took 273.549893ms to make 1000 offers after filtering 3000 offers
> round 3 allocate() took 199.248799ms to make 1000 offers after filtering 4000 offers
> round 4 allocate() took 133.865752ms to make 1000 offers after filtering 5000 offers
> round 5 allocate() took 130.828599ms to make 1000 offers after filtering 6000 offers
> round 6 allocate() took 134.385597ms to make 1000 offers after filtering 7000 offers
> round 7 allocate() took 135.810198ms to make 1000 offers after filtering 8000 offers
> round 8 allocate() took 126.128592ms to make 1000 offers after filtering 9000 offers
> round 9 allocate() took 144.794829ms to make 1000 offers after filtering 10000 offers
> round 10 allocate() took 162.533506ms to make 1000 offers after filtering 11000 offers
> round 11 allocate() took 159.22141ms to make 1000 offers after filtering 12000 offers
> round 12 allocate() took 174.739823ms to make 1000 offers after filtering 13000 offers
> round 13 allocate() took 171.095423ms to make 1000 offers after filtering 14000 offers
> round 14 allocate() took 186.876661ms to make 1000 offers after filtering 15000 offers
> round 15 allocate() took 177.021603ms to make 1000 offers after filtering 16000 offers
> round 16 allocate() took 165.970722ms to make 1000 offers after filtering 17000 offers
> round 17 allocate() took 162.016338ms to make 1000 offers after filtering 18000 offers
> round 18 allocate() took 138.698917ms to make 1000 offers after filtering 19000 offers
> round 19 allocate() took 144.556913ms to make 1000 offers after filtering 20000 offers
> round 20 allocate() took 155.689926ms to make 1000 offers after filtering 21000 offers
> round 21 allocate() took 149.952025ms to make 1000 offers after filtering 22000 offers
> round 22 allocate() took 135.98823ms to make 1000 offers after filtering 23000 offers
> round 23 allocate() took 132.520992ms to make 1000 offers after filtering 24000 offers
> round 24 allocate() took 143.325635ms to make 1000 offers after filtering 25000 offers
> round 25 allocate() took 153.313423ms to make 1000 offers after filtering 26000 offers
> round 26 allocate() took 169.889066ms to make 1000 offers after filtering 27000 offers
> round 27 allocate() took 188.969694ms to make 1000 offers after filtering 28000 offers
> round 28 allocate() took 176.132259ms to make 1000 offers after filtering 29000 offers
> round 29 allocate() took 186.754676ms to make 1000 offers after filtering 30000 offers
> round 30 allocate() took 166.346508ms to make 1000 offers after filtering 31000 offers
> round 31 allocate() took 172.557665ms to make 1000 offers after filtering 32000 offers
> round 32 allocate() took 169.874406ms to make 1000 offers after filtering 33000 offers
> round 33 allocate() took 190.470692ms to make 1000 offers after filtering 34000 offers
> round 34 allocate() took 184.328221ms to make 1000 offers after filtering 35000 offers
> round 35 allocate() took 222.081892ms to make 1000 offers after filtering 36000 offers
> round 36 allocate() took 203.134216ms to make 1000 offers after filtering 37000 offers
> round 37 allocate() took 217.490016ms to make 1000 offers after filtering 38000 offers
> round 38 allocate() took 257.449904ms to make 1000 offers after filtering 39000 offers
> round 39 allocate() took 252.468529ms to make 1000 offers after filtering 40000 offers
> round 40 allocate() took 229.433398ms to make 1000 offers after filtering 41000 offers
> round 41 allocate() took 251.920859ms to make 1000 offers after filtering 42000 offers
> round 42 allocate() took 273.529747ms to make 1000 offers after filtering 43000 offers
> round 43 allocate() took 315.08445ms to make 1000 offers after filtering 44000 offers
> round 44 allocate() took 354.758003ms to make 1000 offers after filtering 45000 offers
> round 45 allocate() took 350.378463ms to make 1000 offers after filtering 46000 offers
> round 46 allocate() took 415.070355ms to make 1000 offers after filtering 47000 offers
> round 47 allocate() took 519.922944ms to make 1000 offers after filtering 48000 offers
> round 48 allocate() took 710.598546ms to make 1000 offers after filtering 49000 offers
> round 49 allocate() took 1.267395251secs to make 0 offers after filtering 50000 offers
> round 50 allocate() took 1.210188235secs to make 0 offers after filtering 50000 offers
> 
> 
> Thanks,
> 
> Meng Zhu
> 
>

Re: Review Request 68490: Optimized `class Resources` with copy-on-write.

Posted by Benjamin Mahler <bm...@apache.org>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68490/#review208286
-----------------------------------------------------------



Thanks! Can you also include some commentary on the rest of the allocation benchmarks as well as the overhead within the Resources micro-benchmarks?


include/mesos/resources.hpp
Lines 590-591 (patched)
<https://reviews.apache.org/r/68490/#comment292011>

    ```
      // We use `boost::indirect_iterator` to expose `Resource_` iteration,
      // while actually storing `shared_ptr<Resource_>`.
    ```



include/mesos/resources.hpp
Lines 592 (patched)
<https://reviews.apache.org/r/68490/#comment292010>

    Remove the newline?



include/mesos/resources.hpp
Lines 588-590 (original), 597-599 (patched)
<https://reviews.apache.org/r/68490/#comment292009>

    This comment now looks a little weird, how about:
    
    ```
      // NOTE: Non-const `begin()` and `end()` intentionally return const
      // iterators to prevent mutable access to the `Resource` objects.
    ```



include/mesos/resources.hpp
Lines 678-679 (patched)
<https://reviews.apache.org/r/68490/#comment292012>

    What's this referring to? Reading this I see it already has an rvalue reference?



include/mesos/resources.hpp
Lines 682 (patched)
<https://reviews.apache.org/r/68490/#comment292017>

    Why do we need this? It seems like the callers should just be de-referencing their shared_ptr prior to calling?



include/mesos/resources.hpp
Line 676 (original), 691-693 (patched)
<https://reviews.apache.org/r/68490/#comment292013>

    A comment would be helpful for the reader:
    
    ```
    // Resources are stored using copy-on-write:
    //
    //   (1) Copies are done by copying the `shared_ptr`. This
    //       makes read-only filtering (e.g. `unreserved()`)
    //       inexpensive as we do not have to perform copies
    //       of the resource objects.
    //
    //   (2) When a write occurs:
    //      (a) If there's a single reference to the resource
    //          object, we mutate directly.
    //      (b) If there's more than a single reference to the
    //          resource object, we copy first, then mutate the
    //          copy.
    ```



src/common/resources.cpp
Line 1471 (original), 1473 (patched)
<https://reviews.apache.org/r/68490/#comment292020>

    I'm a little puzzled about these `const shared_ptr<...>` foreach loops. Why don't the const ones do `const Resource_&` loops?
    
    ```
    foreach (const Resource_& resource_, that) {
    ```
    
    Something I'm missing?



src/common/resources.cpp
Line 1503 (original), 1505 (patched)
<https://reviews.apache.org/r/68490/#comment292014>

    It seems ok to slip this in here, but generally please send unrelated cleanups as separate patches. :)



src/common/resources.cpp
Line 1514 (original), 1516-1518 (patched)
<https://reviews.apache.org/r/68490/#comment292016>

    I'm not sure how obvious these will be to readers, maybe a comment on all of these?
    
    ```
    // Copy-on-write (if more than 1 reference).
    if (resource_.use_count() > 1) {
      resource_ = make_shared<Resource_>(*resource_);
    }
    ```
    
    Maybe something like (since we can't use `mutable`):
    
    ```
    make_mutable(resource_);
    ```
    
    Maybe also make it composable?
    
    ```
    make_mutable(resource_)->resource.mutable_allocation_info()->set_role(role);
    ```
    
    Still, seems rather error prone and easy to forget to check for copies prior to mutating? Wonder if there's a way to make it less brittle.



src/common/resources.cpp
Lines 1543-1544 (patched)
<https://reviews.apache.org/r/68490/#comment292015>

    Interesting suggestion!



src/common/resources.cpp
Lines 1667-1670 (patched)
<https://reviews.apache.org/r/68490/#comment292018>

    Seems odd, why don't we use `Resource` directly here?
    
    ```
          Resource r = *resource_;
          r.clear_reservations();
    
          result.add(std::move(r));
    ```
    
    (I left a comment above about removing the add that takes a shared_ptr, since it doesn't seem needed?)



src/common/resources.cpp
Line 1676 (original), 1692 (patched)
<https://reviews.apache.org/r/68490/#comment292019>

    Feel free to pull some of these unrelated improvements out in front of this, it would help make the diff easier for reviewers.



src/common/resources.cpp
Lines 1924-1925 (original), 1940-1941 (patched)
<https://reviews.apache.org/r/68490/#comment292021>

    Ditto here and elsewhere for the question above about const shared pointer loops vs `const Resource_&` loops



src/common/resources.cpp
Line 1954 (original), 1972-1979 (patched)
<https://reviews.apache.org/r/68490/#comment292022>

    Why is this needed? Isn't the point of this change that toUnreserved avoids copying now?



src/common/resources.cpp
Lines 1958-1963 (original), 1983-1993 (patched)
<https://reviews.apache.org/r/68490/#comment292023>

    Hm.. is there a way to simplify this? This seems to be just adding in the reserved resources? Could we call pushReservation or something instead of touching the resources field directly?
    
    Right now `find()` is not performance critical, so it doesn't need to be optimal if it's simpler not to be.


- Benjamin Mahler


On Aug. 23, 2018, 10:13 p.m., Meng Zhu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68490/
> -----------------------------------------------------------
> 
> (Updated Aug. 23, 2018, 10:13 p.m.)
> 
> 
> Review request for mesos and Benjamin Mahler.
> 
> 
> Bugs: MESOS-6765
>     https://issues.apache.org/jira/browse/MESOS-6765
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This patch lets `class Resources` only store shared
> pointers to the underlying resource objects, so that
> read-only filter operations such as `reserved()`,
> `unreserved()` and etc. can avoid making copies of
> the whole resource objects. Instead, only shared pointers
> are copied.
> 
> In write operations, we check if there are more than one
> references to the resource object. If so, a copy is made
> for safe mutation without affecting owners.
> 
> To maintain the usage abstraction that `class Resources`
> still holds resource objects, we utilize
> `boost::indirect_iterator` iterator adapter to deference
> the shared pointers as we iterate.
> 
> 
> Diffs
> -----
> 
>   include/mesos/resources.hpp 6f81b14f8bc090a144eeae8f15639c370366166d 
>   include/mesos/v1/resources.hpp 09110530da16678abf6bf6b308906dd8ccc8180a 
>   src/common/resources.cpp 0110c0ee3e810ad1c29dfa5507b13ebd5d0222a2 
>   src/v1/resources.cpp 228a7327ffe7934d37b56ee67b8be9ae1e119ca8 
> 
> 
> Diff: https://reviews.apache.org/r/68490/diff/1/
> 
> 
> Testing
> -------
> 
> make check
> 
> Did a quick test on Mac with an optimized build, running benchmark `HierarchicalAllocator_BENCHMARK_Test.ResourceLabels`, here are the results of comparing "before" and "after". Note, DVFS is not fixed. And we only did a partial run to verify the validity of the patch, full evaluation coming soon.
> 
> **Overall, 33% performance improvement for the 1st steup (1000 agents and 1 frameworks) and 32% improvement for the first 50 iterarions of the 2nd setup (1000 agents and 50 frameworks).**
> 
> Before:
> 
> [ RUN      ] SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.ResourceLabels/0
> 
> Using 1000 agents and 1 frameworks
> Added 1 frameworks in 1.220022ms
> Added 1000 agents in 465.045382ms
> round 0 allocate() took 54.803773ms to make 0 offers after filtering 1000 offers
> round 1 allocate() took 66.690456ms to make 0 offers after filtering 1000 offers
> 
> Using 1000 agents and 50 frameworks
> Added 50 frameworks in 1.40774ms
> Added 1000 agents in 460.562488ms
> round 0 allocate() took 155.500548ms to make 1000 offers after filtering 1000 offers
> round 1 allocate() took 155.194986ms to make 1000 offers after filtering 2000 offers
> round 2 allocate() took 160.706712ms to make 1000 offers after filtering 3000 offers
> round 3 allocate() took 156.701883ms to make 1000 offers after filtering 4000 offers
> round 4 allocate() took 164.246987ms to make 1000 offers after filtering 5000 offers
> round 5 allocate() took 160.791809ms to make 1000 offers after filtering 6000 offers
> round 6 allocate() took 166.064652ms to make 1000 offers after filtering 7000 offers
> round 7 allocate() took 162.177213ms to make 1000 offers after filtering 8000 offers
> round 8 allocate() took 166.620163ms to make 1000 offers after filtering 9000 offers
> round 9 allocate() took 167.970045ms to make 1000 offers after filtering 10000 offers
> round 10 allocate() took 166.189463ms to make 1000 offers after filtering 11000 offers
> round 11 allocate() took 170.863462ms to make 1000 offers after filtering 12000 offers
> round 12 allocate() took 172.260473ms to make 1000 offers after filtering 13000 offers
> round 13 allocate() took 170.553911ms to make 1000 offers after filtering 14000 offers
> round 14 allocate() took 235.764593ms to make 1000 offers after filtering 15000 offers
> round 15 allocate() took 247.250433ms to make 1000 offers after filtering 16000 offers
> round 16 allocate() took 276.932824ms to make 1000 offers after filtering 17000 offers
> round 17 allocate() took 248.888469ms to make 1000 offers after filtering 18000 offers
> round 18 allocate() took 193.890556ms to make 1000 offers after filtering 19000 offers
> round 19 allocate() took 195.105346ms to make 1000 offers after filtering 20000 offers
> round 20 allocate() took 194.447428ms to make 1000 offers after filtering 21000 offers
> round 21 allocate() took 205.486287ms to make 1000 offers after filtering 22000 offers
> round 22 allocate() took 199.241922ms to make 1000 offers after filtering 23000 offers
> round 23 allocate() took 200.885488ms to make 1000 offers after filtering 24000 offers
> round 24 allocate() took 216.132361ms to make 1000 offers after filtering 25000 offers
> round 25 allocate() took 210.638273ms to make 1000 offers after filtering 26000 offers
> round 26 allocate() took 232.397778ms to make 1000 offers after filtering 27000 offers
> round 27 allocate() took 239.633708ms to make 1000 offers after filtering 28000 offers
> round 28 allocate() took 225.829677ms to make 1000 offers after filtering 29000 offers
> round 29 allocate() took 245.143272ms to make 1000 offers after filtering 30000 offers
> round 30 allocate() took 248.630295ms to make 1000 offers after filtering 31000 offers
> round 31 allocate() took 262.147804ms to make 1000 offers after filtering 32000 offers
> round 32 allocate() took 251.109969ms to make 1000 offers after filtering 33000 offers
> round 33 allocate() took 263.92273ms to make 1000 offers after filtering 34000 offers
> round 34 allocate() took 268.422275ms to make 1000 offers after filtering 35000 offers
> round 35 allocate() took 273.830163ms to make 1000 offers after filtering 36000 offers
> round 36 allocate() took 283.25481ms to make 1000 offers after filtering 37000 offers
> round 37 allocate() took 366.400781ms to make 1000 offers after filtering 38000 offers
> round 38 allocate() took 382.202755ms to make 1000 offers after filtering 39000 offers
> round 39 allocate() took 337.266121ms to make 1000 offers after filtering 40000 offers
> round 40 allocate() took 381.033696ms to make 1000 offers after filtering 41000 offers
> round 41 allocate() took 365.941946ms to make 1000 offers after filtering 42000 offers
> round 42 allocate() took 379.527886ms to make 1000 offers after filtering 43000 offers
> round 43 allocate() took 425.661181ms to make 1000 offers after filtering 44000 offers
> round 44 allocate() took 455.86657ms to make 1000 offers after filtering 45000 offers
> round 45 allocate() took 506.943117ms to make 1000 offers after filtering 46000 offers
> round 46 allocate() took 681.880233ms to make 1000 offers after filtering 47000 offers
> round 47 allocate() took 860.932974ms to make 1000 offers after filtering 48000 offers
> round 48 allocate() took 960.272209ms to make 1000 offers after filtering 49000 offers
> round 49 allocate() took 1.907427057secs to make 0 offers after filtering 50000 offers
> round 50 allocate() took 2.157864654secs to make 0 offers after filtering 50000 offers
> 
> 
> After:
> 
> Using 1000 agents and 1 frameworks
> Added 1 frameworks in 404709ns
> Added 1000 agents in 245.162138ms
> round 0 allocate() took 38.498883ms to make 0 offers after filtering 1000 offers
> round 1 allocate() took 42.234898ms to make 0 offers after filtering 1000 offers
> 
> Using 1000 agents and 50 frameworks
> Added 50 frameworks in 1.441281ms
> Added 1000 agents in 261.923711ms
> round 0 allocate() took 121.706074ms to make 1000 offers after filtering 1000 offers
> round 1 allocate() took 150.465925ms to make 1000 offers after filtering 2000 offers
> round 2 allocate() took 273.549893ms to make 1000 offers after filtering 3000 offers
> round 3 allocate() took 199.248799ms to make 1000 offers after filtering 4000 offers
> round 4 allocate() took 133.865752ms to make 1000 offers after filtering 5000 offers
> round 5 allocate() took 130.828599ms to make 1000 offers after filtering 6000 offers
> round 6 allocate() took 134.385597ms to make 1000 offers after filtering 7000 offers
> round 7 allocate() took 135.810198ms to make 1000 offers after filtering 8000 offers
> round 8 allocate() took 126.128592ms to make 1000 offers after filtering 9000 offers
> round 9 allocate() took 144.794829ms to make 1000 offers after filtering 10000 offers
> round 10 allocate() took 162.533506ms to make 1000 offers after filtering 11000 offers
> round 11 allocate() took 159.22141ms to make 1000 offers after filtering 12000 offers
> round 12 allocate() took 174.739823ms to make 1000 offers after filtering 13000 offers
> round 13 allocate() took 171.095423ms to make 1000 offers after filtering 14000 offers
> round 14 allocate() took 186.876661ms to make 1000 offers after filtering 15000 offers
> round 15 allocate() took 177.021603ms to make 1000 offers after filtering 16000 offers
> round 16 allocate() took 165.970722ms to make 1000 offers after filtering 17000 offers
> round 17 allocate() took 162.016338ms to make 1000 offers after filtering 18000 offers
> round 18 allocate() took 138.698917ms to make 1000 offers after filtering 19000 offers
> round 19 allocate() took 144.556913ms to make 1000 offers after filtering 20000 offers
> round 20 allocate() took 155.689926ms to make 1000 offers after filtering 21000 offers
> round 21 allocate() took 149.952025ms to make 1000 offers after filtering 22000 offers
> round 22 allocate() took 135.98823ms to make 1000 offers after filtering 23000 offers
> round 23 allocate() took 132.520992ms to make 1000 offers after filtering 24000 offers
> round 24 allocate() took 143.325635ms to make 1000 offers after filtering 25000 offers
> round 25 allocate() took 153.313423ms to make 1000 offers after filtering 26000 offers
> round 26 allocate() took 169.889066ms to make 1000 offers after filtering 27000 offers
> round 27 allocate() took 188.969694ms to make 1000 offers after filtering 28000 offers
> round 28 allocate() took 176.132259ms to make 1000 offers after filtering 29000 offers
> round 29 allocate() took 186.754676ms to make 1000 offers after filtering 30000 offers
> round 30 allocate() took 166.346508ms to make 1000 offers after filtering 31000 offers
> round 31 allocate() took 172.557665ms to make 1000 offers after filtering 32000 offers
> round 32 allocate() took 169.874406ms to make 1000 offers after filtering 33000 offers
> round 33 allocate() took 190.470692ms to make 1000 offers after filtering 34000 offers
> round 34 allocate() took 184.328221ms to make 1000 offers after filtering 35000 offers
> round 35 allocate() took 222.081892ms to make 1000 offers after filtering 36000 offers
> round 36 allocate() took 203.134216ms to make 1000 offers after filtering 37000 offers
> round 37 allocate() took 217.490016ms to make 1000 offers after filtering 38000 offers
> round 38 allocate() took 257.449904ms to make 1000 offers after filtering 39000 offers
> round 39 allocate() took 252.468529ms to make 1000 offers after filtering 40000 offers
> round 40 allocate() took 229.433398ms to make 1000 offers after filtering 41000 offers
> round 41 allocate() took 251.920859ms to make 1000 offers after filtering 42000 offers
> round 42 allocate() took 273.529747ms to make 1000 offers after filtering 43000 offers
> round 43 allocate() took 315.08445ms to make 1000 offers after filtering 44000 offers
> round 44 allocate() took 354.758003ms to make 1000 offers after filtering 45000 offers
> round 45 allocate() took 350.378463ms to make 1000 offers after filtering 46000 offers
> round 46 allocate() took 415.070355ms to make 1000 offers after filtering 47000 offers
> round 47 allocate() took 519.922944ms to make 1000 offers after filtering 48000 offers
> round 48 allocate() took 710.598546ms to make 1000 offers after filtering 49000 offers
> round 49 allocate() took 1.267395251secs to make 0 offers after filtering 50000 offers
> round 50 allocate() took 1.210188235secs to make 0 offers after filtering 50000 offers
> 
> 
> Thanks,
> 
> Meng Zhu
> 
>

Re: Review Request 68490: Optimized `class Resources` with copy-on-write.

Posted by Meng Zhu <mz...@mesosphere.io>.


> On Sept. 6, 2018, 3:27 a.m., Benjamin Bannier wrote:
> > include/mesos/resources.hpp
> > Line 588 (original), 590 (patched)
> > <https://reviews.apache.org/r/68490/diff/3/?file=2082742#file2082742line590>
> >
> >     `s/Resource/Resource_/`?

`Resource_` is not exposed outside, it is implicitly converted to `Resource`. Will comment on this.


> On Sept. 6, 2018, 3:27 a.m., Benjamin Bannier wrote:
> > src/v1/resources.cpp
> > Lines 2117-2118 (original), 2154-2178 (patched)
> > <https://reviews.apache.org/r/68490/diff/3/?file=2082745#file2082745line2155>
> >
> >     This function looks redundant and identical to `Resources::add(const Resource&)` to me. It also introduces another place where we leak implementation details into the interface.
> >     
> >     Suggest to remove and instead let callers perform the deref. We could always add something similar back once we want to support `move`'ing a `that`.
> 
> Benjamin Mahler wrote:
>     I think Meng forgot to publish a reply here, but you can see the reply on my similar comment above. If we don't have this and the caller has to de-ref, then we have to copy the resource object, whereas this interface allows us to copy the shared_ptr.
>     
>     Also note that this is not part of the public interface, it's private.

The difference lies in:

```
 // Cannot be combined with any existing Resource object.
  if (!found) {
    resources.push_back(that);
  }
```

I will copy my earlier reply to BenM:

Consider adding two non-mergeble Resources(e.g. 1 cpu + 1 mem), if we only have Resources::add(const Resource_& that), we will always need to make_shared(Resource_). If we directly pass shared_ptr, we can then just push_back. For any non-mergable Resource_, we will be able to save one make_shared.


One rule of thumb I figured is that, we should avoid make_shared when possible (because it makes a copy). This means if we already have a shared_ptr we should just use it and pass it around. This also explains why I use shared_ptr for foreach loops.


- Meng


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68490/#review208402
-----------------------------------------------------------


On Sept. 6, 2018, 2:56 p.m., Meng Zhu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68490/
> -----------------------------------------------------------
> 
> (Updated Sept. 6, 2018, 2:56 p.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Benjamin Mahler, and Gastón Kleiman.
> 
> 
> Bugs: MESOS-6765
>     https://issues.apache.org/jira/browse/MESOS-6765
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This patch lets `class Resources` only store shared
> pointers to the underlying resource objects, so that
> read-only filter operations such as `reserved()`,
> `unreserved()` and etc. can avoid making copies of
> the whole resource objects. Instead, only shared pointers
> are copied.
> 
> In write operations, we check if there are more than one
> references to the resource object. If so, a copy is made
> for safe mutation without affecting owners.
> 
> To maintain the usage abstraction that `class Resources`
> still holds resource objects, we utilize
> `boost::indirect_iterator` iterator adapter to deference
> the shared pointers as we iterate.
> 
> 
> Diffs
> -----
> 
>   include/mesos/resources.hpp 6f81b14f8bc090a144eeae8f15639c370366166d 
>   include/mesos/v1/resources.hpp 09110530da16678abf6bf6b308906dd8ccc8180a 
>   src/common/resources.cpp 3e63cdedb9261970dbeb9bb9f97eed65819f68a7 
>   src/v1/resources.cpp 3683a331e0859cd6f2ad061db6ba67112ecfcb0d 
> 
> 
> Diff: https://reviews.apache.org/r/68490/diff/4/
> 
> 
> Testing
> -------
> 
> make check
> 
> Benchmarked on a Ubuntu box using clang -O2 optimized build with an i5 processor, frequency fixed at 1.6GHz.
> 
> Resources benchmarks (Measure geo-mean of various operations):
> 
> Normalized Geomean latency
> 		
> 	        Master	Copy-on-write
> Arithmetic 	1	1.09
> Filter	        1	0.28
> Contains	1	0.27
> Parse	        1	1.01
> 
> Sample allocator benchmark (Measure first allocation latency):
> 
> HierarchicalAllocator_BENCHMARK_Test.DeclineOffers/1 (1k agents, 50 frameworks):
> 
> Master 1
> Copy-on-write 0.72
> 
> HierarchicalAllocator_BENCHMARK_Test.ResourceLabels/1 (1k agents, 50 frameworks):
> 
> Master 1
> Copy-on-write 0.72
> 
> 
> Thanks,
> 
> Meng Zhu
> 
>

Re: Review Request 68490: Optimized `class Resources` with copy-on-write.

Posted by Benjamin Mahler <bm...@apache.org>.


> On Sept. 6, 2018, 10:27 a.m., Benjamin Bannier wrote:
> > src/v1/resources.cpp
> > Lines 2117-2118 (original), 2154-2178 (patched)
> > <https://reviews.apache.org/r/68490/diff/3/?file=2082745#file2082745line2155>
> >
> >     This function looks redundant and identical to `Resources::add(const Resource&)` to me. It also introduces another place where we leak implementation details into the interface.
> >     
> >     Suggest to remove and instead let callers perform the deref. We could always add something similar back once we want to support `move`'ing a `that`.

I think Meng forgot to publish a reply here, but you can see the reply on my similar comment above. If we don't have this and the caller has to de-ref, then we have to copy the resource object, whereas this interface allows us to copy the shared_ptr.

Also note that this is not part of the public interface, it's private.


- Benjamin


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68490/#review208402
-----------------------------------------------------------


On Sept. 6, 2018, 9:56 p.m., Meng Zhu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68490/
> -----------------------------------------------------------
> 
> (Updated Sept. 6, 2018, 9:56 p.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Benjamin Mahler, and Gastón Kleiman.
> 
> 
> Bugs: MESOS-6765
>     https://issues.apache.org/jira/browse/MESOS-6765
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This patch lets `class Resources` only store shared
> pointers to the underlying resource objects, so that
> read-only filter operations such as `reserved()`,
> `unreserved()` and etc. can avoid making copies of
> the whole resource objects. Instead, only shared pointers
> are copied.
> 
> In write operations, we check if there are more than one
> references to the resource object. If so, a copy is made
> for safe mutation without affecting owners.
> 
> To maintain the usage abstraction that `class Resources`
> still holds resource objects, we utilize
> `boost::indirect_iterator` iterator adapter to deference
> the shared pointers as we iterate.
> 
> 
> Diffs
> -----
> 
>   include/mesos/resources.hpp 6f81b14f8bc090a144eeae8f15639c370366166d 
>   include/mesos/v1/resources.hpp 09110530da16678abf6bf6b308906dd8ccc8180a 
>   src/common/resources.cpp 3e63cdedb9261970dbeb9bb9f97eed65819f68a7 
>   src/v1/resources.cpp 3683a331e0859cd6f2ad061db6ba67112ecfcb0d 
> 
> 
> Diff: https://reviews.apache.org/r/68490/diff/4/
> 
> 
> Testing
> -------
> 
> make check
> 
> Benchmarked on a Ubuntu box using clang -O2 optimized build with an i5 processor, frequency fixed at 1.6GHz.
> 
> Resources benchmarks (Measure geo-mean of various operations):
> 
> Normalized Geomean latency
> 		
> 	        Master	Copy-on-write
> Arithmetic 	1	1.09
> Filter	        1	0.28
> Contains	1	0.27
> Parse	        1	1.01
> 
> Sample allocator benchmark (Measure first allocation latency):
> 
> HierarchicalAllocator_BENCHMARK_Test.DeclineOffers/1 (1k agents, 50 frameworks):
> 
> Master 1
> Copy-on-write 0.72
> 
> HierarchicalAllocator_BENCHMARK_Test.ResourceLabels/1 (1k agents, 50 frameworks):
> 
> Master 1
> Copy-on-write 0.72
> 
> 
> Thanks,
> 
> Meng Zhu
> 
>

Re: Review Request 68490: Optimized `class Resources` with copy-on-write.

Posted by Benjamin Bannier <be...@mesosphere.io>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68490/#review208402
-----------------------------------------------------------




include/mesos/resources.hpp
Line 588 (original), 590 (patched)
<https://reviews.apache.org/r/68490/#comment292306>

    `s/Resource/Resource_/`?



include/mesos/resources.hpp
Lines 596-597 (patched)
<https://reviews.apache.org/r/68490/#comment292307>

    I'd leave this above above the iterator types as documentation.



include/mesos/resources.hpp
Lines 691-694 (patched)
<https://reviews.apache.org/r/68490/#comment292308>

    nit: While this is likely true, copying a shared pointer can be suprisingly non-inexpensive due to copying of its atomic ref count (likely protected with memory barries, and copies triggering cache invalidations across threads holding ref counts).
    
    Your benchmarks strongly suggest that we still benefit from this patch.



src/v1/resources.cpp
Lines 2117-2118 (original), 2154-2178 (patched)
<https://reviews.apache.org/r/68490/#comment292305>

    This function looks redundant and identical to `Resources::add(const Resource&)` to me. It also introduces another place where we leak implementation details into the interface.
    
    Suggest to remove and instead let callers perform the deref. We could always add something similar back once we want to support `move`'ing a `that`.


- Benjamin Bannier


On Sept. 5, 2018, 11:47 p.m., Meng Zhu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68490/
> -----------------------------------------------------------
> 
> (Updated Sept. 5, 2018, 11:47 p.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Benjamin Mahler, and Gastón Kleiman.
> 
> 
> Bugs: MESOS-6765
>     https://issues.apache.org/jira/browse/MESOS-6765
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This patch lets `class Resources` only store shared
> pointers to the underlying resource objects, so that
> read-only filter operations such as `reserved()`,
> `unreserved()` and etc. can avoid making copies of
> the whole resource objects. Instead, only shared pointers
> are copied.
> 
> In write operations, we check if there are more than one
> references to the resource object. If so, a copy is made
> for safe mutation without affecting owners.
> 
> To maintain the usage abstraction that `class Resources`
> still holds resource objects, we utilize
> `boost::indirect_iterator` iterator adapter to deference
> the shared pointers as we iterate.
> 
> 
> Diffs
> -----
> 
>   include/mesos/resources.hpp 6f81b14f8bc090a144eeae8f15639c370366166d 
>   include/mesos/v1/resources.hpp 09110530da16678abf6bf6b308906dd8ccc8180a 
>   src/common/resources.cpp 3e63cdedb9261970dbeb9bb9f97eed65819f68a7 
>   src/v1/resources.cpp 3683a331e0859cd6f2ad061db6ba67112ecfcb0d 
> 
> 
> Diff: https://reviews.apache.org/r/68490/diff/3/
> 
> 
> Testing
> -------
> 
> make check
> 
> Benchmarked on a Ubuntu box using clang -O2 optimized build with an i5 processor, frequency fixed at 1.6GHz.
> 
> Resources benchmarks (Measure geo-mean of various operations):
> 
> Normalized Geomean latency
> 		
> 	        Master	Copy-on-write
> Arithmetic 	1	1.09
> Filter	        1	0.28
> Contains	1	0.27
> Parse	        1	1.01
> 
> Sample allocator benchmark (Measure first allocation latency):
> 
> HierarchicalAllocator_BENCHMARK_Test.DeclineOffers/1 (1k agents, 50 frameworks):
> 
> Master 1
> Copy-on-write 0.72
> 
> HierarchicalAllocator_BENCHMARK_Test.ResourceLabels/1 (1k agents, 50 frameworks):
> 
> Master 1
> Copy-on-write 0.72
> 
> 
> Thanks,
> 
> Meng Zhu
> 
>

Re: Review Request 68490: Optimized `class Resources` with copy-on-write.

Posted by Benjamin Mahler <bm...@apache.org>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68490/#review208810
-----------------------------------------------------------


Fix it, then Ship it!





src/v1/resources.cpp
Lines 1657-1675 (original), 1657-1677 (patched)
<https://reviews.apache.org/r/68490/#comment293013>

    Why copy the shared pointers in these loops? Ditto above.


- Benjamin Mahler


On Sept. 14, 2018, 10:11 p.m., Meng Zhu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68490/
> -----------------------------------------------------------
> 
> (Updated Sept. 14, 2018, 10:11 p.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Benjamin Mahler, and Gastón Kleiman.
> 
> 
> Bugs: MESOS-6765
>     https://issues.apache.org/jira/browse/MESOS-6765
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This patch lets `class Resources` only store shared
> pointers to the underlying resource objects, so that
> read-only filter operations such as `reserved()`,
> `unreserved()` and etc. can avoid making copies of
> the whole resource objects. Instead, only shared pointers
> are copied.
> 
> In write operations, we check if there are more than one
> references to the resource object. If so, a copy is made
> for safe mutation without affecting owners.
> 
> To maintain the usage abstraction that `class Resources`
> still holds resource objects, we utilize
> `boost::indirect_iterator` iterator adapter to deference
> the shared pointers as we iterate.
> 
> 
> Diffs
> -----
> 
>   include/mesos/resources.hpp 6f81b14f8bc090a144eeae8f15639c370366166d 
>   include/mesos/v1/resources.hpp 09110530da16678abf6bf6b308906dd8ccc8180a 
>   src/common/resources.cpp 3e63cdedb9261970dbeb9bb9f97eed65819f68a7 
>   src/v1/resources.cpp 3683a331e0859cd6f2ad061db6ba67112ecfcb0d 
> 
> 
> Diff: https://reviews.apache.org/r/68490/diff/5/
> 
> 
> Testing
> -------
> 
> make check
> 
> Benchmarked on a Ubuntu box using clang -O2 optimized build with an i5 processor, frequency fixed at 1.6GHz.
> 
> ## Resources benchmarks (Measure geo-mean of various operations):
> 
> Normalized Geomean latency
> 		
> 	        Master	Copy-on-write
> Arithmetic 	1	1.09
> Filter	        1	0.28
> Contains	1	0.27
> Parse	        1	1.01
> 
> ## Sample allocator benchmark (Measure first allocation latency):
> 
> HierarchicalAllocator_BENCHMARK_Test.DeclineOffers/1 (1k agents, 50 frameworks):
> 
> Master 1
> Copy-on-write 0.72
> 
> HierarchicalAllocator_BENCHMARK_Test.ResourceLabels/1 (1k agents, 50 frameworks):
> 
> Master 1
> Copy-on-write 0.72
> 
> ## Sorter benchmark
> 
> ### BENCHMARK_FullSort
> 
> Master:
> 
> Using 30000 agents and 500 clients
> Added 500 clients in 2.261373ms
> Added 30000 agents in 322.252441ms
> Added allocations for 30000 agents in 823.631826ms
> Full sort of 500 clients took 8662ns
> No-op sort of 500 clients took 809ns
> Removed allocations for 30000 agents in 8.482659084secs
> Removed 30000 agents in 417.988004ms
> Removed 500 clients in 17.855575ms
> 
> 
> Copy-on-write:
> 
> Using 30000 agents and 500 clients (**normalized to the master above**)
> Added 500 clients in 2.464647ms (**1.09**)
> Added 30000 agents in 205.071423ms (**0.64**)
> Added allocations for 30000 agents in 156.340883ms (**0.19**)
> Full sort of 500 clients took 1007ns (**0.12**)
> No-op sort of 500 clients took 204ns (**0.02**)
> Removed allocations for 30000 agents in 7.215175016secs (**0.85**)
> Removed 30000 agents in 304.998116ms (**0.73**)
> Removed 500 clients in 3.195079ms (**0.18**)
> 
> 
> ### BENCHMARK_HierarchyFullSort
> 
> Master:
> 
> Added 1056 clients in 36.571505ms
> Added 10000 agents in 114.165111ms
> Added allocations for 10000 agents in 569.144686ms
> Full sort of 1056 clients took 43485ns
> No-op sort of 1056 clients took 676ns
> Removed allocations for 10000 agents in 5.656517904secs
> Removed 10000 agents in 153.016733ms
> Removed 1056 clients in 7.169549ms
> 
> Copy-on-write:
> 
> Added 1056 clients in 36.468822ms (**0.997**)
> Added 10000 agents in 67.733365ms (**0.59**)
> Added allocations for 10000 agents in 99.695723ms (**0.175**)
> Full sort of 1056 clients took 29729ns (**0.68**)
> No-op sort of 1056 clients took 704ns (**1.04**)
> Removed allocations for 10000 agents in 4.824389788secs (**0.85**)
> Removed 10000 agents in 101.319295ms (**0.66**)
> Removed 1056 clients in 2.480967ms (**0.35**)
> 
> 
> Thanks,
> 
> Meng Zhu
> 
>

Re: Review Request 68490: Optimized `class Resources` with copy-on-write.

Posted by Meng Zhu <mz...@mesosphere.io>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68490/
-----------------------------------------------------------

(Updated Sept. 14, 2018, 3:11 p.m.)


Review request for mesos, Benjamin Bannier, Benjamin Mahler, and Gastón Kleiman.


Bugs: MESOS-6765
    https://issues.apache.org/jira/browse/MESOS-6765


Repository: mesos


Description
-------

This patch lets `class Resources` only store shared
pointers to the underlying resource objects, so that
read-only filter operations such as `reserved()`,
`unreserved()` and etc. can avoid making copies of
the whole resource objects. Instead, only shared pointers
are copied.

In write operations, we check if there are more than one
references to the resource object. If so, a copy is made
for safe mutation without affecting owners.

To maintain the usage abstraction that `class Resources`
still holds resource objects, we utilize
`boost::indirect_iterator` iterator adapter to deference
the shared pointers as we iterate.


Diffs (updated)
-----

  include/mesos/resources.hpp 6f81b14f8bc090a144eeae8f15639c370366166d 
  include/mesos/v1/resources.hpp 09110530da16678abf6bf6b308906dd8ccc8180a 
  src/common/resources.cpp 3e63cdedb9261970dbeb9bb9f97eed65819f68a7 
  src/v1/resources.cpp 3683a331e0859cd6f2ad061db6ba67112ecfcb0d 


Diff: https://reviews.apache.org/r/68490/diff/5/

Changes: https://reviews.apache.org/r/68490/diff/4-5/


Testing
-------

make check

Benchmarked on a Ubuntu box using clang -O2 optimized build with an i5 processor, frequency fixed at 1.6GHz.

## Resources benchmarks (Measure geo-mean of various operations):

Normalized Geomean latency
		
	        Master	Copy-on-write
Arithmetic 	1	1.09
Filter	        1	0.28
Contains	1	0.27
Parse	        1	1.01

## Sample allocator benchmark (Measure first allocation latency):

HierarchicalAllocator_BENCHMARK_Test.DeclineOffers/1 (1k agents, 50 frameworks):

Master 1
Copy-on-write 0.72

HierarchicalAllocator_BENCHMARK_Test.ResourceLabels/1 (1k agents, 50 frameworks):

Master 1
Copy-on-write 0.72

## Sorter benchmark

### BENCHMARK_FullSort

Master:

Using 30000 agents and 500 clients
Added 500 clients in 2.261373ms
Added 30000 agents in 322.252441ms
Added allocations for 30000 agents in 823.631826ms
Full sort of 500 clients took 8662ns
No-op sort of 500 clients took 809ns
Removed allocations for 30000 agents in 8.482659084secs
Removed 30000 agents in 417.988004ms
Removed 500 clients in 17.855575ms


Copy-on-write:

Using 30000 agents and 500 clients (**normalized to the master above**)
Added 500 clients in 2.464647ms (**1.09**)
Added 30000 agents in 205.071423ms (**0.64**)
Added allocations for 30000 agents in 156.340883ms (**0.19**)
Full sort of 500 clients took 1007ns (**0.12**)
No-op sort of 500 clients took 204ns (**0.02**)
Removed allocations for 30000 agents in 7.215175016secs (**0.85**)
Removed 30000 agents in 304.998116ms (**0.73**)
Removed 500 clients in 3.195079ms (**0.18**)


### BENCHMARK_HierarchyFullSort

Master:

Added 1056 clients in 36.571505ms
Added 10000 agents in 114.165111ms
Added allocations for 10000 agents in 569.144686ms
Full sort of 1056 clients took 43485ns
No-op sort of 1056 clients took 676ns
Removed allocations for 10000 agents in 5.656517904secs
Removed 10000 agents in 153.016733ms
Removed 1056 clients in 7.169549ms

Copy-on-write:

Added 1056 clients in 36.468822ms (**0.997**)
Added 10000 agents in 67.733365ms (**0.59**)
Added allocations for 10000 agents in 99.695723ms (**0.175**)
Full sort of 1056 clients took 29729ns (**0.68**)
No-op sort of 1056 clients took 704ns (**1.04**)
Removed allocations for 10000 agents in 4.824389788secs (**0.85**)
Removed 10000 agents in 101.319295ms (**0.66**)
Removed 1056 clients in 2.480967ms (**0.35**)


Thanks,

Meng Zhu

Re: Review Request 68490: Optimized `class Resources` with copy-on-write.

Posted by Meng Zhu <mz...@mesosphere.io>.


> On Sept. 11, 2018, 7:25 p.m., Benjamin Mahler wrote:
> > src/common/resources.cpp
> > Line 1641 (original), 1655 (patched)
> > <https://reviews.apache.org/r/68490/diff/4/?file=2086304#file2086304line1655>
> >
> >     Actually, I was thinking about this, isn't the casting to `Resource` broken anywhere we use it? We lose the shared count?

Good catch! Fixed. Also scanned other places and added TODO to fix in a separate patch.


- Meng


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68490/#review208538
-----------------------------------------------------------


On Sept. 14, 2018, 3:11 p.m., Meng Zhu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68490/
> -----------------------------------------------------------
> 
> (Updated Sept. 14, 2018, 3:11 p.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Benjamin Mahler, and Gastón Kleiman.
> 
> 
> Bugs: MESOS-6765
>     https://issues.apache.org/jira/browse/MESOS-6765
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This patch lets `class Resources` only store shared
> pointers to the underlying resource objects, so that
> read-only filter operations such as `reserved()`,
> `unreserved()` and etc. can avoid making copies of
> the whole resource objects. Instead, only shared pointers
> are copied.
> 
> In write operations, we check if there are more than one
> references to the resource object. If so, a copy is made
> for safe mutation without affecting owners.
> 
> To maintain the usage abstraction that `class Resources`
> still holds resource objects, we utilize
> `boost::indirect_iterator` iterator adapter to deference
> the shared pointers as we iterate.
> 
> 
> Diffs
> -----
> 
>   include/mesos/resources.hpp 6f81b14f8bc090a144eeae8f15639c370366166d 
>   include/mesos/v1/resources.hpp 09110530da16678abf6bf6b308906dd8ccc8180a 
>   src/common/resources.cpp 3e63cdedb9261970dbeb9bb9f97eed65819f68a7 
>   src/v1/resources.cpp 3683a331e0859cd6f2ad061db6ba67112ecfcb0d 
> 
> 
> Diff: https://reviews.apache.org/r/68490/diff/5/
> 
> 
> Testing
> -------
> 
> make check
> 
> Benchmarked on a Ubuntu box using clang -O2 optimized build with an i5 processor, frequency fixed at 1.6GHz.
> 
> ## Resources benchmarks (Measure geo-mean of various operations):
> 
> Normalized Geomean latency
> 		
> 	        Master	Copy-on-write
> Arithmetic 	1	1.09
> Filter	        1	0.28
> Contains	1	0.27
> Parse	        1	1.01
> 
> ## Sample allocator benchmark (Measure first allocation latency):
> 
> HierarchicalAllocator_BENCHMARK_Test.DeclineOffers/1 (1k agents, 50 frameworks):
> 
> Master 1
> Copy-on-write 0.72
> 
> HierarchicalAllocator_BENCHMARK_Test.ResourceLabels/1 (1k agents, 50 frameworks):
> 
> Master 1
> Copy-on-write 0.72
> 
> ## Sorter benchmark
> 
> ### BENCHMARK_FullSort
> 
> Master:
> 
> Using 30000 agents and 500 clients
> Added 500 clients in 2.261373ms
> Added 30000 agents in 322.252441ms
> Added allocations for 30000 agents in 823.631826ms
> Full sort of 500 clients took 8662ns
> No-op sort of 500 clients took 809ns
> Removed allocations for 30000 agents in 8.482659084secs
> Removed 30000 agents in 417.988004ms
> Removed 500 clients in 17.855575ms
> 
> 
> Copy-on-write:
> 
> Using 30000 agents and 500 clients (**normalized to the master above**)
> Added 500 clients in 2.464647ms (**1.09**)
> Added 30000 agents in 205.071423ms (**0.64**)
> Added allocations for 30000 agents in 156.340883ms (**0.19**)
> Full sort of 500 clients took 1007ns (**0.12**)
> No-op sort of 500 clients took 204ns (**0.02**)
> Removed allocations for 30000 agents in 7.215175016secs (**0.85**)
> Removed 30000 agents in 304.998116ms (**0.73**)
> Removed 500 clients in 3.195079ms (**0.18**)
> 
> 
> ### BENCHMARK_HierarchyFullSort
> 
> Master:
> 
> Added 1056 clients in 36.571505ms
> Added 10000 agents in 114.165111ms
> Added allocations for 10000 agents in 569.144686ms
> Full sort of 1056 clients took 43485ns
> No-op sort of 1056 clients took 676ns
> Removed allocations for 10000 agents in 5.656517904secs
> Removed 10000 agents in 153.016733ms
> Removed 1056 clients in 7.169549ms
> 
> Copy-on-write:
> 
> Added 1056 clients in 36.468822ms (**0.997**)
> Added 10000 agents in 67.733365ms (**0.59**)
> Added allocations for 10000 agents in 99.695723ms (**0.175**)
> Full sort of 1056 clients took 29729ns (**0.68**)
> No-op sort of 1056 clients took 704ns (**1.04**)
> Removed allocations for 10000 agents in 4.824389788secs (**0.85**)
> Removed 10000 agents in 101.319295ms (**0.66**)
> Removed 1056 clients in 2.480967ms (**0.35**)
> 
> 
> Thanks,
> 
> Meng Zhu
> 
>

Re: Review Request 68490: Optimized `class Resources` with copy-on-write.

Posted by Benjamin Mahler <bm...@apache.org>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68490/#review208538
-----------------------------------------------------------



Almost there! I'm wondering about the casting to `Resource` in loops, seems broken since we lose the shared count? :O


src/common/resources.cpp
Line 1641 (original), 1655 (patched)
<https://reviews.apache.org/r/68490/#comment292565>

    Actually, I was thinking about this, isn't the casting to `Resource` broken anywhere we use it? We lose the shared count?



src/common/resources.cpp
Lines 1671-1674 (patched)
<https://reviews.apache.org/r/68490/#comment292563>

    The std::move in add doesn't do anything right now since we don't have an ravlue overload for shared_ptr?
    
    Wouldn't this be simpler?
    
    ```
        if (isReserved(resource_->resource)) {
          Resource_ r_ = *resource_;
          r_->resource.clear_reservations();
          result.add(std::move(r_));
        } else {
    ```



src/common/resources.cpp
Line 1969 (original), 1992 (patched)
<https://reviews.apache.org/r/68490/#comment292562>

    Ditto here.



src/v1/resources.cpp
Lines 1687-1690 (patched)
<https://reviews.apache.org/r/68490/#comment292564>

    Ditto here



src/v1/resources.cpp
Line 1976 (original), 1997 (patched)
<https://reviews.apache.org/r/68490/#comment292561>

    We don't have an rvalue overload for -=, can we just leave this as is for now by taking out the std::move?


- Benjamin Mahler


On Sept. 7, 2018, 3:29 a.m., Meng Zhu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68490/
> -----------------------------------------------------------
> 
> (Updated Sept. 7, 2018, 3:29 a.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, Benjamin Mahler, and Gastón Kleiman.
> 
> 
> Bugs: MESOS-6765
>     https://issues.apache.org/jira/browse/MESOS-6765
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This patch lets `class Resources` only store shared
> pointers to the underlying resource objects, so that
> read-only filter operations such as `reserved()`,
> `unreserved()` and etc. can avoid making copies of
> the whole resource objects. Instead, only shared pointers
> are copied.
> 
> In write operations, we check if there are more than one
> references to the resource object. If so, a copy is made
> for safe mutation without affecting owners.
> 
> To maintain the usage abstraction that `class Resources`
> still holds resource objects, we utilize
> `boost::indirect_iterator` iterator adapter to deference
> the shared pointers as we iterate.
> 
> 
> Diffs
> -----
> 
>   include/mesos/resources.hpp 6f81b14f8bc090a144eeae8f15639c370366166d 
>   include/mesos/v1/resources.hpp 09110530da16678abf6bf6b308906dd8ccc8180a 
>   src/common/resources.cpp 3e63cdedb9261970dbeb9bb9f97eed65819f68a7 
>   src/v1/resources.cpp 3683a331e0859cd6f2ad061db6ba67112ecfcb0d 
> 
> 
> Diff: https://reviews.apache.org/r/68490/diff/4/
> 
> 
> Testing
> -------
> 
> make check
> 
> Benchmarked on a Ubuntu box using clang -O2 optimized build with an i5 processor, frequency fixed at 1.6GHz.
> 
> ## Resources benchmarks (Measure geo-mean of various operations):
> 
> Normalized Geomean latency
> 		
> 	        Master	Copy-on-write
> Arithmetic 	1	1.09
> Filter	        1	0.28
> Contains	1	0.27
> Parse	        1	1.01
> 
> ## Sample allocator benchmark (Measure first allocation latency):
> 
> HierarchicalAllocator_BENCHMARK_Test.DeclineOffers/1 (1k agents, 50 frameworks):
> 
> Master 1
> Copy-on-write 0.72
> 
> HierarchicalAllocator_BENCHMARK_Test.ResourceLabels/1 (1k agents, 50 frameworks):
> 
> Master 1
> Copy-on-write 0.72
> 
> ## Sorter benchmark
> 
> ### BENCHMARK_FullSort
> 
> Master:
> 
> Using 30000 agents and 500 clients
> Added 500 clients in 2.261373ms
> Added 30000 agents in 322.252441ms
> Added allocations for 30000 agents in 823.631826ms
> Full sort of 500 clients took 8662ns
> No-op sort of 500 clients took 809ns
> Removed allocations for 30000 agents in 8.482659084secs
> Removed 30000 agents in 417.988004ms
> Removed 500 clients in 17.855575ms
> 
> 
> Copy-on-write:
> 
> Using 30000 agents and 500 clients (**normalized to the master above**)
> Added 500 clients in 2.464647ms (**1.09**)
> Added 30000 agents in 205.071423ms (**0.64**)
> Added allocations for 30000 agents in 156.340883ms (**0.19**)
> Full sort of 500 clients took 1007ns (**0.12**)
> No-op sort of 500 clients took 204ns (**0.02**)
> Removed allocations for 30000 agents in 7.215175016secs (**0.85**)
> Removed 30000 agents in 304.998116ms (**0.73**)
> Removed 500 clients in 3.195079ms (**0.18**)
> 
> 
> ### BENCHMARK_HierarchyFullSort
> 
> Master:
> 
> Added 1056 clients in 36.571505ms
> Added 10000 agents in 114.165111ms
> Added allocations for 10000 agents in 569.144686ms
> Full sort of 1056 clients took 43485ns
> No-op sort of 1056 clients took 676ns
> Removed allocations for 10000 agents in 5.656517904secs
> Removed 10000 agents in 153.016733ms
> Removed 1056 clients in 7.169549ms
> 
> Copy-on-write:
> 
> Added 1056 clients in 36.468822ms (**0.997**)
> Added 10000 agents in 67.733365ms (**0.59**)
> Added allocations for 10000 agents in 99.695723ms (**0.175**)
> Full sort of 1056 clients took 29729ns (**0.68**)
> No-op sort of 1056 clients took 704ns (**1.04**)
> Removed allocations for 10000 agents in 4.824389788secs (**0.85**)
> Removed 10000 agents in 101.319295ms (**0.66**)
> Removed 1056 clients in 2.480967ms (**0.35**)
> 
> 
> Thanks,
> 
> Meng Zhu
> 
>

Re: Review Request 68490: Optimized `class Resources` with copy-on-write.

Posted by Meng Zhu <mz...@mesosphere.io>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68490/
-----------------------------------------------------------

(Updated Sept. 6, 2018, 8:29 p.m.)


Review request for mesos, Benjamin Bannier, Benjamin Mahler, and Gastón Kleiman.


Changes
-------

Posted sorter benchmark results.


Bugs: MESOS-6765
    https://issues.apache.org/jira/browse/MESOS-6765


Repository: mesos


Description
-------

This patch lets `class Resources` only store shared
pointers to the underlying resource objects, so that
read-only filter operations such as `reserved()`,
`unreserved()` and etc. can avoid making copies of
the whole resource objects. Instead, only shared pointers
are copied.

In write operations, we check if there are more than one
references to the resource object. If so, a copy is made
for safe mutation without affecting owners.

To maintain the usage abstraction that `class Resources`
still holds resource objects, we utilize
`boost::indirect_iterator` iterator adapter to deference
the shared pointers as we iterate.


Diffs
-----

  include/mesos/resources.hpp 6f81b14f8bc090a144eeae8f15639c370366166d 
  include/mesos/v1/resources.hpp 09110530da16678abf6bf6b308906dd8ccc8180a 
  src/common/resources.cpp 3e63cdedb9261970dbeb9bb9f97eed65819f68a7 
  src/v1/resources.cpp 3683a331e0859cd6f2ad061db6ba67112ecfcb0d 


Diff: https://reviews.apache.org/r/68490/diff/4/


Testing (updated)
-------

make check

Benchmarked on a Ubuntu box using clang -O2 optimized build with an i5 processor, frequency fixed at 1.6GHz.

## Resources benchmarks (Measure geo-mean of various operations):

Normalized Geomean latency
		
	        Master	Copy-on-write
Arithmetic 	1	1.09
Filter	        1	0.28
Contains	1	0.27
Parse	        1	1.01

## Sample allocator benchmark (Measure first allocation latency):

HierarchicalAllocator_BENCHMARK_Test.DeclineOffers/1 (1k agents, 50 frameworks):

Master 1
Copy-on-write 0.72

HierarchicalAllocator_BENCHMARK_Test.ResourceLabels/1 (1k agents, 50 frameworks):

Master 1
Copy-on-write 0.72

## Sorter benchmark

### BENCHMARK_FullSort

Master:

Using 30000 agents and 500 clients
Added 500 clients in 2.261373ms
Added 30000 agents in 322.252441ms
Added allocations for 30000 agents in 823.631826ms
Full sort of 500 clients took 8662ns
No-op sort of 500 clients took 809ns
Removed allocations for 30000 agents in 8.482659084secs
Removed 30000 agents in 417.988004ms
Removed 500 clients in 17.855575ms


Copy-on-write:

Using 30000 agents and 500 clients (**normalized to the master above**)
Added 500 clients in 2.464647ms (**1.09**)
Added 30000 agents in 205.071423ms (**0.64**)
Added allocations for 30000 agents in 156.340883ms (**0.19**)
Full sort of 500 clients took 1007ns (**0.12**)
No-op sort of 500 clients took 204ns (**0.02**)
Removed allocations for 30000 agents in 7.215175016secs (**0.85**)
Removed 30000 agents in 304.998116ms (**0.73**)
Removed 500 clients in 3.195079ms (**0.18**)


### BENCHMARK_HierarchyFullSort

Master:

Added 1056 clients in 36.571505ms
Added 10000 agents in 114.165111ms
Added allocations for 10000 agents in 569.144686ms
Full sort of 1056 clients took 43485ns
No-op sort of 1056 clients took 676ns
Removed allocations for 10000 agents in 5.656517904secs
Removed 10000 agents in 153.016733ms
Removed 1056 clients in 7.169549ms

Copy-on-write:

Added 1056 clients in 36.468822ms (**0.997**)
Added 10000 agents in 67.733365ms (**0.59**)
Added allocations for 10000 agents in 99.695723ms (**0.175**)
Full sort of 1056 clients took 29729ns (**0.68**)
No-op sort of 1056 clients took 704ns (**1.04**)
Removed allocations for 10000 agents in 4.824389788secs (**0.85**)
Removed 10000 agents in 101.319295ms (**0.66**)
Removed 1056 clients in 2.480967ms (**0.35**)


Thanks,

Meng Zhu

Re: Review Request 68490: Optimized `class Resources` with copy-on-write.

Posted by Meng Zhu <mz...@mesosphere.io>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68490/
-----------------------------------------------------------

(Updated Sept. 6, 2018, 2:56 p.m.)


Review request for mesos, Benjamin Bannier, Benjamin Mahler, and Gastón Kleiman.


Changes
-------

Addressed Benjamin's comment.


Bugs: MESOS-6765
    https://issues.apache.org/jira/browse/MESOS-6765


Repository: mesos


Description
-------

This patch lets `class Resources` only store shared
pointers to the underlying resource objects, so that
read-only filter operations such as `reserved()`,
`unreserved()` and etc. can avoid making copies of
the whole resource objects. Instead, only shared pointers
are copied.

In write operations, we check if there are more than one
references to the resource object. If so, a copy is made
for safe mutation without affecting owners.

To maintain the usage abstraction that `class Resources`
still holds resource objects, we utilize
`boost::indirect_iterator` iterator adapter to deference
the shared pointers as we iterate.


Diffs (updated)
-----

  include/mesos/resources.hpp 6f81b14f8bc090a144eeae8f15639c370366166d 
  include/mesos/v1/resources.hpp 09110530da16678abf6bf6b308906dd8ccc8180a 
  src/common/resources.cpp 3e63cdedb9261970dbeb9bb9f97eed65819f68a7 
  src/v1/resources.cpp 3683a331e0859cd6f2ad061db6ba67112ecfcb0d 


Diff: https://reviews.apache.org/r/68490/diff/4/

Changes: https://reviews.apache.org/r/68490/diff/3-4/


Testing
-------

make check

Benchmarked on a Ubuntu box using clang -O2 optimized build with an i5 processor, frequency fixed at 1.6GHz.

Resources benchmarks (Measure geo-mean of various operations):

Normalized Geomean latency
		
	        Master	Copy-on-write
Arithmetic 	1	1.09
Filter	        1	0.28
Contains	1	0.27
Parse	        1	1.01

Sample allocator benchmark (Measure first allocation latency):

HierarchicalAllocator_BENCHMARK_Test.DeclineOffers/1 (1k agents, 50 frameworks):

Master 1
Copy-on-write 0.72

HierarchicalAllocator_BENCHMARK_Test.ResourceLabels/1 (1k agents, 50 frameworks):

Master 1
Copy-on-write 0.72


Thanks,

Meng Zhu

Re: Review Request 68490: Optimized `class Resources` with copy-on-write.

Posted by Meng Zhu <mz...@mesosphere.io>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68490/
-----------------------------------------------------------

(Updated Sept. 5, 2018, 2:47 p.m.)


Review request for mesos, Benjamin Bannier, Benjamin Mahler, and Gastón Kleiman.


Changes
-------

Added benchmark results.


Bugs: MESOS-6765
    https://issues.apache.org/jira/browse/MESOS-6765


Repository: mesos


Description
-------

This patch lets `class Resources` only store shared
pointers to the underlying resource objects, so that
read-only filter operations such as `reserved()`,
`unreserved()` and etc. can avoid making copies of
the whole resource objects. Instead, only shared pointers
are copied.

In write operations, we check if there are more than one
references to the resource object. If so, a copy is made
for safe mutation without affecting owners.

To maintain the usage abstraction that `class Resources`
still holds resource objects, we utilize
`boost::indirect_iterator` iterator adapter to deference
the shared pointers as we iterate.


Diffs
-----

  include/mesos/resources.hpp 6f81b14f8bc090a144eeae8f15639c370366166d 
  include/mesos/v1/resources.hpp 09110530da16678abf6bf6b308906dd8ccc8180a 
  src/common/resources.cpp 3e63cdedb9261970dbeb9bb9f97eed65819f68a7 
  src/v1/resources.cpp 3683a331e0859cd6f2ad061db6ba67112ecfcb0d 


Diff: https://reviews.apache.org/r/68490/diff/3/


Testing (updated)
-------

make check

Benchmarked on a Ubuntu box using clang -O2 optimized build with an i5 processor, frequency fixed at 1.6GHz.

Resources benchmarks (Measure geo-mean of various operations):

Normalized Geomean latency
		
	        Master	Copy-on-write
Arithmetic 	1	1.09
Filter	        1	0.28
Contains	1	0.27
Parse	        1	1.01

Sample allocator benchmark (Measure first allocation latency):

HierarchicalAllocator_BENCHMARK_Test.DeclineOffers/1 (1k agents, 50 frameworks):

Master 1
Copy-on-write 0.72

HierarchicalAllocator_BENCHMARK_Test.ResourceLabels/1 (1k agents, 50 frameworks):

Master 1
Copy-on-write 0.72


Thanks,

Meng Zhu

Re: Review Request 68490: Optimized `class Resources` with copy-on-write.

Posted by Meng Zhu <mz...@mesosphere.io>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68490/
-----------------------------------------------------------

(Updated Sept. 5, 2018, 2:30 p.m.)


Review request for mesos, Benjamin Bannier, Benjamin Mahler, and Gastón Kleiman.


Changes
-------

Made foreach loops inside `class Resources` iterate over `shared_ptr<Resource_>` consistently.


Bugs: MESOS-6765
    https://issues.apache.org/jira/browse/MESOS-6765


Repository: mesos


Description
-------

This patch lets `class Resources` only store shared
pointers to the underlying resource objects, so that
read-only filter operations such as `reserved()`,
`unreserved()` and etc. can avoid making copies of
the whole resource objects. Instead, only shared pointers
are copied.

In write operations, we check if there are more than one
references to the resource object. If so, a copy is made
for safe mutation without affecting owners.

To maintain the usage abstraction that `class Resources`
still holds resource objects, we utilize
`boost::indirect_iterator` iterator adapter to deference
the shared pointers as we iterate.


Diffs (updated)
-----

  include/mesos/resources.hpp 6f81b14f8bc090a144eeae8f15639c370366166d 
  include/mesos/v1/resources.hpp 09110530da16678abf6bf6b308906dd8ccc8180a 
  src/common/resources.cpp 3e63cdedb9261970dbeb9bb9f97eed65819f68a7 
  src/v1/resources.cpp 3683a331e0859cd6f2ad061db6ba67112ecfcb0d 


Diff: https://reviews.apache.org/r/68490/diff/3/

Changes: https://reviews.apache.org/r/68490/diff/2-3/


Testing
-------

make check

Did a quick test on Mac with an optimized build, running benchmark `HierarchicalAllocator_BENCHMARK_Test.ResourceLabels`, here are the results of comparing "before" and "after". Note, DVFS is not fixed. And we only did a partial run to verify the validity of the patch, full evaluation coming soon.

**Overall, 33% performance improvement for the 1st steup (1000 agents and 1 frameworks) and 32% improvement for the first 50 iterarions of the 2nd setup (1000 agents and 50 frameworks).**

Before:

[ RUN      ] SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.ResourceLabels/0

Using 1000 agents and 1 frameworks
Added 1 frameworks in 1.220022ms
Added 1000 agents in 465.045382ms
round 0 allocate() took 54.803773ms to make 0 offers after filtering 1000 offers
round 1 allocate() took 66.690456ms to make 0 offers after filtering 1000 offers

Using 1000 agents and 50 frameworks
Added 50 frameworks in 1.40774ms
Added 1000 agents in 460.562488ms
round 0 allocate() took 155.500548ms to make 1000 offers after filtering 1000 offers
round 1 allocate() took 155.194986ms to make 1000 offers after filtering 2000 offers
round 2 allocate() took 160.706712ms to make 1000 offers after filtering 3000 offers
round 3 allocate() took 156.701883ms to make 1000 offers after filtering 4000 offers
round 4 allocate() took 164.246987ms to make 1000 offers after filtering 5000 offers
round 5 allocate() took 160.791809ms to make 1000 offers after filtering 6000 offers
round 6 allocate() took 166.064652ms to make 1000 offers after filtering 7000 offers
round 7 allocate() took 162.177213ms to make 1000 offers after filtering 8000 offers
round 8 allocate() took 166.620163ms to make 1000 offers after filtering 9000 offers
round 9 allocate() took 167.970045ms to make 1000 offers after filtering 10000 offers
round 10 allocate() took 166.189463ms to make 1000 offers after filtering 11000 offers
round 11 allocate() took 170.863462ms to make 1000 offers after filtering 12000 offers
round 12 allocate() took 172.260473ms to make 1000 offers after filtering 13000 offers
round 13 allocate() took 170.553911ms to make 1000 offers after filtering 14000 offers
round 14 allocate() took 235.764593ms to make 1000 offers after filtering 15000 offers
round 15 allocate() took 247.250433ms to make 1000 offers after filtering 16000 offers
round 16 allocate() took 276.932824ms to make 1000 offers after filtering 17000 offers
round 17 allocate() took 248.888469ms to make 1000 offers after filtering 18000 offers
round 18 allocate() took 193.890556ms to make 1000 offers after filtering 19000 offers
round 19 allocate() took 195.105346ms to make 1000 offers after filtering 20000 offers
round 20 allocate() took 194.447428ms to make 1000 offers after filtering 21000 offers
round 21 allocate() took 205.486287ms to make 1000 offers after filtering 22000 offers
round 22 allocate() took 199.241922ms to make 1000 offers after filtering 23000 offers
round 23 allocate() took 200.885488ms to make 1000 offers after filtering 24000 offers
round 24 allocate() took 216.132361ms to make 1000 offers after filtering 25000 offers
round 25 allocate() took 210.638273ms to make 1000 offers after filtering 26000 offers
round 26 allocate() took 232.397778ms to make 1000 offers after filtering 27000 offers
round 27 allocate() took 239.633708ms to make 1000 offers after filtering 28000 offers
round 28 allocate() took 225.829677ms to make 1000 offers after filtering 29000 offers
round 29 allocate() took 245.143272ms to make 1000 offers after filtering 30000 offers
round 30 allocate() took 248.630295ms to make 1000 offers after filtering 31000 offers
round 31 allocate() took 262.147804ms to make 1000 offers after filtering 32000 offers
round 32 allocate() took 251.109969ms to make 1000 offers after filtering 33000 offers
round 33 allocate() took 263.92273ms to make 1000 offers after filtering 34000 offers
round 34 allocate() took 268.422275ms to make 1000 offers after filtering 35000 offers
round 35 allocate() took 273.830163ms to make 1000 offers after filtering 36000 offers
round 36 allocate() took 283.25481ms to make 1000 offers after filtering 37000 offers
round 37 allocate() took 366.400781ms to make 1000 offers after filtering 38000 offers
round 38 allocate() took 382.202755ms to make 1000 offers after filtering 39000 offers
round 39 allocate() took 337.266121ms to make 1000 offers after filtering 40000 offers
round 40 allocate() took 381.033696ms to make 1000 offers after filtering 41000 offers
round 41 allocate() took 365.941946ms to make 1000 offers after filtering 42000 offers
round 42 allocate() took 379.527886ms to make 1000 offers after filtering 43000 offers
round 43 allocate() took 425.661181ms to make 1000 offers after filtering 44000 offers
round 44 allocate() took 455.86657ms to make 1000 offers after filtering 45000 offers
round 45 allocate() took 506.943117ms to make 1000 offers after filtering 46000 offers
round 46 allocate() took 681.880233ms to make 1000 offers after filtering 47000 offers
round 47 allocate() took 860.932974ms to make 1000 offers after filtering 48000 offers
round 48 allocate() took 960.272209ms to make 1000 offers after filtering 49000 offers
round 49 allocate() took 1.907427057secs to make 0 offers after filtering 50000 offers
round 50 allocate() took 2.157864654secs to make 0 offers after filtering 50000 offers


After:

Using 1000 agents and 1 frameworks
Added 1 frameworks in 404709ns
Added 1000 agents in 245.162138ms
round 0 allocate() took 38.498883ms to make 0 offers after filtering 1000 offers
round 1 allocate() took 42.234898ms to make 0 offers after filtering 1000 offers

Using 1000 agents and 50 frameworks
Added 50 frameworks in 1.441281ms
Added 1000 agents in 261.923711ms
round 0 allocate() took 121.706074ms to make 1000 offers after filtering 1000 offers
round 1 allocate() took 150.465925ms to make 1000 offers after filtering 2000 offers
round 2 allocate() took 273.549893ms to make 1000 offers after filtering 3000 offers
round 3 allocate() took 199.248799ms to make 1000 offers after filtering 4000 offers
round 4 allocate() took 133.865752ms to make 1000 offers after filtering 5000 offers
round 5 allocate() took 130.828599ms to make 1000 offers after filtering 6000 offers
round 6 allocate() took 134.385597ms to make 1000 offers after filtering 7000 offers
round 7 allocate() took 135.810198ms to make 1000 offers after filtering 8000 offers
round 8 allocate() took 126.128592ms to make 1000 offers after filtering 9000 offers
round 9 allocate() took 144.794829ms to make 1000 offers after filtering 10000 offers
round 10 allocate() took 162.533506ms to make 1000 offers after filtering 11000 offers
round 11 allocate() took 159.22141ms to make 1000 offers after filtering 12000 offers
round 12 allocate() took 174.739823ms to make 1000 offers after filtering 13000 offers
round 13 allocate() took 171.095423ms to make 1000 offers after filtering 14000 offers
round 14 allocate() took 186.876661ms to make 1000 offers after filtering 15000 offers
round 15 allocate() took 177.021603ms to make 1000 offers after filtering 16000 offers
round 16 allocate() took 165.970722ms to make 1000 offers after filtering 17000 offers
round 17 allocate() took 162.016338ms to make 1000 offers after filtering 18000 offers
round 18 allocate() took 138.698917ms to make 1000 offers after filtering 19000 offers
round 19 allocate() took 144.556913ms to make 1000 offers after filtering 20000 offers
round 20 allocate() took 155.689926ms to make 1000 offers after filtering 21000 offers
round 21 allocate() took 149.952025ms to make 1000 offers after filtering 22000 offers
round 22 allocate() took 135.98823ms to make 1000 offers after filtering 23000 offers
round 23 allocate() took 132.520992ms to make 1000 offers after filtering 24000 offers
round 24 allocate() took 143.325635ms to make 1000 offers after filtering 25000 offers
round 25 allocate() took 153.313423ms to make 1000 offers after filtering 26000 offers
round 26 allocate() took 169.889066ms to make 1000 offers after filtering 27000 offers
round 27 allocate() took 188.969694ms to make 1000 offers after filtering 28000 offers
round 28 allocate() took 176.132259ms to make 1000 offers after filtering 29000 offers
round 29 allocate() took 186.754676ms to make 1000 offers after filtering 30000 offers
round 30 allocate() took 166.346508ms to make 1000 offers after filtering 31000 offers
round 31 allocate() took 172.557665ms to make 1000 offers after filtering 32000 offers
round 32 allocate() took 169.874406ms to make 1000 offers after filtering 33000 offers
round 33 allocate() took 190.470692ms to make 1000 offers after filtering 34000 offers
round 34 allocate() took 184.328221ms to make 1000 offers after filtering 35000 offers
round 35 allocate() took 222.081892ms to make 1000 offers after filtering 36000 offers
round 36 allocate() took 203.134216ms to make 1000 offers after filtering 37000 offers
round 37 allocate() took 217.490016ms to make 1000 offers after filtering 38000 offers
round 38 allocate() took 257.449904ms to make 1000 offers after filtering 39000 offers
round 39 allocate() took 252.468529ms to make 1000 offers after filtering 40000 offers
round 40 allocate() took 229.433398ms to make 1000 offers after filtering 41000 offers
round 41 allocate() took 251.920859ms to make 1000 offers after filtering 42000 offers
round 42 allocate() took 273.529747ms to make 1000 offers after filtering 43000 offers
round 43 allocate() took 315.08445ms to make 1000 offers after filtering 44000 offers
round 44 allocate() took 354.758003ms to make 1000 offers after filtering 45000 offers
round 45 allocate() took 350.378463ms to make 1000 offers after filtering 46000 offers
round 46 allocate() took 415.070355ms to make 1000 offers after filtering 47000 offers
round 47 allocate() took 519.922944ms to make 1000 offers after filtering 48000 offers
round 48 allocate() took 710.598546ms to make 1000 offers after filtering 49000 offers
round 49 allocate() took 1.267395251secs to make 0 offers after filtering 50000 offers
round 50 allocate() took 1.210188235secs to make 0 offers after filtering 50000 offers


Thanks,

Meng Zhu

Re: Review Request 68490: Optimized `class Resources` with copy-on-write.

Posted by Meng Zhu <mz...@mesosphere.io>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68490/
-----------------------------------------------------------

(Updated Sept. 5, 2018, 12:17 p.m.)


Review request for mesos, Benjamin Mahler and Gastón Kleiman.


Bugs: MESOS-6765
    https://issues.apache.org/jira/browse/MESOS-6765


Repository: mesos


Description
-------

This patch lets `class Resources` only store shared
pointers to the underlying resource objects, so that
read-only filter operations such as `reserved()`,
`unreserved()` and etc. can avoid making copies of
the whole resource objects. Instead, only shared pointers
are copied.

In write operations, we check if there are more than one
references to the resource object. If so, a copy is made
for safe mutation without affecting owners.

To maintain the usage abstraction that `class Resources`
still holds resource objects, we utilize
`boost::indirect_iterator` iterator adapter to deference
the shared pointers as we iterate.


Diffs (updated)
-----

  include/mesos/resources.hpp 6f81b14f8bc090a144eeae8f15639c370366166d 
  include/mesos/v1/resources.hpp 09110530da16678abf6bf6b308906dd8ccc8180a 
  src/common/resources.cpp 3e63cdedb9261970dbeb9bb9f97eed65819f68a7 
  src/v1/resources.cpp 3683a331e0859cd6f2ad061db6ba67112ecfcb0d 


Diff: https://reviews.apache.org/r/68490/diff/2/

Changes: https://reviews.apache.org/r/68490/diff/1-2/


Testing
-------

make check

Did a quick test on Mac with an optimized build, running benchmark `HierarchicalAllocator_BENCHMARK_Test.ResourceLabels`, here are the results of comparing "before" and "after". Note, DVFS is not fixed. And we only did a partial run to verify the validity of the patch, full evaluation coming soon.

**Overall, 33% performance improvement for the 1st steup (1000 agents and 1 frameworks) and 32% improvement for the first 50 iterarions of the 2nd setup (1000 agents and 50 frameworks).**

Before:

[ RUN      ] SlaveAndFrameworkCount/HierarchicalAllocator_BENCHMARK_Test.ResourceLabels/0

Using 1000 agents and 1 frameworks
Added 1 frameworks in 1.220022ms
Added 1000 agents in 465.045382ms
round 0 allocate() took 54.803773ms to make 0 offers after filtering 1000 offers
round 1 allocate() took 66.690456ms to make 0 offers after filtering 1000 offers

Using 1000 agents and 50 frameworks
Added 50 frameworks in 1.40774ms
Added 1000 agents in 460.562488ms
round 0 allocate() took 155.500548ms to make 1000 offers after filtering 1000 offers
round 1 allocate() took 155.194986ms to make 1000 offers after filtering 2000 offers
round 2 allocate() took 160.706712ms to make 1000 offers after filtering 3000 offers
round 3 allocate() took 156.701883ms to make 1000 offers after filtering 4000 offers
round 4 allocate() took 164.246987ms to make 1000 offers after filtering 5000 offers
round 5 allocate() took 160.791809ms to make 1000 offers after filtering 6000 offers
round 6 allocate() took 166.064652ms to make 1000 offers after filtering 7000 offers
round 7 allocate() took 162.177213ms to make 1000 offers after filtering 8000 offers
round 8 allocate() took 166.620163ms to make 1000 offers after filtering 9000 offers
round 9 allocate() took 167.970045ms to make 1000 offers after filtering 10000 offers
round 10 allocate() took 166.189463ms to make 1000 offers after filtering 11000 offers
round 11 allocate() took 170.863462ms to make 1000 offers after filtering 12000 offers
round 12 allocate() took 172.260473ms to make 1000 offers after filtering 13000 offers
round 13 allocate() took 170.553911ms to make 1000 offers after filtering 14000 offers
round 14 allocate() took 235.764593ms to make 1000 offers after filtering 15000 offers
round 15 allocate() took 247.250433ms to make 1000 offers after filtering 16000 offers
round 16 allocate() took 276.932824ms to make 1000 offers after filtering 17000 offers
round 17 allocate() took 248.888469ms to make 1000 offers after filtering 18000 offers
round 18 allocate() took 193.890556ms to make 1000 offers after filtering 19000 offers
round 19 allocate() took 195.105346ms to make 1000 offers after filtering 20000 offers
round 20 allocate() took 194.447428ms to make 1000 offers after filtering 21000 offers
round 21 allocate() took 205.486287ms to make 1000 offers after filtering 22000 offers
round 22 allocate() took 199.241922ms to make 1000 offers after filtering 23000 offers
round 23 allocate() took 200.885488ms to make 1000 offers after filtering 24000 offers
round 24 allocate() took 216.132361ms to make 1000 offers after filtering 25000 offers
round 25 allocate() took 210.638273ms to make 1000 offers after filtering 26000 offers
round 26 allocate() took 232.397778ms to make 1000 offers after filtering 27000 offers
round 27 allocate() took 239.633708ms to make 1000 offers after filtering 28000 offers
round 28 allocate() took 225.829677ms to make 1000 offers after filtering 29000 offers
round 29 allocate() took 245.143272ms to make 1000 offers after filtering 30000 offers
round 30 allocate() took 248.630295ms to make 1000 offers after filtering 31000 offers
round 31 allocate() took 262.147804ms to make 1000 offers after filtering 32000 offers
round 32 allocate() took 251.109969ms to make 1000 offers after filtering 33000 offers
round 33 allocate() took 263.92273ms to make 1000 offers after filtering 34000 offers
round 34 allocate() took 268.422275ms to make 1000 offers after filtering 35000 offers
round 35 allocate() took 273.830163ms to make 1000 offers after filtering 36000 offers
round 36 allocate() took 283.25481ms to make 1000 offers after filtering 37000 offers
round 37 allocate() took 366.400781ms to make 1000 offers after filtering 38000 offers
round 38 allocate() took 382.202755ms to make 1000 offers after filtering 39000 offers
round 39 allocate() took 337.266121ms to make 1000 offers after filtering 40000 offers
round 40 allocate() took 381.033696ms to make 1000 offers after filtering 41000 offers
round 41 allocate() took 365.941946ms to make 1000 offers after filtering 42000 offers
round 42 allocate() took 379.527886ms to make 1000 offers after filtering 43000 offers
round 43 allocate() took 425.661181ms to make 1000 offers after filtering 44000 offers
round 44 allocate() took 455.86657ms to make 1000 offers after filtering 45000 offers
round 45 allocate() took 506.943117ms to make 1000 offers after filtering 46000 offers
round 46 allocate() took 681.880233ms to make 1000 offers after filtering 47000 offers
round 47 allocate() took 860.932974ms to make 1000 offers after filtering 48000 offers
round 48 allocate() took 960.272209ms to make 1000 offers after filtering 49000 offers
round 49 allocate() took 1.907427057secs to make 0 offers after filtering 50000 offers
round 50 allocate() took 2.157864654secs to make 0 offers after filtering 50000 offers


After:

Using 1000 agents and 1 frameworks
Added 1 frameworks in 404709ns
Added 1000 agents in 245.162138ms
round 0 allocate() took 38.498883ms to make 0 offers after filtering 1000 offers
round 1 allocate() took 42.234898ms to make 0 offers after filtering 1000 offers

Using 1000 agents and 50 frameworks
Added 50 frameworks in 1.441281ms
Added 1000 agents in 261.923711ms
round 0 allocate() took 121.706074ms to make 1000 offers after filtering 1000 offers
round 1 allocate() took 150.465925ms to make 1000 offers after filtering 2000 offers
round 2 allocate() took 273.549893ms to make 1000 offers after filtering 3000 offers
round 3 allocate() took 199.248799ms to make 1000 offers after filtering 4000 offers
round 4 allocate() took 133.865752ms to make 1000 offers after filtering 5000 offers
round 5 allocate() took 130.828599ms to make 1000 offers after filtering 6000 offers
round 6 allocate() took 134.385597ms to make 1000 offers after filtering 7000 offers
round 7 allocate() took 135.810198ms to make 1000 offers after filtering 8000 offers
round 8 allocate() took 126.128592ms to make 1000 offers after filtering 9000 offers
round 9 allocate() took 144.794829ms to make 1000 offers after filtering 10000 offers
round 10 allocate() took 162.533506ms to make 1000 offers after filtering 11000 offers
round 11 allocate() took 159.22141ms to make 1000 offers after filtering 12000 offers
round 12 allocate() took 174.739823ms to make 1000 offers after filtering 13000 offers
round 13 allocate() took 171.095423ms to make 1000 offers after filtering 14000 offers
round 14 allocate() took 186.876661ms to make 1000 offers after filtering 15000 offers
round 15 allocate() took 177.021603ms to make 1000 offers after filtering 16000 offers
round 16 allocate() took 165.970722ms to make 1000 offers after filtering 17000 offers
round 17 allocate() took 162.016338ms to make 1000 offers after filtering 18000 offers
round 18 allocate() took 138.698917ms to make 1000 offers after filtering 19000 offers
round 19 allocate() took 144.556913ms to make 1000 offers after filtering 20000 offers
round 20 allocate() took 155.689926ms to make 1000 offers after filtering 21000 offers
round 21 allocate() took 149.952025ms to make 1000 offers after filtering 22000 offers
round 22 allocate() took 135.98823ms to make 1000 offers after filtering 23000 offers
round 23 allocate() took 132.520992ms to make 1000 offers after filtering 24000 offers
round 24 allocate() took 143.325635ms to make 1000 offers after filtering 25000 offers
round 25 allocate() took 153.313423ms to make 1000 offers after filtering 26000 offers
round 26 allocate() took 169.889066ms to make 1000 offers after filtering 27000 offers
round 27 allocate() took 188.969694ms to make 1000 offers after filtering 28000 offers
round 28 allocate() took 176.132259ms to make 1000 offers after filtering 29000 offers
round 29 allocate() took 186.754676ms to make 1000 offers after filtering 30000 offers
round 30 allocate() took 166.346508ms to make 1000 offers after filtering 31000 offers
round 31 allocate() took 172.557665ms to make 1000 offers after filtering 32000 offers
round 32 allocate() took 169.874406ms to make 1000 offers after filtering 33000 offers
round 33 allocate() took 190.470692ms to make 1000 offers after filtering 34000 offers
round 34 allocate() took 184.328221ms to make 1000 offers after filtering 35000 offers
round 35 allocate() took 222.081892ms to make 1000 offers after filtering 36000 offers
round 36 allocate() took 203.134216ms to make 1000 offers after filtering 37000 offers
round 37 allocate() took 217.490016ms to make 1000 offers after filtering 38000 offers
round 38 allocate() took 257.449904ms to make 1000 offers after filtering 39000 offers
round 39 allocate() took 252.468529ms to make 1000 offers after filtering 40000 offers
round 40 allocate() took 229.433398ms to make 1000 offers after filtering 41000 offers
round 41 allocate() took 251.920859ms to make 1000 offers after filtering 42000 offers
round 42 allocate() took 273.529747ms to make 1000 offers after filtering 43000 offers
round 43 allocate() took 315.08445ms to make 1000 offers after filtering 44000 offers
round 44 allocate() took 354.758003ms to make 1000 offers after filtering 45000 offers
round 45 allocate() took 350.378463ms to make 1000 offers after filtering 46000 offers
round 46 allocate() took 415.070355ms to make 1000 offers after filtering 47000 offers
round 47 allocate() took 519.922944ms to make 1000 offers after filtering 48000 offers
round 48 allocate() took 710.598546ms to make 1000 offers after filtering 49000 offers
round 49 allocate() took 1.267395251secs to make 0 offers after filtering 50000 offers
round 50 allocate() took 1.210188235secs to make 0 offers after filtering 50000 offers


Thanks,

Meng Zhu