You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by Benno Evers <be...@mesosphere.com> on 2019/03/04 17:01:13 UTC

Review Request 70116: Added metrics for offer operation feedback.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70116/
-----------------------------------------------------------

Review request for mesos, Gastón Kleiman, Greg Mann, and Joseph Wu.


Bugs: MESOS-8241
    https://issues.apache.org/jira/browse/MESOS-8241


Repository: mesos


Description
-------

This commit adds the following additional metrics
to the master:

    - master/operations_pending
    - master/operations_recovering
    - master/operations_finished
    - master/operations_failed
    - master/operations_error
    - master/operations_dropped
    - master/operations_unreachable
    - master/operations_gone_by_operator

Unit tests are added in the subsequent commit.


Diffs
-----

  src/master/master.hpp aceab3455adfdf7ff6e168e033316997a4d36fb5 
  src/master/master.cpp 665c1c7979a5ae4ecec0d5a68e59c5419049a4d5 
  src/master/metrics.hpp 4495e65b6bb11f7236335a702c4f61e7c3f9b0aa 
  src/master/metrics.cpp 4dd73fb18a06ce8f75c4c1435dba84ade123bee9 


Diff: https://reviews.apache.org/r/70116/diff/1/


Testing
-------


Thanks,

Benno Evers


Re: Review Request 70116: Added metrics for offer operation feedback.

Posted by Greg Mann <gr...@mesosphere.io>.

> On March 4, 2019, 10:20 p.m., Greg Mann wrote:
> > src/master/metrics.hpp
> > Lines 63-67 (patched)
> > <https://reviews.apache.org/r/70116/diff/1/?file=2128637#file2128637line63>
> >
> >     Is this comment accurate? Looks like there is indeed an `operations_unreachable` metric below.
> >     
> >     I think we could probably skip the metrics for such states as the comment suggests.

I think that `operations_recovering`, `operations_unreachable`, and `operations_gone_by_operator` could be removed, since those metrics would not be accurate after a master failover (i.e., master won't know how many operations were on an agent which was marked GONE after master failover). I think the other metrics still make sense.


- Greg


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70116/#review213404
-----------------------------------------------------------


On March 4, 2019, 5:01 p.m., Benno Evers wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70116/
> -----------------------------------------------------------
> 
> (Updated March 4, 2019, 5:01 p.m.)
> 
> 
> Review request for mesos, Gastón Kleiman, Greg Mann, and Joseph Wu.
> 
> 
> Bugs: MESOS-8241
>     https://issues.apache.org/jira/browse/MESOS-8241
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This commit adds the following additional metrics
> to the master:
> 
>     - master/operations_pending
>     - master/operations_recovering
>     - master/operations_finished
>     - master/operations_failed
>     - master/operations_error
>     - master/operations_dropped
>     - master/operations_unreachable
>     - master/operations_gone_by_operator
> 
> Unit tests are added in the subsequent commit.
> 
> 
> Diffs
> -----
> 
>   src/master/master.hpp aceab3455adfdf7ff6e168e033316997a4d36fb5 
>   src/master/master.cpp 665c1c7979a5ae4ecec0d5a68e59c5419049a4d5 
>   src/master/metrics.hpp 4495e65b6bb11f7236335a702c4f61e7c3f9b0aa 
>   src/master/metrics.cpp 4dd73fb18a06ce8f75c4c1435dba84ade123bee9 
> 
> 
> Diff: https://reviews.apache.org/r/70116/diff/1/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Benno Evers
> 
>


Re: Review Request 70116: Added metrics for offer operation feedback.

Posted by Benno Evers <be...@mesosphere.com>.

> On March 4, 2019, 10:20 p.m., Greg Mann wrote:
> > src/master/master.cpp
> > Lines 2261 (patched)
> > <https://reviews.apache.org/r/70116/diff/1/?file=2128636#file2128636line2263>
> >
> >     Why not use `updateOperationMetrics(OPERATION_ERROR, 1);` here?

Why not, indeed.


> On March 4, 2019, 10:20 p.m., Greg Mann wrote:
> > src/master/master.cpp
> > Lines 9223-9224 (patched)
> > <https://reviews.apache.org/r/70116/diff/1/?file=2128636#file2128636line9225>
> >
> >     s/agent/master/
> >     
> >     Yea I think we can crash here.

I though about this some more, and I'm actually leaning towards the opposite conclusion: Left like this, the worst case outcome is that we'll miss a bug that result in incorrect counter values being displayed. When we crash, the worst case is that a bug in the metrics accounting code might lead to a unrecoverable, crash-looping master.

So I think avoiding the latter scenario might be worth departing from the precedence here, especially since we already have precedence for masters actually failing to start for "non-critical" reasons.

I'm not insisting, though, if you still think having a `CHECK()` would be better I'm happy to add it. (the same comment applies almost verbatim to the `CHECK()` suggested by Gaston below as well)


- Benno


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70116/#review213404
-----------------------------------------------------------


On March 8, 2019, 11:48 p.m., Benno Evers wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70116/
> -----------------------------------------------------------
> 
> (Updated March 8, 2019, 11:48 p.m.)
> 
> 
> Review request for mesos, Gastón Kleiman, Greg Mann, and Joseph Wu.
> 
> 
> Bugs: MESOS-8241
>     https://issues.apache.org/jira/browse/MESOS-8241
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This commit adds additional metrics counting the
> number of operations in each state.
> 
> Unit tests are added in the subsequent commit.
> 
> 
> Diffs
> -----
> 
>   src/master/master.cpp dc68fc324de7242737123015fbac19a2129778ce 
>   src/master/metrics.hpp 4495e65b6bb11f7236335a702c4f61e7c3f9b0aa 
>   src/master/metrics.cpp 4dd73fb18a06ce8f75c4c1435dba84ade123bee9 
> 
> 
> Diff: https://reviews.apache.org/r/70116/diff/3/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Benno Evers
> 
>


Re: Review Request 70116: Added metrics for offer operation feedback.

Posted by Greg Mann <gr...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70116/#review213404
-----------------------------------------------------------




src/master/master.cpp
Lines 2261 (patched)
<https://reviews.apache.org/r/70116/#comment299342>

    Why not use `updateOperationMetrics(OPERATION_ERROR, 1);` here?



src/master/master.cpp
Lines 9223-9224 (patched)
<https://reviews.apache.org/r/70116/#comment299343>

    s/agent/master/
    
    Yea I think we can crash here.



src/master/master.cpp
Lines 11672-11673 (patched)
<https://reviews.apache.org/r/70116/#comment299353>

    In the case of operation status update retries, this will be unnecessary work (i.e. we will be decrementing and then incrementing the same metric). Perhaps we should enclose this in a conditional which checks for that equality? It will look a bit strange next to the below conditional, but as the comment indicates we need to rethink this deduplication:
    ```
      // TODO(gkleiman): Revisit the de-duplication logic (MESOS-8441) - if two
      // different terminal statuses arrive, we could end up with different states
      // in `latest_status` and the front of statuses list.
      if (operation->statuses().empty() ||
          *(operation->statuses().rbegin()) != status) {
        operation->add_statuses()->CopyFrom(status);
      }
    ```



src/master/metrics.hpp
Lines 63-67 (patched)
<https://reviews.apache.org/r/70116/#comment299354>

    Is this comment accurate? Looks like there is indeed an `operations_unreachable` metric below.
    
    I think we could probably skip the metrics for such states as the comment suggests.


- Greg Mann


On March 4, 2019, 5:01 p.m., Benno Evers wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70116/
> -----------------------------------------------------------
> 
> (Updated March 4, 2019, 5:01 p.m.)
> 
> 
> Review request for mesos, Gastón Kleiman, Greg Mann, and Joseph Wu.
> 
> 
> Bugs: MESOS-8241
>     https://issues.apache.org/jira/browse/MESOS-8241
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This commit adds the following additional metrics
> to the master:
> 
>     - master/operations_pending
>     - master/operations_recovering
>     - master/operations_finished
>     - master/operations_failed
>     - master/operations_error
>     - master/operations_dropped
>     - master/operations_unreachable
>     - master/operations_gone_by_operator
> 
> Unit tests are added in the subsequent commit.
> 
> 
> Diffs
> -----
> 
>   src/master/master.hpp aceab3455adfdf7ff6e168e033316997a4d36fb5 
>   src/master/master.cpp 665c1c7979a5ae4ecec0d5a68e59c5419049a4d5 
>   src/master/metrics.hpp 4495e65b6bb11f7236335a702c4f61e7c3f9b0aa 
>   src/master/metrics.cpp 4dd73fb18a06ce8f75c4c1435dba84ade123bee9 
> 
> 
> Diff: https://reviews.apache.org/r/70116/diff/1/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Benno Evers
> 
>


Re: Review Request 70116: Added metrics for offer operation feedback.

Posted by Benno Evers <be...@mesosphere.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70116/
-----------------------------------------------------------

(Updated March 27, 2019, 4:39 p.m.)


Review request for mesos, Gastón Kleiman, Greg Mann, and Joseph Wu.


Changes
-------

Clarified comment; clearly distinguished between gauges and counters in documentation.


Bugs: MESOS-8241
    https://issues.apache.org/jira/browse/MESOS-8241


Repository: mesos


Description
-------

This commit adds additional metrics counting the
number of operations in each state.

Unit tests are added in the subsequent commit.


Diffs (updated)
-----

  docs/monitoring.md 54b872f579dbc68ca5f67f4cc1ba34065a09aee2 
  src/master/master.cpp 9c4a9e83da94535873d72c902835f229c4f96320 
  src/master/metrics.hpp 4495e65b6bb11f7236335a702c4f61e7c3f9b0aa 
  src/master/metrics.cpp 4dd73fb18a06ce8f75c4c1435dba84ade123bee9 


Diff: https://reviews.apache.org/r/70116/diff/6/

Changes: https://reviews.apache.org/r/70116/diff/5-6/


Testing
-------


Thanks,

Benno Evers


Re: Review Request 70116: Added metrics for offer operation feedback.

Posted by Joseph Wu <jo...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70116/#review214068
-----------------------------------------------------------


Fix it, then Ship it!




LGTM.

Two small-ish comments to tweak before commit.


docs/monitoring.md
Lines 684-685 (patched)
<https://reviews.apache.org/r/70116/#comment300239>

    You can mention that terminal states, like FINISHED, FAILED, ERROR, DROPPED, or GONE are counters.  Non-terminal states are gauges.
    
    Here and a couple lines below too.
    
    The distinction is (sometimes) important when consuming these metrics because Counters are usually consumed as differences over time (e.g. how many more finished operations are there since the last minute?) whereas gauges are consumed as-is.



src/master/metrics.hpp
Lines 82-84 (patched)
<https://reviews.apache.org/r/70116/#comment300240>

    The framework-wide metrics are not implemented in this review, so you could remove mention of them for now.


- Joseph Wu


On March 26, 2019, 10:56 a.m., Benno Evers wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70116/
> -----------------------------------------------------------
> 
> (Updated March 26, 2019, 10:56 a.m.)
> 
> 
> Review request for mesos, Gastón Kleiman, Greg Mann, and Joseph Wu.
> 
> 
> Bugs: MESOS-8241
>     https://issues.apache.org/jira/browse/MESOS-8241
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This commit adds additional metrics counting the
> number of operations in each state.
> 
> Unit tests are added in the subsequent commit.
> 
> 
> Diffs
> -----
> 
>   docs/monitoring.md 54b872f579dbc68ca5f67f4cc1ba34065a09aee2 
>   src/master/master.cpp 9c4a9e83da94535873d72c902835f229c4f96320 
>   src/master/metrics.hpp 4495e65b6bb11f7236335a702c4f61e7c3f9b0aa 
>   src/master/metrics.cpp 4dd73fb18a06ce8f75c4c1435dba84ade123bee9 
> 
> 
> Diff: https://reviews.apache.org/r/70116/diff/5/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Benno Evers
> 
>


Re: Review Request 70116: Added metrics for offer operation feedback.

Posted by Benno Evers <be...@mesosphere.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70116/
-----------------------------------------------------------

(Updated March 26, 2019, 5:56 p.m.)


Review request for mesos, Gastón Kleiman, Greg Mann, and Joseph Wu.


Changes
-------

Changed accounting logic for gone/unreachable agents; documented over-counting behaviour


Bugs: MESOS-8241
    https://issues.apache.org/jira/browse/MESOS-8241


Repository: mesos


Description
-------

This commit adds additional metrics counting the
number of operations in each state.

Unit tests are added in the subsequent commit.


Diffs (updated)
-----

  docs/monitoring.md 54b872f579dbc68ca5f67f4cc1ba34065a09aee2 
  src/master/master.cpp 9c4a9e83da94535873d72c902835f229c4f96320 
  src/master/metrics.hpp 4495e65b6bb11f7236335a702c4f61e7c3f9b0aa 
  src/master/metrics.cpp 4dd73fb18a06ce8f75c4c1435dba84ade123bee9 


Diff: https://reviews.apache.org/r/70116/diff/5/

Changes: https://reviews.apache.org/r/70116/diff/4-5/


Testing
-------


Thanks,

Benno Evers


Re: Review Request 70116: Added metrics for offer operation feedback.

Posted by Greg Mann <gr...@mesosphere.io>.

> On March 15, 2019, 12:03 a.m., Greg Mann wrote:
> > src/master/metrics.hpp
> > Lines 82-84 (patched)
> > <https://reviews.apache.org/r/70116/diff/4/?file=2130966#file2130966line82>
> >
> >     Hmm am I missing something, or are there no per-framework metrics added? I'm fine with not adding them, but this comment seems to be incorrect?
> 
> Benno Evers wrote:
>     They're not added in this review because they already exist :) See line 816-836 inside `FrameworkMetrics::FrameworkMetrics()`.

Those metrics are slightly different - they provide counters for the offer operations submitted by a particular framework, but don't track the operation states. It might be worth mentioning in this comment that offer operation counts are tracked on a per-framework basis.


- Greg


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70116/#review213734
-----------------------------------------------------------


On March 26, 2019, 5:56 p.m., Benno Evers wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70116/
> -----------------------------------------------------------
> 
> (Updated March 26, 2019, 5:56 p.m.)
> 
> 
> Review request for mesos, Gastón Kleiman, Greg Mann, and Joseph Wu.
> 
> 
> Bugs: MESOS-8241
>     https://issues.apache.org/jira/browse/MESOS-8241
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This commit adds additional metrics counting the
> number of operations in each state.
> 
> Unit tests are added in the subsequent commit.
> 
> 
> Diffs
> -----
> 
>   docs/monitoring.md 54b872f579dbc68ca5f67f4cc1ba34065a09aee2 
>   src/master/master.cpp 9c4a9e83da94535873d72c902835f229c4f96320 
>   src/master/metrics.hpp 4495e65b6bb11f7236335a702c4f61e7c3f9b0aa 
>   src/master/metrics.cpp 4dd73fb18a06ce8f75c4c1435dba84ade123bee9 
> 
> 
> Diff: https://reviews.apache.org/r/70116/diff/5/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Benno Evers
> 
>


Re: Review Request 70116: Added metrics for offer operation feedback.

Posted by Benno Evers <be...@mesosphere.com>.

> On March 15, 2019, 12:03 a.m., Greg Mann wrote:
> > src/master/metrics.hpp
> > Lines 82-84 (patched)
> > <https://reviews.apache.org/r/70116/diff/4/?file=2130966#file2130966line82>
> >
> >     Hmm am I missing something, or are there no per-framework metrics added? I'm fine with not adding them, but this comment seems to be incorrect?

They're not added in this review because they already exist :) See line 816-836 inside `FrameworkMetrics::FrameworkMetrics()`.


- Benno


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70116/#review213734
-----------------------------------------------------------


On March 26, 2019, 5:56 p.m., Benno Evers wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70116/
> -----------------------------------------------------------
> 
> (Updated March 26, 2019, 5:56 p.m.)
> 
> 
> Review request for mesos, Gastón Kleiman, Greg Mann, and Joseph Wu.
> 
> 
> Bugs: MESOS-8241
>     https://issues.apache.org/jira/browse/MESOS-8241
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This commit adds additional metrics counting the
> number of operations in each state.
> 
> Unit tests are added in the subsequent commit.
> 
> 
> Diffs
> -----
> 
>   docs/monitoring.md 54b872f579dbc68ca5f67f4cc1ba34065a09aee2 
>   src/master/master.cpp 9c4a9e83da94535873d72c902835f229c4f96320 
>   src/master/metrics.hpp 4495e65b6bb11f7236335a702c4f61e7c3f9b0aa 
>   src/master/metrics.cpp 4dd73fb18a06ce8f75c4c1435dba84ade123bee9 
> 
> 
> Diff: https://reviews.apache.org/r/70116/diff/5/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Benno Evers
> 
>


Re: Review Request 70116: Added metrics for offer operation feedback.

Posted by Benno Evers <be...@mesosphere.com>.

> On March 15, 2019, 12:03 a.m., Greg Mann wrote:
> > src/master/metrics.hpp
> > Lines 82-84 (patched)
> > <https://reviews.apache.org/r/70116/diff/4/?file=2130966#file2130966line82>
> >
> >     Hmm am I missing something, or are there no per-framework metrics added? I'm fine with not adding them, but this comment seems to be incorrect?
> 
> Benno Evers wrote:
>     They're not added in this review because they already exist :) See line 816-836 inside `FrameworkMetrics::FrameworkMetrics()`.
> 
> Greg Mann wrote:
>     Those metrics are slightly different - they provide counters for the offer operations submitted by a particular framework, but don't track the operation states. It might be worth mentioning in this comment that offer operation counts are tracked on a per-framework basis.

Oh, right. Updated the comment.


- Benno


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70116/#review213734
-----------------------------------------------------------


On March 27, 2019, 4:39 p.m., Benno Evers wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70116/
> -----------------------------------------------------------
> 
> (Updated March 27, 2019, 4:39 p.m.)
> 
> 
> Review request for mesos, Gastón Kleiman, Greg Mann, and Joseph Wu.
> 
> 
> Bugs: MESOS-8241
>     https://issues.apache.org/jira/browse/MESOS-8241
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This commit adds additional metrics counting the
> number of operations in each state.
> 
> Unit tests are added in the subsequent commit.
> 
> 
> Diffs
> -----
> 
>   docs/monitoring.md 54b872f579dbc68ca5f67f4cc1ba34065a09aee2 
>   src/master/master.cpp 9c4a9e83da94535873d72c902835f229c4f96320 
>   src/master/metrics.hpp 4495e65b6bb11f7236335a702c4f61e7c3f9b0aa 
>   src/master/metrics.cpp 4dd73fb18a06ce8f75c4c1435dba84ade123bee9 
> 
> 
> Diff: https://reviews.apache.org/r/70116/diff/6/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Benno Evers
> 
>


Re: Review Request 70116: Added metrics for offer operation feedback.

Posted by Greg Mann <gr...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70116/#review213734
-----------------------------------------------------------




src/master/metrics.hpp
Lines 82-84 (patched)
<https://reviews.apache.org/r/70116/#comment299715>

    Hmm am I missing something, or are there no per-framework metrics added? I'm fine with not adding them, but this comment seems to be incorrect?


- Greg Mann


On March 11, 2019, 7:14 p.m., Benno Evers wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70116/
> -----------------------------------------------------------
> 
> (Updated March 11, 2019, 7:14 p.m.)
> 
> 
> Review request for mesos, Gastón Kleiman, Greg Mann, and Joseph Wu.
> 
> 
> Bugs: MESOS-8241
>     https://issues.apache.org/jira/browse/MESOS-8241
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This commit adds additional metrics counting the
> number of operations in each state.
> 
> Unit tests are added in the subsequent commit.
> 
> 
> Diffs
> -----
> 
>   src/master/master.cpp dc68fc324de7242737123015fbac19a2129778ce 
>   src/master/metrics.hpp 4495e65b6bb11f7236335a702c4f61e7c3f9b0aa 
>   src/master/metrics.cpp 4dd73fb18a06ce8f75c4c1435dba84ade123bee9 
> 
> 
> Diff: https://reviews.apache.org/r/70116/diff/4/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Benno Evers
> 
>


Re: Review Request 70116: Added metrics for offer operation feedback.

Posted by Greg Mann <gr...@mesosphere.io>.

> On March 12, 2019, 10:56 p.m., Joseph Wu wrote:
> > src/master/master.cpp
> > Lines 11361-11385 (original), 11378-11402 (patched)
> > <https://reviews.apache.org/r/70116/diff/4/?file=2130965#file2130965line11380>
> >
> >     I'm curious what would be the proper way to handle operation cleanup/removal.
> >     
> >     When an operation is transitioned into a terminal state, the master will usually `removeOperation(...)` shortly afterwards.  Since we don't decrement the metrics in this case, the number of terminal operations will continue to grow.  This seems like the proper behavior.
> >     
> >     However, in this code, it is possible to remove an agent with non-terminal operations.  This means the non-terminal metrics will never be decremented.  So you can have a cluster with 0 operations, but the metric for pending operations might be non-zero.
> 
> Benno Evers wrote:
>     Hm, good question. I think the only ways a slave gets removed while it still has operations pending is by either being marked gone, or becoming unreachable.
>     
>     In both cases we already transition the counters to the correct `OPERATION_GONE`/`OPERATION_UNREACHABLE` states. (although unfortunately in a somewhat non-local manner, that's what https://reviews.apache.org/r/70185/ is all about)
>     
>     For gone operations, this should be fine. However, the problem is that when an unreachable slave reregisters, we re-add all operations as new operations without decrementing the `operations_unreachable` metric, since at the time the `UpdateSlaveMessage` arrives the master already forgot that the slave was previously unreachable.
>     
>     So as far as I can see, the metrics for pending operations should always be correct, but it is possible to overcount unreachable operations.
>     
>     It's not clear if this can be fixed without quite far-reaching refactoring in the master. So I think the best course might be to either document this behaviour, or remove the `operations_unreachable` metric altogether.
>     
>     What do you think?
> 
> Joseph Wu wrote:
>     It may be worthwhile to add a CHECK to make sure we only ever remove terminal (including GONE) or UNREACHABLE operations.
>     
>     ---
>     
>     Per endless (possible) recounting of unreachable operations, I'd lean towards overcounting & documenting how/when this is possible.  Probably inside the operator document that lists/describes all the metrics.

"the only ways a slave gets removed while it still has operations pending is by either being marked gone, or becoming unreachable"

This isn't true - for example, an agent could send an `UnregisterSlaveMessage` when it has pending operations and it will be removed via `Master::_removeSlave()`. Unfortunately, agent removal in that method currently does not correspond to any well-defined agent states; see https://issues.apache.org/jira/browse/MESOS-9556.

I think that we need to decrement non-terminal operation states when operations are removed in `_removeSlave()`.


- Greg


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70116/#review213632
-----------------------------------------------------------


On March 11, 2019, 7:14 p.m., Benno Evers wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70116/
> -----------------------------------------------------------
> 
> (Updated March 11, 2019, 7:14 p.m.)
> 
> 
> Review request for mesos, Gastón Kleiman, Greg Mann, and Joseph Wu.
> 
> 
> Bugs: MESOS-8241
>     https://issues.apache.org/jira/browse/MESOS-8241
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This commit adds additional metrics counting the
> number of operations in each state.
> 
> Unit tests are added in the subsequent commit.
> 
> 
> Diffs
> -----
> 
>   src/master/master.cpp dc68fc324de7242737123015fbac19a2129778ce 
>   src/master/metrics.hpp 4495e65b6bb11f7236335a702c4f61e7c3f9b0aa 
>   src/master/metrics.cpp 4dd73fb18a06ce8f75c4c1435dba84ade123bee9 
> 
> 
> Diff: https://reviews.apache.org/r/70116/diff/4/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Benno Evers
> 
>


Re: Review Request 70116: Added metrics for offer operation feedback.

Posted by Benno Evers <be...@mesosphere.com>.

> On March 12, 2019, 10:56 p.m., Joseph Wu wrote:
> > src/master/master.cpp
> > Lines 11361-11385 (original), 11378-11402 (patched)
> > <https://reviews.apache.org/r/70116/diff/4/?file=2130965#file2130965line11380>
> >
> >     I'm curious what would be the proper way to handle operation cleanup/removal.
> >     
> >     When an operation is transitioned into a terminal state, the master will usually `removeOperation(...)` shortly afterwards.  Since we don't decrement the metrics in this case, the number of terminal operations will continue to grow.  This seems like the proper behavior.
> >     
> >     However, in this code, it is possible to remove an agent with non-terminal operations.  This means the non-terminal metrics will never be decremented.  So you can have a cluster with 0 operations, but the metric for pending operations might be non-zero.

Hm, good question. I think the only ways a slave gets removed while it still has operations pending is by either being marked gone, or becoming unreachable.

In both cases we already transition the counters to the correct `OPERATION_GONE`/`OPERATION_UNREACHABLE` states. (although unfortunately in a somewhat non-local manner, that's what https://reviews.apache.org/r/70185/ is all about)

For gone operations, this should be fine. However, the problem is that when an unreachable slave reregisters, we re-add all operations as new operations without decrementing the `operations_unreachable` metric, since at the time the `UpdateSlaveMessage` arrives the master already forgot that the slave was previously unreachable.

So as far as I can see, the metrics for pending operations should always be correct, but it is possible to overcount unreachable operations.

It's not clear if this can be fixed without quite far-reaching refactoring in the master. So I think the best course might be to either document this behaviour, or remove the `operations_unreachable` metric altogether.

What do you think?


- Benno


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70116/#review213632
-----------------------------------------------------------


On March 11, 2019, 7:14 p.m., Benno Evers wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70116/
> -----------------------------------------------------------
> 
> (Updated March 11, 2019, 7:14 p.m.)
> 
> 
> Review request for mesos, Gastón Kleiman, Greg Mann, and Joseph Wu.
> 
> 
> Bugs: MESOS-8241
>     https://issues.apache.org/jira/browse/MESOS-8241
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This commit adds additional metrics counting the
> number of operations in each state.
> 
> Unit tests are added in the subsequent commit.
> 
> 
> Diffs
> -----
> 
>   src/master/master.cpp dc68fc324de7242737123015fbac19a2129778ce 
>   src/master/metrics.hpp 4495e65b6bb11f7236335a702c4f61e7c3f9b0aa 
>   src/master/metrics.cpp 4dd73fb18a06ce8f75c4c1435dba84ade123bee9 
> 
> 
> Diff: https://reviews.apache.org/r/70116/diff/4/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Benno Evers
> 
>


Re: Review Request 70116: Added metrics for offer operation feedback.

Posted by Benno Evers <be...@mesosphere.com>.

> On March 12, 2019, 10:56 p.m., Joseph Wu wrote:
> > src/master/master.cpp
> > Lines 11361-11385 (original), 11378-11402 (patched)
> > <https://reviews.apache.org/r/70116/diff/4/?file=2130965#file2130965line11380>
> >
> >     I'm curious what would be the proper way to handle operation cleanup/removal.
> >     
> >     When an operation is transitioned into a terminal state, the master will usually `removeOperation(...)` shortly afterwards.  Since we don't decrement the metrics in this case, the number of terminal operations will continue to grow.  This seems like the proper behavior.
> >     
> >     However, in this code, it is possible to remove an agent with non-terminal operations.  This means the non-terminal metrics will never be decremented.  So you can have a cluster with 0 operations, but the metric for pending operations might be non-zero.
> 
> Benno Evers wrote:
>     Hm, good question. I think the only ways a slave gets removed while it still has operations pending is by either being marked gone, or becoming unreachable.
>     
>     In both cases we already transition the counters to the correct `OPERATION_GONE`/`OPERATION_UNREACHABLE` states. (although unfortunately in a somewhat non-local manner, that's what https://reviews.apache.org/r/70185/ is all about)
>     
>     For gone operations, this should be fine. However, the problem is that when an unreachable slave reregisters, we re-add all operations as new operations without decrementing the `operations_unreachable` metric, since at the time the `UpdateSlaveMessage` arrives the master already forgot that the slave was previously unreachable.
>     
>     So as far as I can see, the metrics for pending operations should always be correct, but it is possible to overcount unreachable operations.
>     
>     It's not clear if this can be fixed without quite far-reaching refactoring in the master. So I think the best course might be to either document this behaviour, or remove the `operations_unreachable` metric altogether.
>     
>     What do you think?
> 
> Joseph Wu wrote:
>     It may be worthwhile to add a CHECK to make sure we only ever remove terminal (including GONE) or UNREACHABLE operations.
>     
>     ---
>     
>     Per endless (possible) recounting of unreachable operations, I'd lean towards overcounting & documenting how/when this is possible.  Probably inside the operator document that lists/describes all the metrics.
> 
> Greg Mann wrote:
>     "the only ways a slave gets removed while it still has operations pending is by either being marked gone, or becoming unreachable"
>     
>     This isn't true - for example, an agent could send an `UnregisterSlaveMessage` when it has pending operations and it will be removed via `Master::_removeSlave()`. Unfortunately, agent removal in that method currently does not correspond to any well-defined agent states; see https://issues.apache.org/jira/browse/MESOS-9556.
>     
>     I think that we need to decrement non-terminal operation states when operations are removed in `_removeSlave()`.

I'm now decrementing non-terminal states (and completely removed https://reviews.apache.org/r/70185/ while re-working the accounting).


- Benno


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70116/#review213632
-----------------------------------------------------------


On March 26, 2019, 5:56 p.m., Benno Evers wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70116/
> -----------------------------------------------------------
> 
> (Updated March 26, 2019, 5:56 p.m.)
> 
> 
> Review request for mesos, Gastón Kleiman, Greg Mann, and Joseph Wu.
> 
> 
> Bugs: MESOS-8241
>     https://issues.apache.org/jira/browse/MESOS-8241
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This commit adds additional metrics counting the
> number of operations in each state.
> 
> Unit tests are added in the subsequent commit.
> 
> 
> Diffs
> -----
> 
>   docs/monitoring.md 54b872f579dbc68ca5f67f4cc1ba34065a09aee2 
>   src/master/master.cpp 9c4a9e83da94535873d72c902835f229c4f96320 
>   src/master/metrics.hpp 4495e65b6bb11f7236335a702c4f61e7c3f9b0aa 
>   src/master/metrics.cpp 4dd73fb18a06ce8f75c4c1435dba84ade123bee9 
> 
> 
> Diff: https://reviews.apache.org/r/70116/diff/5/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Benno Evers
> 
>


Re: Review Request 70116: Added metrics for offer operation feedback.

Posted by Joseph Wu <jo...@mesosphere.io>.

> On March 12, 2019, 3:56 p.m., Joseph Wu wrote:
> > src/master/master.cpp
> > Lines 11361-11385 (original), 11378-11402 (patched)
> > <https://reviews.apache.org/r/70116/diff/4/?file=2130965#file2130965line11380>
> >
> >     I'm curious what would be the proper way to handle operation cleanup/removal.
> >     
> >     When an operation is transitioned into a terminal state, the master will usually `removeOperation(...)` shortly afterwards.  Since we don't decrement the metrics in this case, the number of terminal operations will continue to grow.  This seems like the proper behavior.
> >     
> >     However, in this code, it is possible to remove an agent with non-terminal operations.  This means the non-terminal metrics will never be decremented.  So you can have a cluster with 0 operations, but the metric for pending operations might be non-zero.
> 
> Benno Evers wrote:
>     Hm, good question. I think the only ways a slave gets removed while it still has operations pending is by either being marked gone, or becoming unreachable.
>     
>     In both cases we already transition the counters to the correct `OPERATION_GONE`/`OPERATION_UNREACHABLE` states. (although unfortunately in a somewhat non-local manner, that's what https://reviews.apache.org/r/70185/ is all about)
>     
>     For gone operations, this should be fine. However, the problem is that when an unreachable slave reregisters, we re-add all operations as new operations without decrementing the `operations_unreachable` metric, since at the time the `UpdateSlaveMessage` arrives the master already forgot that the slave was previously unreachable.
>     
>     So as far as I can see, the metrics for pending operations should always be correct, but it is possible to overcount unreachable operations.
>     
>     It's not clear if this can be fixed without quite far-reaching refactoring in the master. So I think the best course might be to either document this behaviour, or remove the `operations_unreachable` metric altogether.
>     
>     What do you think?

It may be worthwhile to add a CHECK to make sure we only ever remove terminal (including GONE) or UNREACHABLE operations.

---

Per endless (possible) recounting of unreachable operations, I'd lean towards overcounting & documenting how/when this is possible.  Probably inside the operator document that lists/describes all the metrics.


- Joseph


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70116/#review213632
-----------------------------------------------------------


On March 11, 2019, 12:14 p.m., Benno Evers wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70116/
> -----------------------------------------------------------
> 
> (Updated March 11, 2019, 12:14 p.m.)
> 
> 
> Review request for mesos, Gastón Kleiman, Greg Mann, and Joseph Wu.
> 
> 
> Bugs: MESOS-8241
>     https://issues.apache.org/jira/browse/MESOS-8241
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This commit adds additional metrics counting the
> number of operations in each state.
> 
> Unit tests are added in the subsequent commit.
> 
> 
> Diffs
> -----
> 
>   src/master/master.cpp dc68fc324de7242737123015fbac19a2129778ce 
>   src/master/metrics.hpp 4495e65b6bb11f7236335a702c4f61e7c3f9b0aa 
>   src/master/metrics.cpp 4dd73fb18a06ce8f75c4c1435dba84ade123bee9 
> 
> 
> Diff: https://reviews.apache.org/r/70116/diff/4/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Benno Evers
> 
>


Re: Review Request 70116: Added metrics for offer operation feedback.

Posted by Joseph Wu <jo...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70116/#review213632
-----------------------------------------------------------



One discussion point, but otherwise LGTM.


src/master/master.cpp
Lines 11361-11385 (original), 11378-11402 (patched)
<https://reviews.apache.org/r/70116/#comment299630>

    I'm curious what would be the proper way to handle operation cleanup/removal.
    
    When an operation is transitioned into a terminal state, the master will usually `removeOperation(...)` shortly afterwards.  Since we don't decrement the metrics in this case, the number of terminal operations will continue to grow.  This seems like the proper behavior.
    
    However, in this code, it is possible to remove an agent with non-terminal operations.  This means the non-terminal metrics will never be decremented.  So you can have a cluster with 0 operations, but the metric for pending operations might be non-zero.


- Joseph Wu


On March 11, 2019, 12:14 p.m., Benno Evers wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70116/
> -----------------------------------------------------------
> 
> (Updated March 11, 2019, 12:14 p.m.)
> 
> 
> Review request for mesos, Gastón Kleiman, Greg Mann, and Joseph Wu.
> 
> 
> Bugs: MESOS-8241
>     https://issues.apache.org/jira/browse/MESOS-8241
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This commit adds additional metrics counting the
> number of operations in each state.
> 
> Unit tests are added in the subsequent commit.
> 
> 
> Diffs
> -----
> 
>   src/master/master.cpp dc68fc324de7242737123015fbac19a2129778ce 
>   src/master/metrics.hpp 4495e65b6bb11f7236335a702c4f61e7c3f9b0aa 
>   src/master/metrics.cpp 4dd73fb18a06ce8f75c4c1435dba84ade123bee9 
> 
> 
> Diff: https://reviews.apache.org/r/70116/diff/4/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Benno Evers
> 
>


Re: Review Request 70116: Added metrics for offer operation feedback.

Posted by Benno Evers <be...@mesosphere.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70116/
-----------------------------------------------------------

(Updated March 11, 2019, 7:14 p.m.)


Review request for mesos, Gastón Kleiman, Greg Mann, and Joseph Wu.


Changes
-------

Correctly adjust counter for gone/unreachable operations; add API for decrement followed by increment


Bugs: MESOS-8241
    https://issues.apache.org/jira/browse/MESOS-8241


Repository: mesos


Description
-------

This commit adds additional metrics counting the
number of operations in each state.

Unit tests are added in the subsequent commit.


Diffs (updated)
-----

  src/master/master.cpp dc68fc324de7242737123015fbac19a2129778ce 
  src/master/metrics.hpp 4495e65b6bb11f7236335a702c4f61e7c3f9b0aa 
  src/master/metrics.cpp 4dd73fb18a06ce8f75c4c1435dba84ade123bee9 


Diff: https://reviews.apache.org/r/70116/diff/4/

Changes: https://reviews.apache.org/r/70116/diff/3-4/


Testing
-------


Thanks,

Benno Evers


Re: Review Request 70116: Added metrics for offer operation feedback.

Posted by Benno Evers <be...@mesosphere.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70116/
-----------------------------------------------------------

(Updated March 8, 2019, 11:48 p.m.)


Review request for mesos, Gastón Kleiman, Greg Mann, and Joseph Wu.


Bugs: MESOS-8241
    https://issues.apache.org/jira/browse/MESOS-8241


Repository: mesos


Description
-------

This commit adds additional metrics counting the
number of operations in each state.

Unit tests are added in the subsequent commit.


Diffs (updated)
-----

  src/master/master.cpp dc68fc324de7242737123015fbac19a2129778ce 
  src/master/metrics.hpp 4495e65b6bb11f7236335a702c4f61e7c3f9b0aa 
  src/master/metrics.cpp 4dd73fb18a06ce8f75c4c1435dba84ade123bee9 


Diff: https://reviews.apache.org/r/70116/diff/3/

Changes: https://reviews.apache.org/r/70116/diff/2-3/


Testing
-------


Thanks,

Benno Evers


Re: Review Request 70116: Added metrics for offer operation feedback.

Posted by Benno Evers <be...@mesosphere.com>.

> On March 8, 2019, 12:30 a.m., Gastón Kleiman wrote:
> > src/master/master.hpp
> > Lines 941-944 (patched)
> > <https://reviews.apache.org/r/70116/diff/2/?file=2129569#file2129569line941>
> >
> >     Is this used anywhere?

It is not. Removed!


> On March 8, 2019, 12:30 a.m., Gastón Kleiman wrote:
> > src/master/metrics.cpp
> > Lines 158-159 (original), 160-161 (patched)
> > <https://reviews.apache.org/r/70116/diff/2/?file=2129572#file2129572line160>
> >
> >     Should we add a similar metric for operation status updates?

Sounds reasonable.


> On March 8, 2019, 12:30 a.m., Gastón Kleiman wrote:
> > src/master/metrics.cpp
> > Lines 174-177 (original), 176-179 (patched)
> > <https://reviews.apache.org/r/70116/diff/2/?file=2129572#file2129572line176>
> >
> >     Should we add a similar metrics for operation status updates?

Sounds equally reasonable.


> On March 8, 2019, 12:30 a.m., Gastón Kleiman wrote:
> > src/master/metrics.cpp
> > Lines 386-388 (patched)
> > <https://reviews.apache.org/r/70116/diff/2/?file=2129572#file2129572line386>
> >
> >     This is weirdly formatted, I think the following follows the style guide and is more readable:
> >     
> >     ```
> >         if (type == Offer::Operation::UNKNOWN ||
> >             type == Offer::Operation::LAUNCH ||
> >             type == Offer::Operation::LAUNCH_GROUP) {
> >     ```

Both styles are explicitly permitted by the style guide, as long as the same style is used consistently within a given boolean expression.

But since you feel this one looks better I will change it.


> On March 8, 2019, 12:30 a.m., Gastón Kleiman wrote:
> > src/master/metrics.cpp
> > Lines 637-651 (patched)
> > <https://reviews.apache.org/r/70116/diff/2/?file=2129572#file2129572line637>
> >
> >     Can these counters be decremented? If not, should we add a CHECK to ensure that `delta` is always positive?

They can be decremented, and we expect delta to be sometimes negative in this function. I think the only functional difference between `Counter` and `PushGauge` is the existence of `operator=`.


> On March 8, 2019, 12:30 a.m., Gastón Kleiman wrote:
> > src/master/metrics.cpp
> > Lines 684 (patched)
> > <https://reviews.apache.org/r/70116/diff/2/?file=2129572#file2129572line684>
> >
> >     Can counters for terminal states be decremented? I guess not? If I'm correct, then we could add a `CHECK` here.

They can be decremented, but they should not be unless we have a bug in the master code. I'm still a bit hesitant to `CHECK` for this, please see my comment to a similar suggestion by Greg above.


- Benno


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70116/#review213541
-----------------------------------------------------------


On March 8, 2019, 11:48 p.m., Benno Evers wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70116/
> -----------------------------------------------------------
> 
> (Updated March 8, 2019, 11:48 p.m.)
> 
> 
> Review request for mesos, Gastón Kleiman, Greg Mann, and Joseph Wu.
> 
> 
> Bugs: MESOS-8241
>     https://issues.apache.org/jira/browse/MESOS-8241
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This commit adds additional metrics counting the
> number of operations in each state.
> 
> Unit tests are added in the subsequent commit.
> 
> 
> Diffs
> -----
> 
>   src/master/master.cpp dc68fc324de7242737123015fbac19a2129778ce 
>   src/master/metrics.hpp 4495e65b6bb11f7236335a702c4f61e7c3f9b0aa 
>   src/master/metrics.cpp 4dd73fb18a06ce8f75c4c1435dba84ade123bee9 
> 
> 
> Diff: https://reviews.apache.org/r/70116/diff/3/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Benno Evers
> 
>


Re: Review Request 70116: Added metrics for offer operation feedback.

Posted by Gastón Kleiman <ga...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70116/#review213541
-----------------------------------------------------------




src/master/master.hpp
Lines 941-944 (patched)
<https://reviews.apache.org/r/70116/#comment299493>

    Is this used anywhere?



src/master/metrics.hpp
Lines 83 (patched)
<https://reviews.apache.org/r/70116/#comment299494>

    s/per operation/per operation type/



src/master/metrics.cpp
Lines 158-159 (original), 160-161 (patched)
<https://reviews.apache.org/r/70116/#comment299495>

    Should we add a similar metric for operation status updates?



src/master/metrics.cpp
Lines 174-177 (original), 176-179 (patched)
<https://reviews.apache.org/r/70116/#comment299496>

    Should we add a similar metrics for operation status updates?



src/master/metrics.cpp
Lines 386-388 (patched)
<https://reviews.apache.org/r/70116/#comment299497>

    This is weirdly formatted, I think the following follows the style guide and is more readable:
    
    ```
        if (type == Offer::Operation::UNKNOWN ||
            type == Offer::Operation::LAUNCH ||
            type == Offer::Operation::LAUNCH_GROUP) {
    ```



src/master/metrics.cpp
Lines 637-651 (patched)
<https://reviews.apache.org/r/70116/#comment299499>

    Can these counters be decremented? If not, should we add a CHECK to ensure that `delta` is always positive?



src/master/metrics.cpp
Lines 684 (patched)
<https://reviews.apache.org/r/70116/#comment299500>

    Can counters for terminal states be decremented? I guess not? If I'm correct, then we could add a `CHECK` here.


- Gastón Kleiman


On March 7, 2019, 9:01 a.m., Benno Evers wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70116/
> -----------------------------------------------------------
> 
> (Updated March 7, 2019, 9:01 a.m.)
> 
> 
> Review request for mesos, Gastón Kleiman, Greg Mann, and Joseph Wu.
> 
> 
> Bugs: MESOS-8241
>     https://issues.apache.org/jira/browse/MESOS-8241
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This commit adds additional metrics counting the
> number of operations in each state.
> 
> Unit tests are added in the subsequent commit.
> 
> 
> Diffs
> -----
> 
>   src/master/master.hpp aceab3455adfdf7ff6e168e033316997a4d36fb5 
>   src/master/master.cpp 665c1c7979a5ae4ecec0d5a68e59c5419049a4d5 
>   src/master/metrics.hpp 4495e65b6bb11f7236335a702c4f61e7c3f9b0aa 
>   src/master/metrics.cpp 4dd73fb18a06ce8f75c4c1435dba84ade123bee9 
> 
> 
> Diff: https://reviews.apache.org/r/70116/diff/2/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Benno Evers
> 
>


Re: Review Request 70116: Added metrics for offer operation feedback.

Posted by Benno Evers <be...@mesosphere.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70116/
-----------------------------------------------------------

(Updated March 7, 2019, 5:01 p.m.)


Review request for mesos, Gastón Kleiman, Greg Mann, and Joseph Wu.


Changes
-------

Added per-operation metrics; addressed comments.


Bugs: MESOS-8241
    https://issues.apache.org/jira/browse/MESOS-8241


Repository: mesos


Description (updated)
-------

This commit adds additional metrics counting the
number of operations in each state.

Unit tests are added in the subsequent commit.


Diffs (updated)
-----

  src/master/master.hpp aceab3455adfdf7ff6e168e033316997a4d36fb5 
  src/master/master.cpp 665c1c7979a5ae4ecec0d5a68e59c5419049a4d5 
  src/master/metrics.hpp 4495e65b6bb11f7236335a702c4f61e7c3f9b0aa 
  src/master/metrics.cpp 4dd73fb18a06ce8f75c4c1435dba84ade123bee9 


Diff: https://reviews.apache.org/r/70116/diff/2/

Changes: https://reviews.apache.org/r/70116/diff/1-2/


Testing
-------


Thanks,

Benno Evers