You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Aviem Zur <av...@gmail.com> on 2017/01/19 09:41:51 UTC

Committed vs. attempted metrics results

Hi all,

While working on the implementation of metrics API in Spark runner the
question of committed vs. attempted results has come up, sparking (no pun
intended) an interesting conversation.
(See API: MetricResult
<https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/metrics/MetricResult.java>
and
discussion: PR #1750 <https://github.com/apache/beam/pull/1750>)

The separation of `attempted` and `committed` metric results seems a bit
unclear.

Seeing that current implementations of aggregators in the different runners
do not guarantee correctness, one could assume that the metrics API
implementations will also follow the same guarantees.

If this is correct, then you could assume that only `attempted()` metrics
results can be fulfilled.
Would it then be better to just have a single method such as `get()` in the
API, and have the guarantees of each runner explained in the capability
matrix / documentation?

Re: Committed vs. attempted metrics results

Posted by Lukasz Cwik <lc...@google.com.INVALID>.
I think that some runners may support both and a user may want the
differentiation.
Lets say a user had a function which failed occasionally due to some
external event. A user could calculate the efficiency of their function by
measuring the recorded processing time as a single metric and then
computing:
committed time / attempted time

On Thu, Jan 19, 2017 at 1:41 AM, Aviem Zur <av...@gmail.com> wrote:

> Hi all,
>
> While working on the implementation of metrics API in Spark runner the
> question of committed vs. attempted results has come up, sparking (no pun
> intended) an interesting conversation.
> (See API: MetricResult
> <https://github.com/apache/beam/blob/master/sdks/java/
> core/src/main/java/org/apache/beam/sdk/metrics/MetricResult.java>
> and
> discussion: PR #1750 <https://github.com/apache/beam/pull/1750>)
>
> The separation of `attempted` and `committed` metric results seems a bit
> unclear.
>
> Seeing that current implementations of aggregators in the different runners
> do not guarantee correctness, one could assume that the metrics API
> implementations will also follow the same guarantees.
>
> If this is correct, then you could assume that only `attempted()` metrics
> results can be fulfilled.
> Would it then be better to just have a single method such as `get()` in the
> API, and have the guarantees of each runner explained in the capability
> matrix / documentation?
>

Re: Committed vs. attempted metrics results

Posted by Aviem Zur <av...@gmail.com>.
Sure.
I've opened a ticket which tries to summarize these final thoughts:
https://issues.apache.org/jira/browse/BEAM-1344

On Thu, Jan 26, 2017 at 6:47 PM Ben Chambers <bc...@google.com.invalid>
wrote:

> It think relaxing the query to not be an exact match is reasonable. I'm
> wondering if it should be substring or regex. either one preserves the
> existing behavior of, when passed a full step path returning only the
> metrics for that specific step, but it adds the ability to just know
> approximately the step name.
>
> A) Util or not seems fine to me. If it seems likely to be reusable let's do
> so. Preferably in a seperate PR or commit.
>
> B) Yes, direct runner should match whatever semantics we choose.
>
> On Thu, Jan 26, 2017, 5:30 AM Aviem Zur <av...@gmail.com> wrote:
>
> > Ben - yes, there is still some ambiguity regarding the querying of the
> > metrics results.
> >
> > You've discussed in this thread the notion that metrics step names should
> > be converted to unique names when aggregating metrics, so that each step
> > will aggregate its own metrics, and not join with other steps by mistake.
> > The way you suggested to query this is that when querying by step, all
> > steps which contain the query string as a substring will show up in the
> > results.
> >
> > This sounds fine, however this is not how the direct runner is
> implemented
> > right now. It is an exact match.
> >
> > In my PR I copied the MetricsResults filtering methods directly from the
> > direct runner and suggested that since I copied them verbatim perhaps
> they
> > should be pulled up to a more central module, and then all runners could
> > use them.
> >
> > So, my remaining questions are:
> > A) Should we create a util for filtering MetricsResults based on a query,
> > using the suggested filtering implementation in this thread (Substring
> > match) ?
> > B) Should direct runner be changed to use such a util, and conform with
> the
> > results filtering suggested.
> >
> > On Tue, Jan 24, 2017 at 2:03 AM Ben Chambers
> <bchambers@google.com.invalid
> > >
> > wrote:
> >
> > > For the short term, it seems like staying with the existing Query API
> and
> > > allowing runner's to throw exceptions if a user issues a query that is
> > not
> > > supported is reasonable. It shouldn't affect the ability to run a Beam
> > > pipeline on other runners, since the Query API is only exposed *after*
> > the
> > > pipeline is run.
> > >
> > > For the longer term, it seems like the Query API could merit some more
> > > thought, especially if people have good use cases for accessing the
> value
> > > of metrics programatically from the same program that ran the original
> > > pipeline.
> > >
> > > Aviem -- is there anything specific that needs to be discussed/nailed
> > down
> > > to help with your PR?
> > >
> > > On Thu, Jan 19, 2017 at 3:57 PM Ben Chambers <bc...@google.com>
> > wrote:
> > >
> > > > On Thu, Jan 19, 2017 at 3:28 PM Amit Sela <am...@gmail.com>
> > wrote:
> > > >
> > > > On Fri, Jan 20, 2017 at 1:18 AM Ben Chambers
> > > <bchambers@google.com.invalid
> > > > >
> > > > wrote:
> > > >
> > > > > Part of the problem here is that whether attempted or committed is
> > > "what
> > > > > matters" depends on the metric. If you're counting the number of
> RPCs
> > > to
> > > > an
> > > > > external service, you may want all the attempts (to predict your
> > bill).
> > > > If
> > > > > you're tracking how long those RPCs took, you may want it just to
> be
> > > the
> > > > > committed (eg., what is the best-case time to execute your
> pipeline).
> > > > This
> > > > > is essentially Luke's case of wanting one or the other.
> > > > >
> > > > This sounds like Metrics should have a user-defined guarantee-level..
> > > which
> > > > might make more sense - Metrics.counter().attempted()/committed() -
> > > though
> > > > this might prove more challenging for runners to implement.
> > > >
> > > >
> > > > We went away from that because of cases like Luke's where a user
> might
> > > > want to compare the two. Or, not even realize there is a difference
> > > > up-front, so declaring ahead of time is difficult. If both are
> > available
> > > > and can be looked at, if they're the same -- no problems. If they're
> > > > different, then it provides a good reason to investigate and figure
> out
> > > the
> > > > difference.
> > > >
> > > >
> > > > >
> > > > > Regarding the step names -- the metrics are reported using the full
> > > step
> > > > > name, which is also made unique during the Graph API. So "My
> > > > > Composite/ParDo(MyDoFn)" or "My Composite/ParDo(MyDoFn)2" if there
> > are
> > > > > multiple instances within the same composite. Specifically -- the
> > names
> > > > are
> > > > > made unique prior to recording metrics, so there are no double
> > counts.
> > > > >
> > > > But how would the user know that ? I'm afraid this could be confusing
> > as
> > > a
> > > > user-facing query API, and I think most users would simply name
> metrics
> > > > differently.
> > > >
> > > >
> > > > The query API can support querying based on a sub-string of the full
> > > name,
> > > > and return metrics from all steps that match. That would allow the
> user
> > > to
> > > > query metrics without knowing that. Having unique names for steps is
> > > > important and useful for many other things (logging, associating time
> > > spent
> > > > executing, etc.).
> > > >
> > > >
> > > > >
> > > > > On Thu, Jan 19, 2017 at 1:57 PM Amit Sela <am...@gmail.com>
> > > wrote:
> > > > >
> > > > > > I think Luke's example is interesting, but I wonder how common it
> > > > > is/would
> > > > > > be ? I'd expect failures to happen but not in a rate that would
> be
> > so
> > > > > > dramatic that it'd be interesting to follow applicatively (I'd
> > expect
> > > > the
> > > > > > runner/cluster to properly monitor up time of processes/nodes
> > > > > separately).
> > > > > > And even if it is useful, I can't think of other use cases.
> > > > > >
> > > > > > I thought the idea was to "declare" the Metrics guarantee level
> in
> > > the
> > > > > > query API, but the more I think about it the more I tend to let
> it
> > go
> > > > for
> > > > > > the following reasons:
> > > > > >
> > > > > >    - Setting aside Luke's example, I think users would prefer the
> > > best
> > > > > >    guarantee a runner can provide. And on that note, I'd expect a
> > > > > > "getMetrics"
> > > > > >    API and not have to figure-out guarantees.
> > > > > >    - Programmatic querying would "break"
> > > > (UnsupportedOperationExecption)
> > > > > >    portability if a program that was running with a runner that
> > > > supports
> > > > > >    committed() would try to execute on a runner that only
> supports
> > > > > > attempted()
> > > > > >    - I know that portability is for the Pipeline and this is
> > > > > post-execution
> > > > > >    but still, call it 25% portability issue ;-) .
> > > > > >    - According to the Capability Matrix, all runners fail to
> > provide
> > > > > >    "commit" guarantee for Aggregators. I can only speak for Spark
> > > > saying
> > > > > > that
> > > > > >    supporting the Metrics API relies on the same underlying
> > mechanism
> > > > and
> > > > > > so
> > > > > >    nothing will change. I wonder about other runners, anyone
> plans
> > to
> > > > > > support
> > > > > >    "commit" guarantees for Metrics soon ? having said that, not
> > sure
> > > > this
> > > > > > is a
> > > > > >    good reason not to have this as a placeholder.
> > > > > >
> > > > > > Another question for querying Metrics - querying by step could
> be a
> > > bit
> > > > > > tricky since a runner is expected to keep unique naming/ids for
> > > steps,
> > > > > but
> > > > > > users are supposed to be aware of this here and I'd suspect users
> > > might
> > > > > not
> > > > > > follow and if they use the same ParDo in a couple of places
> they'll
> > > > query
> > > > > > it and it might be confusing for them to see "double counts" if
> > they
> > > > > didn't
> > > > > > mean for that.
> > > > > >
> > > > > > Amit.
> > > > > >
> > > > > > On Thu, Jan 19, 2017 at 7:36 PM Ben Chambers
> > > > > <bchambers@google.com.invalid
> > > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Thanks for starting the discussion! I'm going to hold off
> saying
> > > > what I
> > > > > > > think and instead just provide some background and additional
> > > > > questions,
> > > > > > > because I want to see where the discussion goes.
> > > > > > >
> > > > > > > When I first suggested the API for querying metrics I was
> adding
> > it
> > > > for
> > > > > > > parity with aggregators. A good first question might be does
> the
> > > > > pipeline
> > > > > > > result even need query methods? Runners could add them as
> > necessary
> > > > > based
> > > > > > > on the levels of querying the support.
> > > > > > >
> > > > > > > The other desire was to make the accuracy clear. One
> > implementation
> > > > > path
> > > > > > > was reporting metrics directly from the workers while
> attempting
> > > > work.
> > > > > > This
> > > > > > > can overcount when retrying and may be under the actual
> attempts
> > if
> > > > the
> > > > > > > worker lost connectivity before reporting.
> > > > > > >
> > > > > > > Another implementation was something like a side output where
> the
> > > > > counts
> > > > > > > are committed as part of each bundles results, and then
> > aggregated.
> > > > > This
> > > > > > > committed value is more accurate and represents the value that
> > > > occurred
> > > > > > > along the success path of the pipeline.
> > > > > > >
> > > > > > > I suspect there are other possible implementations so trying to
> > > make
> > > > an
> > > > > > API
> > > > > > > that expresses all of them is difficult. So:
> > > > > > >
> > > > > > > 1. Does pipeline result need to support querying (which is
> useful
> > > for
> > > > > > > programmatic consumption) or are metrics intended only to get
> > > values
> > > > > out
> > > > > > of
> > > > > > > a pipeline and into some metrics store?
> > > > > > >
> > > > > > > 2. How should pipeline results indicate the different kinds of
> > > > metrics?
> > > > > > > What if a runner supported multiple kinds (eg, the runner
> reports
> > > > both
> > > > > > > attempted and committed results)? As Luke mentions it may be
> > useful
> > > > to
> > > > > > look
> > > > > > > at both to understand how much retries affected the value.
> > > > > > > On Thu, Jan 19, 2017, 1:42 AM Aviem Zur <av...@gmail.com>
> > > wrote:
> > > > > > >
> > > > > > > Hi all,
> > > > > > >
> > > > > > > While working on the implementation of metrics API in Spark
> > runner
> > > > the
> > > > > > > question of committed vs. attempted results has come up,
> sparking
> > > (no
> > > > > pun
> > > > > > > intended) an interesting conversation.
> > > > > > > (See API: MetricResult
> > > > > > > <
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/metrics/MetricResult.java
> > > > > > > >
> > > > > > > and
> > > > > > > discussion: PR #1750 <https://github.com/apache/beam/pull/1750
> >)
> > > > > > >
> > > > > > > The separation of `attempted` and `committed` metric results
> > seems
> > > a
> > > > > bit
> > > > > > > unclear.
> > > > > > >
> > > > > > > Seeing that current implementations of aggregators in the
> > different
> > > > > > runners
> > > > > > > do not guarantee correctness, one could assume that the metrics
> > API
> > > > > > > implementations will also follow the same guarantees.
> > > > > > >
> > > > > > > If this is correct, then you could assume that only
> `attempted()`
> > > > > metrics
> > > > > > > results can be fulfilled.
> > > > > > > Would it then be better to just have a single method such as
> > > `get()`
> > > > in
> > > > > > the
> > > > > > > API, and have the guarantees of each runner explained in the
> > > > capability
> > > > > > > matrix / documentation?
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > >
> >
>

Re: Committed vs. attempted metrics results

Posted by Ben Chambers <bc...@google.com.INVALID>.
It think relaxing the query to not be an exact match is reasonable. I'm
wondering if it should be substring or regex. either one preserves the
existing behavior of, when passed a full step path returning only the
metrics for that specific step, but it adds the ability to just know
approximately the step name.

A) Util or not seems fine to me. If it seems likely to be reusable let's do
so. Preferably in a seperate PR or commit.

B) Yes, direct runner should match whatever semantics we choose.

On Thu, Jan 26, 2017, 5:30 AM Aviem Zur <av...@gmail.com> wrote:

> Ben - yes, there is still some ambiguity regarding the querying of the
> metrics results.
>
> You've discussed in this thread the notion that metrics step names should
> be converted to unique names when aggregating metrics, so that each step
> will aggregate its own metrics, and not join with other steps by mistake.
> The way you suggested to query this is that when querying by step, all
> steps which contain the query string as a substring will show up in the
> results.
>
> This sounds fine, however this is not how the direct runner is implemented
> right now. It is an exact match.
>
> In my PR I copied the MetricsResults filtering methods directly from the
> direct runner and suggested that since I copied them verbatim perhaps they
> should be pulled up to a more central module, and then all runners could
> use them.
>
> So, my remaining questions are:
> A) Should we create a util for filtering MetricsResults based on a query,
> using the suggested filtering implementation in this thread (Substring
> match) ?
> B) Should direct runner be changed to use such a util, and conform with the
> results filtering suggested.
>
> On Tue, Jan 24, 2017 at 2:03 AM Ben Chambers <bchambers@google.com.invalid
> >
> wrote:
>
> > For the short term, it seems like staying with the existing Query API and
> > allowing runner's to throw exceptions if a user issues a query that is
> not
> > supported is reasonable. It shouldn't affect the ability to run a Beam
> > pipeline on other runners, since the Query API is only exposed *after*
> the
> > pipeline is run.
> >
> > For the longer term, it seems like the Query API could merit some more
> > thought, especially if people have good use cases for accessing the value
> > of metrics programatically from the same program that ran the original
> > pipeline.
> >
> > Aviem -- is there anything specific that needs to be discussed/nailed
> down
> > to help with your PR?
> >
> > On Thu, Jan 19, 2017 at 3:57 PM Ben Chambers <bc...@google.com>
> wrote:
> >
> > > On Thu, Jan 19, 2017 at 3:28 PM Amit Sela <am...@gmail.com>
> wrote:
> > >
> > > On Fri, Jan 20, 2017 at 1:18 AM Ben Chambers
> > <bchambers@google.com.invalid
> > > >
> > > wrote:
> > >
> > > > Part of the problem here is that whether attempted or committed is
> > "what
> > > > matters" depends on the metric. If you're counting the number of RPCs
> > to
> > > an
> > > > external service, you may want all the attempts (to predict your
> bill).
> > > If
> > > > you're tracking how long those RPCs took, you may want it just to be
> > the
> > > > committed (eg., what is the best-case time to execute your pipeline).
> > > This
> > > > is essentially Luke's case of wanting one or the other.
> > > >
> > > This sounds like Metrics should have a user-defined guarantee-level..
> > which
> > > might make more sense - Metrics.counter().attempted()/committed() -
> > though
> > > this might prove more challenging for runners to implement.
> > >
> > >
> > > We went away from that because of cases like Luke's where a user might
> > > want to compare the two. Or, not even realize there is a difference
> > > up-front, so declaring ahead of time is difficult. If both are
> available
> > > and can be looked at, if they're the same -- no problems. If they're
> > > different, then it provides a good reason to investigate and figure out
> > the
> > > difference.
> > >
> > >
> > > >
> > > > Regarding the step names -- the metrics are reported using the full
> > step
> > > > name, which is also made unique during the Graph API. So "My
> > > > Composite/ParDo(MyDoFn)" or "My Composite/ParDo(MyDoFn)2" if there
> are
> > > > multiple instances within the same composite. Specifically -- the
> names
> > > are
> > > > made unique prior to recording metrics, so there are no double
> counts.
> > > >
> > > But how would the user know that ? I'm afraid this could be confusing
> as
> > a
> > > user-facing query API, and I think most users would simply name metrics
> > > differently.
> > >
> > >
> > > The query API can support querying based on a sub-string of the full
> > name,
> > > and return metrics from all steps that match. That would allow the user
> > to
> > > query metrics without knowing that. Having unique names for steps is
> > > important and useful for many other things (logging, associating time
> > spent
> > > executing, etc.).
> > >
> > >
> > > >
> > > > On Thu, Jan 19, 2017 at 1:57 PM Amit Sela <am...@gmail.com>
> > wrote:
> > > >
> > > > > I think Luke's example is interesting, but I wonder how common it
> > > > is/would
> > > > > be ? I'd expect failures to happen but not in a rate that would be
> so
> > > > > dramatic that it'd be interesting to follow applicatively (I'd
> expect
> > > the
> > > > > runner/cluster to properly monitor up time of processes/nodes
> > > > separately).
> > > > > And even if it is useful, I can't think of other use cases.
> > > > >
> > > > > I thought the idea was to "declare" the Metrics guarantee level in
> > the
> > > > > query API, but the more I think about it the more I tend to let it
> go
> > > for
> > > > > the following reasons:
> > > > >
> > > > >    - Setting aside Luke's example, I think users would prefer the
> > best
> > > > >    guarantee a runner can provide. And on that note, I'd expect a
> > > > > "getMetrics"
> > > > >    API and not have to figure-out guarantees.
> > > > >    - Programmatic querying would "break"
> > > (UnsupportedOperationExecption)
> > > > >    portability if a program that was running with a runner that
> > > supports
> > > > >    committed() would try to execute on a runner that only supports
> > > > > attempted()
> > > > >    - I know that portability is for the Pipeline and this is
> > > > post-execution
> > > > >    but still, call it 25% portability issue ;-) .
> > > > >    - According to the Capability Matrix, all runners fail to
> provide
> > > > >    "commit" guarantee for Aggregators. I can only speak for Spark
> > > saying
> > > > > that
> > > > >    supporting the Metrics API relies on the same underlying
> mechanism
> > > and
> > > > > so
> > > > >    nothing will change. I wonder about other runners, anyone plans
> to
> > > > > support
> > > > >    "commit" guarantees for Metrics soon ? having said that, not
> sure
> > > this
> > > > > is a
> > > > >    good reason not to have this as a placeholder.
> > > > >
> > > > > Another question for querying Metrics - querying by step could be a
> > bit
> > > > > tricky since a runner is expected to keep unique naming/ids for
> > steps,
> > > > but
> > > > > users are supposed to be aware of this here and I'd suspect users
> > might
> > > > not
> > > > > follow and if they use the same ParDo in a couple of places they'll
> > > query
> > > > > it and it might be confusing for them to see "double counts" if
> they
> > > > didn't
> > > > > mean for that.
> > > > >
> > > > > Amit.
> > > > >
> > > > > On Thu, Jan 19, 2017 at 7:36 PM Ben Chambers
> > > > <bchambers@google.com.invalid
> > > > > >
> > > > > wrote:
> > > > >
> > > > > > Thanks for starting the discussion! I'm going to hold off saying
> > > what I
> > > > > > think and instead just provide some background and additional
> > > > questions,
> > > > > > because I want to see where the discussion goes.
> > > > > >
> > > > > > When I first suggested the API for querying metrics I was adding
> it
> > > for
> > > > > > parity with aggregators. A good first question might be does the
> > > > pipeline
> > > > > > result even need query methods? Runners could add them as
> necessary
> > > > based
> > > > > > on the levels of querying the support.
> > > > > >
> > > > > > The other desire was to make the accuracy clear. One
> implementation
> > > > path
> > > > > > was reporting metrics directly from the workers while attempting
> > > work.
> > > > > This
> > > > > > can overcount when retrying and may be under the actual attempts
> if
> > > the
> > > > > > worker lost connectivity before reporting.
> > > > > >
> > > > > > Another implementation was something like a side output where the
> > > > counts
> > > > > > are committed as part of each bundles results, and then
> aggregated.
> > > > This
> > > > > > committed value is more accurate and represents the value that
> > > occurred
> > > > > > along the success path of the pipeline.
> > > > > >
> > > > > > I suspect there are other possible implementations so trying to
> > make
> > > an
> > > > > API
> > > > > > that expresses all of them is difficult. So:
> > > > > >
> > > > > > 1. Does pipeline result need to support querying (which is useful
> > for
> > > > > > programmatic consumption) or are metrics intended only to get
> > values
> > > > out
> > > > > of
> > > > > > a pipeline and into some metrics store?
> > > > > >
> > > > > > 2. How should pipeline results indicate the different kinds of
> > > metrics?
> > > > > > What if a runner supported multiple kinds (eg, the runner reports
> > > both
> > > > > > attempted and committed results)? As Luke mentions it may be
> useful
> > > to
> > > > > look
> > > > > > at both to understand how much retries affected the value.
> > > > > > On Thu, Jan 19, 2017, 1:42 AM Aviem Zur <av...@gmail.com>
> > wrote:
> > > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > While working on the implementation of metrics API in Spark
> runner
> > > the
> > > > > > question of committed vs. attempted results has come up, sparking
> > (no
> > > > pun
> > > > > > intended) an interesting conversation.
> > > > > > (See API: MetricResult
> > > > > > <
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/metrics/MetricResult.java
> > > > > > >
> > > > > > and
> > > > > > discussion: PR #1750 <https://github.com/apache/beam/pull/1750>)
> > > > > >
> > > > > > The separation of `attempted` and `committed` metric results
> seems
> > a
> > > > bit
> > > > > > unclear.
> > > > > >
> > > > > > Seeing that current implementations of aggregators in the
> different
> > > > > runners
> > > > > > do not guarantee correctness, one could assume that the metrics
> API
> > > > > > implementations will also follow the same guarantees.
> > > > > >
> > > > > > If this is correct, then you could assume that only `attempted()`
> > > > metrics
> > > > > > results can be fulfilled.
> > > > > > Would it then be better to just have a single method such as
> > `get()`
> > > in
> > > > > the
> > > > > > API, and have the guarantees of each runner explained in the
> > > capability
> > > > > > matrix / documentation?
> > > > > >
> > > > >
> > > >
> > >
> > >
> >
>

Re: Committed vs. attempted metrics results

Posted by Aviem Zur <av...@gmail.com>.
Ben - yes, there is still some ambiguity regarding the querying of the
metrics results.

You've discussed in this thread the notion that metrics step names should
be converted to unique names when aggregating metrics, so that each step
will aggregate its own metrics, and not join with other steps by mistake.
The way you suggested to query this is that when querying by step, all
steps which contain the query string as a substring will show up in the
results.

This sounds fine, however this is not how the direct runner is implemented
right now. It is an exact match.

In my PR I copied the MetricsResults filtering methods directly from the
direct runner and suggested that since I copied them verbatim perhaps they
should be pulled up to a more central module, and then all runners could
use them.

So, my remaining questions are:
A) Should we create a util for filtering MetricsResults based on a query,
using the suggested filtering implementation in this thread (Substring
match) ?
B) Should direct runner be changed to use such a util, and conform with the
results filtering suggested.

On Tue, Jan 24, 2017 at 2:03 AM Ben Chambers <bc...@google.com.invalid>
wrote:

> For the short term, it seems like staying with the existing Query API and
> allowing runner's to throw exceptions if a user issues a query that is not
> supported is reasonable. It shouldn't affect the ability to run a Beam
> pipeline on other runners, since the Query API is only exposed *after* the
> pipeline is run.
>
> For the longer term, it seems like the Query API could merit some more
> thought, especially if people have good use cases for accessing the value
> of metrics programatically from the same program that ran the original
> pipeline.
>
> Aviem -- is there anything specific that needs to be discussed/nailed down
> to help with your PR?
>
> On Thu, Jan 19, 2017 at 3:57 PM Ben Chambers <bc...@google.com> wrote:
>
> > On Thu, Jan 19, 2017 at 3:28 PM Amit Sela <am...@gmail.com> wrote:
> >
> > On Fri, Jan 20, 2017 at 1:18 AM Ben Chambers
> <bchambers@google.com.invalid
> > >
> > wrote:
> >
> > > Part of the problem here is that whether attempted or committed is
> "what
> > > matters" depends on the metric. If you're counting the number of RPCs
> to
> > an
> > > external service, you may want all the attempts (to predict your bill).
> > If
> > > you're tracking how long those RPCs took, you may want it just to be
> the
> > > committed (eg., what is the best-case time to execute your pipeline).
> > This
> > > is essentially Luke's case of wanting one or the other.
> > >
> > This sounds like Metrics should have a user-defined guarantee-level..
> which
> > might make more sense - Metrics.counter().attempted()/committed() -
> though
> > this might prove more challenging for runners to implement.
> >
> >
> > We went away from that because of cases like Luke's where a user might
> > want to compare the two. Or, not even realize there is a difference
> > up-front, so declaring ahead of time is difficult. If both are available
> > and can be looked at, if they're the same -- no problems. If they're
> > different, then it provides a good reason to investigate and figure out
> the
> > difference.
> >
> >
> > >
> > > Regarding the step names -- the metrics are reported using the full
> step
> > > name, which is also made unique during the Graph API. So "My
> > > Composite/ParDo(MyDoFn)" or "My Composite/ParDo(MyDoFn)2" if there are
> > > multiple instances within the same composite. Specifically -- the names
> > are
> > > made unique prior to recording metrics, so there are no double counts.
> > >
> > But how would the user know that ? I'm afraid this could be confusing as
> a
> > user-facing query API, and I think most users would simply name metrics
> > differently.
> >
> >
> > The query API can support querying based on a sub-string of the full
> name,
> > and return metrics from all steps that match. That would allow the user
> to
> > query metrics without knowing that. Having unique names for steps is
> > important and useful for many other things (logging, associating time
> spent
> > executing, etc.).
> >
> >
> > >
> > > On Thu, Jan 19, 2017 at 1:57 PM Amit Sela <am...@gmail.com>
> wrote:
> > >
> > > > I think Luke's example is interesting, but I wonder how common it
> > > is/would
> > > > be ? I'd expect failures to happen but not in a rate that would be so
> > > > dramatic that it'd be interesting to follow applicatively (I'd expect
> > the
> > > > runner/cluster to properly monitor up time of processes/nodes
> > > separately).
> > > > And even if it is useful, I can't think of other use cases.
> > > >
> > > > I thought the idea was to "declare" the Metrics guarantee level in
> the
> > > > query API, but the more I think about it the more I tend to let it go
> > for
> > > > the following reasons:
> > > >
> > > >    - Setting aside Luke's example, I think users would prefer the
> best
> > > >    guarantee a runner can provide. And on that note, I'd expect a
> > > > "getMetrics"
> > > >    API and not have to figure-out guarantees.
> > > >    - Programmatic querying would "break"
> > (UnsupportedOperationExecption)
> > > >    portability if a program that was running with a runner that
> > supports
> > > >    committed() would try to execute on a runner that only supports
> > > > attempted()
> > > >    - I know that portability is for the Pipeline and this is
> > > post-execution
> > > >    but still, call it 25% portability issue ;-) .
> > > >    - According to the Capability Matrix, all runners fail to provide
> > > >    "commit" guarantee for Aggregators. I can only speak for Spark
> > saying
> > > > that
> > > >    supporting the Metrics API relies on the same underlying mechanism
> > and
> > > > so
> > > >    nothing will change. I wonder about other runners, anyone plans to
> > > > support
> > > >    "commit" guarantees for Metrics soon ? having said that, not sure
> > this
> > > > is a
> > > >    good reason not to have this as a placeholder.
> > > >
> > > > Another question for querying Metrics - querying by step could be a
> bit
> > > > tricky since a runner is expected to keep unique naming/ids for
> steps,
> > > but
> > > > users are supposed to be aware of this here and I'd suspect users
> might
> > > not
> > > > follow and if they use the same ParDo in a couple of places they'll
> > query
> > > > it and it might be confusing for them to see "double counts" if they
> > > didn't
> > > > mean for that.
> > > >
> > > > Amit.
> > > >
> > > > On Thu, Jan 19, 2017 at 7:36 PM Ben Chambers
> > > <bchambers@google.com.invalid
> > > > >
> > > > wrote:
> > > >
> > > > > Thanks for starting the discussion! I'm going to hold off saying
> > what I
> > > > > think and instead just provide some background and additional
> > > questions,
> > > > > because I want to see where the discussion goes.
> > > > >
> > > > > When I first suggested the API for querying metrics I was adding it
> > for
> > > > > parity with aggregators. A good first question might be does the
> > > pipeline
> > > > > result even need query methods? Runners could add them as necessary
> > > based
> > > > > on the levels of querying the support.
> > > > >
> > > > > The other desire was to make the accuracy clear. One implementation
> > > path
> > > > > was reporting metrics directly from the workers while attempting
> > work.
> > > > This
> > > > > can overcount when retrying and may be under the actual attempts if
> > the
> > > > > worker lost connectivity before reporting.
> > > > >
> > > > > Another implementation was something like a side output where the
> > > counts
> > > > > are committed as part of each bundles results, and then aggregated.
> > > This
> > > > > committed value is more accurate and represents the value that
> > occurred
> > > > > along the success path of the pipeline.
> > > > >
> > > > > I suspect there are other possible implementations so trying to
> make
> > an
> > > > API
> > > > > that expresses all of them is difficult. So:
> > > > >
> > > > > 1. Does pipeline result need to support querying (which is useful
> for
> > > > > programmatic consumption) or are metrics intended only to get
> values
> > > out
> > > > of
> > > > > a pipeline and into some metrics store?
> > > > >
> > > > > 2. How should pipeline results indicate the different kinds of
> > metrics?
> > > > > What if a runner supported multiple kinds (eg, the runner reports
> > both
> > > > > attempted and committed results)? As Luke mentions it may be useful
> > to
> > > > look
> > > > > at both to understand how much retries affected the value.
> > > > > On Thu, Jan 19, 2017, 1:42 AM Aviem Zur <av...@gmail.com>
> wrote:
> > > > >
> > > > > Hi all,
> > > > >
> > > > > While working on the implementation of metrics API in Spark runner
> > the
> > > > > question of committed vs. attempted results has come up, sparking
> (no
> > > pun
> > > > > intended) an interesting conversation.
> > > > > (See API: MetricResult
> > > > > <
> > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/metrics/MetricResult.java
> > > > > >
> > > > > and
> > > > > discussion: PR #1750 <https://github.com/apache/beam/pull/1750>)
> > > > >
> > > > > The separation of `attempted` and `committed` metric results seems
> a
> > > bit
> > > > > unclear.
> > > > >
> > > > > Seeing that current implementations of aggregators in the different
> > > > runners
> > > > > do not guarantee correctness, one could assume that the metrics API
> > > > > implementations will also follow the same guarantees.
> > > > >
> > > > > If this is correct, then you could assume that only `attempted()`
> > > metrics
> > > > > results can be fulfilled.
> > > > > Would it then be better to just have a single method such as
> `get()`
> > in
> > > > the
> > > > > API, and have the guarantees of each runner explained in the
> > capability
> > > > > matrix / documentation?
> > > > >
> > > >
> > >
> >
> >
>

Re: Committed vs. attempted metrics results

Posted by Ben Chambers <bc...@google.com.INVALID>.
For the short term, it seems like staying with the existing Query API and
allowing runner's to throw exceptions if a user issues a query that is not
supported is reasonable. It shouldn't affect the ability to run a Beam
pipeline on other runners, since the Query API is only exposed *after* the
pipeline is run.

For the longer term, it seems like the Query API could merit some more
thought, especially if people have good use cases for accessing the value
of metrics programatically from the same program that ran the original
pipeline.

Aviem -- is there anything specific that needs to be discussed/nailed down
to help with your PR?

On Thu, Jan 19, 2017 at 3:57 PM Ben Chambers <bc...@google.com> wrote:

> On Thu, Jan 19, 2017 at 3:28 PM Amit Sela <am...@gmail.com> wrote:
>
> On Fri, Jan 20, 2017 at 1:18 AM Ben Chambers <bchambers@google.com.invalid
> >
> wrote:
>
> > Part of the problem here is that whether attempted or committed is "what
> > matters" depends on the metric. If you're counting the number of RPCs to
> an
> > external service, you may want all the attempts (to predict your bill).
> If
> > you're tracking how long those RPCs took, you may want it just to be the
> > committed (eg., what is the best-case time to execute your pipeline).
> This
> > is essentially Luke's case of wanting one or the other.
> >
> This sounds like Metrics should have a user-defined guarantee-level.. which
> might make more sense - Metrics.counter().attempted()/committed() - though
> this might prove more challenging for runners to implement.
>
>
> We went away from that because of cases like Luke's where a user might
> want to compare the two. Or, not even realize there is a difference
> up-front, so declaring ahead of time is difficult. If both are available
> and can be looked at, if they're the same -- no problems. If they're
> different, then it provides a good reason to investigate and figure out the
> difference.
>
>
> >
> > Regarding the step names -- the metrics are reported using the full step
> > name, which is also made unique during the Graph API. So "My
> > Composite/ParDo(MyDoFn)" or "My Composite/ParDo(MyDoFn)2" if there are
> > multiple instances within the same composite. Specifically -- the names
> are
> > made unique prior to recording metrics, so there are no double counts.
> >
> But how would the user know that ? I'm afraid this could be confusing as a
> user-facing query API, and I think most users would simply name metrics
> differently.
>
>
> The query API can support querying based on a sub-string of the full name,
> and return metrics from all steps that match. That would allow the user to
> query metrics without knowing that. Having unique names for steps is
> important and useful for many other things (logging, associating time spent
> executing, etc.).
>
>
> >
> > On Thu, Jan 19, 2017 at 1:57 PM Amit Sela <am...@gmail.com> wrote:
> >
> > > I think Luke's example is interesting, but I wonder how common it
> > is/would
> > > be ? I'd expect failures to happen but not in a rate that would be so
> > > dramatic that it'd be interesting to follow applicatively (I'd expect
> the
> > > runner/cluster to properly monitor up time of processes/nodes
> > separately).
> > > And even if it is useful, I can't think of other use cases.
> > >
> > > I thought the idea was to "declare" the Metrics guarantee level in the
> > > query API, but the more I think about it the more I tend to let it go
> for
> > > the following reasons:
> > >
> > >    - Setting aside Luke's example, I think users would prefer the best
> > >    guarantee a runner can provide. And on that note, I'd expect a
> > > "getMetrics"
> > >    API and not have to figure-out guarantees.
> > >    - Programmatic querying would "break"
> (UnsupportedOperationExecption)
> > >    portability if a program that was running with a runner that
> supports
> > >    committed() would try to execute on a runner that only supports
> > > attempted()
> > >    - I know that portability is for the Pipeline and this is
> > post-execution
> > >    but still, call it 25% portability issue ;-) .
> > >    - According to the Capability Matrix, all runners fail to provide
> > >    "commit" guarantee for Aggregators. I can only speak for Spark
> saying
> > > that
> > >    supporting the Metrics API relies on the same underlying mechanism
> and
> > > so
> > >    nothing will change. I wonder about other runners, anyone plans to
> > > support
> > >    "commit" guarantees for Metrics soon ? having said that, not sure
> this
> > > is a
> > >    good reason not to have this as a placeholder.
> > >
> > > Another question for querying Metrics - querying by step could be a bit
> > > tricky since a runner is expected to keep unique naming/ids for steps,
> > but
> > > users are supposed to be aware of this here and I'd suspect users might
> > not
> > > follow and if they use the same ParDo in a couple of places they'll
> query
> > > it and it might be confusing for them to see "double counts" if they
> > didn't
> > > mean for that.
> > >
> > > Amit.
> > >
> > > On Thu, Jan 19, 2017 at 7:36 PM Ben Chambers
> > <bchambers@google.com.invalid
> > > >
> > > wrote:
> > >
> > > > Thanks for starting the discussion! I'm going to hold off saying
> what I
> > > > think and instead just provide some background and additional
> > questions,
> > > > because I want to see where the discussion goes.
> > > >
> > > > When I first suggested the API for querying metrics I was adding it
> for
> > > > parity with aggregators. A good first question might be does the
> > pipeline
> > > > result even need query methods? Runners could add them as necessary
> > based
> > > > on the levels of querying the support.
> > > >
> > > > The other desire was to make the accuracy clear. One implementation
> > path
> > > > was reporting metrics directly from the workers while attempting
> work.
> > > This
> > > > can overcount when retrying and may be under the actual attempts if
> the
> > > > worker lost connectivity before reporting.
> > > >
> > > > Another implementation was something like a side output where the
> > counts
> > > > are committed as part of each bundles results, and then aggregated.
> > This
> > > > committed value is more accurate and represents the value that
> occurred
> > > > along the success path of the pipeline.
> > > >
> > > > I suspect there are other possible implementations so trying to make
> an
> > > API
> > > > that expresses all of them is difficult. So:
> > > >
> > > > 1. Does pipeline result need to support querying (which is useful for
> > > > programmatic consumption) or are metrics intended only to get values
> > out
> > > of
> > > > a pipeline and into some metrics store?
> > > >
> > > > 2. How should pipeline results indicate the different kinds of
> metrics?
> > > > What if a runner supported multiple kinds (eg, the runner reports
> both
> > > > attempted and committed results)? As Luke mentions it may be useful
> to
> > > look
> > > > at both to understand how much retries affected the value.
> > > > On Thu, Jan 19, 2017, 1:42 AM Aviem Zur <av...@gmail.com> wrote:
> > > >
> > > > Hi all,
> > > >
> > > > While working on the implementation of metrics API in Spark runner
> the
> > > > question of committed vs. attempted results has come up, sparking (no
> > pun
> > > > intended) an interesting conversation.
> > > > (See API: MetricResult
> > > > <
> > > >
> > > >
> > >
> >
> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/metrics/MetricResult.java
> > > > >
> > > > and
> > > > discussion: PR #1750 <https://github.com/apache/beam/pull/1750>)
> > > >
> > > > The separation of `attempted` and `committed` metric results seems a
> > bit
> > > > unclear.
> > > >
> > > > Seeing that current implementations of aggregators in the different
> > > runners
> > > > do not guarantee correctness, one could assume that the metrics API
> > > > implementations will also follow the same guarantees.
> > > >
> > > > If this is correct, then you could assume that only `attempted()`
> > metrics
> > > > results can be fulfilled.
> > > > Would it then be better to just have a single method such as `get()`
> in
> > > the
> > > > API, and have the guarantees of each runner explained in the
> capability
> > > > matrix / documentation?
> > > >
> > >
> >
>
>

Re: Committed vs. attempted metrics results

Posted by Ben Chambers <bc...@google.com.INVALID>.
On Thu, Jan 19, 2017 at 3:28 PM Amit Sela <am...@gmail.com> wrote:

> On Fri, Jan 20, 2017 at 1:18 AM Ben Chambers <bchambers@google.com.invalid
> >
> wrote:
>
> > Part of the problem here is that whether attempted or committed is "what
> > matters" depends on the metric. If you're counting the number of RPCs to
> an
> > external service, you may want all the attempts (to predict your bill).
> If
> > you're tracking how long those RPCs took, you may want it just to be the
> > committed (eg., what is the best-case time to execute your pipeline).
> This
> > is essentially Luke's case of wanting one or the other.
> >
> This sounds like Metrics should have a user-defined guarantee-level.. which
> might make more sense - Metrics.counter().attempted()/committed() - though
> this might prove more challenging for runners to implement.
>

We went away from that because of cases like Luke's where a user might want
to compare the two. Or, not even realize there is a difference up-front, so
declaring ahead of time is difficult. If both are available and can be
looked at, if they're the same -- no problems. If they're different, then
it provides a good reason to investigate and figure out the difference.


> >
> > Regarding the step names -- the metrics are reported using the full step
> > name, which is also made unique during the Graph API. So "My
> > Composite/ParDo(MyDoFn)" or "My Composite/ParDo(MyDoFn)2" if there are
> > multiple instances within the same composite. Specifically -- the names
> are
> > made unique prior to recording metrics, so there are no double counts.
> >
> But how would the user know that ? I'm afraid this could be confusing as a
> user-facing query API, and I think most users would simply name metrics
> differently.
>

The query API can support querying based on a sub-string of the full name,
and return metrics from all steps that match. That would allow the user to
query metrics without knowing that. Having unique names for steps is
important and useful for many other things (logging, associating time spent
executing, etc.).


> >
> > On Thu, Jan 19, 2017 at 1:57 PM Amit Sela <am...@gmail.com> wrote:
> >
> > > I think Luke's example is interesting, but I wonder how common it
> > is/would
> > > be ? I'd expect failures to happen but not in a rate that would be so
> > > dramatic that it'd be interesting to follow applicatively (I'd expect
> the
> > > runner/cluster to properly monitor up time of processes/nodes
> > separately).
> > > And even if it is useful, I can't think of other use cases.
> > >
> > > I thought the idea was to "declare" the Metrics guarantee level in the
> > > query API, but the more I think about it the more I tend to let it go
> for
> > > the following reasons:
> > >
> > >    - Setting aside Luke's example, I think users would prefer the best
> > >    guarantee a runner can provide. And on that note, I'd expect a
> > > "getMetrics"
> > >    API and not have to figure-out guarantees.
> > >    - Programmatic querying would "break"
> (UnsupportedOperationExecption)
> > >    portability if a program that was running with a runner that
> supports
> > >    committed() would try to execute on a runner that only supports
> > > attempted()
> > >    - I know that portability is for the Pipeline and this is
> > post-execution
> > >    but still, call it 25% portability issue ;-) .
> > >    - According to the Capability Matrix, all runners fail to provide
> > >    "commit" guarantee for Aggregators. I can only speak for Spark
> saying
> > > that
> > >    supporting the Metrics API relies on the same underlying mechanism
> and
> > > so
> > >    nothing will change. I wonder about other runners, anyone plans to
> > > support
> > >    "commit" guarantees for Metrics soon ? having said that, not sure
> this
> > > is a
> > >    good reason not to have this as a placeholder.
> > >
> > > Another question for querying Metrics - querying by step could be a bit
> > > tricky since a runner is expected to keep unique naming/ids for steps,
> > but
> > > users are supposed to be aware of this here and I'd suspect users might
> > not
> > > follow and if they use the same ParDo in a couple of places they'll
> query
> > > it and it might be confusing for them to see "double counts" if they
> > didn't
> > > mean for that.
> > >
> > > Amit.
> > >
> > > On Thu, Jan 19, 2017 at 7:36 PM Ben Chambers
> > <bchambers@google.com.invalid
> > > >
> > > wrote:
> > >
> > > > Thanks for starting the discussion! I'm going to hold off saying
> what I
> > > > think and instead just provide some background and additional
> > questions,
> > > > because I want to see where the discussion goes.
> > > >
> > > > When I first suggested the API for querying metrics I was adding it
> for
> > > > parity with aggregators. A good first question might be does the
> > pipeline
> > > > result even need query methods? Runners could add them as necessary
> > based
> > > > on the levels of querying the support.
> > > >
> > > > The other desire was to make the accuracy clear. One implementation
> > path
> > > > was reporting metrics directly from the workers while attempting
> work.
> > > This
> > > > can overcount when retrying and may be under the actual attempts if
> the
> > > > worker lost connectivity before reporting.
> > > >
> > > > Another implementation was something like a side output where the
> > counts
> > > > are committed as part of each bundles results, and then aggregated.
> > This
> > > > committed value is more accurate and represents the value that
> occurred
> > > > along the success path of the pipeline.
> > > >
> > > > I suspect there are other possible implementations so trying to make
> an
> > > API
> > > > that expresses all of them is difficult. So:
> > > >
> > > > 1. Does pipeline result need to support querying (which is useful for
> > > > programmatic consumption) or are metrics intended only to get values
> > out
> > > of
> > > > a pipeline and into some metrics store?
> > > >
> > > > 2. How should pipeline results indicate the different kinds of
> metrics?
> > > > What if a runner supported multiple kinds (eg, the runner reports
> both
> > > > attempted and committed results)? As Luke mentions it may be useful
> to
> > > look
> > > > at both to understand how much retries affected the value.
> > > > On Thu, Jan 19, 2017, 1:42 AM Aviem Zur <av...@gmail.com> wrote:
> > > >
> > > > Hi all,
> > > >
> > > > While working on the implementation of metrics API in Spark runner
> the
> > > > question of committed vs. attempted results has come up, sparking (no
> > pun
> > > > intended) an interesting conversation.
> > > > (See API: MetricResult
> > > > <
> > > >
> > > >
> > >
> >
> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/metrics/MetricResult.java
> > > > >
> > > > and
> > > > discussion: PR #1750 <https://github.com/apache/beam/pull/1750>)
> > > >
> > > > The separation of `attempted` and `committed` metric results seems a
> > bit
> > > > unclear.
> > > >
> > > > Seeing that current implementations of aggregators in the different
> > > runners
> > > > do not guarantee correctness, one could assume that the metrics API
> > > > implementations will also follow the same guarantees.
> > > >
> > > > If this is correct, then you could assume that only `attempted()`
> > metrics
> > > > results can be fulfilled.
> > > > Would it then be better to just have a single method such as `get()`
> in
> > > the
> > > > API, and have the guarantees of each runner explained in the
> capability
> > > > matrix / documentation?
> > > >
> > >
> >
>

Re: Committed vs. attempted metrics results

Posted by Amit Sela <am...@gmail.com>.
On Fri, Jan 20, 2017 at 1:18 AM Ben Chambers <bc...@google.com.invalid>
wrote:

> Part of the problem here is that whether attempted or committed is "what
> matters" depends on the metric. If you're counting the number of RPCs to an
> external service, you may want all the attempts (to predict your bill). If
> you're tracking how long those RPCs took, you may want it just to be the
> committed (eg., what is the best-case time to execute your pipeline). This
> is essentially Luke's case of wanting one or the other.
>
This sounds like Metrics should have a user-defined guarantee-level.. which
might make more sense - Metrics.counter().attempted()/committed() - though
this might prove more challenging for runners to implement.

>
> Regarding the step names -- the metrics are reported using the full step
> name, which is also made unique during the Graph API. So "My
> Composite/ParDo(MyDoFn)" or "My Composite/ParDo(MyDoFn)2" if there are
> multiple instances within the same composite. Specifically -- the names are
> made unique prior to recording metrics, so there are no double counts.
>
But how would the user know that ? I'm afraid this could be confusing as a
user-facing query API, and I think most users would simply name metrics
differently.

>
> On Thu, Jan 19, 2017 at 1:57 PM Amit Sela <am...@gmail.com> wrote:
>
> > I think Luke's example is interesting, but I wonder how common it
> is/would
> > be ? I'd expect failures to happen but not in a rate that would be so
> > dramatic that it'd be interesting to follow applicatively (I'd expect the
> > runner/cluster to properly monitor up time of processes/nodes
> separately).
> > And even if it is useful, I can't think of other use cases.
> >
> > I thought the idea was to "declare" the Metrics guarantee level in the
> > query API, but the more I think about it the more I tend to let it go for
> > the following reasons:
> >
> >    - Setting aside Luke's example, I think users would prefer the best
> >    guarantee a runner can provide. And on that note, I'd expect a
> > "getMetrics"
> >    API and not have to figure-out guarantees.
> >    - Programmatic querying would "break" (UnsupportedOperationExecption)
> >    portability if a program that was running with a runner that supports
> >    committed() would try to execute on a runner that only supports
> > attempted()
> >    - I know that portability is for the Pipeline and this is
> post-execution
> >    but still, call it 25% portability issue ;-) .
> >    - According to the Capability Matrix, all runners fail to provide
> >    "commit" guarantee for Aggregators. I can only speak for Spark saying
> > that
> >    supporting the Metrics API relies on the same underlying mechanism and
> > so
> >    nothing will change. I wonder about other runners, anyone plans to
> > support
> >    "commit" guarantees for Metrics soon ? having said that, not sure this
> > is a
> >    good reason not to have this as a placeholder.
> >
> > Another question for querying Metrics - querying by step could be a bit
> > tricky since a runner is expected to keep unique naming/ids for steps,
> but
> > users are supposed to be aware of this here and I'd suspect users might
> not
> > follow and if they use the same ParDo in a couple of places they'll query
> > it and it might be confusing for them to see "double counts" if they
> didn't
> > mean for that.
> >
> > Amit.
> >
> > On Thu, Jan 19, 2017 at 7:36 PM Ben Chambers
> <bchambers@google.com.invalid
> > >
> > wrote:
> >
> > > Thanks for starting the discussion! I'm going to hold off saying what I
> > > think and instead just provide some background and additional
> questions,
> > > because I want to see where the discussion goes.
> > >
> > > When I first suggested the API for querying metrics I was adding it for
> > > parity with aggregators. A good first question might be does the
> pipeline
> > > result even need query methods? Runners could add them as necessary
> based
> > > on the levels of querying the support.
> > >
> > > The other desire was to make the accuracy clear. One implementation
> path
> > > was reporting metrics directly from the workers while attempting work.
> > This
> > > can overcount when retrying and may be under the actual attempts if the
> > > worker lost connectivity before reporting.
> > >
> > > Another implementation was something like a side output where the
> counts
> > > are committed as part of each bundles results, and then aggregated.
> This
> > > committed value is more accurate and represents the value that occurred
> > > along the success path of the pipeline.
> > >
> > > I suspect there are other possible implementations so trying to make an
> > API
> > > that expresses all of them is difficult. So:
> > >
> > > 1. Does pipeline result need to support querying (which is useful for
> > > programmatic consumption) or are metrics intended only to get values
> out
> > of
> > > a pipeline and into some metrics store?
> > >
> > > 2. How should pipeline results indicate the different kinds of metrics?
> > > What if a runner supported multiple kinds (eg, the runner reports both
> > > attempted and committed results)? As Luke mentions it may be useful to
> > look
> > > at both to understand how much retries affected the value.
> > > On Thu, Jan 19, 2017, 1:42 AM Aviem Zur <av...@gmail.com> wrote:
> > >
> > > Hi all,
> > >
> > > While working on the implementation of metrics API in Spark runner the
> > > question of committed vs. attempted results has come up, sparking (no
> pun
> > > intended) an interesting conversation.
> > > (See API: MetricResult
> > > <
> > >
> > >
> >
> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/metrics/MetricResult.java
> > > >
> > > and
> > > discussion: PR #1750 <https://github.com/apache/beam/pull/1750>)
> > >
> > > The separation of `attempted` and `committed` metric results seems a
> bit
> > > unclear.
> > >
> > > Seeing that current implementations of aggregators in the different
> > runners
> > > do not guarantee correctness, one could assume that the metrics API
> > > implementations will also follow the same guarantees.
> > >
> > > If this is correct, then you could assume that only `attempted()`
> metrics
> > > results can be fulfilled.
> > > Would it then be better to just have a single method such as `get()` in
> > the
> > > API, and have the guarantees of each runner explained in the capability
> > > matrix / documentation?
> > >
> >
>

Re: Committed vs. attempted metrics results

Posted by Ben Chambers <bc...@google.com.INVALID>.
Part of the problem here is that whether attempted or committed is "what
matters" depends on the metric. If you're counting the number of RPCs to an
external service, you may want all the attempts (to predict your bill). If
you're tracking how long those RPCs took, you may want it just to be the
committed (eg., what is the best-case time to execute your pipeline). This
is essentially Luke's case of wanting one or the other.

Regarding the step names -- the metrics are reported using the full step
name, which is also made unique during the Graph API. So "My
Composite/ParDo(MyDoFn)" or "My Composite/ParDo(MyDoFn)2" if there are
multiple instances within the same composite. Specifically -- the names are
made unique prior to recording metrics, so there are no double counts.

On Thu, Jan 19, 2017 at 1:57 PM Amit Sela <am...@gmail.com> wrote:

> I think Luke's example is interesting, but I wonder how common it is/would
> be ? I'd expect failures to happen but not in a rate that would be so
> dramatic that it'd be interesting to follow applicatively (I'd expect the
> runner/cluster to properly monitor up time of processes/nodes separately).
> And even if it is useful, I can't think of other use cases.
>
> I thought the idea was to "declare" the Metrics guarantee level in the
> query API, but the more I think about it the more I tend to let it go for
> the following reasons:
>
>    - Setting aside Luke's example, I think users would prefer the best
>    guarantee a runner can provide. And on that note, I'd expect a
> "getMetrics"
>    API and not have to figure-out guarantees.
>    - Programmatic querying would "break" (UnsupportedOperationExecption)
>    portability if a program that was running with a runner that supports
>    committed() would try to execute on a runner that only supports
> attempted()
>    - I know that portability is for the Pipeline and this is post-execution
>    but still, call it 25% portability issue ;-) .
>    - According to the Capability Matrix, all runners fail to provide
>    "commit" guarantee for Aggregators. I can only speak for Spark saying
> that
>    supporting the Metrics API relies on the same underlying mechanism and
> so
>    nothing will change. I wonder about other runners, anyone plans to
> support
>    "commit" guarantees for Metrics soon ? having said that, not sure this
> is a
>    good reason not to have this as a placeholder.
>
> Another question for querying Metrics - querying by step could be a bit
> tricky since a runner is expected to keep unique naming/ids for steps, but
> users are supposed to be aware of this here and I'd suspect users might not
> follow and if they use the same ParDo in a couple of places they'll query
> it and it might be confusing for them to see "double counts" if they didn't
> mean for that.
>
> Amit.
>
> On Thu, Jan 19, 2017 at 7:36 PM Ben Chambers <bchambers@google.com.invalid
> >
> wrote:
>
> > Thanks for starting the discussion! I'm going to hold off saying what I
> > think and instead just provide some background and additional questions,
> > because I want to see where the discussion goes.
> >
> > When I first suggested the API for querying metrics I was adding it for
> > parity with aggregators. A good first question might be does the pipeline
> > result even need query methods? Runners could add them as necessary based
> > on the levels of querying the support.
> >
> > The other desire was to make the accuracy clear. One implementation path
> > was reporting metrics directly from the workers while attempting work.
> This
> > can overcount when retrying and may be under the actual attempts if the
> > worker lost connectivity before reporting.
> >
> > Another implementation was something like a side output where the counts
> > are committed as part of each bundles results, and then aggregated. This
> > committed value is more accurate and represents the value that occurred
> > along the success path of the pipeline.
> >
> > I suspect there are other possible implementations so trying to make an
> API
> > that expresses all of them is difficult. So:
> >
> > 1. Does pipeline result need to support querying (which is useful for
> > programmatic consumption) or are metrics intended only to get values out
> of
> > a pipeline and into some metrics store?
> >
> > 2. How should pipeline results indicate the different kinds of metrics?
> > What if a runner supported multiple kinds (eg, the runner reports both
> > attempted and committed results)? As Luke mentions it may be useful to
> look
> > at both to understand how much retries affected the value.
> > On Thu, Jan 19, 2017, 1:42 AM Aviem Zur <av...@gmail.com> wrote:
> >
> > Hi all,
> >
> > While working on the implementation of metrics API in Spark runner the
> > question of committed vs. attempted results has come up, sparking (no pun
> > intended) an interesting conversation.
> > (See API: MetricResult
> > <
> >
> >
> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/metrics/MetricResult.java
> > >
> > and
> > discussion: PR #1750 <https://github.com/apache/beam/pull/1750>)
> >
> > The separation of `attempted` and `committed` metric results seems a bit
> > unclear.
> >
> > Seeing that current implementations of aggregators in the different
> runners
> > do not guarantee correctness, one could assume that the metrics API
> > implementations will also follow the same guarantees.
> >
> > If this is correct, then you could assume that only `attempted()` metrics
> > results can be fulfilled.
> > Would it then be better to just have a single method such as `get()` in
> the
> > API, and have the guarantees of each runner explained in the capability
> > matrix / documentation?
> >
>

Re: Committed vs. attempted metrics results

Posted by Amit Sela <am...@gmail.com>.
I think Luke's example is interesting, but I wonder how common it is/would
be ? I'd expect failures to happen but not in a rate that would be so
dramatic that it'd be interesting to follow applicatively (I'd expect the
runner/cluster to properly monitor up time of processes/nodes separately).
And even if it is useful, I can't think of other use cases.

I thought the idea was to "declare" the Metrics guarantee level in the
query API, but the more I think about it the more I tend to let it go for
the following reasons:

   - Setting aside Luke's example, I think users would prefer the best
   guarantee a runner can provide. And on that note, I'd expect a "getMetrics"
   API and not have to figure-out guarantees.
   - Programmatic querying would "break" (UnsupportedOperationExecption)
   portability if a program that was running with a runner that supports
   committed() would try to execute on a runner that only supports attempted()
   - I know that portability is for the Pipeline and this is post-execution
   but still, call it 25% portability issue ;-) .
   - According to the Capability Matrix, all runners fail to provide
   "commit" guarantee for Aggregators. I can only speak for Spark saying that
   supporting the Metrics API relies on the same underlying mechanism and so
   nothing will change. I wonder about other runners, anyone plans to support
   "commit" guarantees for Metrics soon ? having said that, not sure this is a
   good reason not to have this as a placeholder.

Another question for querying Metrics - querying by step could be a bit
tricky since a runner is expected to keep unique naming/ids for steps, but
users are supposed to be aware of this here and I'd suspect users might not
follow and if they use the same ParDo in a couple of places they'll query
it and it might be confusing for them to see "double counts" if they didn't
mean for that.

Amit.

On Thu, Jan 19, 2017 at 7:36 PM Ben Chambers <bc...@google.com.invalid>
wrote:

> Thanks for starting the discussion! I'm going to hold off saying what I
> think and instead just provide some background and additional questions,
> because I want to see where the discussion goes.
>
> When I first suggested the API for querying metrics I was adding it for
> parity with aggregators. A good first question might be does the pipeline
> result even need query methods? Runners could add them as necessary based
> on the levels of querying the support.
>
> The other desire was to make the accuracy clear. One implementation path
> was reporting metrics directly from the workers while attempting work. This
> can overcount when retrying and may be under the actual attempts if the
> worker lost connectivity before reporting.
>
> Another implementation was something like a side output where the counts
> are committed as part of each bundles results, and then aggregated. This
> committed value is more accurate and represents the value that occurred
> along the success path of the pipeline.
>
> I suspect there are other possible implementations so trying to make an API
> that expresses all of them is difficult. So:
>
> 1. Does pipeline result need to support querying (which is useful for
> programmatic consumption) or are metrics intended only to get values out of
> a pipeline and into some metrics store?
>
> 2. How should pipeline results indicate the different kinds of metrics?
> What if a runner supported multiple kinds (eg, the runner reports both
> attempted and committed results)? As Luke mentions it may be useful to look
> at both to understand how much retries affected the value.
> On Thu, Jan 19, 2017, 1:42 AM Aviem Zur <av...@gmail.com> wrote:
>
> Hi all,
>
> While working on the implementation of metrics API in Spark runner the
> question of committed vs. attempted results has come up, sparking (no pun
> intended) an interesting conversation.
> (See API: MetricResult
> <
>
> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/metrics/MetricResult.java
> >
> and
> discussion: PR #1750 <https://github.com/apache/beam/pull/1750>)
>
> The separation of `attempted` and `committed` metric results seems a bit
> unclear.
>
> Seeing that current implementations of aggregators in the different runners
> do not guarantee correctness, one could assume that the metrics API
> implementations will also follow the same guarantees.
>
> If this is correct, then you could assume that only `attempted()` metrics
> results can be fulfilled.
> Would it then be better to just have a single method such as `get()` in the
> API, and have the guarantees of each runner explained in the capability
> matrix / documentation?
>

Re: Committed vs. attempted metrics results

Posted by Ben Chambers <bc...@google.com.INVALID>.
Thanks for starting the discussion! I'm going to hold off saying what I
think and instead just provide some background and additional questions,
because I want to see where the discussion goes.

When I first suggested the API for querying metrics I was adding it for
parity with aggregators. A good first question might be does the pipeline
result even need query methods? Runners could add them as necessary based
on the levels of querying the support.

The other desire was to make the accuracy clear. One implementation path
was reporting metrics directly from the workers while attempting work. This
can overcount when retrying and may be under the actual attempts if the
worker lost connectivity before reporting.

Another implementation was something like a side output where the counts
are committed as part of each bundles results, and then aggregated. This
committed value is more accurate and represents the value that occurred
along the success path of the pipeline.

I suspect there are other possible implementations so trying to make an API
that expresses all of them is difficult. So:

1. Does pipeline result need to support querying (which is useful for
programmatic consumption) or are metrics intended only to get values out of
a pipeline and into some metrics store?

2. How should pipeline results indicate the different kinds of metrics?
What if a runner supported multiple kinds (eg, the runner reports both
attempted and committed results)? As Luke mentions it may be useful to look
at both to understand how much retries affected the value.
On Thu, Jan 19, 2017, 1:42 AM Aviem Zur <av...@gmail.com> wrote:

Hi all,

While working on the implementation of metrics API in Spark runner the
question of committed vs. attempted results has come up, sparking (no pun
intended) an interesting conversation.
(See API: MetricResult
<
https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/metrics/MetricResult.java
>
and
discussion: PR #1750 <https://github.com/apache/beam/pull/1750>)

The separation of `attempted` and `committed` metric results seems a bit
unclear.

Seeing that current implementations of aggregators in the different runners
do not guarantee correctness, one could assume that the metrics API
implementations will also follow the same guarantees.

If this is correct, then you could assume that only `attempted()` metrics
results can be fulfilled.
Would it then be better to just have a single method such as `get()` in the
API, and have the guarantees of each runner explained in the capability
matrix / documentation?