You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@storm.apache.org by Jungtaek Lim <ka...@gmail.com> on 2016/05/18 03:09:07 UTC

Regarding types of DataPoint value for Metrics Consumer

Hi devs,

Since IMetric#getValueAndReset doesn't restrict return type, it gives us
flexibility but metrics consumer should parse the value without context
(having just some assumptions).

I've look into some open source metrics consumers, and many of them support
Number, Map<String, Number/String>, and one of them supports Nested Map.
For the case of Map its key is appended to metric key and value is
converted to 'double'. I think it would be enough, but I'm not sure we can
rely on all metrics consumers to handle properly, too.

I feel if we can recommend proper types of DataPoint value for storing
metrics to time-series DB via metrics consumer it would be great. It can be
used to protocol between IMetric users and metrics consumer developers.

What do you think?

Thanks,
Jungtaek Lim (HeartSaVioR)

ps. I'm not heavy user of time-series DBs (I researched some, but they
don't document type/size of value clearly) so if someone could give the
information of supporting type/size of value of time-series DBs it should
be great for me. Or we can just assume number as 'double' as above and go
forward.

Re: Regarding types of DataPoint value for Metrics Consumer

Posted by Jungtaek Lim <ka...@gmail.com>.

Abhishek,

The information about what your company is facing is in fact what I really
wanted to hear from community. Thanks for providing these in detail.

For me I didn't face something directly, but heard that attaching metrics
consumer incurred huge degradation of performance of whole topology.

There're two things I'm suspecting:
1. metrics consumer can't keep up
2. mass amount of metrics tuples (due to having lots of tasks) which some
of them need to be serialized/deserialized and transferred.

I'm addressing 1. from async metrics consumer bolt and metrics filter on
consumer, but as I said IMetric makes it really hard to resolve 2. I
strongly agree that metric tick completely doesn't make sense so I tried to
fix it, but IMetric doesn't guarantee thread-safeness so metrics couldn't
be collected from other threads (also means other places), and it even also
prevents aggregation on worker level.

There're some ideas in mind to help reducing the problem with keeping
backward compatibility, but I also agree that we should take a new approach
if we want to clear out all these problems. Seems like we need to initiate
thread for discussion or vote.

Thanks,
Jungtaek Lim (HeartSaVioR)

2016년 5월 19일 (목) 오전 2:55, Abhishek Agarwal <ab...@gmail.com>님이 작성:

> I remember from a previous discussion that codahale metrics are shaded
> inside storm-core and that breaks compatibility with any existing
> plugins/reporters. Will it not be a problem here? And btw does it need to
> be shaded?
>
> @Jungteek - Exactly what are the core issues you have run into w.r.t
> metrics? At my company, we make heavy use of metrics. And there were two
> major issues, we faced
>
>    - explosion of metrics as the number of tasks increase - This put up a
>    lot of unnecessary load on the graphite servers even though we were only
>    interested in machine level aggregated metric. Aggregation is difficult
> to
>    solve while keeping backward compatibility intact.
>    - metric tick in the same queue as message queue - If bolt is slow or
>    blocked, metrics for that bolt will not be emitted since metric-tick
> won't
>    be consumed by bolt. It can cause a lot of confusion as . [Refer
> STORM-972]
>    - Only averages are emitted for latency in many places while histograms
>    are more useful.
>
> I know you are trying to solve many problems with metric collection but
> solving these problems independently of each other might not be the best
> approach. I would vote for implementing a backward incompatible solution if
> it solves all these problems in clean way.
>
> On Wed, May 18, 2016 at 9:55 PM, P. Taylor Goetz <pt...@gmail.com>
> wrote:
>
> > +1 for standardizing on drop wizard/Coda Hale’s metrics library. It’s a
> > solid library that’s widely used and understood.
> >
> > -Taylor
> >
> >
> > > On May 18, 2016, at 10:22 AM, Bobby Evans <evans@yahoo-inc.com.INVALID
> >
> > wrote:
> > >
> > > There are a lot of things that I dislike about IMetric.  It provides
> too
> > much flexibility and at the same time not enough information/conventions
> to
> > be able to interpret the numbers it returns correctly.  We recently had a
> > case where someone was trying to compute an average using a ReducedMetric
> > and a MeanReducer (which by the way should be deprecated because it is
> > fundamentally flawed).  This hands the metric collector an average.  How
> is
> > it supposed to combine one average with another when doing a roll up,
> > either across components or across time ranges?  It just does not work
> > mathematically unless you know that all of the averages had the exact
> same
> > number of operations in them, which we cannot know.
> > > This is why dropwizard and other metrics systems have a specific set up
> > supported metrics, not object, that they know mathematically work out.  A
> > gauge is different from a counter, which is different from a ratio, or a
> > meter, or a timer, or a histogram.  Please lets not reinvent the wheel
> > here, we already did it wrong once, lets not do it wrong again.  We are
> > using dropwizard in other places in the code internally, I would prefer
> > that we standardize on it, or a thin wrapper around it based on the same
> > concepts.  Or if there is a different API that someone here would prefer
> > that we use that is fine with me too.  But lets not write it ourselves,
> > lets take from the experts who have spent a long time building something
> > that works.
> > >  - Bobby
> > >
> > >    On Tuesday, May 17, 2016 10:10 PM, Jungtaek Lim <ka...@gmail.com>
> > wrote:
> > >
> > >
> > > Hi devs,
> > >
> > > Since IMetric#getValueAndReset doesn't restrict return type, it gives
> us
> > > flexibility but metrics consumer should parse the value without context
> > > (having just some assumptions).
> > >
> > > I've look into some open source metrics consumers, and many of them
> > support
> > > Number, Map<String, Number/String>, and one of them supports Nested
> Map.
> > > For the case of Map its key is appended to metric key and value is
> > > converted to 'double'. I think it would be enough, but I'm not sure we
> > can
> > > rely on all metrics consumers to handle properly, too.
> > >
> > > I feel if we can recommend proper types of DataPoint value for storing
> > > metrics to time-series DB via metrics consumer it would be great. It
> can
> > be
> > > used to protocol between IMetric users and metrics consumer developers.
> > >
> > > What do you think?
> > >
> > > Thanks,
> > > Jungtaek Lim (HeartSaVioR)
> > >
> > > ps. I'm not heavy user of time-series DBs (I researched some, but they
> > > don't document type/size of value clearly) so if someone could give the
> > > information of supporting type/size of value of time-series DBs it
> should
> > > be great for me. Or we can just assume number as 'double' as above and
> go
> > > forward.
> > >
> > >
> >
> >
>
>
> --
> Regards,
> Abhishek Agarwal
>

Re: Regarding types of DataPoint value for Metrics Consumer

Posted by "P. Taylor Goetz" <pt...@gmail.com>.

+1 I agree.

I think we have been considering that the clojure migration/JStorm merge would become 2.0. Maybe we should reconsider that in order to allow major version bumps and closer adherence to semantic versioning. I can also think of a few areas where I’d like to see some backward-incompatible version changes.

-Taylor

> On May 20, 2016, at 10:25 AM, Bobby Evans <ev...@yahoo-inc.com.INVALID> wrote:
> 
> Yes, so what I would propose is...
> 
> If we add in a new dependency to the storm lib directory that is not shaded, or decide to stop shading a dependency it should at least be a minor version bump, but I would prefer a major version bump (potentially binary incompatible change).If we bump a version number for an unshaded dependency it should follow with a corresponding version bump in storm.  Major for Major, minor for minor, bugfix for bugfix (which can more or less be ignored).  Except in the cases like guava where they more or less ignore this process and any change can break backwards compatibility so just be careful.If we decide to remove a dependency or start shading a dependency that was not shaded previously, it should be like adding a new one,   I would prefer a major version bump, but minor is acceptable.
> my 2 cents.
>  - Bobby
> 
>    On Thursday, May 19, 2016 3:39 PM, P. Taylor Goetz <pt...@gmail.com> wrote:
> 
> 
> But if we shade a dependency, are’t we effectively making it *private*, in which case a minor revision bump would be unnecessary? I would think using the shaded/private versions of storm’s dependencies should be done at one’s own risk, i.e. knowing they are subject to change.
> 
> -Taylor
> 
>> On May 19, 2016, at 2:41 PM, Bobby Evans <ev...@yahoo-inc.com.INVALID> wrote:
>> 
>> We almost never add new dependencies to storm-core without shading first.  I don't think it will be an explosion in minor revisions, but that is just me.
>>   - Bobby
>> 
>>     On Wednesday, May 18, 2016 4:58 PM, P. Taylor Goetz <pt...@gmail.com> wrote:
>> 
>> 
>> Good point.
>> 
>> I guess the problem is that we have no way of knowing what dependencies a user could potentially use in their topologies. Semantic versioning can't really help with that without a potential explosions of minor revisions. Maybe the best option is to document such dependency changes in release notes?
>> 
>> -Taylor
>> 
>>> On May 18, 2016, at 4:47 PM, Bobby Evans <ev...@yahoo-inc.com.INVALID> wrote:
>>> 
>>> But adding something that was not there before is just the same thing.  If I have a topology that is using guava, and storm adds in a dependency with a different version to the classpath without shading it, I just broke that topology.
>>>   - Bobby
>>> 
>>>     On Wednesday, May 18, 2016 2:26 PM, P. Taylor Goetz <pt...@gmail.com> wrote:
>>> 
>>> 
>>> 
>>> 
>>> On May 18, 2016, at 2:46 PM, Bobby Evans <ev...@yahoo-inc.com.INVALID> wrote:
>>> If this is for 1.x then we might want to go to 1.1, not sure what the policy is for adding something new to the classpath.
>>> - Bobby
>>> 
>>> There was a discussion thread around this. My feeling is that for 1.x, since we are adding APIs we should bump the minor version (i.e. 1.1.x).
>>> I don’t think adding something to the classpath warrants a minor version bump unless it adds something to a *Storm* API.
>>> -Taylor
>>> 
>> 
> 
>

Re: Regarding types of DataPoint value for Metrics Consumer

Posted by Bobby Evans <ev...@yahoo-inc.com.INVALID>.

Yes, so what I would propose is... 

If we add in a new dependency to the storm lib directory that is not shaded, or decide to stop shading a dependency it should at least be a minor version bump, but I would prefer a major version bump (potentially binary incompatible change).If we bump a version number for an unshaded dependency it should follow with a corresponding version bump in storm.  Major for Major, minor for minor, bugfix for bugfix (which can more or less be ignored).  Except in the cases like guava where they more or less ignore this process and any change can break backwards compatibility so just be careful.If we decide to remove a dependency or start shading a dependency that was not shaded previously, it should be like adding a new one,   I would prefer a major version bump, but minor is acceptable. 
my 2 cents.
 - Bobby 

    On Thursday, May 19, 2016 3:39 PM, P. Taylor Goetz <pt...@gmail.com> wrote:
 

 But if we shade a dependency, are’t we effectively making it *private*, in which case a minor revision bump would be unnecessary? I would think using the shaded/private versions of storm’s dependencies should be done at one’s own risk, i.e. knowing they are subject to change.

-Taylor

> On May 19, 2016, at 2:41 PM, Bobby Evans <ev...@yahoo-inc.com.INVALID> wrote:
> 
> We almost never add new dependencies to storm-core without shading first.  I don't think it will be an explosion in minor revisions, but that is just me.
>  - Bobby
> 
>    On Wednesday, May 18, 2016 4:58 PM, P. Taylor Goetz <pt...@gmail.com> wrote:
> 
> 
> Good point.
> 
> I guess the problem is that we have no way of knowing what dependencies a user could potentially use in their topologies. Semantic versioning can't really help with that without a potential explosions of minor revisions. Maybe the best option is to document such dependency changes in release notes?
> 
> -Taylor
> 
>> On May 18, 2016, at 4:47 PM, Bobby Evans <ev...@yahoo-inc.com.INVALID> wrote:
>> 
>> But adding something that was not there before is just the same thing.  If I have a topology that is using guava, and storm adds in a dependency with a different version to the classpath without shading it, I just broke that topology.
>>  - Bobby
>> 
>>    On Wednesday, May 18, 2016 2:26 PM, P. Taylor Goetz <pt...@gmail.com> wrote:
>> 
>> 
>> 
>> 
>> On May 18, 2016, at 2:46 PM, Bobby Evans <ev...@yahoo-inc.com.INVALID> wrote:
>> If this is for 1.x then we might want to go to 1.1, not sure what the policy is for adding something new to the classpath.
>> - Bobby
>> 
>> There was a discussion thread around this. My feeling is that for 1.x, since we are adding APIs we should bump the minor version (i.e. 1.1.x).
>> I don’t think adding something to the classpath warrants a minor version bump unless it adds something to a *Storm* API.
>> -Taylor
>> 
>

Re: Regarding types of DataPoint value for Metrics Consumer

Posted by "P. Taylor Goetz" <pt...@gmail.com>.

But if we shade a dependency, are’t we effectively making it *private*, in which case a minor revision bump would be unnecessary? I would think using the shaded/private versions of storm’s dependencies should be done at one’s own risk, i.e. knowing they are subject to change.

-Taylor

> On May 19, 2016, at 2:41 PM, Bobby Evans <ev...@yahoo-inc.com.INVALID> wrote:
> 
> We almost never add new dependencies to storm-core without shading first.  I don't think it will be an explosion in minor revisions, but that is just me.
>  - Bobby
> 
>    On Wednesday, May 18, 2016 4:58 PM, P. Taylor Goetz <pt...@gmail.com> wrote:
> 
> 
> Good point.
> 
> I guess the problem is that we have no way of knowing what dependencies a user could potentially use in their topologies. Semantic versioning can't really help with that without a potential explosions of minor revisions. Maybe the best option is to document such dependency changes in release notes?
> 
> -Taylor
> 
>> On May 18, 2016, at 4:47 PM, Bobby Evans <ev...@yahoo-inc.com.INVALID> wrote:
>> 
>> But adding something that was not there before is just the same thing.  If I have a topology that is using guava, and storm adds in a dependency with a different version to the classpath without shading it, I just broke that topology.
>>   - Bobby
>> 
>>     On Wednesday, May 18, 2016 2:26 PM, P. Taylor Goetz <pt...@gmail.com> wrote:
>> 
>> 
>> 
>> 
>> On May 18, 2016, at 2:46 PM, Bobby Evans <ev...@yahoo-inc.com.INVALID> wrote:
>> If this is for 1.x then we might want to go to 1.1, not sure what the policy is for adding something new to the classpath.
>> - Bobby
>> 
>> There was a discussion thread around this. My feeling is that for 1.x, since we are adding APIs we should bump the minor version (i.e. 1.1.x).
>> I don’t think adding something to the classpath warrants a minor version bump unless it adds something to a *Storm* API.
>> -Taylor
>> 
>

Re: Regarding types of DataPoint value for Metrics Consumer

Posted by Bobby Evans <ev...@yahoo-inc.com.INVALID>.

We almost never add new dependencies to storm-core without shading first.  I don't think it will be an explosion in minor revisions, but that is just me.
 - Bobby 

    On Wednesday, May 18, 2016 4:58 PM, P. Taylor Goetz <pt...@gmail.com> wrote:
 

 Good point.

I guess the problem is that we have no way of knowing what dependencies a user could potentially use in their topologies. Semantic versioning can't really help with that without a potential explosions of minor revisions. Maybe the best option is to document such dependency changes in release notes?

-Taylor

> On May 18, 2016, at 4:47 PM, Bobby Evans <ev...@yahoo-inc.com.INVALID> wrote:
> 
> But adding something that was not there before is just the same thing.  If I have a topology that is using guava, and storm adds in a dependency with a different version to the classpath without shading it, I just broke that topology.
>  - Bobby 
> 
>    On Wednesday, May 18, 2016 2:26 PM, P. Taylor Goetz <pt...@gmail.com> wrote:
> 
> 
> 
> 
> On May 18, 2016, at 2:46 PM, Bobby Evans <ev...@yahoo-inc.com.INVALID> wrote:
> If this is for 1.x then we might want to go to 1.1, not sure what the policy is for adding something new to the classpath.
> - Bobby
> 
> There was a discussion thread around this. My feeling is that for 1.x, since we are adding APIs we should bump the minor version (i.e. 1.1.x).
> I don’t think adding something to the classpath warrants a minor version bump unless it adds something to a *Storm* API.
> -Taylor
>

Re: Regarding types of DataPoint value for Metrics Consumer

Posted by "P. Taylor Goetz" <pt...@gmail.com>.

Good point.

I guess the problem is that we have no way of knowing what dependencies a user could potentially use in their topologies. Semantic versioning can't really help with that without a potential explosions of minor revisions. Maybe the best option is to document such dependency changes in release notes?

-Taylor

> On May 18, 2016, at 4:47 PM, Bobby Evans <ev...@yahoo-inc.com.INVALID> wrote:
> 
> But adding something that was not there before is just the same thing.  If I have a topology that is using guava, and storm adds in a dependency with a different version to the classpath without shading it, I just broke that topology.
>  - Bobby 
> 
>    On Wednesday, May 18, 2016 2:26 PM, P. Taylor Goetz <pt...@gmail.com> wrote:
> 
> 
> 
> 
> On May 18, 2016, at 2:46 PM, Bobby Evans <ev...@yahoo-inc.com.INVALID> wrote:
> If this is for 1.x then we might want to go to 1.1, not sure what the policy is for adding something new to the classpath.
> - Bobby
> 
> There was a discussion thread around this. My feeling is that for 1.x, since we are adding APIs we should bump the minor version (i.e. 1.1.x).
> I don’t think adding something to the classpath warrants a minor version bump unless it adds something to a *Storm* API.
> -Taylor
>

Re: Regarding types of DataPoint value for Metrics Consumer

Posted by Bobby Evans <ev...@yahoo-inc.com.INVALID>.

But adding something that was not there before is just the same thing.  If I have a topology that is using guava, and storm adds in a dependency with a different version to the classpath without shading it, I just broke that topology.
 - Bobby 

    On Wednesday, May 18, 2016 2:26 PM, P. Taylor Goetz <pt...@gmail.com> wrote:
 

 

On May 18, 2016, at 2:46 PM, Bobby Evans <ev...@yahoo-inc.com.INVALID> wrote:
If this is for 1.x then we might want to go to 1.1, not sure what the policy is for adding something new to the classpath.
- Bobby

There was a discussion thread around this. My feeling is that for 1.x, since we are adding APIs we should bump the minor version (i.e. 1.1.x).
I don’t think adding something to the classpath warrants a minor version bump unless it adds something to a *Storm* API.
-Taylor

Re: Regarding types of DataPoint value for Metrics Consumer

Posted by Bobby Evans <ev...@yahoo-inc.com>.

I disagree. If I have something that uses guava x.y and storm moves to guava y.z now my topology will no longer run.  It is not that minor of a change.
 - Bobby 

    On Wednesday, May 18, 2016 2:01 PM, P. Taylor Goetz <pt...@gmail.com> wrote:
 

 

On May 18, 2016, at 2:46 PM, Bobby Evans <ev...@yahoo-inc.com.INVALID> wrote:
If this is for 1.x then we might want to go to 1.1, not sure what the policy is for adding something new to the classpath.
- Bobby

There was a discussion thread around this. My feeling is that for 1.x, since we are adding APIs we should bump the minor version (i.e. 1.1.x).
I don’t think adding something to the classpath warrants a minor version bump unless it adds something to a *Storm* API.
-Taylor

Re: Regarding types of DataPoint value for Metrics Consumer

Posted by "P. Taylor Goetz" <pt...@gmail.com>.

> On May 18, 2016, at 2:46 PM, Bobby Evans <ev...@yahoo-inc.com.INVALID> wrote:
> 
> If this is for 1.x then we might want to go to 1.1, not sure what the policy is for adding something new to the classpath.
> - Bobby

There was a discussion thread around this. My feeling is that for 1.x, since we are adding APIs we should bump the minor version (i.e. 1.1.x).

I don’t think adding something to the classpath warrants a minor version bump unless it adds something to a *Storm* API.

-Taylor

Re: Regarding types of DataPoint value for Metrics Consumer

Posted by Bobby Evans <ev...@yahoo-inc.com.INVALID>.

Yes it is shaded, but it dosn't have to be.  It was shaded, because it is not user facing, and so adding it to the classpath potentially could cause some issues.  If we standardize on it then we would want to stop shading it, or we could put in a small shim layer in front of it so we could swap it out later if we wanted to.  Either would be fine.  But if we change the classpath we would need to bump a version number of some kind. If this is for 2.0 then it is already bumped so no worries.  If this is for 1.x then we might want to go to 1.1, not sure what the policy is for adding something new to the classpath.
- Bobby
  - Bobby 

    On Wednesday, May 18, 2016 12:56 PM, Abhishek Agarwal <ab...@gmail.com> wrote:
 

 I remember from a previous discussion that codahale metrics are shaded inside storm-core and that breaks compatibility with any existing plugins/reporters. Will it not be a problem here? And btw does it need to be shaded?
@Jungteek - Exactly what are the core issues you have run into w.r.t metrics? At my company, we make heavy use of metrics. And there were two major issues, we faced    
   - explosion of metrics as the number of tasks increase - This put up a lot of unnecessary load on the graphite servers even though we were only interested in machine level aggregated metric. Aggregation is difficult to solve while keeping backward compatibility intact.    

   - metric tick in the same queue as message queue - If bolt is slow or blocked, metrics for that bolt will not be emitted since metric-tick won't be consumed by bolt. It can cause a lot of confusion as . [Refer STORM-972]
   - Only averages are emitted for latency in many places while histograms are more useful. 
I know you are trying to solve many problems with metric collection but solving these problems independently of each other might not be the best approach. I would vote for implementing a backward incompatible solution if it solves all these problems in clean way.
On Wed, May 18, 2016 at 9:55 PM, P. Taylor Goetz <pt...@gmail.com> wrote:

+1 for standardizing on drop wizard/Coda Hale’s metrics library. It’s a solid library that’s widely used and understood.

-Taylor


> On May 18, 2016, at 10:22 AM, Bobby Evans <ev...@yahoo-inc.com.INVALID> wrote:
>
> There are a lot of things that I dislike about IMetric.  It provides too much flexibility and at the same time not enough information/conventions to be able to interpret the numbers it returns correctly.  We recently had a case where someone was trying to compute an average using a ReducedMetric and a MeanReducer (which by the way should be deprecated because it is fundamentally flawed).  This hands the metric collector an average.  How is it supposed to combine one average with another when doing a roll up, either across components or across time ranges?  It just does not work mathematically unless you know that all of the averages had the exact same number of operations in them, which we cannot know.
> This is why dropwizard and other metrics systems have a specific set up supported metrics, not object, that they know mathematically work out.  A gauge is different from a counter, which is different from a ratio, or a meter, or a timer, or a histogram.  Please lets not reinvent the wheel here, we already did it wrong once, lets not do it wrong again.  We are using dropwizard in other places in the code internally, I would prefer that we standardize on it, or a thin wrapper around it based on the same concepts.  Or if there is a different API that someone here would prefer that we use that is fine with me too.  But lets not write it ourselves, lets take from the experts who have spent a long time building something that works.
>  - Bobby
>
>    On Tuesday, May 17, 2016 10:10 PM, Jungtaek Lim <ka...@gmail.com> wrote:
>
>
> Hi devs,
>
> Since IMetric#getValueAndReset doesn't restrict return type, it gives us
> flexibility but metrics consumer should parse the value without context
> (having just some assumptions).
>
> I've look into some open source metrics consumers, and many of them support
> Number, Map<String, Number/String>, and one of them supports Nested Map.
> For the case of Map its key is appended to metric key and value is
> converted to 'double'. I think it would be enough, but I'm not sure we can
> rely on all metrics consumers to handle properly, too.
>
> I feel if we can recommend proper types of DataPoint value for storing
> metrics to time-series DB via metrics consumer it would be great. It can be
> used to protocol between IMetric users and metrics consumer developers.
>
> What do you think?
>
> Thanks,
> Jungtaek Lim (HeartSaVioR)
>
> ps. I'm not heavy user of time-series DBs (I researched some, but they
> don't document type/size of value clearly) so if someone could give the
> information of supporting type/size of value of time-series DBs it should
> be great for me. Or we can just assume number as 'double' as above and go
> forward.
>
>





-- 
Regards,
Abhishek Agarwal

Re: Regarding types of DataPoint value for Metrics Consumer

Posted by Abhishek Agarwal <ab...@gmail.com>.

I remember from a previous discussion that codahale metrics are shaded
inside storm-core and that breaks compatibility with any existing
plugins/reporters. Will it not be a problem here? And btw does it need to
be shaded?

@Jungteek - Exactly what are the core issues you have run into w.r.t
metrics? At my company, we make heavy use of metrics. And there were two
major issues, we faced

   - explosion of metrics as the number of tasks increase - This put up a
   lot of unnecessary load on the graphite servers even though we were only
   interested in machine level aggregated metric. Aggregation is difficult to
   solve while keeping backward compatibility intact.
   - metric tick in the same queue as message queue - If bolt is slow or
   blocked, metrics for that bolt will not be emitted since metric-tick won't
   be consumed by bolt. It can cause a lot of confusion as . [Refer STORM-972]
   - Only averages are emitted for latency in many places while histograms
   are more useful.

I know you are trying to solve many problems with metric collection but
solving these problems independently of each other might not be the best
approach. I would vote for implementing a backward incompatible solution if
it solves all these problems in clean way.

On Wed, May 18, 2016 at 9:55 PM, P. Taylor Goetz <pt...@gmail.com> wrote:

> +1 for standardizing on drop wizard/Coda Hale’s metrics library. It’s a
> solid library that’s widely used and understood.
>
> -Taylor
>
>
> > On May 18, 2016, at 10:22 AM, Bobby Evans <ev...@yahoo-inc.com.INVALID>
> wrote:
> >
> > There are a lot of things that I dislike about IMetric.  It provides too
> much flexibility and at the same time not enough information/conventions to
> be able to interpret the numbers it returns correctly.  We recently had a
> case where someone was trying to compute an average using a ReducedMetric
> and a MeanReducer (which by the way should be deprecated because it is
> fundamentally flawed).  This hands the metric collector an average.  How is
> it supposed to combine one average with another when doing a roll up,
> either across components or across time ranges?  It just does not work
> mathematically unless you know that all of the averages had the exact same
> number of operations in them, which we cannot know.
> > This is why dropwizard and other metrics systems have a specific set up
> supported metrics, not object, that they know mathematically work out.  A
> gauge is different from a counter, which is different from a ratio, or a
> meter, or a timer, or a histogram.  Please lets not reinvent the wheel
> here, we already did it wrong once, lets not do it wrong again.  We are
> using dropwizard in other places in the code internally, I would prefer
> that we standardize on it, or a thin wrapper around it based on the same
> concepts.  Or if there is a different API that someone here would prefer
> that we use that is fine with me too.  But lets not write it ourselves,
> lets take from the experts who have spent a long time building something
> that works.
> >  - Bobby
> >
> >    On Tuesday, May 17, 2016 10:10 PM, Jungtaek Lim <ka...@gmail.com>
> wrote:
> >
> >
> > Hi devs,
> >
> > Since IMetric#getValueAndReset doesn't restrict return type, it gives us
> > flexibility but metrics consumer should parse the value without context
> > (having just some assumptions).
> >
> > I've look into some open source metrics consumers, and many of them
> support
> > Number, Map<String, Number/String>, and one of them supports Nested Map.
> > For the case of Map its key is appended to metric key and value is
> > converted to 'double'. I think it would be enough, but I'm not sure we
> can
> > rely on all metrics consumers to handle properly, too.
> >
> > I feel if we can recommend proper types of DataPoint value for storing
> > metrics to time-series DB via metrics consumer it would be great. It can
> be
> > used to protocol between IMetric users and metrics consumer developers.
> >
> > What do you think?
> >
> > Thanks,
> > Jungtaek Lim (HeartSaVioR)
> >
> > ps. I'm not heavy user of time-series DBs (I researched some, but they
> > don't document type/size of value clearly) so if someone could give the
> > information of supporting type/size of value of time-series DBs it should
> > be great for me. Or we can just assume number as 'double' as above and go
> > forward.
> >
> >
>
>


-- 
Regards,
Abhishek Agarwal

Re: Regarding types of DataPoint value for Metrics Consumer

Posted by "P. Taylor Goetz" <pt...@gmail.com>.

+1 for standardizing on drop wizard/Coda Hale’s metrics library. It’s a solid library that’s widely used and understood.

-Taylor


> On May 18, 2016, at 10:22 AM, Bobby Evans <ev...@yahoo-inc.com.INVALID> wrote:
> 
> There are a lot of things that I dislike about IMetric.  It provides too much flexibility and at the same time not enough information/conventions to be able to interpret the numbers it returns correctly.  We recently had a case where someone was trying to compute an average using a ReducedMetric and a MeanReducer (which by the way should be deprecated because it is fundamentally flawed).  This hands the metric collector an average.  How is it supposed to combine one average with another when doing a roll up, either across components or across time ranges?  It just does not work mathematically unless you know that all of the averages had the exact same number of operations in them, which we cannot know.
> This is why dropwizard and other metrics systems have a specific set up supported metrics, not object, that they know mathematically work out.  A gauge is different from a counter, which is different from a ratio, or a meter, or a timer, or a histogram.  Please lets not reinvent the wheel here, we already did it wrong once, lets not do it wrong again.  We are using dropwizard in other places in the code internally, I would prefer that we standardize on it, or a thin wrapper around it based on the same concepts.  Or if there is a different API that someone here would prefer that we use that is fine with me too.  But lets not write it ourselves, lets take from the experts who have spent a long time building something that works.
>  - Bobby
> 
>    On Tuesday, May 17, 2016 10:10 PM, Jungtaek Lim <ka...@gmail.com> wrote:
> 
> 
> Hi devs,
> 
> Since IMetric#getValueAndReset doesn't restrict return type, it gives us
> flexibility but metrics consumer should parse the value without context
> (having just some assumptions).
> 
> I've look into some open source metrics consumers, and many of them support
> Number, Map<String, Number/String>, and one of them supports Nested Map.
> For the case of Map its key is appended to metric key and value is
> converted to 'double'. I think it would be enough, but I'm not sure we can
> rely on all metrics consumers to handle properly, too.
> 
> I feel if we can recommend proper types of DataPoint value for storing
> metrics to time-series DB via metrics consumer it would be great. It can be
> used to protocol between IMetric users and metrics consumer developers.
> 
> What do you think?
> 
> Thanks,
> Jungtaek Lim (HeartSaVioR)
> 
> ps. I'm not heavy user of time-series DBs (I researched some, but they
> don't document type/size of value clearly) so if someone could give the
> information of supporting type/size of value of time-series DBs it should
> be great for me. Or we can just assume number as 'double' as above and go
> forward.
> 
>

Re: Regarding types of DataPoint value for Metrics Consumer

Posted by Jungtaek Lim <ka...@gmail.com>.

Yes totally agreed with you.

For me, getValueAndReset() itself is more painful to modify anything
regarding metrics.
I feel that getValueAndReset() was born with precondition that caller
should ensure thread-safe. At least kinds of implementation of IMetric what
Storm provides now are NOT thread-safe, and guaranteeing thread-safety is
not a requirement for custom IMetric. (Nowhere claims that)  Since metrics
are being handled per task, and normally reader and writer for IMetric are
using same thread so there's no issue at all. But such precondition and
backward compatibility makes me giving up collecting metrics at worker
level for optimization.
(I also saw the complicated logic in CountStatAndMetric
and LatencyStatAndMetric which are for handling both heartbeat message and
built-in metrics.)

Btw, what I'm concerning is all about backward compatibility. As you may
know, what I've been addressing is done without breaking backward
compatibility. It's based on precondition: work for Storm 2.0.0 is in
progress and in phase 2 we are planning to evaluate metrics feature of
JStorm and may adopt. We all don't know when we complete work for Storm
2.0.0 for now and I need some improvements on current metrics so I'm
addressing some for 1.x, but if we want to start discussion for new metrics
right now, that would be also good.

2016년 5월 18일 (수) 오후 11:22, Bobby Evans <ev...@yahoo-inc.com.invalid>님이 작성:

> There are a lot of things that I dislike about IMetric.  It provides too
> much flexibility and at the same time not enough information/conventions to
> be able to interpret the numbers it returns correctly.  We recently had a
> case where someone was trying to compute an average using a ReducedMetric
> and a MeanReducer (which by the way should be deprecated because it is
> fundamentally flawed).  This hands the metric collector an average.  How is
> it supposed to combine one average with another when doing a roll up,
> either across components or across time ranges?  It just does not work
> mathematically unless you know that all of the averages had the exact same
> number of operations in them, which we cannot know.
> This is why dropwizard and other metrics systems have a specific set up
> supported metrics, not object, that they know mathematically work out.  A
> gauge is different from a counter, which is different from a ratio, or a
> meter, or a timer, or a histogram.  Please lets not reinvent the wheel
> here, we already did it wrong once, lets not do it wrong again.  We are
> using dropwizard in other places in the code internally, I would prefer
> that we standardize on it, or a thin wrapper around it based on the same
> concepts.  Or if there is a different API that someone here would prefer
> that we use that is fine with me too.  But lets not write it ourselves,
> lets take from the experts who have spent a long time building something
> that works.
>  - Bobby
>
>     On Tuesday, May 17, 2016 10:10 PM, Jungtaek Lim <ka...@gmail.com>
> wrote:
>
>
>  Hi devs,
>
> Since IMetric#getValueAndReset doesn't restrict return type, it gives us
> flexibility but metrics consumer should parse the value without context
> (having just some assumptions).
>
> I've look into some open source metrics consumers, and many of them support
> Number, Map<String, Number/String>, and one of them supports Nested Map.
> For the case of Map its key is appended to metric key and value is
> converted to 'double'. I think it would be enough, but I'm not sure we can
> rely on all metrics consumers to handle properly, too.
>
> I feel if we can recommend proper types of DataPoint value for storing
> metrics to time-series DB via metrics consumer it would be great. It can be
> used to protocol between IMetric users and metrics consumer developers.
>
> What do you think?
>
> Thanks,
> Jungtaek Lim (HeartSaVioR)
>
> ps. I'm not heavy user of time-series DBs (I researched some, but they
> don't document type/size of value clearly) so if someone could give the
> information of supporting type/size of value of time-series DBs it should
> be great for me. Or we can just assume number as 'double' as above and go
> forward.
>
>
>

Re: Regarding types of DataPoint value for Metrics Consumer

Posted by Bobby Evans <ev...@yahoo-inc.com.INVALID>.

There are a lot of things that I dislike about IMetric.  It provides too much flexibility and at the same time not enough information/conventions to be able to interpret the numbers it returns correctly.  We recently had a case where someone was trying to compute an average using a ReducedMetric and a MeanReducer (which by the way should be deprecated because it is fundamentally flawed).  This hands the metric collector an average.  How is it supposed to combine one average with another when doing a roll up, either across components or across time ranges?  It just does not work mathematically unless you know that all of the averages had the exact same number of operations in them, which we cannot know.
This is why dropwizard and other metrics systems have a specific set up supported metrics, not object, that they know mathematically work out.  A gauge is different from a counter, which is different from a ratio, or a meter, or a timer, or a histogram.  Please lets not reinvent the wheel here, we already did it wrong once, lets not do it wrong again.  We are using dropwizard in other places in the code internally, I would prefer that we standardize on it, or a thin wrapper around it based on the same concepts.  Or if there is a different API that someone here would prefer that we use that is fine with me too.  But lets not write it ourselves, lets take from the experts who have spent a long time building something that works.
 - Bobby 

    On Tuesday, May 17, 2016 10:10 PM, Jungtaek Lim <ka...@gmail.com> wrote:
 

 Hi devs,

Since IMetric#getValueAndReset doesn't restrict return type, it gives us
flexibility but metrics consumer should parse the value without context
(having just some assumptions).

I've look into some open source metrics consumers, and many of them support
Number, Map<String, Number/String>, and one of them supports Nested Map.
For the case of Map its key is appended to metric key and value is
converted to 'double'. I think it would be enough, but I'm not sure we can
rely on all metrics consumers to handle properly, too.

I feel if we can recommend proper types of DataPoint value for storing
metrics to time-series DB via metrics consumer it would be great. It can be
used to protocol between IMetric users and metrics consumer developers.

What do you think?

Thanks,
Jungtaek Lim (HeartSaVioR)

ps. I'm not heavy user of time-series DBs (I researched some, but they
don't document type/size of value clearly) so if someone could give the
information of supporting type/size of value of time-series DBs it should
be great for me. Or we can just assume number as 'double' as above and go
forward.