You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@heron.apache.org by Thomas Cooper <to...@gmail.com> on 2018/04/19 01:58:54 UTC

Adding source task id to default metrics

Hi All,

This started out a quick slack post, then a reasonably sized email and now
it has headings!

*Introduction*

I am working on a performance modeling system for Heron. Hopefully this
system will be useful for checking proposed plans will meet performance
targets and also for checking if currently running physical plans will have
back pressure issues with higher traffic rates.

To do this I need to know what proportion of tuples are routed from each
upstream instance to its downstream instances, which is a metric that Heron
does not provide by default.

*Proposal*

I have implemented a custom metric to do what I need in my test topologies,
it is a simple multi-count metric called "__receive-count" where the key
now includes the "sourceTaskId" value (which you can get from the tuple
instance) as well as the source component name and incoming stream name.

This is basically the same as the default "__execute-count" metric but the
metric name format is
"__receive-count/<source-component>/<source-task-ID>/<incoming-stream>"
instead of "__execute-count/<source-component>/<incoming-stream>"

So I see two options:

   1. Create a new "__receive-count" metric and leave the "__execute-count"
   alone
   2. Alter "__execute-count" to include the source task ID.

*Questions*

My first question is weather the metric name is parsed anywhere further
down the line, such as aggregating component metrics in the metrics
manager? So changing the name would break things?

My second is if we do change "__execute-count" should we also add the
source task ID to other bolt metrics like "__execute-latency" (it would be
nice to see how latency changes by source instance --- this is a particular
issue in two consecutive fields grouped components as instances will
receive very different key distributions which could lead to very different
processing latency).

*Implementation*

To add this to the default metrics (or change "__execute-count") seems like
it would be reasonably straight forward (famous last words). We would need
to modify the `FullBoltMetric` class to include the new metrics (if
required) and edit the `FullBoltMetric.executeTuple` method to accept the
"sourceTaskId" (which is already available in the
"BoltInstance.readTuplesAndExecute" method) as a 4th argument.

Obviously, we will need to do the same with the Python implementation. Will
this also need to be changed in the Storm compatibility layer?

*Conclusion*

Having the information on where tuples are flowing is really important if
we want to be able to do more intelligent routing and adaptive auto-scaling
in the future and hopefully this one small change/extra metric won't add
any significant processing overhead.

I look forward to hearing what you think.

Cheers,

Tom Cooper
W: www.tomcooper.org.uk  | Twitter: @tomncooper
<https://twitter.com/tomncooper>

Re: Adding source task id to default metrics

Posted by Ning Wang <wa...@gmail.com>.

However the feature may not be available in other metrics service/library,
so it is a valid concern especially for heavy topologies.

The new metrics could be very useful though. I am wondering it might be
useful to have two sets of metrics. One is exposed to external
observability, another one is used for internal processes.

On Wed, Apr 18, 2018 at 9:17 PM, Karthik Ramasamy <ka...@streaml.io>
wrote:

> I like the idea of both the metrics and it might be great to include them.
>
> Prometheus can aggregate metrics downstream by component-id/source-task etc
> It is a nice tool.
>
> cheers
> /karthik
>
> On Wed, Apr 18, 2018 at 8:32 PM, Fu Maosong <ma...@gmail.com> wrote:
>
> > One concern is that it will significantly increase the number of metrics,
> > potentially leading performance concerns.
> >
> > 2018-04-18 18:58 GMT-07:00 Thomas Cooper <to...@gmail.com>:
> >
> > > Hi All,
> > >
> > > This started out a quick slack post, then a reasonably sized email and
> > now
> > > it has headings!
> > >
> > > *Introduction*
> > >
> > > I am working on a performance modeling system for Heron. Hopefully this
> > > system will be useful for checking proposed plans will meet performance
> > > targets and also for checking if currently running physical plans will
> > have
> > > back pressure issues with higher traffic rates.
> > >
> > > To do this I need to know what proportion of tuples are routed from
> each
> > > upstream instance to its downstream instances, which is a metric that
> > Heron
> > > does not provide by default.
> > >
> > > *Proposal*
> > >
> > > I have implemented a custom metric to do what I need in my test
> > topologies,
> > > it is a simple multi-count metric called "__receive-count" where the
> key
> > > now includes the "sourceTaskId" value (which you can get from the tuple
> > > instance) as well as the source component name and incoming stream
> name.
> > >
> > > This is basically the same as the default "__execute-count" metric but
> > the
> > > metric name format is
> > > "__receive-count/<source-component>/<source-task-ID>/<
> incoming-stream>"
> > > instead of "__execute-count/<source-component>/<incoming-stream>"
> > >
> > > So I see two options:
> > >
> > >    1. Create a new "__receive-count" metric and leave the
> > "__execute-count"
> > >    alone
> > >    2. Alter "__execute-count" to include the source task ID.
> > >
> > > *Questions*
> > >
> > > My first question is weather the metric name is parsed anywhere further
> > > down the line, such as aggregating component metrics in the metrics
> > > manager? So changing the name would break things?
> > >
> > > My second is if we do change "__execute-count" should we also add the
> > > source task ID to other bolt metrics like "__execute-latency" (it would
> > be
> > > nice to see how latency changes by source instance --- this is a
> > particular
> > > issue in two consecutive fields grouped components as instances will
> > > receive very different key distributions which could lead to very
> > different
> > > processing latency).
> > >
> > > *Implementation*
> > >
> > > To add this to the default metrics (or change "__execute-count") seems
> > like
> > > it would be reasonably straight forward (famous last words). We would
> > need
> > > to modify the `FullBoltMetric` class to include the new metrics (if
> > > required) and edit the `FullBoltMetric.executeTuple` method to accept
> the
> > > "sourceTaskId" (which is already available in the
> > > "BoltInstance.readTuplesAndExecute" method) as a 4th argument.
> > >
> > > Obviously, we will need to do the same with the Python implementation.
> > Will
> > > this also need to be changed in the Storm compatibility layer?
> > >
> > > *Conclusion*
> > >
> > > Having the information on where tuples are flowing is really important
> if
> > > we want to be able to do more intelligent routing and adaptive
> > auto-scaling
> > > in the future and hopefully this one small change/extra metric won't
> add
> > > any significant processing overhead.
> > >
> > > I look forward to hearing what you think.
> > >
> > > Cheers,
> > >
> > > Tom Cooper
> > > W: www.tomcooper.org.uk  | Twitter: @tomncooper
> > > <https://twitter.com/tomncooper>
> > >
> >
> >
> >
> > --
> > With my best Regards
> > ------------------
> > Fu Maosong
> > Twitter Inc.
> > Mobile: +001-415-244-7520
> >
>

Re: Adding source task id to default metrics

Posted by Karthik Ramasamy <ka...@streaml.io>.

I like the idea of both the metrics and it might be great to include them.

Prometheus can aggregate metrics downstream by component-id/source-task etc
It is a nice tool.

cheers
/karthik

On Wed, Apr 18, 2018 at 8:32 PM, Fu Maosong <ma...@gmail.com> wrote:

> One concern is that it will significantly increase the number of metrics,
> potentially leading performance concerns.
>
> 2018-04-18 18:58 GMT-07:00 Thomas Cooper <to...@gmail.com>:
>
> > Hi All,
> >
> > This started out a quick slack post, then a reasonably sized email and
> now
> > it has headings!
> >
> > *Introduction*
> >
> > I am working on a performance modeling system for Heron. Hopefully this
> > system will be useful for checking proposed plans will meet performance
> > targets and also for checking if currently running physical plans will
> have
> > back pressure issues with higher traffic rates.
> >
> > To do this I need to know what proportion of tuples are routed from each
> > upstream instance to its downstream instances, which is a metric that
> Heron
> > does not provide by default.
> >
> > *Proposal*
> >
> > I have implemented a custom metric to do what I need in my test
> topologies,
> > it is a simple multi-count metric called "__receive-count" where the key
> > now includes the "sourceTaskId" value (which you can get from the tuple
> > instance) as well as the source component name and incoming stream name.
> >
> > This is basically the same as the default "__execute-count" metric but
> the
> > metric name format is
> > "__receive-count/<source-component>/<source-task-ID>/<incoming-stream>"
> > instead of "__execute-count/<source-component>/<incoming-stream>"
> >
> > So I see two options:
> >
> >    1. Create a new "__receive-count" metric and leave the
> "__execute-count"
> >    alone
> >    2. Alter "__execute-count" to include the source task ID.
> >
> > *Questions*
> >
> > My first question is weather the metric name is parsed anywhere further
> > down the line, such as aggregating component metrics in the metrics
> > manager? So changing the name would break things?
> >
> > My second is if we do change "__execute-count" should we also add the
> > source task ID to other bolt metrics like "__execute-latency" (it would
> be
> > nice to see how latency changes by source instance --- this is a
> particular
> > issue in two consecutive fields grouped components as instances will
> > receive very different key distributions which could lead to very
> different
> > processing latency).
> >
> > *Implementation*
> >
> > To add this to the default metrics (or change "__execute-count") seems
> like
> > it would be reasonably straight forward (famous last words). We would
> need
> > to modify the `FullBoltMetric` class to include the new metrics (if
> > required) and edit the `FullBoltMetric.executeTuple` method to accept the
> > "sourceTaskId" (which is already available in the
> > "BoltInstance.readTuplesAndExecute" method) as a 4th argument.
> >
> > Obviously, we will need to do the same with the Python implementation.
> Will
> > this also need to be changed in the Storm compatibility layer?
> >
> > *Conclusion*
> >
> > Having the information on where tuples are flowing is really important if
> > we want to be able to do more intelligent routing and adaptive
> auto-scaling
> > in the future and hopefully this one small change/extra metric won't add
> > any significant processing overhead.
> >
> > I look forward to hearing what you think.
> >
> > Cheers,
> >
> > Tom Cooper
> > W: www.tomcooper.org.uk  | Twitter: @tomncooper
> > <https://twitter.com/tomncooper>
> >
>
>
>
> --
> With my best Regards
> ------------------
> Fu Maosong
> Twitter Inc.
> Mobile: +001-415-244-7520
>

Re: Adding source task id to default metrics

Posted by Fu Maosong <ma...@gmail.com>.

One concern is that it will significantly increase the number of metrics,
potentially leading performance concerns.

2018-04-18 18:58 GMT-07:00 Thomas Cooper <to...@gmail.com>:

> Hi All,
>
> This started out a quick slack post, then a reasonably sized email and now
> it has headings!
>
> *Introduction*
>
> I am working on a performance modeling system for Heron. Hopefully this
> system will be useful for checking proposed plans will meet performance
> targets and also for checking if currently running physical plans will have
> back pressure issues with higher traffic rates.
>
> To do this I need to know what proportion of tuples are routed from each
> upstream instance to its downstream instances, which is a metric that Heron
> does not provide by default.
>
> *Proposal*
>
> I have implemented a custom metric to do what I need in my test topologies,
> it is a simple multi-count metric called "__receive-count" where the key
> now includes the "sourceTaskId" value (which you can get from the tuple
> instance) as well as the source component name and incoming stream name.
>
> This is basically the same as the default "__execute-count" metric but the
> metric name format is
> "__receive-count/<source-component>/<source-task-ID>/<incoming-stream>"
> instead of "__execute-count/<source-component>/<incoming-stream>"
>
> So I see two options:
>
>    1. Create a new "__receive-count" metric and leave the "__execute-count"
>    alone
>    2. Alter "__execute-count" to include the source task ID.
>
> *Questions*
>
> My first question is weather the metric name is parsed anywhere further
> down the line, such as aggregating component metrics in the metrics
> manager? So changing the name would break things?
>
> My second is if we do change "__execute-count" should we also add the
> source task ID to other bolt metrics like "__execute-latency" (it would be
> nice to see how latency changes by source instance --- this is a particular
> issue in two consecutive fields grouped components as instances will
> receive very different key distributions which could lead to very different
> processing latency).
>
> *Implementation*
>
> To add this to the default metrics (or change "__execute-count") seems like
> it would be reasonably straight forward (famous last words). We would need
> to modify the `FullBoltMetric` class to include the new metrics (if
> required) and edit the `FullBoltMetric.executeTuple` method to accept the
> "sourceTaskId" (which is already available in the
> "BoltInstance.readTuplesAndExecute" method) as a 4th argument.
>
> Obviously, we will need to do the same with the Python implementation. Will
> this also need to be changed in the Storm compatibility layer?
>
> *Conclusion*
>
> Having the information on where tuples are flowing is really important if
> we want to be able to do more intelligent routing and adaptive auto-scaling
> in the future and hopefully this one small change/extra metric won't add
> any significant processing overhead.
>
> I look forward to hearing what you think.
>
> Cheers,
>
> Tom Cooper
> W: www.tomcooper.org.uk  | Twitter: @tomncooper
> <https://twitter.com/tomncooper>
>



-- 
With my best Regards
------------------
Fu Maosong
Twitter Inc.
Mobile: +001-415-244-7520