You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by "Brunner, Bill" <bi...@baml.com> on 2014/10/27 12:29:16 UTC

Metrics question

What is the best way to capture start/end times of functions/aggregators in Trident?  I am interested in capturing elapsed times for each, but would prefer not having to pass the info around in tuples, or write incrementally to a database.  Wondering if anyone else has done this and how.  Thanks

----------------------------------------------------------------------
This message, and any attachments, is for the intended recipient(s) only, may contain information that is privileged, confidential and/or proprietary and subject to important terms and conditions available at http://www.bankofamerica.com/emaildisclaimer.   If you are not the intended recipient, please delete this message.

RE: Metrics question

Posted by "Brunner, Bill" <bi...@baml.com>.
Thanks, I’ll check it out.

From: Itai Frenkel [mailto:Itai@forter.com]
Sent: Tuesday, October 28, 2014 8:43 AM
To: Michael Pershyn; user@storm.apache.org
Subject: Re: Metrics question


Writing to the database could be slow and would require bulk. We are using riemann to forward the data.

See https://github.com/forter/riemann-storm-monitor

We plan to modify the code to use the metrics framework, but currently it wraps the bolts.



________________________________
From: Brunner, Bill <bi...@baml.com>>
Sent: Tuesday, October 28, 2014 1:28 PM
To: Michael Pershyn; user@storm.apache.org<ma...@storm.apache.org>
Subject: RE: Metrics question

Thanks Michael.  My use case is to be able to have visibility into the metrics within the topology at runtime.  More specifically, the last step in my processing writes all the elapsed times, counts, etc to a db.  I know that the metrics framework is designed for this, but I was just wondering whether there was another way to read/write data to some data structure that is in scope for all components of the topology (without creating a static class).  Let me know.  In the meantime I will take a closer look at the metrics framework.
Thanks again.

From: Michael Pershyn [mailto:michael.pershyn@gmail.com]
Sent: Tuesday, October 28, 2014 5:56 AM
To: Brunner, Bill; user@storm.apache.org<ma...@storm.apache.org>
Subject: Re: Metrics question


Hi Bill,

There are two options known to me, I have tried only first one.
Apart from using profiling tools, when you want just measure one function - there is function time in java, so you can measure how long the call takes. example<http://stackoverflow.com/questions/180158/how-do-i-time-a-methods-execution-in-java>. Which way you take depends on your goals.

I think the answer heavily depends on your goals - would you like to capture time for short period (debugging), log period (tuning), or for lifetime.

For the case of time:

1.     The result of this function you can write into storm-logs using storm, so you end-up with log entry like “my-function has taken 55 ms to execute” in worker log. In simple case you can just watch this log with shell tools or storm logviewer.
However, these logs also can be processed by logstash<http://logstash.net/>.
Logstash can send the events into elasticsearch, where, using kibana<http://www.elasticsearch.org/overview/kibana/> you skim over them, watch them, count them, build nice graphics, filter-out non-relevant information and so on. It is easy to setup.
Logstash can also parse logs and send events to opentsdb (and other metrics backends) where you can save and analyze the data. Docker containers can be a great help<https://registry.hub.docker.com/u/dockerfile/elasticsearch/>.

2.     For long-term/continuous measurement I would recommend to try out storm metrics framework<https://storm.apache.org/documentation/Metrics.html> (if you use storm 0.9+). They are also applicable for trident. I would suggest to use AssignableMetric or Reduced metric to calculate mean for your function calls.
Metrics then could be processed by metrics consumer (and may end-up in log or graphite, opentsdb or other database).

Best regards,
Michael Pershyn

On 10/27/2014 12:29 PM, Brunner, Bill wrote:
What is the best way to capture start/end times of functions/aggregators in Trident?  I am interested in capturing elapsed times for each, but would prefer not having to pass the info around in tuples, or write incrementally to a database.  Wondering if anyone else has done this and how.  Thanks
________________________________
This message, and any attachments, is for the intended recipient(s) only, may contain information that is privileged, confidential and/or proprietary and subject to important terms and conditions available at http://www.bankofamerica.com/emaildisclaimer. If you are not the intended recipient, please delete this message.
​
________________________________
This message, and any attachments, is for the intended recipient(s) only, may contain information that is privileged, confidential and/or proprietary and subject to important terms and conditions available at http://www.bankofamerica.com/emaildisclaimer. If you are not the intended recipient, please delete this message.

----------------------------------------------------------------------
This message, and any attachments, is for the intended recipient(s) only, may contain information that is privileged, confidential and/or proprietary and subject to important terms and conditions available at http://www.bankofamerica.com/emaildisclaimer.   If you are not the intended recipient, please delete this message.

Re: Metrics question

Posted by Itai Frenkel <It...@forter.com>.
Writing to the database could be slow and would require bulk. We are using riemann to forward the data.

See https://github.com/forter/riemann-storm-monitor

We plan to modify the code to use the metrics framework, but currently it wraps the bolts.


________________________________
From: Brunner, Bill <bi...@baml.com>
Sent: Tuesday, October 28, 2014 1:28 PM
To: Michael Pershyn; user@storm.apache.org
Subject: RE: Metrics question

Thanks Michael.  My use case is to be able to have visibility into the metrics within the topology at runtime.  More specifically, the last step in my processing writes all the elapsed times, counts, etc to a db.  I know that the metrics framework is designed for this, but I was just wondering whether there was another way to read/write data to some data structure that is in scope for all components of the topology (without creating a static class).  Let me know.  In the meantime I will take a closer look at the metrics framework.
Thanks again.

From: Michael Pershyn [mailto:michael.pershyn@gmail.com]
Sent: Tuesday, October 28, 2014 5:56 AM
To: Brunner, Bill; user@storm.apache.org
Subject: Re: Metrics question


Hi Bill,

There are two options known to me, I have tried only first one.
Apart from using profiling tools, when you want just measure one function - there is function time in java, so you can measure how long the call takes. example<http://stackoverflow.com/questions/180158/how-do-i-time-a-methods-execution-in-java>. Which way you take depends on your goals.

I think the answer heavily depends on your goals - would you like to capture time for short period (debugging), log period (tuning), or for lifetime.

For the case of time:

1.     The result of this function you can write into storm-logs using storm, so you end-up with log entry like "my-function has taken 55 ms to execute" in worker log. In simple case you can just watch this log with shell tools or storm logviewer.
However, these logs also can be processed by logstash<http://logstash.net/>.
Logstash can send the events into elasticsearch, where, using kibana<http://www.elasticsearch.org/overview/kibana/> you skim over them, watch them, count them, build nice graphics, filter-out non-relevant information and so on. It is easy to setup.
Logstash can also parse logs and send events to opentsdb (and other metrics backends) where you can save and analyze the data. Docker containers can be a great help<https://registry.hub.docker.com/u/dockerfile/elasticsearch/>.

2.     For long-term/continuous measurement I would recommend to try out storm metrics framework<https://storm.apache.org/documentation/Metrics.html> (if you use storm 0.9+). They are also applicable for trident. I would suggest to use AssignableMetric or Reduced metric to calculate mean for your function calls.
Metrics then could be processed by metrics consumer (and may end-up in log or graphite, opentsdb or other database).

Best regards,
Michael Pershyn

On 10/27/2014 12:29 PM, Brunner, Bill wrote:
What is the best way to capture start/end times of functions/aggregators in Trident?  I am interested in capturing elapsed times for each, but would prefer not having to pass the info around in tuples, or write incrementally to a database.  Wondering if anyone else has done this and how.  Thanks
________________________________
This message, and any attachments, is for the intended recipient(s) only, may contain information that is privileged, confidential and/or proprietary and subject to important terms and conditions available at http://www.bankofamerica.com/emaildisclaimer. If you are not the intended recipient, please delete this message.
?
________________________________
This message, and any attachments, is for the intended recipient(s) only, may contain information that is privileged, confidential and/or proprietary and subject to important terms and conditions available at http://www.bankofamerica.com/emaildisclaimer. If you are not the intended recipient, please delete this message.

RE: Metrics question

Posted by "Brunner, Bill" <bi...@baml.com>.
Thanks Michael.  My use case is to be able to have visibility into the metrics within the topology at runtime.  More specifically, the last step in my processing writes all the elapsed times, counts, etc to a db.  I know that the metrics framework is designed for this, but I was just wondering whether there was another way to read/write data to some data structure that is in scope for all components of the topology (without creating a static class).  Let me know.  In the meantime I will take a closer look at the metrics framework.
Thanks again.

From: Michael Pershyn [mailto:michael.pershyn@gmail.com]
Sent: Tuesday, October 28, 2014 5:56 AM
To: Brunner, Bill; user@storm.apache.org
Subject: Re: Metrics question


Hi Bill,

There are two options known to me, I have tried only first one.
Apart from using profiling tools, when you want just measure one function - there is function time in java, so you can measure how long the call takes. example<http://stackoverflow.com/questions/180158/how-do-i-time-a-methods-execution-in-java>. Which way you take depends on your goals.

I think the answer heavily depends on your goals - would you like to capture time for short period (debugging), log period (tuning), or for lifetime.

For the case of time:

1.     The result of this function you can write into storm-logs using storm, so you end-up with log entry like “my-function has taken 55 ms to execute” in worker log. In simple case you can just watch this log with shell tools or storm logviewer.
However, these logs also can be processed by logstash<http://logstash.net/>.
Logstash can send the events into elasticsearch, where, using kibana<http://www.elasticsearch.org/overview/kibana/> you skim over them, watch them, count them, build nice graphics, filter-out non-relevant information and so on. It is easy to setup.
Logstash can also parse logs and send events to opentsdb (and other metrics backends) where you can save and analyze the data. Docker containers can be a great help<https://registry.hub.docker.com/u/dockerfile/elasticsearch/>.

2.     For long-term/continuous measurement I would recommend to try out storm metrics framework<https://storm.apache.org/documentation/Metrics.html> (if you use storm 0.9+). They are also applicable for trident. I would suggest to use AssignableMetric or Reduced metric to calculate mean for your function calls.
Metrics then could be processed by metrics consumer (and may end-up in log or graphite, opentsdb or other database).

Best regards,
Michael Pershyn

On 10/27/2014 12:29 PM, Brunner, Bill wrote:
What is the best way to capture start/end times of functions/aggregators in Trident?  I am interested in capturing elapsed times for each, but would prefer not having to pass the info around in tuples, or write incrementally to a database.  Wondering if anyone else has done this and how.  Thanks
________________________________
This message, and any attachments, is for the intended recipient(s) only, may contain information that is privileged, confidential and/or proprietary and subject to important terms and conditions available at http://www.bankofamerica.com/emaildisclaimer. If you are not the intended recipient, please delete this message.
​

----------------------------------------------------------------------
This message, and any attachments, is for the intended recipient(s) only, may contain information that is privileged, confidential and/or proprietary and subject to important terms and conditions available at http://www.bankofamerica.com/emaildisclaimer.   If you are not the intended recipient, please delete this message.

Re: Metrics question

Posted by Michael Pershyn <mi...@gmail.com>.
Hi Bill,

There are two options known to me, I have tried only first one.
Apart from using profiling tools, when you want just measure one
function - there is function |time| in java, so you can measure how long
the call takes. example
<http://stackoverflow.com/questions/180158/how-do-i-time-a-methods-execution-in-java>.
Which way you take depends on your goals.

I think the answer heavily depends on your goals - would you like to
capture time for short period (debugging), log period (tuning), or for
lifetime.

For the case of |time|:

 1.

    The result of this function you can write into storm-logs using
    storm, so you end-up with log entry like “my-function has taken 55
    ms to execute” in worker log. In simple case you can just watch this
    log with shell tools or storm logviewer.
    However, these logs also can be processed by logstash
    <http://logstash.net/>.
    Logstash can send the events into elasticsearch, where, using kibana
    <http://www.elasticsearch.org/overview/kibana/> you skim over them,
    watch them, count them, build nice graphics, filter-out non-relevant
    information and so on. It is easy to setup.
    Logstash can also parse logs and send events to opentsdb (and other
    metrics backends) where you can save and analyze the data. Docker
    containers can be a great help
    <https://registry.hub.docker.com/u/dockerfile/elasticsearch/>.

 2.

    For long-term/continuous measurement I would recommend to try out
    storm metrics framework
    <https://storm.apache.org/documentation/Metrics.html> (if you use
    storm 0.9+). They are also applicable for trident. I would suggest
    to use AssignableMetric or Reduced metric to calculate mean for your
    function calls.
    Metrics then could be processed by metrics consumer (and may end-up
    in log or graphite, opentsdb or other database).

Best regards,
Michael Pershyn

On 10/27/2014 12:29 PM, Brunner, Bill wrote:

> What is the best way to capture start/end times of
> functions/aggregators in Trident?  I am interested in capturing
> elapsed times for each, but would prefer not having to pass the info
> around in tuples, or write incrementally to a database.  Wondering if
> anyone else has done this and how.  Thanks
>
> ------------------------------------------------------------------------
> This message, and any attachments, is for the intended recipient(s)
> only, may contain information that is privileged, confidential and/or
> proprietary and subject to important terms and conditions available at
> http://www.bankofamerica.com/emaildisclaimer. If you are not the
> intended recipient, please delete this message.

​