You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by "Brunner, Bill" <bi...@baml.com> on 2014/10/27 12:29:16 UTC
Metrics question
What is the best way to capture start/end times of functions/aggregators in Trident? I am interested in capturing elapsed times for each, but would prefer not having to pass the info around in tuples, or write incrementally to a database. Wondering if anyone else has done this and how. Thanks
----------------------------------------------------------------------
This message, and any attachments, is for the intended recipient(s) only, may contain information that is privileged, confidential and/or proprietary and subject to important terms and conditions available at http://www.bankofamerica.com/emaildisclaimer. If you are not the intended recipient, please delete this message.
RE: Metrics question
Posted by "Brunner, Bill" <bi...@baml.com>.
Thanks, I’ll check it out.
From: Itai Frenkel [mailto:Itai@forter.com]
Sent: Tuesday, October 28, 2014 8:43 AM
To: Michael Pershyn; user@storm.apache.org
Subject: Re: Metrics question
Writing to the database could be slow and would require bulk. We are using riemann to forward the data.
See https://github.com/forter/riemann-storm-monitor
We plan to modify the code to use the metrics framework, but currently it wraps the bolts.
________________________________
From: Brunner, Bill <bi...@baml.com>>
Sent: Tuesday, October 28, 2014 1:28 PM
To: Michael Pershyn; user@storm.apache.org<ma...@storm.apache.org>
Subject: RE: Metrics question
Thanks Michael. My use case is to be able to have visibility into the metrics within the topology at runtime. More specifically, the last step in my processing writes all the elapsed times, counts, etc to a db. I know that the metrics framework is designed for this, but I was just wondering whether there was another way to read/write data to some data structure that is in scope for all components of the topology (without creating a static class). Let me know. In the meantime I will take a closer look at the metrics framework.
Thanks again.
From: Michael Pershyn [mailto:michael.pershyn@gmail.com]
Sent: Tuesday, October 28, 2014 5:56 AM
To: Brunner, Bill; user@storm.apache.org<ma...@storm.apache.org>
Subject: Re: Metrics question
Hi Bill,
There are two options known to me, I have tried only first one.
Apart from using profiling tools, when you want just measure one function - there is function time in java, so you can measure how long the call takes. example<http://stackoverflow.com/questions/180158/how-do-i-time-a-methods-execution-in-java>. Which way you take depends on your goals.
I think the answer heavily depends on your goals - would you like to capture time for short period (debugging), log period (tuning), or for lifetime.
For the case of time:
1. The result of this function you can write into storm-logs using storm, so you end-up with log entry like “my-function has taken 55 ms to execute” in worker log. In simple case you can just watch this log with shell tools or storm logviewer.
However, these logs also can be processed by logstash<http://logstash.net/>.
Logstash can send the events into elasticsearch, where, using kibana<http://www.elasticsearch.org/overview/kibana/> you skim over them, watch them, count them, build nice graphics, filter-out non-relevant information and so on. It is easy to setup.
Logstash can also parse logs and send events to opentsdb (and other metrics backends) where you can save and analyze the data. Docker containers can be a great help<https://registry.hub.docker.com/u/dockerfile/elasticsearch/>.
2. For long-term/continuous measurement I would recommend to try out storm metrics framework<https://storm.apache.org/documentation/Metrics.html> (if you use storm 0.9+). They are also applicable for trident. I would suggest to use AssignableMetric or Reduced metric to calculate mean for your function calls.
Metrics then could be processed by metrics consumer (and may end-up in log or graphite, opentsdb or other database).
Best regards,
Michael Pershyn
On 10/27/2014 12:29 PM, Brunner, Bill wrote:
What is the best way to capture start/end times of functions/aggregators in Trident? I am interested in capturing elapsed times for each, but would prefer not having to pass the info around in tuples, or write incrementally to a database. Wondering if anyone else has done this and how. Thanks
________________________________
This message, and any attachments, is for the intended recipient(s) only, may contain information that is privileged, confidential and/or proprietary and subject to important terms and conditions available at http://www.bankofamerica.com/emaildisclaimer. If you are not the intended recipient, please delete this message.
________________________________
This message, and any attachments, is for the intended recipient(s) only, may contain information that is privileged, confidential and/or proprietary and subject to important terms and conditions available at http://www.bankofamerica.com/emaildisclaimer. If you are not the intended recipient, please delete this message.
----------------------------------------------------------------------
This message, and any attachments, is for the intended recipient(s) only, may contain information that is privileged, confidential and/or proprietary and subject to important terms and conditions available at http://www.bankofamerica.com/emaildisclaimer. If you are not the intended recipient, please delete this message.
Re: Metrics question
Posted by Itai Frenkel <It...@forter.com>.
Writing to the database could be slow and would require bulk. We are using riemann to forward the data.
See https://github.com/forter/riemann-storm-monitor
We plan to modify the code to use the metrics framework, but currently it wraps the bolts.
________________________________
From: Brunner, Bill <bi...@baml.com>
Sent: Tuesday, October 28, 2014 1:28 PM
To: Michael Pershyn; user@storm.apache.org
Subject: RE: Metrics question
Thanks Michael. My use case is to be able to have visibility into the metrics within the topology at runtime. More specifically, the last step in my processing writes all the elapsed times, counts, etc to a db. I know that the metrics framework is designed for this, but I was just wondering whether there was another way to read/write data to some data structure that is in scope for all components of the topology (without creating a static class). Let me know. In the meantime I will take a closer look at the metrics framework.
Thanks again.
From: Michael Pershyn [mailto:michael.pershyn@gmail.com]
Sent: Tuesday, October 28, 2014 5:56 AM
To: Brunner, Bill; user@storm.apache.org
Subject: Re: Metrics question
Hi Bill,
There are two options known to me, I have tried only first one.
Apart from using profiling tools, when you want just measure one function - there is function time in java, so you can measure how long the call takes. example<http://stackoverflow.com/questions/180158/how-do-i-time-a-methods-execution-in-java>. Which way you take depends on your goals.
I think the answer heavily depends on your goals - would you like to capture time for short period (debugging), log period (tuning), or for lifetime.
For the case of time:
1. The result of this function you can write into storm-logs using storm, so you end-up with log entry like "my-function has taken 55 ms to execute" in worker log. In simple case you can just watch this log with shell tools or storm logviewer.
However, these logs also can be processed by logstash<http://logstash.net/>.
Logstash can send the events into elasticsearch, where, using kibana<http://www.elasticsearch.org/overview/kibana/> you skim over them, watch them, count them, build nice graphics, filter-out non-relevant information and so on. It is easy to setup.
Logstash can also parse logs and send events to opentsdb (and other metrics backends) where you can save and analyze the data. Docker containers can be a great help<https://registry.hub.docker.com/u/dockerfile/elasticsearch/>.
2. For long-term/continuous measurement I would recommend to try out storm metrics framework<https://storm.apache.org/documentation/Metrics.html> (if you use storm 0.9+). They are also applicable for trident. I would suggest to use AssignableMetric or Reduced metric to calculate mean for your function calls.
Metrics then could be processed by metrics consumer (and may end-up in log or graphite, opentsdb or other database).
Best regards,
Michael Pershyn
On 10/27/2014 12:29 PM, Brunner, Bill wrote:
What is the best way to capture start/end times of functions/aggregators in Trident? I am interested in capturing elapsed times for each, but would prefer not having to pass the info around in tuples, or write incrementally to a database. Wondering if anyone else has done this and how. Thanks
________________________________
This message, and any attachments, is for the intended recipient(s) only, may contain information that is privileged, confidential and/or proprietary and subject to important terms and conditions available at http://www.bankofamerica.com/emaildisclaimer. If you are not the intended recipient, please delete this message.
?
________________________________
This message, and any attachments, is for the intended recipient(s) only, may contain information that is privileged, confidential and/or proprietary and subject to important terms and conditions available at http://www.bankofamerica.com/emaildisclaimer. If you are not the intended recipient, please delete this message.
RE: Metrics question
Posted by "Brunner, Bill" <bi...@baml.com>.
Thanks Michael. My use case is to be able to have visibility into the metrics within the topology at runtime. More specifically, the last step in my processing writes all the elapsed times, counts, etc to a db. I know that the metrics framework is designed for this, but I was just wondering whether there was another way to read/write data to some data structure that is in scope for all components of the topology (without creating a static class). Let me know. In the meantime I will take a closer look at the metrics framework.
Thanks again.
From: Michael Pershyn [mailto:michael.pershyn@gmail.com]
Sent: Tuesday, October 28, 2014 5:56 AM
To: Brunner, Bill; user@storm.apache.org
Subject: Re: Metrics question
Hi Bill,
There are two options known to me, I have tried only first one.
Apart from using profiling tools, when you want just measure one function - there is function time in java, so you can measure how long the call takes. example<http://stackoverflow.com/questions/180158/how-do-i-time-a-methods-execution-in-java>. Which way you take depends on your goals.
I think the answer heavily depends on your goals - would you like to capture time for short period (debugging), log period (tuning), or for lifetime.
For the case of time:
1. The result of this function you can write into storm-logs using storm, so you end-up with log entry like “my-function has taken 55 ms to execute” in worker log. In simple case you can just watch this log with shell tools or storm logviewer.
However, these logs also can be processed by logstash<http://logstash.net/>.
Logstash can send the events into elasticsearch, where, using kibana<http://www.elasticsearch.org/overview/kibana/> you skim over them, watch them, count them, build nice graphics, filter-out non-relevant information and so on. It is easy to setup.
Logstash can also parse logs and send events to opentsdb (and other metrics backends) where you can save and analyze the data. Docker containers can be a great help<https://registry.hub.docker.com/u/dockerfile/elasticsearch/>.
2. For long-term/continuous measurement I would recommend to try out storm metrics framework<https://storm.apache.org/documentation/Metrics.html> (if you use storm 0.9+). They are also applicable for trident. I would suggest to use AssignableMetric or Reduced metric to calculate mean for your function calls.
Metrics then could be processed by metrics consumer (and may end-up in log or graphite, opentsdb or other database).
Best regards,
Michael Pershyn
On 10/27/2014 12:29 PM, Brunner, Bill wrote:
What is the best way to capture start/end times of functions/aggregators in Trident? I am interested in capturing elapsed times for each, but would prefer not having to pass the info around in tuples, or write incrementally to a database. Wondering if anyone else has done this and how. Thanks
________________________________
This message, and any attachments, is for the intended recipient(s) only, may contain information that is privileged, confidential and/or proprietary and subject to important terms and conditions available at http://www.bankofamerica.com/emaildisclaimer. If you are not the intended recipient, please delete this message.
----------------------------------------------------------------------
This message, and any attachments, is for the intended recipient(s) only, may contain information that is privileged, confidential and/or proprietary and subject to important terms and conditions available at http://www.bankofamerica.com/emaildisclaimer. If you are not the intended recipient, please delete this message.
Re: Metrics question
Posted by Michael Pershyn <mi...@gmail.com>.
Hi Bill,
There are two options known to me, I have tried only first one.
Apart from using profiling tools, when you want just measure one
function - there is function |time| in java, so you can measure how long
the call takes. example
<http://stackoverflow.com/questions/180158/how-do-i-time-a-methods-execution-in-java>.
Which way you take depends on your goals.
I think the answer heavily depends on your goals - would you like to
capture time for short period (debugging), log period (tuning), or for
lifetime.
For the case of |time|:
1.
The result of this function you can write into storm-logs using
storm, so you end-up with log entry like “my-function has taken 55
ms to execute” in worker log. In simple case you can just watch this
log with shell tools or storm logviewer.
However, these logs also can be processed by logstash
<http://logstash.net/>.
Logstash can send the events into elasticsearch, where, using kibana
<http://www.elasticsearch.org/overview/kibana/> you skim over them,
watch them, count them, build nice graphics, filter-out non-relevant
information and so on. It is easy to setup.
Logstash can also parse logs and send events to opentsdb (and other
metrics backends) where you can save and analyze the data. Docker
containers can be a great help
<https://registry.hub.docker.com/u/dockerfile/elasticsearch/>.
2.
For long-term/continuous measurement I would recommend to try out
storm metrics framework
<https://storm.apache.org/documentation/Metrics.html> (if you use
storm 0.9+). They are also applicable for trident. I would suggest
to use AssignableMetric or Reduced metric to calculate mean for your
function calls.
Metrics then could be processed by metrics consumer (and may end-up
in log or graphite, opentsdb or other database).
Best regards,
Michael Pershyn
On 10/27/2014 12:29 PM, Brunner, Bill wrote:
> What is the best way to capture start/end times of
> functions/aggregators in Trident? I am interested in capturing
> elapsed times for each, but would prefer not having to pass the info
> around in tuples, or write incrementally to a database. Wondering if
> anyone else has done this and how. Thanks
>
> ------------------------------------------------------------------------
> This message, and any attachments, is for the intended recipient(s)
> only, may contain information that is privileged, confidential and/or
> proprietary and subject to important terms and conditions available at
> http://www.bankofamerica.com/emaildisclaimer. If you are not the
> intended recipient, please delete this message.