You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@storm.apache.org by padma priya chitturi <pa...@gmail.com> on 2014/02/24 14:40:32 UTC

Storm Performace Benchmark

Hi All,

I've been using storm metrics to visualize the performance of storm (
http://www.bigdata-cookbook.com/post/72320512609/storm-metrics-how-to).

I have included the metrics initialization code in ExclamationTopology code
and saw the metrics in metrics.log

How can we summarize the metrics in metrics.log file. ? what do different
fields mean ? How can we visualize the metrics using Graphite ?

Can someone suggest me in this..

Thanks,
Padma Ch.

Re: Storm Performace Benchmark

Posted by "P. Taylor Goetz" <pt...@gmail.com>.
I ran into the same set of issues a little while back when I looked into writing a MetricsConsumer implementation that pushed metrics data to ganglia. Keeping it generic enough that it would be useful for user-defined metrics was enough of a struggle that I punted. And as you alluded to, mapping it to ganglia’s model was a challenge.

I ultimately cobbled together a one-off bridge that pushed Storm UI metrics to ganglia by polling Nimbus along with JVM metrics pulled from the JVM JMX API.

I’m also a big fan of Coda Hale’s metrics library that Michael Knoll pointed out. It’s solid, easy to use, and integrates well with many monitoring/metrics systems.

- Taylor

On Feb 24, 2014, at 6:12 PM, Bobby Evans <ev...@yahoo-inc.com> wrote:

> On that note I have really wanted to add in more metrics to the perf test
> using the the storm metrics subsystem not just the metrics that are
> uploaded to ZK and accessible through nimbus, but I have not found time to
> do that.  Writing a generic metrics aggregator is almost impossible
> because the metrics system can send anything across the wire.  In many
> cases they are numbers, but there are also many different cases where it
> is a map of a string to a number, or even something else more complex is
> possible with user defined metrics.  And even in the cases where it is
> just a number most of the time you can take it as an incremental update to
> a running count (i.e. Number of events processed over the last N seconds),
> but in some cases it may be a hard number (Heap space used by the VM, or
> number of events queued in the disruptor queue).
> 
> You almost have to look at every metric that is printed out and decide if
> you want to process it, and if so how to put it into your
> monitoring/metrics system of choice.  The logging metrics collector is
> simple, but not what most people will want to use.
> 
> Then there are the latency metrics where it gets even more complex because
> to aggregate them you also need the corresponding event counts.  The
> metrics for these that you get from the UI/Nimbus handle this for you, but
> with this system you need to do some of the math yourself to compute a
> latency weighted by the event throughput.
> 
> ‹Bobby
> 
> On 2/24/14, 12:06 PM, "Otávio Carvalho" <ot...@gmail.com> wrote:
> 
>> You can also take a look at storm-perf-test (
>> https://github.com/yahoo/storm-perf-test/) source code.
>> I'm currently trying to extract some metrics, in order to develop
>> benchmarks for storm and other stream processors, and I thought it was
>> really useful.
>> 
>> Thanks,
>> 
>> Otávio.
>> 
>> Undergraduate Student at Federal University of Rio Grande do Sul -
>> http://inf.ufrgs.br
>> Scholarship holder at Parallel and Distributed Processing Group -
>> http://gppd.inf.ufrgs.br
>> omcarvalho@inf.ufrgs.br / @otaviocarvalho
>> 
>> 2014-02-24 12:50 GMT-03:00 Milinda Pathirage <mp...@umail.iu.edu>:
>> 
>>> Hi Padma,
>>> 
>>> I think answers to your questions are there in the article you
>>> mentioned. Anyway I'll try to explain what needs to be done briefly.
>>> Note that I don't have any experience on statsd or graphite.
>>> 
>>> First on sumerizing the metrics in metrics.log file. If you want to
>>> summerize the metrics mentioned in the article, you will have to write
>>> your own summarizer. It depends on what and how to summerize data you
>>> collected. It looks like fields in metrics.log contains information
>>> such as timestamp of the metrics publish event, storm host name, bolt
>>> identifier, metrics identifier and actual metrics value. You should be
>>> able to understand it by reading [2].
>>> 
>>>  - It looks like people are using statsd to feed graphite[1]. And
>>> author of the article you mentioned also planning to use the same
>>> approach.
>>>  - In this case you need to first write a metrics consumer which
>>> publish metrics to statsd.
>>>  - Then connnect statsd and graphite according to [2].
>>> 
>>> I think its possible to write a metrics consumer which directly feed
>>> graphite. But I am not sure whether which approach is easier.
>>> 
>>> Thanks
>>> Milinda
>>> 
>>> [1]
>>> 
>>> http://matt.aimonetti.net/posts/2013/06/26/practical-guide-to-graphite-mo
>>> nitoring/
>>> [2]
>>> 
>>> https://github.com/nathanmarz/storm/blob/master/storm-core/src/jvm/backty
>>> pe/storm/metric/LoggingMetricsConsumer.java
>>> 
>>> On Mon, Feb 24, 2014 at 8:40 AM, padma priya chitturi
>>> <pa...@gmail.com> wrote:
>>>> Hi All,
>>>> 
>>>> I've been using storm metrics to visualize the performance of storm (
>>>> 
>>> http://www.bigdata-cookbook.com/post/72320512609/storm-metrics-how-to).
>>>> 
>>>> I have included the metrics initialization code in ExclamationTopology
>>> code
>>>> and saw the metrics in metrics.log
>>>> 
>>>> How can we summarize the metrics in metrics.log file. ? what do
>>> different
>>>> fields mean ? How can we visualize the metrics using Graphite ?
>>>> 
>>>> Can someone suggest me in this..
>>>> 
>>>> Thanks,
>>>> Padma Ch.
>>> 
>>> 
>>> 
>>> --
>>> Milinda Pathirage
>>> 
>>> PhD Student | Research Assistant
>>> School of Informatics and Computing | Data to Insight Center
>>> Indiana University
>>> 
>>> twitter: milindalakmal
>>> skype: milinda.pathirage
>>> blog: http://milinda.pathirage.org
>>> 
> 


Re: Storm Performace Benchmark

Posted by "Michael G. Noll" <mi...@michael-noll.com>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 25.02.2014 00:12, Bobby Evans wrote:
> Then there are the latency metrics where it gets even more complex
> because to aggregate them you also need the corresponding event
> counts.  The metrics for these that you get from the UI/Nimbus
> handle this for you, but with this system you need to do some of
> the math yourself to compute a latency weighted by the event
> throughput.

Just as Bobby mentioned there are some metrics (like latency or
rate/s) that you cannot aggregate correctly without having access to
the underlying "raw" counts that make up the metric.  For that reason
the current implementation of Storm's built-in metrics feature has
been only of limited use for us unfortunately.  Instead we have been
using Coda Hale's Metrics library [1] directly and then push its
metrics into a tool like Graphite.  See [2] for a concrete example.

Best,
Michael




[1] http://metrics.codahale.com/
[2]
http://www.michael-noll.com/blog/2013/11/06/sending-metrics-from-storm-to-graphite/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (MingW32)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlMTexgACgkQeW5XuG18ujTbiACgkj6xYCPAuPzMmGKAUCvMvSPM
020An3IWcdufVBZFLCmOLy2COSDQKYOj
=/G4I
-----END PGP SIGNATURE-----

Re: Storm Performace Benchmark

Posted by Bobby Evans <ev...@yahoo-inc.com>.
On that note I have really wanted to add in more metrics to the perf test
using the the storm metrics subsystem not just the metrics that are
uploaded to ZK and accessible through nimbus, but I have not found time to
do that.  Writing a generic metrics aggregator is almost impossible
because the metrics system can send anything across the wire.  In many
cases they are numbers, but there are also many different cases where it
is a map of a string to a number, or even something else more complex is
possible with user defined metrics.  And even in the cases where it is
just a number most of the time you can take it as an incremental update to
a running count (i.e. Number of events processed over the last N seconds),
but in some cases it may be a hard number (Heap space used by the VM, or
number of events queued in the disruptor queue).

You almost have to look at every metric that is printed out and decide if
you want to process it, and if so how to put it into your
monitoring/metrics system of choice.  The logging metrics collector is
simple, but not what most people will want to use.

Then there are the latency metrics where it gets even more complex because
to aggregate them you also need the corresponding event counts.  The
metrics for these that you get from the UI/Nimbus handle this for you, but
with this system you need to do some of the math yourself to compute a
latency weighted by the event throughput.

‹Bobby

On 2/24/14, 12:06 PM, "Otávio Carvalho" <ot...@gmail.com> wrote:

>You can also take a look at storm-perf-test (
>https://github.com/yahoo/storm-perf-test/) source code.
>I'm currently trying to extract some metrics, in order to develop
>benchmarks for storm and other stream processors, and I thought it was
>really useful.
>
>Thanks,
>
>Otávio.
>
>Undergraduate Student at Federal University of Rio Grande do Sul -
>http://inf.ufrgs.br
>Scholarship holder at Parallel and Distributed Processing Group -
>http://gppd.inf.ufrgs.br
>omcarvalho@inf.ufrgs.br / @otaviocarvalho
>
>2014-02-24 12:50 GMT-03:00 Milinda Pathirage <mp...@umail.iu.edu>:
>
>> Hi Padma,
>>
>> I think answers to your questions are there in the article you
>> mentioned. Anyway I'll try to explain what needs to be done briefly.
>> Note that I don't have any experience on statsd or graphite.
>>
>> First on sumerizing the metrics in metrics.log file. If you want to
>> summerize the metrics mentioned in the article, you will have to write
>> your own summarizer. It depends on what and how to summerize data you
>> collected. It looks like fields in metrics.log contains information
>> such as timestamp of the metrics publish event, storm host name, bolt
>> identifier, metrics identifier and actual metrics value. You should be
>> able to understand it by reading [2].
>>
>>   - It looks like people are using statsd to feed graphite[1]. And
>> author of the article you mentioned also planning to use the same
>> approach.
>>   - In this case you need to first write a metrics consumer which
>> publish metrics to statsd.
>>   - Then connnect statsd and graphite according to [2].
>>
>> I think its possible to write a metrics consumer which directly feed
>> graphite. But I am not sure whether which approach is easier.
>>
>> Thanks
>> Milinda
>>
>> [1]
>> 
>>http://matt.aimonetti.net/posts/2013/06/26/practical-guide-to-graphite-mo
>>nitoring/
>> [2]
>> 
>>https://github.com/nathanmarz/storm/blob/master/storm-core/src/jvm/backty
>>pe/storm/metric/LoggingMetricsConsumer.java
>>
>> On Mon, Feb 24, 2014 at 8:40 AM, padma priya chitturi
>> <pa...@gmail.com> wrote:
>> > Hi All,
>> >
>> > I've been using storm metrics to visualize the performance of storm (
>> > 
>>http://www.bigdata-cookbook.com/post/72320512609/storm-metrics-how-to).
>> >
>> > I have included the metrics initialization code in ExclamationTopology
>> code
>> > and saw the metrics in metrics.log
>> >
>> > How can we summarize the metrics in metrics.log file. ? what do
>>different
>> > fields mean ? How can we visualize the metrics using Graphite ?
>> >
>> > Can someone suggest me in this..
>> >
>> > Thanks,
>> > Padma Ch.
>>
>>
>>
>> --
>> Milinda Pathirage
>>
>> PhD Student | Research Assistant
>> School of Informatics and Computing | Data to Insight Center
>> Indiana University
>>
>> twitter: milindalakmal
>> skype: milinda.pathirage
>> blog: http://milinda.pathirage.org
>>


Re: Storm Performace Benchmark

Posted by Otávio Carvalho <ot...@gmail.com>.
You can also take a look at storm-perf-test (
https://github.com/yahoo/storm-perf-test/) source code.
I'm currently trying to extract some metrics, in order to develop
benchmarks for storm and other stream processors, and I thought it was
really useful.

Thanks,

Otávio.

Undergraduate Student at Federal University of Rio Grande do Sul -
http://inf.ufrgs.br
Scholarship holder at Parallel and Distributed Processing Group -
http://gppd.inf.ufrgs.br
omcarvalho@inf.ufrgs.br / @otaviocarvalho

2014-02-24 12:50 GMT-03:00 Milinda Pathirage <mp...@umail.iu.edu>:

> Hi Padma,
>
> I think answers to your questions are there in the article you
> mentioned. Anyway I'll try to explain what needs to be done briefly.
> Note that I don't have any experience on statsd or graphite.
>
> First on sumerizing the metrics in metrics.log file. If you want to
> summerize the metrics mentioned in the article, you will have to write
> your own summarizer. It depends on what and how to summerize data you
> collected. It looks like fields in metrics.log contains information
> such as timestamp of the metrics publish event, storm host name, bolt
> identifier, metrics identifier and actual metrics value. You should be
> able to understand it by reading [2].
>
>   - It looks like people are using statsd to feed graphite[1]. And
> author of the article you mentioned also planning to use the same
> approach.
>   - In this case you need to first write a metrics consumer which
> publish metrics to statsd.
>   - Then connnect statsd and graphite according to [2].
>
> I think its possible to write a metrics consumer which directly feed
> graphite. But I am not sure whether which approach is easier.
>
> Thanks
> Milinda
>
> [1]
> http://matt.aimonetti.net/posts/2013/06/26/practical-guide-to-graphite-monitoring/
> [2]
> https://github.com/nathanmarz/storm/blob/master/storm-core/src/jvm/backtype/storm/metric/LoggingMetricsConsumer.java
>
> On Mon, Feb 24, 2014 at 8:40 AM, padma priya chitturi
> <pa...@gmail.com> wrote:
> > Hi All,
> >
> > I've been using storm metrics to visualize the performance of storm (
> > http://www.bigdata-cookbook.com/post/72320512609/storm-metrics-how-to).
> >
> > I have included the metrics initialization code in ExclamationTopology
> code
> > and saw the metrics in metrics.log
> >
> > How can we summarize the metrics in metrics.log file. ? what do different
> > fields mean ? How can we visualize the metrics using Graphite ?
> >
> > Can someone suggest me in this..
> >
> > Thanks,
> > Padma Ch.
>
>
>
> --
> Milinda Pathirage
>
> PhD Student | Research Assistant
> School of Informatics and Computing | Data to Insight Center
> Indiana University
>
> twitter: milindalakmal
> skype: milinda.pathirage
> blog: http://milinda.pathirage.org
>

Re: Storm Performace Benchmark

Posted by Milinda Pathirage <mp...@umail.iu.edu>.
Hi Padma,

I think answers to your questions are there in the article you
mentioned. Anyway I'll try to explain what needs to be done briefly.
Note that I don't have any experience on statsd or graphite.

First on sumerizing the metrics in metrics.log file. If you want to
summerize the metrics mentioned in the article, you will have to write
your own summarizer. It depends on what and how to summerize data you
collected. It looks like fields in metrics.log contains information
such as timestamp of the metrics publish event, storm host name, bolt
identifier, metrics identifier and actual metrics value. You should be
able to understand it by reading [2].

  - It looks like people are using statsd to feed graphite[1]. And
author of the article you mentioned also planning to use the same
approach.
  - In this case you need to first write a metrics consumer which
publish metrics to statsd.
  - Then connnect statsd and graphite according to [2].

I think its possible to write a metrics consumer which directly feed
graphite. But I am not sure whether which approach is easier.

Thanks
Milinda

[1] http://matt.aimonetti.net/posts/2013/06/26/practical-guide-to-graphite-monitoring/
[2] https://github.com/nathanmarz/storm/blob/master/storm-core/src/jvm/backtype/storm/metric/LoggingMetricsConsumer.java

On Mon, Feb 24, 2014 at 8:40 AM, padma priya chitturi
<pa...@gmail.com> wrote:
> Hi All,
>
> I've been using storm metrics to visualize the performance of storm (
> http://www.bigdata-cookbook.com/post/72320512609/storm-metrics-how-to).
>
> I have included the metrics initialization code in ExclamationTopology code
> and saw the metrics in metrics.log
>
> How can we summarize the metrics in metrics.log file. ? what do different
> fields mean ? How can we visualize the metrics using Graphite ?
>
> Can someone suggest me in this..
>
> Thanks,
> Padma Ch.



-- 
Milinda Pathirage

PhD Student | Research Assistant
School of Informatics and Computing | Data to Insight Center
Indiana University

twitter: milindalakmal
skype: milinda.pathirage
blog: http://milinda.pathirage.org