You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@storm.apache.org by "Thomas Cooper (PGR)" <t....@newcastle.ac.uk> on 2016/05/26 12:11:28 UTC

Clarification on metrics sampling rate

Hi,


I'm a Computer Science PhD student working on modelling the performance of distributed stream processing systems like Storm.


I am attempting to use Queueing Theory to model the performance of a running topology and then make predictions about performance under varying input loads. To do this accurately I need metrics for the latency, arrival and emission rates of each task (among other things), which Storm happily gives me.


However, I know that Storm samples the summary metrics for the UI using the Config.TOPOLOGY_STATS_SAMPLE_RATE value (default to 0.05), but does this apply to the metrics on the "__metrics" stream which, as I understand it, are sent to any bolt implementing IMetricConsumer registered with Topology?


Any hints would be greatly appreciated, as a last resort I can go digging in the source code but I would like to avoid that if possible.


Also let me know if this would be better posted on the dev mailing list. This is my 1st time using the mailing list, I am likely to have more questions in the future and I want to avoid spamming the wrong people.


Thanks in advance,


Tom Cooper
PhD Student
Newcastle University, School of Computer Science

Re: Clarification on metrics sampling rate

Posted by Jungtaek Lim <ka...@gmail.com>.

Yes you're right. It's measured for each executor.
Btw, metrics of transfer-queue is not exposed since it's not part of
executor, but separate queue.

2016년 5월 26일 (목) 오후 9:51, Thomas Cooper (PGR) <t....@newcastle.ac.uk>님이
작성:

> Thanks for the quick reply Jungtaek,
>
>
> That clears things up for me.
>
>
> One other question. For the metrics related to sendqueue and receive, am I
> right in thinking these are related to the Disruptor send queue and
> Disruptor receive queue for the executor running each task and not the send
> and receive threads for each worker process?
>
>
> Thanks,
>
>
> Thomas Cooper
>
> PhD Student
> Newcastle University, School of Computer Science
>
>
>
> ------------------------------
> *From:* Jungtaek Lim <ka...@gmail.com>
> *Sent:* 26 May 2016 13:32
> *To:* user@storm.apache.org
> *Subject:* Re: Clarification on metrics sampling rate
>
> Hi Tom,
>
> At first, user mailing list is more proper place since dev mailing list is
> for Storm developers (committers / PMCs / contributors) talking about
> improving/maintaining Storm.
>
> Topology built-in metrics are sampled regardless of consumer (UI metric -
> actually task heartbeat, or metrics consumer).
> While metrics provided to UI has time windows, metrics provided to metrics
> consumer resets their values every period.
>
> Hope this help.
>
> Thanks,
> Jungtaek Lim (HeartSaVioR)
>
> 2016년 5월 26일 (목) 오후 9:11, Thomas Cooper (PGR) <t....@newcastle.ac.uk>님이
> 작성:
>
>> Hi,
>>
>>
>> I'm a Computer Science PhD student working on modelling the performance
>> of distributed stream processing systems like Storm.
>>
>>
>> I am attempting to use Queueing Theory to model the performance of a
>> running topology and then make predictions about performance under varying
>> input loads. To do this accurately I need metrics for the latency, arrival
>> and emission rates of each task (among other things), which Storm happily
>> gives me.
>>
>>
>> However, I know that Storm samples the summary metrics for the UI using
>> the Config.TOPOLOGY_STATS_SAMPLE_RATE value (default to 0.05), but does
>> this apply to the metrics on the "__metrics" stream which, as I understand
>> it, are sent to any bolt implementing IMetricConsumer registered with
>> Topology?
>>
>>
>> Any hints would be greatly appreciated, as a last resort I can go digging
>> in the source code but I would like to avoid that if possible.
>>
>>
>> Also let me know if this would be better posted on the dev mailing list.
>> This is my 1st time using the mailing list, I am likely to have more
>> questions in the future and I want to avoid spamming the wrong people.
>>
>>
>> Thanks in advance,
>>
>>
>> Tom Cooper
>> PhD Student
>> Newcastle University, School of Computer Science
>>
>>

Re: Clarification on metrics sampling rate

Posted by "Thomas Cooper (PGR)" <t....@newcastle.ac.uk>.

Thanks for the quick reply Jungtaek,


That clears things up for me.


One other question. For the metrics related to sendqueue and receive, am I right in thinking these are related to the Disruptor send queue and Disruptor receive queue for the executor running each task and not the send and receive threads for each worker process?


Thanks,


Thomas Cooper
PhD Student
Newcastle University, School of Computer Science



________________________________
From: Jungtaek Lim <ka...@gmail.com>
Sent: 26 May 2016 13:32
To: user@storm.apache.org
Subject: Re: Clarification on metrics sampling rate

Hi Tom,

At first, user mailing list is more proper place since dev mailing list is for Storm developers (committers / PMCs / contributors) talking about improving/maintaining Storm.

Topology built-in metrics are sampled regardless of consumer (UI metric - actually task heartbeat, or metrics consumer).
While metrics provided to UI has time windows, metrics provided to metrics consumer resets their values every period.

Hope this help.

Thanks,
Jungtaek Lim (HeartSaVioR)

2016? 5? 26? (?) ?? 9:11, Thomas Cooper (PGR) <t....@newcastle.ac.uk>>?? ??:

Hi,


I'm a Computer Science PhD student working on modelling the performance of distributed stream processing systems like Storm.


I am attempting to use Queueing Theory to model the performance of a running topology and then make predictions about performance under varying input loads. To do this accurately I need metrics for the latency, arrival and emission rates of each task (among other things), which Storm happily gives me.


However, I know that Storm samples the summary metrics for the UI using the Config.TOPOLOGY_STATS_SAMPLE_RATE value (default to 0.05), but does this apply to the metrics on the "__metrics" stream which, as I understand it, are sent to any bolt implementing IMetricConsumer registered with Topology?


Any hints would be greatly appreciated, as a last resort I can go digging in the source code but I would like to avoid that if possible.


Also let me know if this would be better posted on the dev mailing list. This is my 1st time using the mailing list, I am likely to have more questions in the future and I want to avoid spamming the wrong people.


Thanks in advance,


Tom Cooper
PhD Student
Newcastle University, School of Computer Science

Re: Clarification on metrics sampling rate

Posted by Jungtaek Lim <ka...@gmail.com>.

Hi Tom,

At first, user mailing list is more proper place since dev mailing list is
for Storm developers (committers / PMCs / contributors) talking about
improving/maintaining Storm.

Topology built-in metrics are sampled regardless of consumer (UI metric -
actually task heartbeat, or metrics consumer).
While metrics provided to UI has time windows, metrics provided to metrics
consumer resets their values every period.

Hope this help.

Thanks,
Jungtaek Lim (HeartSaVioR)

2016년 5월 26일 (목) 오후 9:11, Thomas Cooper (PGR) <t....@newcastle.ac.uk>님이
작성:

> Hi,
>
>
> I'm a Computer Science PhD student working on modelling the performance of
> distributed stream processing systems like Storm.
>
>
> I am attempting to use Queueing Theory to model the performance of a
> running topology and then make predictions about performance under varying
> input loads. To do this accurately I need metrics for the latency, arrival
> and emission rates of each task (among other things), which Storm happily
> gives me.
>
>
> However, I know that Storm samples the summary metrics for the UI using
> the Config.TOPOLOGY_STATS_SAMPLE_RATE value (default to 0.05), but does
> this apply to the metrics on the "__metrics" stream which, as I understand
> it, are sent to any bolt implementing IMetricConsumer registered with
> Topology?
>
>
> Any hints would be greatly appreciated, as a last resort I can go digging
> in the source code but I would like to avoid that if possible.
>
>
> Also let me know if this would be better posted on the dev mailing list.
> This is my 1st time using the mailing list, I am likely to have more
> questions in the future and I want to avoid spamming the wrong people.
>
>
> Thanks in advance,
>
>
> Tom Cooper
> PhD Student
> Newcastle University, School of Computer Science
>
>