You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by shalom sagges <sh...@gmail.com> on 2019/05/29 14:31:59 UTC

Collecting Latency Metrics

Hi All,

I'm creating a dashboard that should collect read/write latency metrics on
C* 3.x.
In older versions (e.g. 2.0) I used to divide the total read latency in
microseconds with the read count.

Is there a metric attribute that shows read/write latency without the need
to do the math, such as in nodetool tablestats "Local read latency" output?
I saw there's a Mean attribute in org.apache.cassandra.metrics.ReadLatency
but I'm not sure this is the right one.

I'd really appreciate your help on this one.
Thanks!

Re: Collecting Latency Metrics

Posted by Chris Lohfink <cl...@gmail.com>.
To answer your question
org.apache.cassandra.metrics:type=Table,name=ReadTotalLatency can give you
the total local read latency in microseconds and you can get the count from
the Latency read metric.

If you are going to do that be sure to do it on the delta from previous
query (new - last) for both total latency and counter or else you will
slowly converge to a global average that will almost never change as the
quantity of reads simply removes outliers. The mean attribute of the
Latency metric you mentioned will give you an approximation for this
actually as its taking the total/count of a decaying histogram of the
latencies. It will however be even less accurate than using the deltas
since the bounds of the decaying wont necessarily match up with your
reading intervals and histogram introduces a worst case 20% round up. Even
with using deltas though this will hide outliers, you could end up with
really bad queries that don't even show up as a tick on your graph
(although *generally* it will).

Chris

On Wed, May 29, 2019 at 9:32 AM shalom sagges <sh...@gmail.com>
wrote:

> Hi All,
>
> I'm creating a dashboard that should collect read/write latency metrics on
> C* 3.x.
> In older versions (e.g. 2.0) I used to divide the total read latency in
> microseconds with the read count.
>
> Is there a metric attribute that shows read/write latency without the need
> to do the math, such as in nodetool tablestats "Local read latency" output?
> I saw there's a Mean attribute in org.apache.cassandra.metrics.ReadLatency
> but I'm not sure this is the right one.
>
> I'd really appreciate your help on this one.
> Thanks!
>
>
>

Re: Collecting Latency Metrics

Posted by shalom sagges <sh...@gmail.com>.
Sorry for the duplicated emails but I just want to make sure I'm doing
it correctly:
To summarize, are both ways accurate or one is better than the other?

divideSeries(averageSeries(keepLastValue(nonNegativeDerivative($env.path.to.host.$host.org_apache_cassandra_metrics.Table.$ks.$cf.ReadTotalLatency.Count))),averageSeries(keepLastValue(nonNegativeDerivative($env.path.to.host.$host.org_apache_cassandra_metrics.Table.$ks.$cf.ReadLatency.Count))))

OR

alias(scaleToSeconds(averageSeriesWithWildcards(nonNegativeDerivative($env.path.to.host.$host.org_apache_cassandra_metrics.Table.$ks.$cf.ReadTotalLatency.Count),7,8,9),1),'test')

WDYT?


On Thu, May 30, 2019 at 2:29 PM shalom sagges <sh...@gmail.com>
wrote:

> Thanks for your replies guys. I really appreciate it.
>
> @Alain, I use Graphite for backend on top of Grafana. But the goal is to
> move from Graphite to Prometheus eventually.
>
> I tried to find a direct way of getting a specific Latency metric in
> average and as Chris pointed out, then Mean value isn't that accurate.
> I do not wish to use the percentile metrics either, but a single latency
> metric like the *"Local read latency" *output in nodetool tablestats.
> Looking at the code of nodetool tablestats, it seems that C* also divides
> *ReadTotalLatency.Count* with *ReadLatency.Count *to get the latency
> result.
>
> So I guess I will have no choice but to run the calculation on my own via
> Graphite:
>
> divideSeries(averageSeries(keepLastValue(nonNegativeDerivative($env.path.to.host.$host.org_apache_cassandra_metrics.Table.$ks.$cf.ReadTotalLatency.Count))),averageSeries(keepLastValue(nonNegativeDerivative($env.path.to.host.$host.org_apache_cassandra_metrics.Table.$ks.$cf.ReadLatency.Count))))
>
> Does this seem right to you?
>
> Thanks!
>
> On Thu, May 30, 2019 at 12:34 AM Paul Chandler <pa...@redshots.com> wrote:
>
>> There are various attributes under
>> org.apache.cassandra.metrics.ClientRequest.Latency.Read these measure the
>> latency in milliseconds
>>
>> Thanks
>>
>> Paul
>> www.redshots.com
>>
>> > On 29 May 2019, at 15:31, shalom sagges <sh...@gmail.com> wrote:
>> >
>> > Hi All,
>> >
>> > I'm creating a dashboard that should collect read/write latency metrics
>> on C* 3.x.
>> > In older versions (e.g. 2.0) I used to divide the total read latency in
>> microseconds with the read count.
>> >
>> > Is there a metric attribute that shows read/write latency without the
>> need to do the math, such as in nodetool tablestats "Local read latency"
>> output?
>> > I saw there's a Mean attribute in
>> org.apache.cassandra.metrics.ReadLatency but I'm not sure this is the right
>> one.
>> >
>> > I'd really appreciate your help on this one.
>> > Thanks!
>> >
>> >
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
>> For additional commands, e-mail: user-help@cassandra.apache.org
>>
>>

Re: Collecting Latency Metrics

Posted by shalom sagges <sh...@gmail.com>.
Thanks a lot for your comments.
This mailing list is truly *the *definitive guide to Cassandra
*. *
The knowledge transferred here is invaluable.
So just wanted to give a big shout out to anyone who is helping out here.

Regards,

On Thu, May 30, 2019 at 6:10 PM Jon Haddad <jo...@jonhaddad.com> wrote:

> Yep.  I would *never* use mean when it comes to performance to make any
> sort of decisions.  I prefer to graph all the p99 latencies as well as the
> max.
>
> Some good reading on the topic:
> https://bravenewgeek.com/everything-you-know-about-latency-is-wrong/
>
> On Thu, May 30, 2019 at 7:35 AM Chris Lohfink <cl...@gmail.com>
> wrote:
>
>> For what it is worth, generally I would recommend just using the mean vs
>> calculating it yourself. It's a lot easier and averages are meaningless for
>> anything besides trending anyway (which is really what this is useful for,
>> finding issues on the larger scale), especially with high volume clusters
>> so the loss in accuracy kinda moot. Your average for local reads/writes
>> will almost always be sub millisecond but you might end up having 500
>> millisecond requests or worse that the mean will hide.
>>
>> Chris
>>
>> On Thu, May 30, 2019 at 6:30 AM shalom sagges <sh...@gmail.com>
>> wrote:
>>
>>> Thanks for your replies guys. I really appreciate it.
>>>
>>> @Alain, I use Graphite for backend on top of Grafana. But the goal is to
>>> move from Graphite to Prometheus eventually.
>>>
>>> I tried to find a direct way of getting a specific Latency metric in
>>> average and as Chris pointed out, then Mean value isn't that accurate.
>>> I do not wish to use the percentile metrics either, but a single latency
>>> metric like the *"Local read latency" *output in nodetool tablestats.
>>> Looking at the code of nodetool tablestats, it seems that C* also
>>> divides *ReadTotalLatency.Count* with *ReadLatency.Count *to get the
>>> latency result.
>>>
>>> So I guess I will have no choice but to run the calculation on my own
>>> via Graphite:
>>>
>>> divideSeries(averageSeries(keepLastValue(nonNegativeDerivative($env.path.to.host.$host.org_apache_cassandra_metrics.Table.$ks.$cf.ReadTotalLatency.Count))),averageSeries(keepLastValue(nonNegativeDerivative($env.path.to.host.$host.org_apache_cassandra_metrics.Table.$ks.$cf.ReadLatency.Count))))
>>>
>>> Does this seem right to you?
>>>
>>> Thanks!
>>>
>>> On Thu, May 30, 2019 at 12:34 AM Paul Chandler <pa...@redshots.com>
>>> wrote:
>>>
>>>> There are various attributes under
>>>> org.apache.cassandra.metrics.ClientRequest.Latency.Read these measure the
>>>> latency in milliseconds
>>>>
>>>> Thanks
>>>>
>>>> Paul
>>>> www.redshots.com
>>>>
>>>> > On 29 May 2019, at 15:31, shalom sagges <sh...@gmail.com>
>>>> wrote:
>>>> >
>>>> > Hi All,
>>>> >
>>>> > I'm creating a dashboard that should collect read/write latency
>>>> metrics on C* 3.x.
>>>> > In older versions (e.g. 2.0) I used to divide the total read latency
>>>> in microseconds with the read count.
>>>> >
>>>> > Is there a metric attribute that shows read/write latency without the
>>>> need to do the math, such as in nodetool tablestats "Local read latency"
>>>> output?
>>>> > I saw there's a Mean attribute in
>>>> org.apache.cassandra.metrics.ReadLatency but I'm not sure this is the right
>>>> one.
>>>> >
>>>> > I'd really appreciate your help on this one.
>>>> > Thanks!
>>>> >
>>>> >
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
>>>> For additional commands, e-mail: user-help@cassandra.apache.org
>>>>
>>>>

Re: Collecting Latency Metrics

Posted by Jon Haddad <jo...@jonhaddad.com>.
Yep.  I would *never* use mean when it comes to performance to make any
sort of decisions.  I prefer to graph all the p99 latencies as well as the
max.

Some good reading on the topic:
https://bravenewgeek.com/everything-you-know-about-latency-is-wrong/

On Thu, May 30, 2019 at 7:35 AM Chris Lohfink <cl...@gmail.com> wrote:

> For what it is worth, generally I would recommend just using the mean vs
> calculating it yourself. It's a lot easier and averages are meaningless for
> anything besides trending anyway (which is really what this is useful for,
> finding issues on the larger scale), especially with high volume clusters
> so the loss in accuracy kinda moot. Your average for local reads/writes
> will almost always be sub millisecond but you might end up having 500
> millisecond requests or worse that the mean will hide.
>
> Chris
>
> On Thu, May 30, 2019 at 6:30 AM shalom sagges <sh...@gmail.com>
> wrote:
>
>> Thanks for your replies guys. I really appreciate it.
>>
>> @Alain, I use Graphite for backend on top of Grafana. But the goal is to
>> move from Graphite to Prometheus eventually.
>>
>> I tried to find a direct way of getting a specific Latency metric in
>> average and as Chris pointed out, then Mean value isn't that accurate.
>> I do not wish to use the percentile metrics either, but a single latency
>> metric like the *"Local read latency" *output in nodetool tablestats.
>> Looking at the code of nodetool tablestats, it seems that C* also divides
>> *ReadTotalLatency.Count* with *ReadLatency.Count *to get the latency
>> result.
>>
>> So I guess I will have no choice but to run the calculation on my own via
>> Graphite:
>>
>> divideSeries(averageSeries(keepLastValue(nonNegativeDerivative($env.path.to.host.$host.org_apache_cassandra_metrics.Table.$ks.$cf.ReadTotalLatency.Count))),averageSeries(keepLastValue(nonNegativeDerivative($env.path.to.host.$host.org_apache_cassandra_metrics.Table.$ks.$cf.ReadLatency.Count))))
>>
>> Does this seem right to you?
>>
>> Thanks!
>>
>> On Thu, May 30, 2019 at 12:34 AM Paul Chandler <pa...@redshots.com> wrote:
>>
>>> There are various attributes under
>>> org.apache.cassandra.metrics.ClientRequest.Latency.Read these measure the
>>> latency in milliseconds
>>>
>>> Thanks
>>>
>>> Paul
>>> www.redshots.com
>>>
>>> > On 29 May 2019, at 15:31, shalom sagges <sh...@gmail.com>
>>> wrote:
>>> >
>>> > Hi All,
>>> >
>>> > I'm creating a dashboard that should collect read/write latency
>>> metrics on C* 3.x.
>>> > In older versions (e.g. 2.0) I used to divide the total read latency
>>> in microseconds with the read count.
>>> >
>>> > Is there a metric attribute that shows read/write latency without the
>>> need to do the math, such as in nodetool tablestats "Local read latency"
>>> output?
>>> > I saw there's a Mean attribute in
>>> org.apache.cassandra.metrics.ReadLatency but I'm not sure this is the right
>>> one.
>>> >
>>> > I'd really appreciate your help on this one.
>>> > Thanks!
>>> >
>>> >
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
>>> For additional commands, e-mail: user-help@cassandra.apache.org
>>>
>>>

Re: Collecting Latency Metrics

Posted by Chris Lohfink <cl...@gmail.com>.
For what it is worth, generally I would recommend just using the mean vs
calculating it yourself. It's a lot easier and averages are meaningless for
anything besides trending anyway (which is really what this is useful for,
finding issues on the larger scale), especially with high volume clusters
so the loss in accuracy kinda moot. Your average for local reads/writes
will almost always be sub millisecond but you might end up having 500
millisecond requests or worse that the mean will hide.

Chris

On Thu, May 30, 2019 at 6:30 AM shalom sagges <sh...@gmail.com>
wrote:

> Thanks for your replies guys. I really appreciate it.
>
> @Alain, I use Graphite for backend on top of Grafana. But the goal is to
> move from Graphite to Prometheus eventually.
>
> I tried to find a direct way of getting a specific Latency metric in
> average and as Chris pointed out, then Mean value isn't that accurate.
> I do not wish to use the percentile metrics either, but a single latency
> metric like the *"Local read latency" *output in nodetool tablestats.
> Looking at the code of nodetool tablestats, it seems that C* also divides
> *ReadTotalLatency.Count* with *ReadLatency.Count *to get the latency
> result.
>
> So I guess I will have no choice but to run the calculation on my own via
> Graphite:
>
> divideSeries(averageSeries(keepLastValue(nonNegativeDerivative($env.path.to.host.$host.org_apache_cassandra_metrics.Table.$ks.$cf.ReadTotalLatency.Count))),averageSeries(keepLastValue(nonNegativeDerivative($env.path.to.host.$host.org_apache_cassandra_metrics.Table.$ks.$cf.ReadLatency.Count))))
>
> Does this seem right to you?
>
> Thanks!
>
> On Thu, May 30, 2019 at 12:34 AM Paul Chandler <pa...@redshots.com> wrote:
>
>> There are various attributes under
>> org.apache.cassandra.metrics.ClientRequest.Latency.Read these measure the
>> latency in milliseconds
>>
>> Thanks
>>
>> Paul
>> www.redshots.com
>>
>> > On 29 May 2019, at 15:31, shalom sagges <sh...@gmail.com> wrote:
>> >
>> > Hi All,
>> >
>> > I'm creating a dashboard that should collect read/write latency metrics
>> on C* 3.x.
>> > In older versions (e.g. 2.0) I used to divide the total read latency in
>> microseconds with the read count.
>> >
>> > Is there a metric attribute that shows read/write latency without the
>> need to do the math, such as in nodetool tablestats "Local read latency"
>> output?
>> > I saw there's a Mean attribute in
>> org.apache.cassandra.metrics.ReadLatency but I'm not sure this is the right
>> one.
>> >
>> > I'd really appreciate your help on this one.
>> > Thanks!
>> >
>> >
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
>> For additional commands, e-mail: user-help@cassandra.apache.org
>>
>>

Re: Collecting Latency Metrics

Posted by shalom sagges <sh...@gmail.com>.
Thanks for your replies guys. I really appreciate it.

@Alain, I use Graphite for backend on top of Grafana. But the goal is to
move from Graphite to Prometheus eventually.

I tried to find a direct way of getting a specific Latency metric in
average and as Chris pointed out, then Mean value isn't that accurate.
I do not wish to use the percentile metrics either, but a single latency
metric like the *"Local read latency" *output in nodetool tablestats.
Looking at the code of nodetool tablestats, it seems that C* also divides
*ReadTotalLatency.Count* with *ReadLatency.Count *to get the latency
result.

So I guess I will have no choice but to run the calculation on my own via
Graphite:
divideSeries(averageSeries(keepLastValue(nonNegativeDerivative($env.path.to.host.$host.org_apache_cassandra_metrics.Table.$ks.$cf.ReadTotalLatency.Count))),averageSeries(keepLastValue(nonNegativeDerivative($env.path.to.host.$host.org_apache_cassandra_metrics.Table.$ks.$cf.ReadLatency.Count))))

Does this seem right to you?

Thanks!

On Thu, May 30, 2019 at 12:34 AM Paul Chandler <pa...@redshots.com> wrote:

> There are various attributes under
> org.apache.cassandra.metrics.ClientRequest.Latency.Read these measure the
> latency in milliseconds
>
> Thanks
>
> Paul
> www.redshots.com
>
> > On 29 May 2019, at 15:31, shalom sagges <sh...@gmail.com> wrote:
> >
> > Hi All,
> >
> > I'm creating a dashboard that should collect read/write latency metrics
> on C* 3.x.
> > In older versions (e.g. 2.0) I used to divide the total read latency in
> microseconds with the read count.
> >
> > Is there a metric attribute that shows read/write latency without the
> need to do the math, such as in nodetool tablestats "Local read latency"
> output?
> > I saw there's a Mean attribute in
> org.apache.cassandra.metrics.ReadLatency but I'm not sure this is the right
> one.
> >
> > I'd really appreciate your help on this one.
> > Thanks!
> >
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: user-help@cassandra.apache.org
>
>

Re: Collecting Latency Metrics

Posted by Chris Lohfink <cl...@gmail.com>.
>
> org.apache.cassandra.metrics.ClientRequest.Latency.Read these measure the
> latency in milliseconds
>

Its actually in microseconds, unless calling the values() operation which
gives the histogram in nanoseconds

On Wed, May 29, 2019 at 4:34 PM Paul Chandler <pa...@redshots.com> wrote:

> There are various attributes under
> org.apache.cassandra.metrics.ClientRequest.Latency.Read these measure the
> latency in milliseconds
>
> Thanks
>
> Paul
> www.redshots.com
>
> > On 29 May 2019, at 15:31, shalom sagges <sh...@gmail.com> wrote:
> >
> > Hi All,
> >
> > I'm creating a dashboard that should collect read/write latency metrics
> on C* 3.x.
> > In older versions (e.g. 2.0) I used to divide the total read latency in
> microseconds with the read count.
> >
> > Is there a metric attribute that shows read/write latency without the
> need to do the math, such as in nodetool tablestats "Local read latency"
> output?
> > I saw there's a Mean attribute in
> org.apache.cassandra.metrics.ReadLatency but I'm not sure this is the right
> one.
> >
> > I'd really appreciate your help on this one.
> > Thanks!
> >
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: user-help@cassandra.apache.org
>
>

Re: Collecting Latency Metrics

Posted by Paul Chandler <pa...@redshots.com>.
There are various attributes under org.apache.cassandra.metrics.ClientRequest.Latency.Read these measure the latency in milliseconds

Thanks 

Paul
www.redshots.com

> On 29 May 2019, at 15:31, shalom sagges <sh...@gmail.com> wrote:
> 
> Hi All,
> 
> I'm creating a dashboard that should collect read/write latency metrics on C* 3.x. 
> In older versions (e.g. 2.0) I used to divide the total read latency in microseconds with the read count. 
> 
> Is there a metric attribute that shows read/write latency without the need to do the math, such as in nodetool tablestats "Local read latency" output?
> I saw there's a Mean attribute in org.apache.cassandra.metrics.ReadLatency but I'm not sure this is the right one. 
> 
> I'd really appreciate your help on this one. 
> Thanks!
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
For additional commands, e-mail: user-help@cassandra.apache.org


Re: Collecting Latency Metrics

Posted by shalom sagges <sh...@gmail.com>.
If I only send ReadTotalLatency to Graphite/Grafana, can I run an average
on it and use "scale to seconds=1" ?
Will that do the trick?

Thanks!

On Wed, May 29, 2019 at 5:31 PM shalom sagges <sh...@gmail.com>
wrote:

> Hi All,
>
> I'm creating a dashboard that should collect read/write latency metrics on
> C* 3.x.
> In older versions (e.g. 2.0) I used to divide the total read latency in
> microseconds with the read count.
>
> Is there a metric attribute that shows read/write latency without the need
> to do the math, such as in nodetool tablestats "Local read latency" output?
> I saw there's a Mean attribute in org.apache.cassandra.metrics.ReadLatency
> but I'm not sure this is the right one.
>
> I'd really appreciate your help on this one.
> Thanks!
>
>
>

Re: Collecting Latency Metrics

Posted by Alain RODRIGUEZ <ar...@gmail.com>.
Hello,

This metric is available indeed:

Most of the metrics available are documented here:
http://cassandra.apache.org/doc/latest/operating/metrics.html

For client requests (coordinator perspective latency):
http://cassandra.apache.org/doc/latest/operating/metrics.html#client-request-metrics
For local requests (per table/host latency, locally, no network
communication included):
http://cassandra.apache.org/doc/latest/operating/metrics.html#table-metrics

LatencySpecial type that tracks latency (in microseconds) with a Timer plus
> a Counter that tracks the total latency accrued since starting. The
> former is useful if you track the change in total latency since the last
> check. Each metric name of this type will have ‘Latency’ and ‘TotalLatency’
> appended to it.


You need 'Latency', not 'TotalLatency'. I would guess that's the issue
because latencies are available for as far as I remember (including C*2.0,
1.2 for sure :)).

Also, be aware that quite a few things changed in the metric structure
between C* 2.1 and C*2.2 (and C*3.0 is similar to C*2.2).

Examples of changes:
- ColumnFamily --> Table
- 99percentile --> p99
- 1MinuteRate -->  m1_rate
- metric name before KS and Table names and some other changes of this kind.
- ^ aggregations / aliases and indexes changed because of this ^ - breaking
most of the charts (in my case at least).
- ‘.value’ is not appended to the metric name anymore for gauges, nothing
instead.

For example (Grafana / Graphite):
From
```aliasByNode(averageSeriesWithWildcards(cassandra.$env.$dc.$host.org.apache.cassandra.metrics.ColumnFamily.$ks.$table.ReadLatency.95percentile,
2, 3), 1, 7, 8, 9)```
to
```aliasByNode(averageSeriesWithWildcards(cassandra.$env.$dc.$host.org.apache.cassandra.metrics.Table.ReadLatency.$ks.$table.p95,
2, 3), 1, 8, 9, 10)```


Another tip, is to use ccm locally (https://github.com/riptano/ccm) for
example and 'jconsole $cassandra_pid'. I use this -->jconsole $(ccm node1
show | grep pid | awk -F= '{print $2}')
Once you're in, you can explore available mbeans and find the metrics
available in 'org.apache.cassandra.[...]'. It's not ideal as you search
'manually' but it allowed me to find some metrics in the past or fix issues
from the doc above.

Out of curiosity, may I ask what backend you used for your monitoring?

C*heers,
-----------------------
Alain Rodriguez - alain@thelastpickle.com
France / Spain

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com



Le mer. 29 mai 2019 à 15:32, shalom sagges <sh...@gmail.com> a
écrit :

> Hi All,
>
> I'm creating a dashboard that should collect read/write latency metrics on
> C* 3.x.
> In older versions (e.g. 2.0) I used to divide the total read latency in
> microseconds with the read count.
>
> Is there a metric attribute that shows read/write latency without the need
> to do the math, such as in nodetool tablestats "Local read latency" output?
> I saw there's a Mean attribute in org.apache.cassandra.metrics.ReadLatency
> but I'm not sure this is the right one.
>
> I'd really appreciate your help on this one.
> Thanks!
>
>
>