You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Brian Tarbox <ta...@cabotresearch.com> on 2013/01/22 14:59:29 UTC

Is this how to read the output of nodetool cfhistograms?

The output of this command seems to make no sense unless I think of it as 5
completely separate histograms that just happen to be displayed together.

Using this example output should I read it as: my reads all took either 1
or 2 sstable.  And separately, I had write latencies of 3,7,19.  And
separately I had read latencies of 2, 8,69, etc?

In other words...each row isn't really a row...i.e. on those 16033 reads
from a single SSTable I didn't have 0 write latency, 0 read latency, 0 row
size and 0 column count.  Is that right?

Offset      SSTables     Write Latency      Read Latency          Row Size
     Column Count
1              16033             0                            0
               0                 0
2                303               0                            0
                 0                 1
3                  0                 0                            0
                   0                 0
4                  0                 0                            0
                   0                 0
5                  0                 0                            0
                   0                 0
6                  0                 0                            0
                   0                 0
7                  0                 0                            0
                   0                 0
8                  0                 0                            2
                   0                 0
10                 0                 0                            0
                   0              6261
12                 0                 0                            2
                   0               117
14                 0                 0                            8
                   0                 0
17                 0                 3                           69
                   0               255
20                 0                 7                          163
                   0                 0
24                 0                19                         1369
                   0                 0

Re: Is this how to read the output of nodetool cfhistograms?

Posted by Derek Williams <de...@fyrie.net>.
The histogram uses buckets, so it isn't exact (which would be much more
expensive to record). And you are reading it the wrong way, you have 3M
reads taking ~1.9ms (just like you don't have 1 read using 16k sstables.
which would be a bit extreme).


On Wed, Jan 23, 2013 at 9:02 AM, Brian Tarbox <ta...@cabotresearch.com>wrote:

> Wei,
> Thank you for the explanation (Offset is always the x-axis, the other
> columns represent the y-axis (taken 5 independent times)).
>
> Part of this still doesn't make sense.  If I look at just read latencies
> for example...am I to believe that 1916 times I had a latency of exactly
> 3229500 usecs?  Is this just some weird 5-independent variable mushed
> together data bucketing???
>
>         Offset                SSTables         Write Lat     Read Lat
>    1109 0 349 642406  1331 0 147 1335840  1597 0 121 640374  *1916* 0 117
> *3229500*  2299 0 91 683749  2759 0 77 202722
>
>
> On Tue, Jan 22, 2013 at 12:11 PM, Wei Zhu <wz...@yahoo.com> wrote:
>
>> I agree that Cassandra cfhistograms is probably the most bizarre metrics
>> I have ever come across although it's extremely useful.
>>
>> I believe the offset is actually the metrics it has tracked (x-axis on
>> the traditional histogram) and the number under each column is how many
>> times that value has been recorded (y-axis on the traditional histogram).
>> Your write latency are 17, 20, 24 (microseconds?). 3 writes took 17, 7
>> writes took 20 and 19 writes took 24
>>
>> Correct me if I am wrong.
>>
>> Thanks.
>> -Wei
>>
>>   ------------------------------
>> *From:* Brian Tarbox <ta...@cabotresearch.com>
>> *To:* user@cassandra.apache.org
>> *Sent:* Tuesday, January 22, 2013 7:27 AM
>> *Subject:* Re: Is this how to read the output of nodetool cfhistograms?
>>
>> Indeed, but how many Cassandra users have the good fortune to stumble
>> across that page?  Just saying that the explanation of the very powerful
>> nodetool commands should be more front and center.
>>
>>  Brian
>>
>>
>> On Tue, Jan 22, 2013 at 10:03 AM, Edward Capriolo <ed...@gmail.com>wrote:
>>
>> This was described in good detail here:
>>
>> http://thelastpickle.com/2011/04/28/Forces-of-Write-and-Read/
>>
>> On Tue, Jan 22, 2013 at 9:41 AM, Brian Tarbox <ta...@cabotresearch.com>wrote:
>>
>> Thank you!   Since this is a very non-standard way to display data it
>> might be worth a better explanation in the various online documentation
>> sets.
>>
>> Thank you again.
>>
>> Brian
>>
>>
>> On Tue, Jan 22, 2013 at 9:19 AM, Mina Naguib <mi...@adgear.com>wrote:
>>
>>
>>
>> On 2013-01-22, at 8:59 AM, Brian Tarbox <ta...@cabotresearch.com> wrote:
>>
>> > The output of this command seems to make no sense unless I think of it
>> as 5 completely separate histograms that just happen to be displayed
>> together.
>> >
>> > Using this example output should I read it as: my reads all took either
>> 1 or 2 sstable.  And separately, I had write latencies of 3,7,19.  And
>> separately I had read latencies of 2, 8,69, etc?
>> >
>> > In other words...each row isn't really a row...i.e. on those 16033
>> reads from a single SSTable I didn't have 0 write latency, 0 read latency,
>> 0 row size and 0 column count.  Is that right?
>>
>> Correct.  A number in any of the metric columns is a count value bucketed
>> in the offset on that row.  There are no relationships between other
>> columns on the same row.
>>
>> So your first row says "16033 reads were satisfied by 1 sstable".  The
>> other metrics (for example, latency of these reads) is reflected in the
>> histogram under "Read Latency", under various other bucketed offsets.
>>
>> >
>> > Offset      SSTables     Write Latency      Read Latency          Row
>> Size      Column Count
>> > 1              16033             0                            0
>>                    0                 0
>> > 2                303               0                            0
>>                      0                 1
>> > 3                  0                 0                            0
>>                        0                 0
>> > 4                  0                 0                            0
>>                        0                 0
>> > 5                  0                 0                            0
>>                        0                 0
>> > 6                  0                 0                            0
>>                        0                 0
>> > 7                  0                 0                            0
>>                        0                 0
>> > 8                  0                 0                            2
>>                        0                 0
>> > 10                 0                 0                            0
>>                        0              6261
>> > 12                 0                 0                            2
>>                        0               117
>> > 14                 0                 0                            8
>>                        0                 0
>> > 17                 0                 3                           69
>>                        0               255
>> > 20                 0                 7                          163
>>                        0                 0
>> > 24                 0                19                         1369
>>                        0                 0
>> >
>>
>>
>>
>>
>>
>>
>>
>


-- 
Derek Williams

Re: Is this how to read the output of nodetool cfhistograms?

Posted by Brian Tarbox <ta...@cabotresearch.com>.
Wei,
Thank you for the explanation (Offset is always the x-axis, the other
columns represent the y-axis (taken 5 independent times)).

Part of this still doesn't make sense.  If I look at just read latencies
for example...am I to believe that 1916 times I had a latency of exactly
3229500 usecs?  Is this just some weird 5-independent variable mushed
together data bucketing???

        Offset                SSTables         Write Lat     Read Lat
   1109 0 349 642406  1331 0 147 1335840  1597 0 121 640374  *1916* 0 117 *
3229500*  2299 0 91 683749  2759 0 77 202722


On Tue, Jan 22, 2013 at 12:11 PM, Wei Zhu <wz...@yahoo.com> wrote:

> I agree that Cassandra cfhistograms is probably the most bizarre metrics I
> have ever come across although it's extremely useful.
>
> I believe the offset is actually the metrics it has tracked (x-axis on the
> traditional histogram) and the number under each column is how many times
> that value has been recorded (y-axis on the traditional histogram). Your
> write latency are 17, 20, 24 (microseconds?). 3 writes took 17, 7 writes
> took 20 and 19 writes took 24
>
> Correct me if I am wrong.
>
> Thanks.
> -Wei
>
>   ------------------------------
> *From:* Brian Tarbox <ta...@cabotresearch.com>
> *To:* user@cassandra.apache.org
> *Sent:* Tuesday, January 22, 2013 7:27 AM
> *Subject:* Re: Is this how to read the output of nodetool cfhistograms?
>
> Indeed, but how many Cassandra users have the good fortune to stumble
> across that page?  Just saying that the explanation of the very powerful
> nodetool commands should be more front and center.
>
> Brian
>
>
> On Tue, Jan 22, 2013 at 10:03 AM, Edward Capriolo <ed...@gmail.com>wrote:
>
> This was described in good detail here:
>
> http://thelastpickle.com/2011/04/28/Forces-of-Write-and-Read/
>
> On Tue, Jan 22, 2013 at 9:41 AM, Brian Tarbox <ta...@cabotresearch.com>wrote:
>
> Thank you!   Since this is a very non-standard way to display data it
> might be worth a better explanation in the various online documentation
> sets.
>
> Thank you again.
>
> Brian
>
>
> On Tue, Jan 22, 2013 at 9:19 AM, Mina Naguib <mi...@adgear.com>wrote:
>
>
>
> On 2013-01-22, at 8:59 AM, Brian Tarbox <ta...@cabotresearch.com> wrote:
>
> > The output of this command seems to make no sense unless I think of it
> as 5 completely separate histograms that just happen to be displayed
> together.
> >
> > Using this example output should I read it as: my reads all took either
> 1 or 2 sstable.  And separately, I had write latencies of 3,7,19.  And
> separately I had read latencies of 2, 8,69, etc?
> >
> > In other words...each row isn't really a row...i.e. on those 16033 reads
> from a single SSTable I didn't have 0 write latency, 0 read latency, 0 row
> size and 0 column count.  Is that right?
>
> Correct.  A number in any of the metric columns is a count value bucketed
> in the offset on that row.  There are no relationships between other
> columns on the same row.
>
> So your first row says "16033 reads were satisfied by 1 sstable".  The
> other metrics (for example, latency of these reads) is reflected in the
> histogram under "Read Latency", under various other bucketed offsets.
>
> >
> > Offset      SSTables     Write Latency      Read Latency          Row
> Size      Column Count
> > 1              16033             0                            0
>                    0                 0
> > 2                303               0                            0
>                      0                 1
> > 3                  0                 0                            0
>                        0                 0
> > 4                  0                 0                            0
>                        0                 0
> > 5                  0                 0                            0
>                        0                 0
> > 6                  0                 0                            0
>                        0                 0
> > 7                  0                 0                            0
>                        0                 0
> > 8                  0                 0                            2
>                        0                 0
> > 10                 0                 0                            0
>                        0              6261
> > 12                 0                 0                            2
>                        0               117
> > 14                 0                 0                            8
>                        0                 0
> > 17                 0                 3                           69
>                        0               255
> > 20                 0                 7                          163
>                        0                 0
> > 24                 0                19                         1369
>                        0                 0
> >
>
>
>
>
>
>
>

Re: Is this how to read the output of nodetool cfhistograms?

Posted by Wei Zhu <wz...@yahoo.com>.
I agree that Cassandra cfhistograms is probably the most bizarre metrics I have ever come across although it's extremely useful. 

I believe the offset is actually the metrics it has tracked (x-axis on the traditional histogram) and the number under each column is how many times that value has been recorded (y-axis on the traditional histogram). Your write latency are 17, 20, 24 (microseconds?). 3 writes took 17, 7 writes took 20 and 19 writes took 24

Correct me if I am wrong.

Thanks.
-Wei


________________________________
 From: Brian Tarbox <ta...@cabotresearch.com>
To: user@cassandra.apache.org 
Sent: Tuesday, January 22, 2013 7:27 AM
Subject: Re: Is this how to read the output of nodetool cfhistograms?
 

Indeed, but how many Cassandra users have the good fortune to stumble across that page?  Just saying that the explanation of the very powerful nodetool commands should be more front and center.

Brian



On Tue, Jan 22, 2013 at 10:03 AM, Edward Capriolo <ed...@gmail.com> wrote:

This was described in good detail here:
>
>
>
>http://thelastpickle.com/2011/04/28/Forces-of-Write-and-Read/
>
>
>On Tue, Jan 22, 2013 at 9:41 AM, Brian Tarbox <ta...@cabotresearch.com> wrote:
>
>Thank you!   Since this is a very non-standard way to display data it might be worth a better explanation in the various online documentation sets.
>>
>>
>>Thank you again.
>>
>>
>>Brian
>>
>>
>>
>>On Tue, Jan 22, 2013 at 9:19 AM, Mina Naguib <mi...@adgear.com> wrote:
>>
>>
>>>
>>>On 2013-01-22, at 8:59 AM, Brian Tarbox <ta...@cabotresearch.com> wrote:
>>>
>>>> The output of this command seems to make no sense unless I think of it as 5 completely separate histograms that just happen to be displayed together.
>>>>
>>>> Using this example output should I read it as: my reads all took either 1 or 2 sstable.  And separately, I had write latencies of 3,7,19.  And separately I had read latencies of 2, 8,69, etc?
>>>>
>>>> In other words...each row isn't really a row...i.e. on those 16033 reads from a single SSTable I didn't have 0 write latency, 0 read latency, 0 row size and 0 column count.  Is that right?
>>>
>>>Correct.  A number in any of the metric columns is a count value bucketed in the offset on that row.  There are no relationships between other columns on the same row.
>>>
>>>So your first row says "16033 reads were satisfied by 1 sstable".  The other metrics (for example, latency of these reads) is reflected in the histogram under "Read Latency", under various other bucketed offsets.
>>>
>>>
>>>>
>>>> Offset      SSTables     Write Latency      Read Latency          Row Size      Column Count
>>>> 1              16033             0                            0                            0                 0
>>>> 2                303               0                            0                            0                 1
>>>> 3                  0                 0                            0                            0                 0
>>>> 4                  0                 0                            0                            0                 0
>>>> 5                  0                 0                            0                            0                 0
>>>> 6                  0                 0                            0                            0                 0
>>>> 7                  0                 0                            0                            0                 0
>>>> 8                  0                 0                            2                            0                 0
>>>> 10                 0                 0                            0                            0              6261
>>>> 12                 0                 0                            2                            0               117
>>>> 14                 0                 0                            8                            0                 0
>>>> 17                 0                 3                           69                            0               255
>>>> 20                 0                 7                          163                            0                 0
>>>> 24                 0                19                         1369                            0                 0
>>>>
>>>
>>>
>>
>

Re: Is this how to read the output of nodetool cfhistograms?

Posted by Brian Tarbox <ta...@cabotresearch.com>.
Indeed, but how many Cassandra users have the good fortune to stumble
across that page?  Just saying that the explanation of the very powerful
nodetool commands should be more front and center.

Brian


On Tue, Jan 22, 2013 at 10:03 AM, Edward Capriolo <ed...@gmail.com>wrote:

> This was described in good detail here:
>
> http://thelastpickle.com/2011/04/28/Forces-of-Write-and-Read/
>
> On Tue, Jan 22, 2013 at 9:41 AM, Brian Tarbox <ta...@cabotresearch.com>wrote:
>
>> Thank you!   Since this is a very non-standard way to display data it
>> might be worth a better explanation in the various online documentation
>> sets.
>>
>> Thank you again.
>>
>> Brian
>>
>>
>> On Tue, Jan 22, 2013 at 9:19 AM, Mina Naguib <mi...@adgear.com>wrote:
>>
>>>
>>>
>>> On 2013-01-22, at 8:59 AM, Brian Tarbox <ta...@cabotresearch.com>
>>> wrote:
>>>
>>> > The output of this command seems to make no sense unless I think of it
>>> as 5 completely separate histograms that just happen to be displayed
>>> together.
>>> >
>>> > Using this example output should I read it as: my reads all took
>>> either 1 or 2 sstable.  And separately, I had write latencies of 3,7,19.
>>>  And separately I had read latencies of 2, 8,69, etc?
>>> >
>>> > In other words...each row isn't really a row...i.e. on those 16033
>>> reads from a single SSTable I didn't have 0 write latency, 0 read latency,
>>> 0 row size and 0 column count.  Is that right?
>>>
>>> Correct.  A number in any of the metric columns is a count value
>>> bucketed in the offset on that row.  There are no relationships between
>>> other columns on the same row.
>>>
>>> So your first row says "16033 reads were satisfied by 1 sstable".  The
>>> other metrics (for example, latency of these reads) is reflected in the
>>> histogram under "Read Latency", under various other bucketed offsets.
>>>
>>> >
>>> > Offset      SSTables     Write Latency      Read Latency          Row
>>> Size      Column Count
>>> > 1              16033             0                            0
>>>                      0                 0
>>> > 2                303               0                            0
>>>                        0                 1
>>> > 3                  0                 0                            0
>>>                          0                 0
>>> > 4                  0                 0                            0
>>>                          0                 0
>>> > 5                  0                 0                            0
>>>                          0                 0
>>> > 6                  0                 0                            0
>>>                          0                 0
>>> > 7                  0                 0                            0
>>>                          0                 0
>>> > 8                  0                 0                            2
>>>                          0                 0
>>> > 10                 0                 0                            0
>>>                          0              6261
>>> > 12                 0                 0                            2
>>>                          0               117
>>> > 14                 0                 0                            8
>>>                          0                 0
>>> > 17                 0                 3                           69
>>>                          0               255
>>> > 20                 0                 7                          163
>>>                          0                 0
>>> > 24                 0                19                         1369
>>>                          0                 0
>>> >
>>>
>>>
>>
>

Re: Is this how to read the output of nodetool cfhistograms?

Posted by Edward Capriolo <ed...@gmail.com>.
This was described in good detail here:

http://thelastpickle.com/2011/04/28/Forces-of-Write-and-Read/

On Tue, Jan 22, 2013 at 9:41 AM, Brian Tarbox <ta...@cabotresearch.com>wrote:

> Thank you!   Since this is a very non-standard way to display data it
> might be worth a better explanation in the various online documentation
> sets.
>
> Thank you again.
>
> Brian
>
>
> On Tue, Jan 22, 2013 at 9:19 AM, Mina Naguib <mi...@adgear.com>wrote:
>
>>
>>
>> On 2013-01-22, at 8:59 AM, Brian Tarbox <ta...@cabotresearch.com> wrote:
>>
>> > The output of this command seems to make no sense unless I think of it
>> as 5 completely separate histograms that just happen to be displayed
>> together.
>> >
>> > Using this example output should I read it as: my reads all took either
>> 1 or 2 sstable.  And separately, I had write latencies of 3,7,19.  And
>> separately I had read latencies of 2, 8,69, etc?
>> >
>> > In other words...each row isn't really a row...i.e. on those 16033
>> reads from a single SSTable I didn't have 0 write latency, 0 read latency,
>> 0 row size and 0 column count.  Is that right?
>>
>> Correct.  A number in any of the metric columns is a count value bucketed
>> in the offset on that row.  There are no relationships between other
>> columns on the same row.
>>
>> So your first row says "16033 reads were satisfied by 1 sstable".  The
>> other metrics (for example, latency of these reads) is reflected in the
>> histogram under "Read Latency", under various other bucketed offsets.
>>
>> >
>> > Offset      SSTables     Write Latency      Read Latency          Row
>> Size      Column Count
>> > 1              16033             0                            0
>>                    0                 0
>> > 2                303               0                            0
>>                      0                 1
>> > 3                  0                 0                            0
>>                        0                 0
>> > 4                  0                 0                            0
>>                        0                 0
>> > 5                  0                 0                            0
>>                        0                 0
>> > 6                  0                 0                            0
>>                        0                 0
>> > 7                  0                 0                            0
>>                        0                 0
>> > 8                  0                 0                            2
>>                        0                 0
>> > 10                 0                 0                            0
>>                        0              6261
>> > 12                 0                 0                            2
>>                        0               117
>> > 14                 0                 0                            8
>>                        0                 0
>> > 17                 0                 3                           69
>>                        0               255
>> > 20                 0                 7                          163
>>                        0                 0
>> > 24                 0                19                         1369
>>                        0                 0
>> >
>>
>>
>

Re: Is this how to read the output of nodetool cfhistograms?

Posted by Brian Tarbox <ta...@cabotresearch.com>.
Thank you!   Since this is a very non-standard way to display data it might
be worth a better explanation in the various online documentation sets.

Thank you again.

Brian


On Tue, Jan 22, 2013 at 9:19 AM, Mina Naguib <mi...@adgear.com> wrote:

>
>
> On 2013-01-22, at 8:59 AM, Brian Tarbox <ta...@cabotresearch.com> wrote:
>
> > The output of this command seems to make no sense unless I think of it
> as 5 completely separate histograms that just happen to be displayed
> together.
> >
> > Using this example output should I read it as: my reads all took either
> 1 or 2 sstable.  And separately, I had write latencies of 3,7,19.  And
> separately I had read latencies of 2, 8,69, etc?
> >
> > In other words...each row isn't really a row...i.e. on those 16033 reads
> from a single SSTable I didn't have 0 write latency, 0 read latency, 0 row
> size and 0 column count.  Is that right?
>
> Correct.  A number in any of the metric columns is a count value bucketed
> in the offset on that row.  There are no relationships between other
> columns on the same row.
>
> So your first row says "16033 reads were satisfied by 1 sstable".  The
> other metrics (for example, latency of these reads) is reflected in the
> histogram under "Read Latency", under various other bucketed offsets.
>
> >
> > Offset      SSTables     Write Latency      Read Latency          Row
> Size      Column Count
> > 1              16033             0                            0
>                    0                 0
> > 2                303               0                            0
>                      0                 1
> > 3                  0                 0                            0
>                        0                 0
> > 4                  0                 0                            0
>                        0                 0
> > 5                  0                 0                            0
>                        0                 0
> > 6                  0                 0                            0
>                        0                 0
> > 7                  0                 0                            0
>                        0                 0
> > 8                  0                 0                            2
>                        0                 0
> > 10                 0                 0                            0
>                        0              6261
> > 12                 0                 0                            2
>                        0               117
> > 14                 0                 0                            8
>                        0                 0
> > 17                 0                 3                           69
>                        0               255
> > 20                 0                 7                          163
>                        0                 0
> > 24                 0                19                         1369
>                        0                 0
> >
>
>

Re: Is this how to read the output of nodetool cfhistograms?

Posted by Mina Naguib <mi...@adgear.com>.

On 2013-01-22, at 8:59 AM, Brian Tarbox <ta...@cabotresearch.com> wrote:

> The output of this command seems to make no sense unless I think of it as 5 completely separate histograms that just happen to be displayed together.
> 
> Using this example output should I read it as: my reads all took either 1 or 2 sstable.  And separately, I had write latencies of 3,7,19.  And separately I had read latencies of 2, 8,69, etc?
> 
> In other words...each row isn't really a row...i.e. on those 16033 reads from a single SSTable I didn't have 0 write latency, 0 read latency, 0 row size and 0 column count.  Is that right?

Correct.  A number in any of the metric columns is a count value bucketed in the offset on that row.  There are no relationships between other columns on the same row.

So your first row says "16033 reads were satisfied by 1 sstable".  The other metrics (for example, latency of these reads) is reflected in the histogram under "Read Latency", under various other bucketed offsets.

> 
> Offset      SSTables     Write Latency      Read Latency          Row Size      Column Count
> 1              16033             0                            0                            0                 0
> 2                303               0                            0                            0                 1
> 3                  0                 0                            0                            0                 0
> 4                  0                 0                            0                            0                 0
> 5                  0                 0                            0                            0                 0
> 6                  0                 0                            0                            0                 0
> 7                  0                 0                            0                            0                 0
> 8                  0                 0                            2                            0                 0
> 10                 0                 0                            0                            0              6261
> 12                 0                 0                            2                            0               117
> 14                 0                 0                            8                            0                 0
> 17                 0                 3                           69                            0               255
> 20                 0                 7                          163                            0                 0
> 24                 0                19                         1369                            0                 0
>