You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Keith Wright <kw...@nanigans.com> on 2014/07/24 20:48:27 UTC
Hot, large row
Hi all,
We are seeing an issue where basically daily one of our nodes spikes in load and is churning in CMS heap pressure. It appears that reads are backing up and my guess is that our application is reading a large row repeatedly. Our write structure can lead itself to wide rows very infrequently (<0.001%) and we do our best to detect and delete them but obviously we’re missing a case. Hoping for assistance on the following questions:
* How can I detect wide rows?
* Anyone know what debug level I can set so that I can see what reads the hot node is handling? I’m hoping to see the “bad” row
* Any way to get the server to blacklist these wide rows automatically?
We’re using C* 2.0.6 with Vnodes.
Thanks
Re: Hot, large row
Posted by DuyHai Doan <do...@gmail.com>.
"If I run nodetool tpstats, I see a high number of items in the Pending
phase for ReadStage. Other items mostly appear near empty. In addition, I
see dropped reads" --> Have a look at system I/O & CPU stats to check for
possible bottlenecks.
This symptom is not necessarily caused by widerows. Sometimes your disk I/O
just can't keep up with the intense read burst.
On Thu, Jul 24, 2014 at 10:00 PM, Keith Wright <kw...@nanigans.com> wrote:
> I appreciate the feedback, doing it on the client side is interesting and
> I will start looking into that. To be clear, here are the symptoms I am
> seeing:
>
>
> - A node will start showing high load and the CMS collection time
> jumps to 100+ ms/sec (per new is also up)
> - If I run nodetool tpstats, I see a high number of items in the
> Pending phase for ReadStage. Other items mostly appear near empty. In
> addition, I see dropped reads.
> - If I shutdown the effected node, two other nodes show increased
> load. I am at RF=3 so this would again suggest slow reads against a single
> row
>
> Does it sound correct that my best course of action is to investigate a
> large row? If it were a small row being called repeatedly, I assume OS/key
> cache would make that a VERY fast operation.
>
> Thanks
>
> From: DuyHai Doan <do...@gmail.com>
> Reply-To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
> Date: Thursday, July 24, 2014 at 3:53 PM
>
> To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
> Subject: Re: Hot, large row
>
> Your extract of cfhistograms show that there are no particular "wide
> rows". The widest has 61214 cells which is big but not that huge to be
> really a concern.
>
> Turning on trace probabilty only tells give you some "hints" about what
> kind of queries are done, it does not give the exact partition key nor
> other statement values, especially when you are using prepared statements
> ...
>
>
> "I am considering reducing read_request_timeout_in_ms: 5000 in
> cassandra.yaml so that it reduces the impact when this occurs." --> Don't
> do that, you'll only sweep dust under the carpet. Find the real issue and
> fix it instead of changing parameter to hide it.
>
> One solution would be on client side, to activate some logging to show the
> CQL3 statements the application is issuing that may overload the server. I
> know that's better said than done but I don't have any other idea for the
> moment
>
> -------- Shameless self-promotion ------
>
> To support this kind of live prod debugging & investigation that I add a
> new dynamic query logging feature in Achilles:
> https://github.com/doanduyhai/Achilles/wiki/Statements-Logging-and-Tracing#dynamic-statements-logging
>
> Once you hit the issue, this kind of feature may save your day...
>
>
>
>
>
>
> On Thu, Jul 24, 2014 at 9:22 PM, Keith Wright <kw...@nanigans.com>
> wrote:
>
>> I can see from cfhistograms that I do have some wide rows (see below). I
>> set trace probability as you suggested but the output doesn’t appear to
>> tell me what row was actually read unless I missed something. I just see
>> executing prepared statement. Any ideas how I can find the row in
>> question?
>>
>> I am considering reducing read_request_timeout_in_ms: 5000 in
>> cassandra.yaml so that it reduces the impact when this occurs.
>>
>> Any help in identifying my issue would be GREATLY appreciated
>>
>> Cell Count per Partition
>>
>> 1 cells: 50449950
>>
>> 2 cells: 14281828
>>
>> 3 cells: 8093366
>>
>> 4 cells: 5029200
>>
>> 5 cells: 3103023
>>
>> 6 cells: 3059903
>>
>> 7 cells: 1903018
>>
>> 8 cells: 1509297
>>
>> 10 cells: 2420359
>>
>> 12 cells: 1624895
>>
>> 14 cells: 1171678
>>
>> 17 cells: 1289391
>>
>> 20 cells: 909777
>>
>> 24 cells: 852081
>>
>> 29 cells: 722925
>>
>> 35 cells: 587067
>>
>> 42 cells: 459473
>>
>> 50 cells: 358744
>>
>> 60 cells: 304146
>>
>> 72 cells: 244682
>>
>> 86 cells: 191045
>>
>> 103 cells: 155337
>>
>> 124 cells: 127061
>>
>> 149 cells: 98913
>>
>> 179 cells: 77454
>>
>> 215 cells: 59849
>>
>> 258 cells: 46117
>>
>> 310 cells: 35321
>>
>> 372 cells: 26319
>>
>> 446 cells: 19379
>>
>> 535 cells: 13783
>>
>> 642 cells: 9993
>>
>> 770 cells: 6973
>>
>> 924 cells: 4713
>>
>> 1109 cells: 3229
>>
>> 1331 cells: 2062
>>
>> 1597 cells: 1338
>>
>> 1916 cells: 773
>>
>> 2299 cells: 495
>>
>> 2759 cells: 268
>>
>> 3311 cells: 150
>>
>> 3973 cells: 100
>>
>> 4768 cells: 42
>>
>> 5722 cells: 24
>>
>> 6866 cells: 12
>>
>> 8239 cells: 9
>>
>> 9887 cells: 3
>>
>> 11864 cells: 0
>>
>> 14237 cells: 5
>>
>> 17084 cells: 1
>>
>> 20501 cells: 0
>>
>> 24601 cells: 2
>>
>> 29521 cells: 0
>>
>> 35425 cells: 0
>>
>> 42510 cells: 0
>>
>> 51012 cells: 0
>>
>> 61214 cells: 2
>>
>> From: DuyHai Doan <do...@gmail.com>
>> Reply-To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
>> Date: Thursday, July 24, 2014 at 3:01 PM
>> To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
>> Subject: Re: Hot, large row
>>
>> "How can I detect wide rows?" -->
>>
>> nodetool cfhistograms <keyspace> <suspected column family>
>>
>> Look at column "Column count" (last column) and identify a line in this
>> column with very high value of "Offset". In a well designed application you
>> should have a gaussian distribution where 80% of your row have a similar
>> number of columns.
>>
>> "Anyone know what debug level I can set so that I can see what reads the
>> hot node is handling? " -->
>>
>> "nodetool settraceprobability <value>", where value is a small number
>> (0.001) on the node where you encounter the issue. Activate the tracing for
>> a while (5 mins) and deactivate it (value = 0). Then look into
>> system_traces tables "events" & "sessions". It may help or not since the
>> tracing is done once every 1000.
>>
>> "Any way to get the server to blacklist these wide rows automatically?"
>> --> No
>>
>>
>> On Thu, Jul 24, 2014 at 8:48 PM, Keith Wright <kw...@nanigans.com>
>> wrote:
>>
>>> Hi all,
>>>
>>> We are seeing an issue where basically daily one of our nodes spikes
>>> in load and is churning in CMS heap pressure. It appears that reads are
>>> backing up and my guess is that our application is reading a large row
>>> repeatedly. Our write structure can lead itself to wide rows very
>>> infrequently (<0.001%) and we do our best to detect and delete them but
>>> obviously we’re missing a case. Hoping for assistance on the following
>>> questions:
>>>
>>> - How can I detect wide rows?
>>> - Anyone know what debug level I can set so that I can see what
>>> reads the hot node is handling? I’m hoping to see the “bad” row
>>> - Any way to get the server to blacklist these wide rows
>>> automatically?
>>>
>>> We’re using C* 2.0.6 with Vnodes.
>>>
>>> Thanks
>>>
>>
>>
>
Re: Hot, large row
Posted by Keith Wright <kw...@nanigans.com>.
I don’t know but my guess is it would be without tombstones. I did more research this weekend (note that my Sunday was largely interrupted by again seeing a node go to high load/high CMS for ~3 hours) and came across this presentation: http://www.slideshare.net/mobile/planetcassandra/8-axel-liljencrantz-23204252
I definitely suggestion you give this a look, very informative. The important take away is that they ran into the same issue as I due to using the same model where I am updating to the same row over time with a TTL causing that row to fragment across SSTables and once across 4+ tables, compaction can never actually remove tombstones. As I see it, I have the following options and was hoping to get some advice:
1. Modify my write structure to include time within the key. Currently we want to get all of a row but I can likely add month to the time and it would be ok for the application to do two reads to get the most recent data (to deal with month boundaries). This will contain the fragmentation to one month.
2. Following off of item #1, it appears that according to CASSANDRA-5514 that if I include time within my query it will not bother going through older SSTables and thus reduce the impact of the row fragmentation. Problem here is that likely my data space will still continue to grow over time as tombstones will never be removed.
3. Move from LCS to STCS and run full compactions periodically to cleanup tombstones
I appreciate the help!
From: Jack Krupansky <ja...@basetechnology.com>>
Reply-To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Date: Friday, July 25, 2014 at 11:15 AM
To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Subject: Re: Hot, large row
Is it the accumulated tombstones on a row that make it act as if “wide”? Does cfhistograms count the tombstones or subtract them when reporting on cell-count for rows? (I don’t know.)
-- Jack Krupansky
From: Keith Wright<ma...@nanigans.com>
Sent: Friday, July 25, 2014 10:24 AM
To: user@cassandra.apache.org<ma...@cassandra.apache.org>
Cc: Don Jackson<ma...@nanigans.com>
Subject: Re: Hot, large row
Ha, check out who filed that ticket! Yes I’m aware of it. My hope is that it was mostly addressed in CASSANDRA-6563 so I may upgrade from 2.0.6 to 2.0.9. I’m really just surprised that others are not doing similar actions as I and thus experiencing similar issues.
To answer DuyHai’s questions:
How many nodes do you have ? And how many distinct user_id roughtly is there ?
- 14 nodes with approximately 250 million distinct user_ids
For GC activity, in general we see low GC pressure in both Par New and CMS (we see the occasional CMS spike but its usually under 100 ms). When we see a node locked up in CMS GC, its not that anyone GC takes a long time, its just that the consistent nature of them causes the read latency to spike from the usual 3-5 ms up to 35 ms which causes issues for our application.
Also Jack Krupansky question is interesting. Even though you limit a request to 5000, if each cell is a big blob or block of text, it mays add up a lot into JVM heap …
- The columns values are actually timestamps and thus not variable in length and we cap the length of other columns used in the primary key so I find if VERY unlikely that this is a cause.
I will look into the paging option with that native client but from the docs it appears that its enabled by default, right?
I greatly appreciate all the help!
From: Ken Hancock <ke...@schange.com>>
Reply-To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Date: Friday, July 25, 2014 at 10:06 AM
To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Cc: Don Jackson <dj...@nanigans.com>>
Subject: Re: Hot, large row
https://issues.apache.org/jira/browse/CASSANDRA-6654
Re: Hot, large row
Posted by Jack Krupansky <ja...@basetechnology.com>.
Is it the accumulated tombstones on a row that make it act as if “wide”? Does cfhistograms count the tombstones or subtract them when reporting on cell-count for rows? (I don’t know.)
-- Jack Krupansky
From: Keith Wright
Sent: Friday, July 25, 2014 10:24 AM
To: user@cassandra.apache.org
Cc: Don Jackson
Subject: Re: Hot, large row
Ha, check out who filed that ticket! Yes I’m aware of it. My hope is that it was mostly addressed in CASSANDRA-6563 so I may upgrade from 2.0.6 to 2.0.9. I’m really just surprised that others are not doing similar actions as I and thus experiencing similar issues.
To answer DuyHai’s questions:
How many nodes do you have ? And how many distinct user_id roughtly is there ?
- 14 nodes with approximately 250 million distinct user_ids
For GC activity, in general we see low GC pressure in both Par New and CMS (we see the occasional CMS spike but its usually under 100 ms). When we see a node locked up in CMS GC, its not that anyone GC takes a long time, its just that the consistent nature of them causes the read latency to spike from the usual 3-5 ms up to 35 ms which causes issues for our application.
Also Jack Krupansky question is interesting. Even though you limit a request to 5000, if each cell is a big blob or block of text, it mays add up a lot into JVM heap …
- The columns values are actually timestamps and thus not variable in length and we cap the length of other columns used in the primary key so I find if VERY unlikely that this is a cause.
I will look into the paging option with that native client but from the docs it appears that its enabled by default, right?
I greatly appreciate all the help!
From: Ken Hancock <ke...@schange.com>
Reply-To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
Date: Friday, July 25, 2014 at 10:06 AM
To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
Cc: Don Jackson <dj...@nanigans.com>
Subject: Re: Hot, large row
https://issues.apache.org/jira/browse/CASSANDRA-6654
Re: Hot, large row
Posted by Keith Wright <kw...@nanigans.com>.
Ha, check out who filed that ticket! Yes I’m aware of it. My hope is that it was mostly addressed in CASSANDRA-6563 so I may upgrade from 2.0.6 to 2.0.9. I’m really just surprised that others are not doing similar actions as I and thus experiencing similar issues.
To answer DuyHai’s questions:
How many nodes do you have ? And how many distinct user_id roughtly is there ?
- 14 nodes with approximately 250 million distinct user_ids
For GC activity, in general we see low GC pressure in both Par New and CMS (we see the occasional CMS spike but its usually under 100 ms). When we see a node locked up in CMS GC, its not that anyone GC takes a long time, its just that the consistent nature of them causes the read latency to spike from the usual 3-5 ms up to 35 ms which causes issues for our application.
Also Jack Krupansky question is interesting. Even though you limit a request to 5000, if each cell is a big blob or block of text, it mays add up a lot into JVM heap …
- The columns values are actually timestamps and thus not variable in length and we cap the length of other columns used in the primary key so I find if VERY unlikely that this is a cause.
I will look into the paging option with that native client but from the docs it appears that its enabled by default, right?
I greatly appreciate all the help!
From: Ken Hancock <ke...@schange.com>>
Reply-To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Date: Friday, July 25, 2014 at 10:06 AM
To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Cc: Don Jackson <dj...@nanigans.com>>
Subject: Re: Hot, large row
https://issues.apache.org/jira/browse/CASSANDRA-6654
Re: Hot, large row
Posted by Ken Hancock <ke...@schange.com>.
Keith,
If I'm understanding your schema it sounds like you have rows/partitions
with TTLs that continuously grow over time. This sounds a lot like
https://issues.apache.org/jira/browse/CASSANDRA-6654 which was marked as
working as designed, but totally unintuitive. (Johnathan Ellis went off
and spent the time to reproduce which indicates to me it was equally
unintuitive to someone who is pretty knowlegeable on Cassandra.).
Ken
On Thu, Jul 24, 2014 at 10:10 PM, Keith Wright <kw...@nanigans.com> wrote:
> One last item to add to this thread: we have consistently experienced
> this behavior where over time performance degrades (previously we were
> unable to bootstrap nodes to due long GC pauses from existing nodes). I
> believe its due to tombstone build up (as I mentioned previously one of the
> tables mentioned is showing a droppable tombstone ratio of > 30%). The
> sku table is used to hold SKUs that recent users viewed/purchased. When we
> write a new SKU we set the TTL to 30 days where the row key is the user id;
> our read case is to fetch ALL skus the user has seen within the TTL. Since
> the user sees SKUs consistently over time, this can result in a row with
> many columns much of which are likely tombstoned (see CASSANDRA-6654 which
> I filed for this which shows that C* does not handle this case well).
>
> I guess I’m just surprised that others aren’t using C* for similar usage
> cases and thus having the same issue?
>
> I am hoping to upgrade to 2.0.9 which has improvements to remove
> tombstones.
>
> From: Keith Wright <kw...@nanigans.com>
> Reply-To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
> Date: Thursday, July 24, 2014 at 4:50 PM
> To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
> Cc: Don Jackson <dj...@nanigans.com>
>
> Subject: Re: Hot, large row
>
> When a node is showing the high CMS issue, io is actually low likely due
> to the fact that none is going on during CMS GC. On a node not showing the
> issue, iostat shows disk usage around 50% (these are SSD) and load hovers
> around 10 for a dual octo core machine this is fine.
>
> In addition, nodetool compactionstats does not show that we are falling
> behind in compactions.
>
> So not sure what is going on here. We are running CentOS 6.5 with java
> 1.7.0_51. It does seem like things are getting worse and I’m considering
> dropping and rebuilding all the tables (as I have the data in Hadoop).
> This seems to be a repeated problem for us with Cassandra and now that
> Aerospike has an open source version, we are very much considering
> switching.
>
> Thanks again for the help and any insight you might have!
>
> avg-cpu: %user %nice %system %iowait %steal %idle
>
> 23.43 12.40 11.40 6.20 0.00 46.57
>
>
> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz
> avgqu-sz await svctm %util
>
> sda 0.00 0.00 720.50 0.00 5.51 0.00 15.66
> 1.05 1.54 0.57 41.35
>
> sdc 0.00 0.00 5930.50 2100.50 55.42 8.21 16.23
> 2.30 0.29 0.06 51.25
>
> sdb 0.00 0.00 6951.50 2052.50 65.82 8.02 16.80
> 4.31 0.48 0.07 59.60
>
>
> avg-cpu: %user %nice %system %iowait %steal %idle
>
> 9.48 14.72 5.60 3.67 0.00 66.52
>
>
> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz
> avgqu-sz await svctm %util
>
> sda 0.00 0.00 327.00 19.50 2.55 0.08 15.53
> 0.04 0.13 0.12 4.25
>
> sdc 0.00 0.00 3974.50 1403.50 36.37 5.48 15.94
> 0.99 0.18 0.08 45.10
>
> sdb 0.00 0.00 4357.50 1535.00 40.83 6.00 16.28
> 1.10 0.19 0.08 47.45
>
> From: DuyHai Doan <do...@gmail.com>
> Reply-To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
> Date: Thursday, July 24, 2014 at 4:44 PM
> To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
> Subject: Re: Hot, large row
>
> For global_user_event_skus_v2
>
> 1. number of SSTables per read is quite huge. Considering you're using
> LCS, it means that LCS cannot keep up with write rate and is left behind.
> AFAIK LCS is using SizeTieredCompaction at L0 to cope with extreme write
> burst. Your high number of SSTables per read is quite fishy here.
>
> 2. Write latency is widespead up to 73.457 millisecs, meaning that your
> node is getting behind for write in some cases. Most of write are still
> below 1 millisec but we don't care. What we care here is the large trail of
> write latency climbing up to 73 millisecs
>
> 3. Same remarks for read latency which is worst because the distribution
> is even "flatter", worst cases going up to 100 ms.
>
> If I were you, I'll check for disk I/O first and maybe CPU usage
>
>
> On Thu, Jul 24, 2014 at 10:32 PM, Keith Wright <kw...@nanigans.com>
> wrote:
>
>> Cfhistograms for the tables I believe are most likely the issue are below
>> on the node that most recently presented the issue. Any ideas? Note that
>> these tables are LCS and have droppable tombstone ratios of 27% for
>> global_user_event_skus_v2 and 2.7% for the other. Table definitions also
>> below, you can see that they do use compound keys.
>>
>> Thanks
>>
>> CREATE TABLE global_user_event_skus_v2 (
>>
>> user_id timeuuid,
>>
>> app_id int,
>>
>> type text,
>>
>> name text,
>>
>> sku text,
>>
>> creationtime timestamp,
>>
>> PRIMARY KEY (user_id, app_id, type, name, sku)
>>
>> ) WITH
>>
>> bloom_filter_fp_chance=0.100000 AND
>>
>> caching='KEYS_ONLY' AND
>>
>> comment='' AND
>>
>> dclocal_read_repair_chance=0.000000 AND
>>
>> gc_grace_seconds=86400 AND
>>
>> index_interval=128 AND
>>
>> read_repair_chance=0.100000 AND
>>
>> replicate_on_write='true' AND
>>
>> populate_io_cache_on_flush='false' AND
>>
>> default_time_to_live=2592000 AND
>>
>> speculative_retry='99.0PERCENTILE' AND
>>
>> memtable_flush_period_in_ms=0 AND
>>
>> compaction={'class': 'LeveledCompactionStrategy'} AND
>>
>> compression={'sstable_compression': 'LZ4Compressor'};
>>
>>
>> users/global_user_event_skus_v2 histograms
>>
>>
>> SSTables per Read
>>
>> 1 sstables: 1398122
>>
>> 2 sstables: 231100
>>
>> 3 sstables: 97334
>>
>> 4 sstables: 56496
>>
>> 5 sstables: 43046
>>
>> 6 sstables: 35471
>>
>> 7 sstables: 26888
>>
>> 8 sstables: 21102
>>
>> 10 sstables: 29522
>>
>> 12 sstables: 18314
>>
>> 14 sstables: 11368
>>
>> 17 sstables: 10283
>>
>> 20 sstables: 5766
>>
>> 24 sstables: 3554
>>
>> 29 sstables: 1681
>>
>> 35 sstables: 521
>>
>> 42 sstables: 102
>>
>> 50 sstables: 1
>>
>>
>> Write Latency (microseconds)
>>
>> 2 us: 532
>>
>> 3 us: 10728
>>
>> 4 us: 49608
>>
>> 5 us: 173718
>>
>> 6 us: 404131
>>
>> 7 us: 697468
>>
>> 8 us: 981048
>>
>> 10 us: 2643680
>>
>> 12 us: 3183727
>>
>> 14 us: 3150881
>>
>> 17 us: 3746333
>>
>> 20 us: 2298481
>>
>> 24 us: 1647604
>>
>> 29 us: 1070733
>>
>> 35 us: 693300
>>
>> 42 us: 442120
>>
>> 50 us: 288618
>>
>> 60 us: 205061
>>
>> 72 us: 141194
>>
>> 86 us: 98781
>>
>> 103 us: 73629
>>
>> 124 us: 54900
>>
>> 149 us: 40518
>>
>> 179 us: 30330
>>
>> 215 us: 23953
>>
>> 258 us: 18875
>>
>> 310 us: 14147
>>
>> 372 us: 10509
>>
>> 446 us: 7984
>>
>> 535 us: 6257
>>
>> 642 us: 4903
>>
>> 770 us: 4086
>>
>> 924 us: 3252
>>
>> 1109 us: 2676
>>
>> 1331 us: 2129
>>
>> 1597 us: 1753
>>
>> 1916 us: 1488
>>
>> 2299 us: 1237
>>
>> 2759 us: 1078
>>
>> 3311 us: 866
>>
>> 3973 us: 704
>>
>> 4768 us: 551
>>
>> 5722 us: 553
>>
>> 6866 us: 774
>>
>> 8239 us: 443
>>
>> 9887 us: 303
>>
>> 11864 us: 222
>>
>> 14237 us: 181
>>
>> 17084 us: 146
>>
>> 20501 us: 147
>>
>> 24601 us: 499
>>
>> 29521 us: 842
>>
>> 35425 us: 548
>>
>> 42510 us: 303
>>
>> 51012 us: 560
>>
>> 61214 us: 842
>>
>> 73457 us: 600
>>
>> 88148 us: 283
>>
>> 105778 us: 173
>>
>> 126934 us: 81
>>
>> 152321 us: 18
>>
>> 182785 us: 2
>>
>>
>> Read Latency (microseconds)
>>
>> 35 us: 2
>>
>> 42 us: 72
>>
>> 50 us: 737
>>
>> 60 us: 3555
>>
>> 72 us: 6926
>>
>> 86 us: 8373
>>
>> 103 us: 15287
>>
>> 124 us: 30054
>>
>> 149 us: 46164
>>
>> 179 us: 67828
>>
>> 215 us: 86741
>>
>> 258 us: 92178
>>
>> 310 us: 82205
>>
>> 372 us: 69641
>>
>> 446 us: 64710
>>
>> 535 us: 58295
>>
>> 642 us: 55247
>>
>> 770 us: 57918
>>
>> 924 us: 56928
>>
>> 1109 us: 75762
>>
>> 1331 us: 122306
>>
>> 1597 us: 141564
>>
>> 1916 us: 117301
>>
>> 2299 us: 102293
>>
>> 2759 us: 98789
>>
>> 3311 us: 86923
>>
>> 3973 us: 75304
>>
>> 4768 us: 64012
>>
>> 5722 us: 54973
>>
>> 6866 us: 47221
>>
>> 8239 us: 41483
>>
>> 9887 us: 35015
>>
>> 11864 us: 30249
>>
>> 14237 us: 23505
>>
>> 17084 us: 18364
>>
>> 20501 us: 12992
>>
>> 24601 us: 8837
>>
>> 29521 us: 6497
>>
>> 35425 us: 4840
>>
>> 42510 us: 3536
>>
>> 51012 us: 2471
>>
>> 61214 us: 2784
>>
>> 73457 us: 3664
>>
>> 88148 us: 3219
>>
>> 105778 us: 1980
>>
>> 126934 us: 1050
>>
>> 152321 us: 519
>>
>> 182785 us: 184
>>
>> 219342 us: 82
>>
>> 263210 us: 39
>>
>> 315852 us: 22
>>
>> 379022 us: 16
>>
>> 454826 us: 0
>>
>> 545791 us: 0
>>
>> 654949 us: 1
>>
>> 785939 us: 1
>>
>> 943127 us: 0
>>
>> 1131752 us: 0
>>
>> 1358102 us: 0
>>
>> 1629722 us: 3
>>
>> 1955666 us: 3
>>
>> 2346799 us: 1
>>
>> 2816159 us: 0
>>
>> 3379391 us: 0
>>
>> 4055269 us: 0
>>
>> 4866323 us: 0
>>
>> 5839588 us: 0
>>
>> 7007506 us: 4
>>
>>
>> Partition Size (bytes)
>>
>> 35 bytes: 31
>>
>> 42 bytes: 0
>>
>> 50 bytes: 0
>>
>> 60 bytes: 0
>>
>> 72 bytes: 0
>>
>> 86 bytes: 0
>>
>> 103 bytes: 761108
>>
>> 124 bytes: 23596164
>>
>> 149 bytes: 22963275
>>
>> 179 bytes: 3517971
>>
>> 215 bytes: 6395335
>>
>> 258 bytes: 6197982
>>
>> 310 bytes: 4662589
>>
>> 372 bytes: 5261934
>>
>> 446 bytes: 3032032
>>
>> 535 bytes: 3775310
>>
>> 642 bytes: 2879935
>>
>> 770 bytes: 3002743
>>
>> 924 bytes: 2279900
>>
>> 1109 bytes: 1976357
>>
>> 1331 bytes: 1643875
>>
>> 1597 bytes: 1422199
>>
>> 1916 bytes: 1142746
>>
>> 2299 bytes: 901584
>>
>> 2759 bytes: 760173
>>
>> 3311 bytes: 596874
>>
>> 3973 bytes: 480502
>>
>> 4768 bytes: 380185
>>
>> 5722 bytes: 303315
>>
>> 6866 bytes: 239013
>>
>> 8239 bytes: 187337
>>
>> 9887 bytes: 147218
>>
>> 11864 bytes: 115684
>>
>> 14237 bytes: 89718
>>
>> 17084 bytes: 69160
>>
>> 20501 bytes: 53070
>>
>> 24601 bytes: 40496
>>
>> 29521 bytes: 30344
>>
>> 35425 bytes: 22656
>>
>> 42510 bytes: 16419
>>
>> 51012 bytes: 11621
>>
>> 61214 bytes: 8020
>>
>> 73457 bytes: 5561
>>
>> 88148 bytes: 3708
>>
>> 105778 bytes: 2434
>>
>> 126934 bytes: 1561
>>
>> 152321 bytes: 1017
>>
>> 182785 bytes: 559
>>
>> 219342 bytes: 349
>>
>> 263210 bytes: 181
>>
>> 315852 bytes: 121
>>
>> 379022 bytes: 66
>>
>> 454826 bytes: 22
>>
>> 545791 bytes: 14
>>
>> 654949 bytes: 10
>>
>> 785939 bytes: 2
>>
>> 943127 bytes: 0
>>
>> 1131752 bytes: 4
>>
>> 1358102 bytes: 2
>>
>> 1629722 bytes: 1
>>
>> 1955666 bytes: 2
>>
>> 2346799 bytes: 0
>>
>> 2816159 bytes: 0
>>
>> 3379391 bytes: 0
>>
>> 4055269 bytes: 0
>>
>> 4866323 bytes: 1
>>
>>
>> Cell Count per Partition
>>
>> 1 cells: 50599458
>>
>> 2 cells: 13975363
>>
>> 3 cells: 7927898
>>
>> 4 cells: 4981142
>>
>> 5 cells: 3081166
>>
>> 6 cells: 3034895
>>
>> 7 cells: 1896720
>>
>> 8 cells: 1508849
>>
>> 10 cells: 2418845
>>
>> 12 cells: 1631289
>>
>> 14 cells: 1179269
>>
>> 17 cells: 1301647
>>
>> 20 cells: 919744
>>
>> 24 cells: 864313
>>
>> 29 cells: 735203
>>
>> 35 cells: 597749
>>
>> 42 cells: 469611
>>
>> 50 cells: 367422
>>
>> 60 cells: 312697
>>
>> 72 cells: 251782
>>
>> 86 cells: 197717
>>
>> 103 cells: 161244
>>
>> 124 cells: 132074
>>
>> 149 cells: 103711
>>
>> 179 cells: 81760
>>
>> 215 cells: 63360
>>
>> 258 cells: 48982
>>
>> 310 cells: 37932
>>
>> 372 cells: 28328
>>
>> 446 cells: 21347
>>
>> 535 cells: 15373
>>
>> 642 cells: 10959
>>
>> 770 cells: 7643
>>
>> 924 cells: 5346
>>
>> 1109 cells: 3613
>>
>> 1331 cells: 2291
>>
>> 1597 cells: 1558
>>
>> 1916 cells: 885
>>
>> 2299 cells: 577
>>
>> 2759 cells: 327
>>
>> 3311 cells: 169
>>
>> 3973 cells: 116
>>
>> 4768 cells: 57
>>
>> 5722 cells: 23
>>
>> 6866 cells: 15
>>
>> 8239 cells: 9
>>
>> 9887 cells: 3
>>
>> 11864 cells: 0
>>
>> 14237 cells: 5
>>
>> 17084 cells: 1
>>
>> 20501 cells: 0
>>
>> 24601 cells: 2
>>
>> 29521 cells: 0
>>
>> 35425 cells: 0
>>
>> 42510 cells: 0
>>
>> 51012 cells: 0
>>
>> 61214 cells: 1
>>
>>
>> CREATE TABLE global_user_events (
>>
>> user_id timeuuid,
>>
>> app_id int,
>>
>> type text,
>>
>> name text,
>>
>> extra_param map<text, text>,
>>
>> paid boolean,
>>
>> value float,
>>
>> PRIMARY KEY (user_id, app_id, type, name)
>>
>> ) WITH
>>
>> bloom_filter_fp_chance=0.100000 AND
>>
>> caching='KEYS_ONLY' AND
>>
>> comment='' AND
>>
>> dclocal_read_repair_chance=0.000000 AND
>>
>> gc_grace_seconds=86400 AND
>>
>> index_interval=128 AND
>>
>> read_repair_chance=0.100000 AND
>>
>> replicate_on_write='true' AND
>>
>> populate_io_cache_on_flush='false' AND
>>
>> default_time_to_live=0 AND
>>
>> speculative_retry='99.0PERCENTILE' AND
>>
>> memtable_flush_period_in_ms=0 AND
>>
>> compaction={'class': 'LeveledCompactionStrategy'} AND
>>
>> compression={'sstable_compression': 'LZ4Compressor'};
>>
>>
>> users/global_user_events histograms
>>
>>
>> SSTables per Read
>>
>> 1 sstables: 1024446
>>
>> 2 sstables: 632948
>>
>> 3 sstables: 230768
>>
>> 4 sstables: 112428
>>
>> 5 sstables: 82651
>>
>> 6 sstables: 61439
>>
>> 7 sstables: 47518
>>
>> 8 sstables: 38490
>>
>> 10 sstables: 58054
>>
>> 12 sstables: 41128
>>
>> 14 sstables: 29642
>>
>> 17 sstables: 31056
>>
>> 20 sstables: 20474
>>
>> 24 sstables: 17799
>>
>> 29 sstables: 13744
>>
>> 35 sstables: 9485
>>
>> 42 sstables: 5933
>>
>> 50 sstables: 3607
>>
>> 60 sstables: 1618
>>
>> 72 sstables: 263
>>
>> 86 sstables: 10
>>
>>
>> Write Latency (microseconds)
>>
>> 10 us: 10
>>
>> 12 us: 75
>>
>> 14 us: 319
>>
>> 17 us: 3550
>>
>> 20 us: 20617
>>
>> 24 us: 105254
>>
>> 29 us: 311290
>>
>> 35 us: 494404
>>
>> 42 us: 516301
>>
>> 50 us: 534061
>>
>> 60 us: 421438
>>
>> 72 us: 314377
>>
>> 86 us: 375240
>>
>> 103 us: 333402
>>
>> 124 us: 816891
>>
>> 149 us: 667832
>>
>> 179 us: 430659
>>
>> 215 us: 336063
>>
>> 258 us: 380606
>>
>> 310 us: 235592
>>
>> 372 us: 68679
>>
>> 446 us: 14988
>>
>> 535 us: 6169
>>
>> 642 us: 3763
>>
>> 770 us: 2591
>>
>> 924 us: 2270
>>
>> 1109 us: 1985
>>
>> 1331 us: 1744
>>
>> 1597 us: 1564
>>
>> 1916 us: 1489
>>
>> 2299 us: 1326
>>
>> 2759 us: 1114
>>
>> 3311 us: 1024
>>
>> 3973 us: 984
>>
>> 4768 us: 833
>>
>> 5722 us: 962
>>
>> 6866 us: 1076
>>
>> 8239 us: 891
>>
>> 9887 us: 611
>>
>> 11864 us: 489
>>
>> 14237 us: 324
>>
>> 17084 us: 306
>>
>> 20501 us: 222
>>
>> 24601 us: 426
>>
>> 29521 us: 791
>>
>> 35425 us: 820
>>
>> 42510 us: 534
>>
>> 51012 us: 530
>>
>> 61214 us: 946
>>
>> 73457 us: 1122
>>
>> 88148 us: 714
>>
>> 105778 us: 313
>>
>> 126934 us: 127
>>
>> 152321 us: 38
>>
>> 182785 us: 3
>>
>> 219342 us: 2
>>
>> 263210 us: 0
>>
>> 315852 us: 0
>>
>> 379022 us: 0
>>
>> 454826 us: 2
>>
>> 545791 us: 0
>>
>> 654949 us: 3
>>
>> 785939 us: 0
>>
>> 943127 us: 0
>>
>> 1131752 us: 0
>>
>> 1358102 us: 0
>>
>> 1629722 us: 0
>>
>> 1955666 us: 0
>>
>> 2346799 us: 1
>>
>> 2816159 us: 0
>>
>> 3379391 us: 0
>>
>> 4055269 us: 0
>>
>> 4866323 us: 0
>>
>> 5839588 us: 0
>>
>> 7007506 us: 1
>>
>>
>> Read Latency (microseconds)
>>
>> 42 us: 1
>>
>> 50 us: 2
>>
>> 60 us: 2
>>
>> 72 us: 4
>>
>> 86 us: 15
>>
>> 103 us: 23
>>
>> 124 us: 29
>>
>> 149 us: 48
>>
>> 179 us: 67
>>
>> 215 us: 80
>>
>> 258 us: 316
>>
>> 310 us: 1812
>>
>> 372 us: 4999
>>
>> 446 us: 8876
>>
>> 535 us: 14586
>>
>> 642 us: 24417
>>
>> 770 us: 31943
>>
>> 924 us: 37453
>>
>> 1109 us: 77449
>>
>> 1331 us: 139135
>>
>> 1597 us: 177538
>>
>> 1916 us: 169568
>>
>> 2299 us: 177617
>>
>> 2759 us: 202552
>>
>> 3311 us: 198101
>>
>> 3973 us: 179807
>>
>> 4768 us: 159149
>>
>> 5722 us: 142282
>>
>> 6866 us: 126124
>>
>> 8239 us: 111529
>>
>> 9887 us: 97658
>>
>> 11864 us: 84913
>>
>> 14237 us: 70905
>>
>> 17084 us: 54917
>>
>> 20501 us: 40623
>>
>> 24601 us: 29545
>>
>> 29521 us: 21448
>>
>> 35425 us: 15255
>>
>> 42510 us: 11222
>>
>> 51012 us: 8150
>>
>> 61214 us: 7473
>>
>> 73457 us: 10060
>>
>> 88148 us: 10530
>>
>> 105778 us: 7573
>>
>> 126934 us: 4331
>>
>> 152321 us: 2097
>>
>> 182785 us: 836
>>
>> 219342 us: 255
>>
>> 263210 us: 84
>>
>> 315852 us: 28
>>
>> 379022 us: 8
>>
>> 454826 us: 2
>>
>> 545791 us: 0
>>
>> 654949 us: 2
>>
>> 785939 us: 1
>>
>> 943127 us: 0
>>
>> 1131752 us: 0
>>
>> 1358102 us: 0
>>
>> 1629722 us: 2
>>
>> 1955666 us: 6
>>
>> 2346799 us: 18
>>
>> 2816159 us: 0
>>
>> 3379391 us: 0
>>
>> 4055269 us: 0
>>
>> 4866323 us: 0
>>
>> 5839588 us: 0
>>
>> 7007506 us: 34
>>
>>
>> Partition Size (bytes)
>>
>> 35 bytes: 25
>>
>> 42 bytes: 0
>>
>> 50 bytes: 0
>>
>> 60 bytes: 0
>>
>> 72 bytes: 21761
>>
>> 86 bytes: 694466658
>>
>> 103 bytes: 219007195
>>
>> 124 bytes: 7034596
>>
>> 149 bytes: 110269079
>>
>> 179 bytes: 101276811
>>
>> 215 bytes: 38518027
>>
>> 258 bytes: 103242356
>>
>> 310 bytes: 51434914
>>
>> 372 bytes: 38048245
>>
>> 446 bytes: 42558831
>>
>> 535 bytes: 32869176
>>
>> 642 bytes: 17188026
>>
>> 770 bytes: 14750559
>>
>> 924 bytes: 15963256
>>
>> 1109 bytes: 16125324
>>
>> 1331 bytes: 11462272
>>
>> 1597 bytes: 6810452
>>
>> 1916 bytes: 5685739
>>
>> 2299 bytes: 10232324
>>
>> 2759 bytes: 5138522
>>
>> 3311 bytes: 5847998
>>
>> 3973 bytes: 4065752
>>
>> 4768 bytes: 3468263
>>
>> 5722 bytes: 2545431
>>
>> 6866 bytes: 1701913
>>
>> 8239 bytes: 993808
>>
>> 9887 bytes: 495282
>>
>> 11864 bytes: 208644
>>
>> 14237 bytes: 74307
>>
>> 17084 bytes: 22286
>>
>> 20501 bytes: 5474
>>
>> 24601 bytes: 1225
>>
>> 29521 bytes: 210
>>
>> 35425 bytes: 97
>>
>> 42510 bytes: 48
>>
>> 51012 bytes: 21
>>
>> 61214 bytes: 15
>>
>> 73457 bytes: 10
>>
>> 88148 bytes: 3
>>
>> 105778 bytes: 4
>>
>> 126934 bytes: 1
>>
>> 152321 bytes: 2
>>
>> 182785 bytes: 1
>>
>>
>> Cell Count per Partition
>>
>> 1 cells: 914323128
>>
>> 2 cells: 196020828
>>
>> 3 cells: 165502164
>>
>> 4 cells: 71735648
>>
>> 5 cells: 54450466
>>
>> 6 cells: 28995756
>>
>> 7 cells: 9625976
>>
>> 8 cells: 10257788
>>
>> 10 cells: 24004757
>>
>> 12 cells: 13087147
>>
>> 14 cells: 13588347
>>
>> 17 cells: 11976353
>>
>> 20 cells: 6245374
>>
>> 24 cells: 5627821
>>
>> 29 cells: 9480533
>>
>> 35 cells: 6008276
>>
>> 42 cells: 5221357
>>
>> 50 cells: 4355454
>>
>> 60 cells: 3955488
>>
>> 72 cells: 2872578
>>
>> 86 cells: 1914447
>>
>> 103 cells: 1216276
>>
>> 124 cells: 652035
>>
>> 149 cells: 279565
>>
>> 179 cells: 99836
>>
>> 215 cells: 29132
>>
>> 258 cells: 6682
>>
>> 310 cells: 1276
>>
>> 372 cells: 258
>>
>> 446 cells: 96
>>
>> 535 cells: 48
>>
>> 642 cells: 25
>>
>> 770 cells: 9
>>
>> 924 cells: 9
>>
>> 1109 cells: 4
>>
>> 1331 cells: 3
>>
>> 1597 cells: 1
>>
>> 1916 cells: 2
>>
>> From: Jack Krupansky <ja...@basetechnology.com>
>> Reply-To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
>> Date: Thursday, July 24, 2014 at 4:07 PM
>>
>> To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
>> Subject: Re: Hot, large row
>>
>> Could it be some “fat columns” (cells with large blob or text values)
>> rather than the cell-count per se? IOW, a “big row” rather than a “wide
>> row”?
>>
>> And, could it be a large partition rather than a large row (many rows in
>> a single partition)? Are clustering columns being used in the primary key?
>>
>> -- Jack Krupansky
>>
>> *From:* DuyHai Doan <do...@gmail.com>
>> *Sent:* Thursday, July 24, 2014 3:53 PM
>> *To:* user@cassandra.apache.org
>> *Subject:* Re: Hot, large row
>>
>> Your extract of cfhistograms show that there are no particular "wide
>> rows". The widest has 61214 cells which is big but not that huge to be
>> really a concern.
>>
>> Turning on trace probabilty only tells give you some "hints" about what
>> kind of queries are done, it does not give the exact partition key nor
>> other statement values, especially when you are using prepared statements
>> ...
>>
>>
>> "I am considering reducing read_request_timeout_in_ms: 5000 in
>> cassandra.yaml so that it reduces the impact when this occurs." --> Don't
>> do that, you'll only sweep dust under the carpet. Find the real issue and
>> fix it instead of changing parameter to hide it.
>>
>> One solution would be on client side, to activate some logging to show
>> the CQL3 statements the application is issuing that may overload the
>> server. I know that's better said than done but I don't have any other
>> idea for the moment
>>
>> -------- Shameless self-promotion ------
>>
>> To support this kind of live prod debugging & investigation that I add a
>> new dynamic query logging feature in Achilles:
>> https://github.com/doanduyhai/Achilles/wiki/Statements-Logging-and-Tracing#dynamic-statements-logging
>>
>> Once you hit the issue, this kind of feature may save your day...
>>
>>
>>
>>
>>
>>
>> On Thu, Jul 24, 2014 at 9:22 PM, Keith Wright <kw...@nanigans.com>
>> wrote:
>>
>>> I can see from cfhistograms that I do have some wide rows (see below).
>>> I set trace probability as you suggested but the output doesn’t appear to
>>> tell me what row was actually read unless I missed something. I just see
>>> executing prepared statement. Any ideas how I can find the row in
>>> question?
>>>
>>> I am considering reducing read_request_timeout_in_ms: 5000 in
>>> cassandra.yaml so that it reduces the impact when this occurs.
>>>
>>> Any help in identifying my issue would be GREATLY appreciated
>>>
>>>
>>> Cell Count per Partition
>>>
>>> 1 cells: 50449950
>>>
>>> 2 cells: 14281828
>>>
>>> 3 cells: 8093366
>>>
>>> 4 cells: 5029200
>>>
>>> 5 cells: 3103023
>>>
>>> 6 cells: 3059903
>>>
>>> 7 cells: 1903018
>>>
>>> 8 cells: 1509297
>>>
>>> 10 cells: 2420359
>>>
>>> 12 cells: 1624895
>>>
>>> 14 cells: 1171678
>>>
>>> 17 cells: 1289391
>>>
>>> 20 cells: 909777
>>>
>>> 24 cells: 852081
>>>
>>> 29 cells: 722925
>>>
>>> 35 cells: 587067
>>>
>>> 42 cells: 459473
>>>
>>> 50 cells: 358744
>>>
>>> 60 cells: 304146
>>>
>>> 72 cells: 244682
>>>
>>> 86 cells: 191045
>>>
>>> 103 cells: 155337
>>>
>>> 124 cells: 127061
>>>
>>> 149 cells: 98913
>>>
>>> 179 cells: 77454
>>>
>>> 215 cells: 59849
>>>
>>> 258 cells: 46117
>>>
>>> 310 cells: 35321
>>>
>>> 372 cells: 26319
>>>
>>> 446 cells: 19379
>>>
>>> 535 cells: 13783
>>>
>>> 642 cells: 9993
>>>
>>> 770 cells: 6973
>>>
>>> 924 cells: 4713
>>>
>>> 1109 cells: 3229
>>>
>>> 1331 cells: 2062
>>>
>>> 1597 cells: 1338
>>>
>>> 1916 cells: 773
>>>
>>> 2299 cells: 495
>>>
>>> 2759 cells: 268
>>>
>>> 3311 cells: 150
>>>
>>> 3973 cells: 100
>>>
>>> 4768 cells: 42
>>>
>>> 5722 cells: 24
>>>
>>> 6866 cells: 12
>>>
>>> 8239 cells: 9
>>>
>>> 9887 cells: 3
>>>
>>> 11864 cells: 0
>>>
>>> 14237 cells: 5
>>>
>>> 17084 cells: 1
>>>
>>> 20501 cells: 0
>>>
>>> 24601 cells: 2
>>>
>>> 29521 cells: 0
>>>
>>> 35425 cells: 0
>>>
>>> 42510 cells: 0
>>>
>>> 51012 cells: 0
>>>
>>> 61214 cells: 2
>>>
>>> From: DuyHai Doan <do...@gmail.com>
>>> Reply-To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
>>> Date: Thursday, July 24, 2014 at 3:01 PM
>>> To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
>>> Subject: Re: Hot, large row
>>>
>>> "How can I detect wide rows?" -->
>>>
>>> nodetool cfhistograms <keyspace> <suspected column family>
>>>
>>> Look at column "Column count" (last column) and identify a line in this
>>> column with very high value of "Offset". In a well designed application you
>>> should have a gaussian distribution where 80% of your row have a similar
>>> number of columns.
>>>
>>> "Anyone know what debug level I can set so that I can see what reads the
>>> hot node is handling? " -->
>>>
>>> "nodetool settraceprobability <value>", where value is a small number
>>> (0.001) on the node where you encounter the issue. Activate the tracing for
>>> a while (5 mins) and deactivate it (value = 0). Then look into
>>> system_traces tables "events" & "sessions". It may help or not since the
>>> tracing is done once every 1000.
>>>
>>> "Any way to get the server to blacklist these wide rows automatically?"
>>> --> No
>>>
>>>
>>> On Thu, Jul 24, 2014 at 8:48 PM, Keith Wright <kw...@nanigans.com>
>>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> We are seeing an issue where basically daily one of our nodes spikes
>>>> in load and is churning in CMS heap pressure. It appears that reads are
>>>> backing up and my guess is that our application is reading a large row
>>>> repeatedly. Our write structure can lead itself to wide rows very
>>>> infrequently (<0.001%) and we do our best to detect and delete them but
>>>> obviously we’re missing a case. Hoping for assistance on the following
>>>> questions:
>>>>
>>>> - How can I detect wide rows?
>>>> - Anyone know what debug level I can set so that I can see what
>>>> reads the hot node is handling? I’m hoping to see the “bad” row
>>>> - Any way to get the server to blacklist these wide rows
>>>> automatically?
>>>>
>>>> We’re using C* 2.0.6 with Vnodes.
>>>>
>>>> Thanks
>>>>
>>>
>>>
>>
>>
>
>
--
*Ken Hancock *| System Architect, Advanced Advertising
SeaChange International
50 Nagog Park
Acton, Massachusetts 01720
ken.hancock@schange.com | www.schange.com | NASDAQ:SEAC
<http://www.schange.com/en-US/Company/InvestorRelations.aspx>
Office: +1 (978) 889-3329 | [image: Google Talk:]
ken.hancock@schange.com | [image:
Skype:]hancockks | [image: Yahoo IM:]hancockks [image: LinkedIn]
<http://www.linkedin.com/in/kenhancock>
[image: SeaChange International]
<http://www.schange.com/>This e-mail and any attachments may contain
information which is SeaChange International confidential. The information
enclosed is intended only for the addressees herein and may not be copied
or forwarded without permission from SeaChange International.
Re: Hot, large row
Posted by DuyHai Doan <do...@gmail.com>.
Hello Keith
1. Periodically seeing one node stuck in CMS GC causing high read
latency. Seems to recover on its own after an hour or so
How many nodes do you have ? And how many distinct user_id roughtly is
there ?
Looking at your jvm settings it seems that you have the GC log enabled. It
worths having a look into it. And also grep for the pattern "GC for" in the
Casssandra system.log file
The symptom you mention looks like there are activity bursts on one
particular node. The rows are not so wide since the largest has only 61k of
cells and C* can deal with rows larger than that. It all depends now on
your data access pattern.
Also Jack Krupansky question is interesting. Even though you limit a
request to 5000, if each cell is a big blob or block of text, it mays add
up a lot into JVM heap ...
Did you try to do a select without limit and use paging feature of Java
Driver. Or lower the limit in the select to 500 as Duncan said and paginate
manually
Hope that helps
Duy Hai
On Fri, Jul 25, 2014 at 3:10 PM, Duncan Sands <du...@gmail.com>
wrote:
> Hi Keith,
>
>
> On 25/07/14 14:43, Keith Wright wrote:
>
>> Answers to your questions below but in the end I believe the root issue
>> here is
>> that LCS is clearly not compacting away as it should resulting in reads
>> across
>> many SSTables which as you noted is “fishy”. I’m considering filing a
>> JIRA for
>> this, sound reasonable?
>>
>> We are running OOTB JMV tuning (see below) and using the datastax client.
>> When
>> we read from the table in question, we put a limit of 5000 to help reduce
>> the
>> read volume but yes the standard scenario is: “select * from
>> global_user_event_skus_v2 where user_id = ? limit 5000”
>>
>
> does reducing the limit, eg to 500, help? I've had similar sounding
> problems when many clients were doing wide row reads in parallel, with the
> reads returning thousands of rows.
>
> Ciao, Duncan.
>
Re: Hot, large row
Posted by Duncan Sands <du...@gmail.com>.
Hi Keith,
On 25/07/14 14:43, Keith Wright wrote:
> Answers to your questions below but in the end I believe the root issue here is
> that LCS is clearly not compacting away as it should resulting in reads across
> many SSTables which as you noted is “fishy”. I’m considering filing a JIRA for
> this, sound reasonable?
>
> We are running OOTB JMV tuning (see below) and using the datastax client. When
> we read from the table in question, we put a limit of 5000 to help reduce the
> read volume but yes the standard scenario is: “select * from
> global_user_event_skus_v2 where user_id = ? limit 5000”
does reducing the limit, eg to 500, help? I've had similar sounding problems
when many clients were doing wide row reads in parallel, with the reads
returning thousands of rows.
Ciao, Duncan.
Re: Hot, large row
Posted by Keith Wright <kw...@nanigans.com>.
Answers to your questions below but in the end I believe the root issue here is that LCS is clearly not compacting away as it should resulting in reads across many SSTables which as you noted is “fishy”. I’m considering filing a JIRA for this, sound reasonable?
We are running OOTB JMV tuning (see below) and using the datastax client. When we read from the table in question, we put a limit of 5000 to help reduce the read volume but yes the standard scenario is: “select * from global_user_event_skus_v2 where user_id = ? limit 5000”
If you recall, there are 2 issues occurring. Turning on client side paging would just mask these IMO:
1. Periodically seeing one node stuck in CMS GC causing high read latency. Seems to recover on its own after an hour or so
2. Generally bad read/write times for some LCS tables that have multiple updates over time
Thanks!
-XX:+CMSClassUnloadingEnabled -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms8018M -Xmx8018M -Xmn2004M -XX:+HeapDumpOnOutOfMemoryError -Xss256k -XX:StringTableSize=1000003 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseTLAB -XX:+UseCondCardMark -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime -Xloggc:/var/log/cassandra/gc.log -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=10M
From: DuyHai Doan <do...@gmail.com>>
Reply-To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Date: Friday, July 25, 2014 at 2:53 AM
To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Subject: Re: Hot, large row
What are your jvm settings?
Your read pattern implies that you may fetch lots of data into memory (reading all skus for a given user), maybe it stressed too much the jvm.
Did you use native paging of Java Driver to avoid loading all columns at a time?
And the loading all skus for one user, is it a rare scenario or is it your main use case for this colum family?
Le 25 juil. 2014 04:11, "Keith Wright" <kw...@nanigans.com>> a écrit :
One last item to add to this thread: we have consistently experienced this behavior where over time performance degrades (previously we were unable to bootstrap nodes to due long GC pauses from existing nodes). I believe its due to tombstone build up (as I mentioned previously one of the tables mentioned is showing a droppable tombstone ratio of > 30%). The sku table is used to hold SKUs that recent users viewed/purchased. When we write a new SKU we set the TTL to 30 days where the row key is the user id; our read case is to fetch ALL skus the user has seen within the TTL. Since the user sees SKUs consistently over time, this can result in a row with many columns much of which are likely tombstoned (see CASSANDRA-6654 which I filed for this which shows that C* does not handle this case well).
I guess I’m just surprised that others aren’t using C* for similar usage cases and thus having the same issue?
I am hoping to upgrade to 2.0.9 which has improvements to remove tombstones.
From: Keith Wright <kw...@nanigans.com>>
Reply-To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Date: Thursday, July 24, 2014 at 4:50 PM
To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Cc: Don Jackson <dj...@nanigans.com>>
Subject: Re: Hot, large row
When a node is showing the high CMS issue, io is actually low likely due to the fact that none is going on during CMS GC. On a node not showing the issue, iostat shows disk usage around 50% (these are SSD) and load hovers around 10 for a dual octo core machine this is fine.
In addition, nodetool compactionstats does not show that we are falling behind in compactions.
So not sure what is going on here. We are running CentOS 6.5 with java 1.7.0_51. It does seem like things are getting worse and I’m considering dropping and rebuilding all the tables (as I have the data in Hadoop). This seems to be a repeated problem for us with Cassandra and now that Aerospike has an open source version, we are very much considering switching.
Thanks again for the help and any insight you might have!
avg-cpu: %user %nice %system %iowait %steal %idle
23.43 12.40 11.40 6.20 0.00 46.57
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sda 0.00 0.00 720.50 0.00 5.51 0.00 15.66 1.05 1.54 0.57 41.35
sdc 0.00 0.00 5930.50 2100.50 55.42 8.21 16.23 2.30 0.29 0.06 51.25
sdb 0.00 0.00 6951.50 2052.50 65.82 8.02 16.80 4.31 0.48 0.07 59.60
avg-cpu: %user %nice %system %iowait %steal %idle
9.48 14.72 5.60 3.67 0.00 66.52
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sda 0.00 0.00 327.00 19.50 2.55 0.08 15.53 0.04 0.13 0.12 4.25
sdc 0.00 0.00 3974.50 1403.50 36.37 5.48 15.94 0.99 0.18 0.08 45.10
sdb 0.00 0.00 4357.50 1535.00 40.83 6.00 16.28 1.10 0.19 0.08 47.45
From: DuyHai Doan <do...@gmail.com>>
Reply-To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Date: Thursday, July 24, 2014 at 4:44 PM
To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Subject: Re: Hot, large row
For global_user_event_skus_v2
1. number of SSTables per read is quite huge. Considering you're using LCS, it means that LCS cannot keep up with write rate and is left behind. AFAIK LCS is using SizeTieredCompaction at L0 to cope with extreme write burst. Your high number of SSTables per read is quite fishy here.
2. Write latency is widespead up to 73.457 millisecs, meaning that your node is getting behind for write in some cases. Most of write are still below 1 millisec but we don't care. What we care here is the large trail of write latency climbing up to 73 millisecs
3. Same remarks for read latency which is worst because the distribution is even "flatter", worst cases going up to 100 ms.
If I were you, I'll check for disk I/O first and maybe CPU usage
On Thu, Jul 24, 2014 at 10:32 PM, Keith Wright <kw...@nanigans.com>> wrote:
Cfhistograms for the tables I believe are most likely the issue are below on the node that most recently presented the issue. Any ideas? Note that these tables are LCS and have droppable tombstone ratios of 27% for global_user_event_skus_v2 and 2.7% for the other. Table definitions also below, you can see that they do use compound keys.
Thanks
CREATE TABLE global_user_event_skus_v2 (
user_id timeuuid,
app_id int,
type text,
name text,
sku text,
creationtime timestamp,
PRIMARY KEY (user_id, app_id, type, name, sku)
) WITH
bloom_filter_fp_chance=0.100000 AND
caching='KEYS_ONLY' AND
comment='' AND
dclocal_read_repair_chance=0.000000 AND
gc_grace_seconds=86400 AND
index_interval=128 AND
read_repair_chance=0.100000 AND
replicate_on_write='true' AND
populate_io_cache_on_flush='false' AND
default_time_to_live=2592000 AND
speculative_retry='99.0PERCENTILE' AND
memtable_flush_period_in_ms=0 AND
compaction={'class': 'LeveledCompactionStrategy'} AND
compression={'sstable_compression': 'LZ4Compressor'};
users/global_user_event_skus_v2 histograms
SSTables per Read
1 sstables: 1398122
2 sstables: 231100
3 sstables: 97334
4 sstables: 56496
5 sstables: 43046
6 sstables: 35471
7 sstables: 26888
8 sstables: 21102
10 sstables: 29522
12 sstables: 18314
14 sstables: 11368
17 sstables: 10283
20 sstables: 5766
24 sstables: 3554
29 sstables: 1681
35 sstables: 521
42 sstables: 102
50 sstables: 1
Write Latency (microseconds)
2 us: 532
3 us: 10728
4 us: 49608
5 us: 173718
6 us: 404131
7 us: 697468
8 us: 981048
10 us: 2643680
12 us: 3183727
14 us: 3150881
17 us: 3746333
20 us: 2298481
24 us: 1647604
29 us: 1070733
35 us: 693300
42 us: 442120
50 us: 288618
60 us: 205061
72 us: 141194
86 us: 98781
103 us: 73629
124 us: 54900
149 us: 40518
179 us: 30330
215 us: 23953
258 us: 18875
310 us: 14147
372 us: 10509
446 us: 7984
535 us: 6257
642 us: 4903
770 us: 4086
924 us: 3252
1109 us: 2676
1331 us: 2129
1597 us: 1753
1916 us: 1488
2299 us: 1237
2759 us: 1078
3311 us: 866
3973 us: 704
4768 us: 551
5722 us: 553
6866 us: 774
8239 us: 443
9887 us: 303
11864 us: 222
14237 us: 181
17084 us: 146
20501 us: 147
24601 us: 499
29521 us: 842
35425 us: 548
42510 us: 303
51012 us: 560
61214 us: 842
73457 us: 600
88148 us: 283
105778 us: 173
126934 us: 81
152321 us: 18
182785 us: 2
Read Latency (microseconds)
35 us: 2
42 us: 72
50 us: 737
60 us: 3555
72 us: 6926
86 us: 8373
103 us: 15287
124 us: 30054
149 us: 46164
179 us: 67828
215 us: 86741
258 us: 92178
310 us: 82205
372 us: 69641
446 us: 64710
535 us: 58295
642 us: 55247
770 us: 57918
924 us: 56928
1109 us: 75762
1331 us: 122306
1597 us: 141564
1916 us: 117301
2299 us: 102293
2759 us: 98789
3311 us: 86923
3973 us: 75304
4768 us: 64012
5722 us: 54973
6866 us: 47221
8239 us: 41483
9887 us: 35015
11864 us: 30249
14237 us: 23505
17084 us: 18364
20501 us: 12992
24601 us: 8837
29521 us: 6497
35425 us: 4840
42510 us: 3536
51012 us: 2471
61214 us: 2784
73457 us: 3664
88148 us: 3219
105778 us: 1980
126934 us: 1050
152321 us: 519
182785 us: 184
219342 us: 82
263210 us: 39
315852 us: 22
379022 us: 16
454826 us: 0
545791 us: 0
654949 us: 1
785939 us: 1
943127 us: 0
1131752 us: 0
1358102 us: 0
1629722 us: 3
1955666 us: 3
2346799 us: 1
2816159 us: 0
3379391 us: 0
4055269 us: 0
4866323 us: 0
5839588 us: 0
7007506 us: 4
Partition Size (bytes)
35 bytes: 31
42 bytes: 0
50 bytes: 0
60 bytes: 0
72 bytes: 0
86 bytes: 0
103 bytes: 761108
124 bytes: 23596164
149 bytes: 22963275
179 bytes: 3517971
215 bytes: 6395335
258 bytes: 6197982
310 bytes: 4662589
372 bytes: 5261934
446 bytes: 3032032
535 bytes: 3775310
642 bytes: 2879935
770 bytes: 3002743
924 bytes: 2279900
1109 bytes: 1976357
1331 bytes: 1643875
1597 bytes: 1422199
1916 bytes: 1142746
2299 bytes: 901584
2759 bytes: 760173
3311 bytes: 596874
3973 bytes: 480502
4768 bytes: 380185
5722 bytes: 303315
6866 bytes: 239013
8239 bytes: 187337
9887 bytes: 147218
11864 bytes: 115684
14237 bytes: 89718
17084 bytes: 69160
20501 bytes: 53070
24601 bytes: 40496
29521 bytes: 30344
35425 bytes: 22656
42510 bytes: 16419
51012 bytes: 11621
61214 bytes: 8020
73457 bytes: 5561
88148 bytes: 3708
105778 bytes: 2434
126934 bytes: 1561
152321 bytes: 1017
182785 bytes: 559
219342 bytes: 349
263210 bytes: 181
315852 bytes: 121
379022 bytes: 66
454826 bytes: 22
545791 bytes: 14
654949 bytes: 10
785939 bytes: 2
943127 bytes: 0
1131752 bytes: 4
1358102 bytes: 2
1629722 bytes: 1
1955666 bytes: 2
2346799 bytes: 0
2816159 bytes: 0
3379391 bytes: 0
4055269 bytes: 0
4866323 bytes: 1
Cell Count per Partition
1 cells: 50599458
2 cells: 13975363
3 cells: 7927898
4 cells: 4981142
5 cells: 3081166
6 cells: 3034895
7 cells: 1896720
8 cells: 1508849
10 cells: 2418845
12 cells: 1631289
14 cells: 1179269
17 cells: 1301647
20 cells: 919744
24 cells: 864313
29 cells: 735203
35 cells: 597749
42 cells: 469611
50 cells: 367422
60 cells: 312697
72 cells: 251782
86 cells: 197717
103 cells: 161244
124 cells: 132074
149 cells: 103711
179 cells: 81760
215 cells: 63360
258 cells: 48982
310 cells: 37932
372 cells: 28328
446 cells: 21347
535 cells: 15373
642 cells: 10959
770 cells: 7643
924 cells: 5346
1109 cells: 3613
1331 cells: 2291
1597 cells: 1558
1916 cells: 885
2299 cells: 577
2759 cells: 327
3311 cells: 169
3973 cells: 116
4768 cells: 57
5722 cells: 23
6866 cells: 15
8239 cells: 9
9887 cells: 3
11864 cells: 0
14237 cells: 5
17084 cells: 1
20501 cells: 0
24601 cells: 2
29521 cells: 0
35425 cells: 0
42510 cells: 0
51012 cells: 0
61214 cells: 1
CREATE TABLE global_user_events (
user_id timeuuid,
app_id int,
type text,
name text,
extra_param map<text, text>,
paid boolean,
value float,
PRIMARY KEY (user_id, app_id, type, name)
) WITH
bloom_filter_fp_chance=0.100000 AND
caching='KEYS_ONLY' AND
comment='' AND
dclocal_read_repair_chance=0.000000 AND
gc_grace_seconds=86400 AND
index_interval=128 AND
read_repair_chance=0.100000 AND
replicate_on_write='true' AND
populate_io_cache_on_flush='false' AND
default_time_to_live=0 AND
speculative_retry='99.0PERCENTILE' AND
memtable_flush_period_in_ms=0 AND
compaction={'class': 'LeveledCompactionStrategy'} AND
compression={'sstable_compression': 'LZ4Compressor'};
users/global_user_events histograms
SSTables per Read
1 sstables: 1024446
2 sstables: 632948
3 sstables: 230768
4 sstables: 112428
5 sstables: 82651
6 sstables: 61439
7 sstables: 47518
8 sstables: 38490
10 sstables: 58054
12 sstables: 41128
14 sstables: 29642
17 sstables: 31056
20 sstables: 20474
24 sstables: 17799
29 sstables: 13744
35 sstables: 9485
42 sstables: 5933
50 sstables: 3607
60 sstables: 1618
72 sstables: 263
86 sstables: 10
Write Latency (microseconds)
10 us: 10
12 us: 75
14 us: 319
17 us: 3550
20 us: 20617
24 us: 105254
29 us: 311290
35 us: 494404
42 us: 516301
50 us: 534061
60 us: 421438
72 us: 314377
86 us: 375240
103 us: 333402
124 us: 816891
149 us: 667832
179 us: 430659
215 us: 336063
258 us: 380606
310 us: 235592
372 us: 68679
446 us: 14988
535 us: 6169
642 us: 3763
770 us: 2591
924 us: 2270
1109 us: 1985
1331 us: 1744
1597 us: 1564
1916 us: 1489
2299 us: 1326
2759 us: 1114
3311 us: 1024
3973 us: 984
4768 us: 833
5722 us: 962
6866 us: 1076
8239 us: 891
9887 us: 611
11864 us: 489
14237 us: 324
17084 us: 306
20501 us: 222
24601 us: 426
29521 us: 791
35425 us: 820
42510 us: 534
51012 us: 530
61214 us: 946
73457 us: 1122
88148 us: 714
105778 us: 313
126934 us: 127
152321 us: 38
182785 us: 3
219342 us: 2
263210 us: 0
315852 us: 0
379022 us: 0
454826 us: 2
545791 us: 0
654949 us: 3
785939 us: 0
943127 us: 0
1131752 us: 0
1358102 us: 0
1629722 us: 0
1955666 us: 0
2346799 us: 1
2816159 us: 0
3379391 us: 0
4055269 us: 0
4866323 us: 0
5839588 us: 0
7007506 us: 1
Read Latency (microseconds)
42 us: 1
50 us: 2
60 us: 2
72 us: 4
86 us: 15
103 us: 23
124 us: 29
149 us: 48
179 us: 67
215 us: 80
258 us: 316
310 us: 1812
372 us: 4999
446 us: 8876
535 us: 14586
642 us: 24417
770 us: 31943
924 us: 37453
1109 us: 77449
1331 us: 139135
1597 us: 177538
1916 us: 169568
2299 us: 177617
2759 us: 202552
3311 us: 198101
3973 us: 179807
4768 us: 159149
5722 us: 142282
6866 us: 126124
8239 us: 111529
9887 us: 97658
11864 us: 84913
14237 us: 70905
17084 us: 54917
20501 us: 40623
24601 us: 29545
29521 us: 21448
35425 us: 15255
42510 us: 11222
51012 us: 8150
61214 us: 7473
73457 us: 10060
88148 us: 10530
105778 us: 7573
126934 us: 4331
152321 us: 2097
182785 us: 836
219342 us: 255
263210 us: 84
315852 us: 28
379022 us: 8
454826 us: 2
545791 us: 0
654949 us: 2
785939 us: 1
943127 us: 0
1131752 us: 0
1358102 us: 0
1629722 us: 2
1955666 us: 6
2346799 us: 18
2816159 us: 0
3379391 us: 0
4055269 us: 0
4866323 us: 0
5839588 us: 0
7007506 us: 34
Partition Size (bytes)
35 bytes: 25
42 bytes: 0
50 bytes: 0
60 bytes: 0
72 bytes: 21761
86 bytes: 694466658
103 bytes: 219007195
124 bytes: 7034596
149 bytes: 110269079
179 bytes: 101276811
215 bytes: 38518027
258 bytes: 103242356
310 bytes: 51434914
372 bytes: 38048245
446 bytes: 42558831
535 bytes: 32869176
642 bytes: 17188026
770 bytes: 14750559
924 bytes: 15963256
1109 bytes: 16125324
1331 bytes: 11462272
1597 bytes: 6810452
1916 bytes: 5685739
2299 bytes: 10232324
2759 bytes: 5138522
3311 bytes: 5847998
3973 bytes: 4065752
4768 bytes: 3468263
5722 bytes: 2545431
6866 bytes: 1701913
8239 bytes: 993808
9887 bytes: 495282
11864 bytes: 208644
14237 bytes: 74307
17084 bytes: 22286
20501 bytes: 5474
24601 bytes: 1225
29521 bytes: 210
35425 bytes: 97
42510 bytes: 48
51012 bytes: 21
61214 bytes: 15
73457 bytes: 10
88148 bytes: 3
105778 bytes: 4
126934 bytes: 1
152321 bytes: 2
182785 bytes: 1
Cell Count per Partition
1 cells: 914323128
2 cells: 196020828
3 cells: 165502164
4 cells: 71735648
5 cells: 54450466
6 cells: 28995756
7 cells: 9625976
8 cells: 10257788
10 cells: 24004757
12 cells: 13087147
14 cells: 13588347
17 cells: 11976353
20 cells: 6245374
24 cells: 5627821
29 cells: 9480533
35 cells: 6008276
42 cells: 5221357
50 cells: 4355454
60 cells: 3955488
72 cells: 2872578
86 cells: 1914447
103 cells: 1216276
124 cells: 652035
149 cells: 279565
179 cells: 99836
215 cells: 29132
258 cells: 6682
310 cells: 1276
372 cells: 258
446 cells: 96
535 cells: 48
642 cells: 25
770 cells: 9
924 cells: 9
1109 cells: 4
1331 cells: 3
1597 cells: 1
1916 cells: 2
From: Jack Krupansky <ja...@basetechnology.com>>
Reply-To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Date: Thursday, July 24, 2014 at 4:07 PM
To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Subject: Re: Hot, large row
Could it be some “fat columns” (cells with large blob or text values) rather than the cell-count per se? IOW, a “big row” rather than a “wide row”?
And, could it be a large partition rather than a large row (many rows in a single partition)? Are clustering columns being used in the primary key?
-- Jack Krupansky
From: DuyHai Doan<ma...@gmail.com>
Sent: Thursday, July 24, 2014 3:53 PM
To: user@cassandra.apache.org<ma...@cassandra.apache.org>
Subject: Re: Hot, large row
Your extract of cfhistograms show that there are no particular "wide rows". The widest has 61214 cells which is big but not that huge to be really a concern.
Turning on trace probabilty only tells give you some "hints" about what kind of queries are done, it does not give the exact partition key nor other statement values, especially when you are using prepared statements ...
"I am considering reducing read_request_timeout_in_ms: 5000 in cassandra.yaml so that it reduces the impact when this occurs." --> Don't do that, you'll only sweep dust under the carpet. Find the real issue and fix it instead of changing parameter to hide it.
One solution would be on client side, to activate some logging to show the CQL3 statements the application is issuing that may overload the server. I know that's better said than done but I don't have any other idea for the moment
-------- Shameless self-promotion ------
To support this kind of live prod debugging & investigation that I add a new dynamic query logging feature in Achilles: https://github.com/doanduyhai/Achilles/wiki/Statements-Logging-and-Tracing#dynamic-statements-logging
Once you hit the issue, this kind of feature may save your day...
On Thu, Jul 24, 2014 at 9:22 PM, Keith Wright <kw...@nanigans.com>> wrote:
I can see from cfhistograms that I do have some wide rows (see below). I set trace probability as you suggested but the output doesn’t appear to tell me what row was actually read unless I missed something. I just see executing prepared statement. Any ideas how I can find the row in question?
I am considering reducing read_request_timeout_in_ms: 5000 in cassandra.yaml so that it reduces the impact when this occurs.
Any help in identifying my issue would be GREATLY appreciated
Cell Count per Partition
1 cells: 50449950
2 cells: 14281828
3 cells: 8093366
4 cells: 5029200
5 cells: 3103023<tel:3103023>
6 cells: 3059903
7 cells: 1903018
8 cells: 1509297
10 cells: 2420359
12 cells: 1624895
14 cells: 1171678
17 cells: 1289391
20 cells: 909777
24 cells: 852081
29 cells: 722925
35 cells: 587067
42 cells: 459473
50 cells: 358744
60 cells: 304146
72 cells: 244682
86 cells: 191045
103 cells: 155337
124 cells: 127061
149 cells: 98913
179 cells: 77454
215 cells: 59849
258 cells: 46117
310 cells: 35321
372 cells: 26319
446 cells: 19379
535 cells: 13783
642 cells: 9993
770 cells: 6973
924 cells: 4713
1109 cells: 3229
1331 cells: 2062
1597 cells: 1338
1916 cells: 773
2299 cells: 495
2759 cells: 268
3311 cells: 150
3973 cells: 100
4768 cells: 42
5722 cells: 24
6866 cells: 12
8239 cells: 9
9887 cells: 3
11864 cells: 0
14237 cells: 5
17084 cells: 1
20501 cells: 0
24601 cells: 2
29521 cells: 0
35425 cells: 0
42510 cells: 0
51012 cells: 0
61214 cells: 2
From: DuyHai Doan <do...@gmail.com>>
Reply-To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Date: Thursday, July 24, 2014 at 3:01 PM
To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Subject: Re: Hot, large row
"How can I detect wide rows?" -->
nodetool cfhistograms <keyspace> <suspected column family>
Look at column "Column count" (last column) and identify a line in this column with very high value of "Offset". In a well designed application you should have a gaussian distribution where 80% of your row have a similar number of columns.
"Anyone know what debug level I can set so that I can see what reads the hot node is handling? " -->
"nodetool settraceprobability <value>", where value is a small number (0.001) on the node where you encounter the issue. Activate the tracing for a while (5 mins) and deactivate it (value = 0). Then look into system_traces tables "events" & "sessions". It may help or not since the tracing is done once every 1000.
"Any way to get the server to blacklist these wide rows automatically?" --> No
On Thu, Jul 24, 2014 at 8:48 PM, Keith Wright <kw...@nanigans.com>> wrote:
Hi all,
We are seeing an issue where basically daily one of our nodes spikes in load and is churning in CMS heap pressure. It appears that reads are backing up and my guess is that our application is reading a large row repeatedly. Our write structure can lead itself to wide rows very infrequently (<0.001%) and we do our best to detect and delete them but obviously we’re missing a case. Hoping for assistance on the following questions:
* How can I detect wide rows?
* Anyone know what debug level I can set so that I can see what reads the hot node is handling? I’m hoping to see the “bad” row
* Any way to get the server to blacklist these wide rows automatically?
We’re using C* 2.0.6 with Vnodes.
Thanks
Re: Hot, large row
Posted by DuyHai Doan <do...@gmail.com>.
What are your jvm settings?
Your read pattern implies that you may fetch lots of data into memory
(reading all skus for a given user), maybe it stressed too much the jvm.
Did you use native paging of Java Driver to avoid loading all columns at a
time?
And the loading all skus for one user, is it a rare scenario or is it your
main use case for this colum family?
Le 25 juil. 2014 04:11, "Keith Wright" <kw...@nanigans.com> a écrit :
> One last item to add to this thread: we have consistently experienced
> this behavior where over time performance degrades (previously we were
> unable to bootstrap nodes to due long GC pauses from existing nodes). I
> believe its due to tombstone build up (as I mentioned previously one of the
> tables mentioned is showing a droppable tombstone ratio of > 30%). The
> sku table is used to hold SKUs that recent users viewed/purchased. When we
> write a new SKU we set the TTL to 30 days where the row key is the user id;
> our read case is to fetch ALL skus the user has seen within the TTL. Since
> the user sees SKUs consistently over time, this can result in a row with
> many columns much of which are likely tombstoned (see CASSANDRA-6654 which
> I filed for this which shows that C* does not handle this case well).
>
> I guess I’m just surprised that others aren’t using C* for similar usage
> cases and thus having the same issue?
>
> I am hoping to upgrade to 2.0.9 which has improvements to remove
> tombstones.
>
> From: Keith Wright <kw...@nanigans.com>
> Reply-To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
> Date: Thursday, July 24, 2014 at 4:50 PM
> To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
> Cc: Don Jackson <dj...@nanigans.com>
> Subject: Re: Hot, large row
>
> When a node is showing the high CMS issue, io is actually low likely due
> to the fact that none is going on during CMS GC. On a node not showing the
> issue, iostat shows disk usage around 50% (these are SSD) and load hovers
> around 10 for a dual octo core machine this is fine.
>
> In addition, nodetool compactionstats does not show that we are falling
> behind in compactions.
>
> So not sure what is going on here. We are running CentOS 6.5 with java
> 1.7.0_51. It does seem like things are getting worse and I’m considering
> dropping and rebuilding all the tables (as I have the data in Hadoop).
> This seems to be a repeated problem for us with Cassandra and now that
> Aerospike has an open source version, we are very much considering
> switching.
>
> Thanks again for the help and any insight you might have!
>
> avg-cpu: %user %nice %system %iowait %steal %idle
>
> 23.43 12.40 11.40 6.20 0.00 46.57
>
>
> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz
> avgqu-sz await svctm %util
>
> sda 0.00 0.00 720.50 0.00 5.51 0.00 15.66
> 1.05 1.54 0.57 41.35
>
> sdc 0.00 0.00 5930.50 2100.50 55.42 8.21 16.23
> 2.30 0.29 0.06 51.25
>
> sdb 0.00 0.00 6951.50 2052.50 65.82 8.02 16.80
> 4.31 0.48 0.07 59.60
>
>
> avg-cpu: %user %nice %system %iowait %steal %idle
>
> 9.48 14.72 5.60 3.67 0.00 66.52
>
>
> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz
> avgqu-sz await svctm %util
>
> sda 0.00 0.00 327.00 19.50 2.55 0.08 15.53
> 0.04 0.13 0.12 4.25
>
> sdc 0.00 0.00 3974.50 1403.50 36.37 5.48 15.94
> 0.99 0.18 0.08 45.10
>
> sdb 0.00 0.00 4357.50 1535.00 40.83 6.00 16.28
> 1.10 0.19 0.08 47.45
>
> From: DuyHai Doan <do...@gmail.com>
> Reply-To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
> Date: Thursday, July 24, 2014 at 4:44 PM
> To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
> Subject: Re: Hot, large row
>
> For global_user_event_skus_v2
>
> 1. number of SSTables per read is quite huge. Considering you're using
> LCS, it means that LCS cannot keep up with write rate and is left behind.
> AFAIK LCS is using SizeTieredCompaction at L0 to cope with extreme write
> burst. Your high number of SSTables per read is quite fishy here.
>
> 2. Write latency is widespead up to 73.457 millisecs, meaning that your
> node is getting behind for write in some cases. Most of write are still
> below 1 millisec but we don't care. What we care here is the large trail of
> write latency climbing up to 73 millisecs
>
> 3. Same remarks for read latency which is worst because the distribution
> is even "flatter", worst cases going up to 100 ms.
>
> If I were you, I'll check for disk I/O first and maybe CPU usage
>
>
> On Thu, Jul 24, 2014 at 10:32 PM, Keith Wright <kw...@nanigans.com>
> wrote:
>
>> Cfhistograms for the tables I believe are most likely the issue are below
>> on the node that most recently presented the issue. Any ideas? Note that
>> these tables are LCS and have droppable tombstone ratios of 27% for
>> global_user_event_skus_v2 and 2.7% for the other. Table definitions also
>> below, you can see that they do use compound keys.
>>
>> Thanks
>>
>> CREATE TABLE global_user_event_skus_v2 (
>>
>> user_id timeuuid,
>>
>> app_id int,
>>
>> type text,
>>
>> name text,
>>
>> sku text,
>>
>> creationtime timestamp,
>>
>> PRIMARY KEY (user_id, app_id, type, name, sku)
>>
>> ) WITH
>>
>> bloom_filter_fp_chance=0.100000 AND
>>
>> caching='KEYS_ONLY' AND
>>
>> comment='' AND
>>
>> dclocal_read_repair_chance=0.000000 AND
>>
>> gc_grace_seconds=86400 AND
>>
>> index_interval=128 AND
>>
>> read_repair_chance=0.100000 AND
>>
>> replicate_on_write='true' AND
>>
>> populate_io_cache_on_flush='false' AND
>>
>> default_time_to_live=2592000 AND
>>
>> speculative_retry='99.0PERCENTILE' AND
>>
>> memtable_flush_period_in_ms=0 AND
>>
>> compaction={'class': 'LeveledCompactionStrategy'} AND
>>
>> compression={'sstable_compression': 'LZ4Compressor'};
>>
>>
>> users/global_user_event_skus_v2 histograms
>>
>>
>> SSTables per Read
>>
>> 1 sstables: 1398122
>>
>> 2 sstables: 231100
>>
>> 3 sstables: 97334
>>
>> 4 sstables: 56496
>>
>> 5 sstables: 43046
>>
>> 6 sstables: 35471
>>
>> 7 sstables: 26888
>>
>> 8 sstables: 21102
>>
>> 10 sstables: 29522
>>
>> 12 sstables: 18314
>>
>> 14 sstables: 11368
>>
>> 17 sstables: 10283
>>
>> 20 sstables: 5766
>>
>> 24 sstables: 3554
>>
>> 29 sstables: 1681
>>
>> 35 sstables: 521
>>
>> 42 sstables: 102
>>
>> 50 sstables: 1
>>
>>
>> Write Latency (microseconds)
>>
>> 2 us: 532
>>
>> 3 us: 10728
>>
>> 4 us: 49608
>>
>> 5 us: 173718
>>
>> 6 us: 404131
>>
>> 7 us: 697468
>>
>> 8 us: 981048
>>
>> 10 us: 2643680
>>
>> 12 us: 3183727
>>
>> 14 us: 3150881
>>
>> 17 us: 3746333
>>
>> 20 us: 2298481
>>
>> 24 us: 1647604
>>
>> 29 us: 1070733
>>
>> 35 us: 693300
>>
>> 42 us: 442120
>>
>> 50 us: 288618
>>
>> 60 us: 205061
>>
>> 72 us: 141194
>>
>> 86 us: 98781
>>
>> 103 us: 73629
>>
>> 124 us: 54900
>>
>> 149 us: 40518
>>
>> 179 us: 30330
>>
>> 215 us: 23953
>>
>> 258 us: 18875
>>
>> 310 us: 14147
>>
>> 372 us: 10509
>>
>> 446 us: 7984
>>
>> 535 us: 6257
>>
>> 642 us: 4903
>>
>> 770 us: 4086
>>
>> 924 us: 3252
>>
>> 1109 us: 2676
>>
>> 1331 us: 2129
>>
>> 1597 us: 1753
>>
>> 1916 us: 1488
>>
>> 2299 us: 1237
>>
>> 2759 us: 1078
>>
>> 3311 us: 866
>>
>> 3973 us: 704
>>
>> 4768 us: 551
>>
>> 5722 us: 553
>>
>> 6866 us: 774
>>
>> 8239 us: 443
>>
>> 9887 us: 303
>>
>> 11864 us: 222
>>
>> 14237 us: 181
>>
>> 17084 us: 146
>>
>> 20501 us: 147
>>
>> 24601 us: 499
>>
>> 29521 us: 842
>>
>> 35425 us: 548
>>
>> 42510 us: 303
>>
>> 51012 us: 560
>>
>> 61214 us: 842
>>
>> 73457 us: 600
>>
>> 88148 us: 283
>>
>> 105778 us: 173
>>
>> 126934 us: 81
>>
>> 152321 us: 18
>>
>> 182785 us: 2
>>
>>
>> Read Latency (microseconds)
>>
>> 35 us: 2
>>
>> 42 us: 72
>>
>> 50 us: 737
>>
>> 60 us: 3555
>>
>> 72 us: 6926
>>
>> 86 us: 8373
>>
>> 103 us: 15287
>>
>> 124 us: 30054
>>
>> 149 us: 46164
>>
>> 179 us: 67828
>>
>> 215 us: 86741
>>
>> 258 us: 92178
>>
>> 310 us: 82205
>>
>> 372 us: 69641
>>
>> 446 us: 64710
>>
>> 535 us: 58295
>>
>> 642 us: 55247
>>
>> 770 us: 57918
>>
>> 924 us: 56928
>>
>> 1109 us: 75762
>>
>> 1331 us: 122306
>>
>> 1597 us: 141564
>>
>> 1916 us: 117301
>>
>> 2299 us: 102293
>>
>> 2759 us: 98789
>>
>> 3311 us: 86923
>>
>> 3973 us: 75304
>>
>> 4768 us: 64012
>>
>> 5722 us: 54973
>>
>> 6866 us: 47221
>>
>> 8239 us: 41483
>>
>> 9887 us: 35015
>>
>> 11864 us: 30249
>>
>> 14237 us: 23505
>>
>> 17084 us: 18364
>>
>> 20501 us: 12992
>>
>> 24601 us: 8837
>>
>> 29521 us: 6497
>>
>> 35425 us: 4840
>>
>> 42510 us: 3536
>>
>> 51012 us: 2471
>>
>> 61214 us: 2784
>>
>> 73457 us: 3664
>>
>> 88148 us: 3219
>>
>> 105778 us: 1980
>>
>> 126934 us: 1050
>>
>> 152321 us: 519
>>
>> 182785 us: 184
>>
>> 219342 us: 82
>>
>> 263210 us: 39
>>
>> 315852 us: 22
>>
>> 379022 us: 16
>>
>> 454826 us: 0
>>
>> 545791 us: 0
>>
>> 654949 us: 1
>>
>> 785939 us: 1
>>
>> 943127 us: 0
>>
>> 1131752 us: 0
>>
>> 1358102 us: 0
>>
>> 1629722 us: 3
>>
>> 1955666 us: 3
>>
>> 2346799 us: 1
>>
>> 2816159 us: 0
>>
>> 3379391 us: 0
>>
>> 4055269 us: 0
>>
>> 4866323 us: 0
>>
>> 5839588 us: 0
>>
>> 7007506 us: 4
>>
>>
>> Partition Size (bytes)
>>
>> 35 bytes: 31
>>
>> 42 bytes: 0
>>
>> 50 bytes: 0
>>
>> 60 bytes: 0
>>
>> 72 bytes: 0
>>
>> 86 bytes: 0
>>
>> 103 bytes: 761108
>>
>> 124 bytes: 23596164
>>
>> 149 bytes: 22963275
>>
>> 179 bytes: 3517971
>>
>> 215 bytes: 6395335
>>
>> 258 bytes: 6197982
>>
>> 310 bytes: 4662589
>>
>> 372 bytes: 5261934
>>
>> 446 bytes: 3032032
>>
>> 535 bytes: 3775310
>>
>> 642 bytes: 2879935
>>
>> 770 bytes: 3002743
>>
>> 924 bytes: 2279900
>>
>> 1109 bytes: 1976357
>>
>> 1331 bytes: 1643875
>>
>> 1597 bytes: 1422199
>>
>> 1916 bytes: 1142746
>>
>> 2299 bytes: 901584
>>
>> 2759 bytes: 760173
>>
>> 3311 bytes: 596874
>>
>> 3973 bytes: 480502
>>
>> 4768 bytes: 380185
>>
>> 5722 bytes: 303315
>>
>> 6866 bytes: 239013
>>
>> 8239 bytes: 187337
>>
>> 9887 bytes: 147218
>>
>> 11864 bytes: 115684
>>
>> 14237 bytes: 89718
>>
>> 17084 bytes: 69160
>>
>> 20501 bytes: 53070
>>
>> 24601 bytes: 40496
>>
>> 29521 bytes: 30344
>>
>> 35425 bytes: 22656
>>
>> 42510 bytes: 16419
>>
>> 51012 bytes: 11621
>>
>> 61214 bytes: 8020
>>
>> 73457 bytes: 5561
>>
>> 88148 bytes: 3708
>>
>> 105778 bytes: 2434
>>
>> 126934 bytes: 1561
>>
>> 152321 bytes: 1017
>>
>> 182785 bytes: 559
>>
>> 219342 bytes: 349
>>
>> 263210 bytes: 181
>>
>> 315852 bytes: 121
>>
>> 379022 bytes: 66
>>
>> 454826 bytes: 22
>>
>> 545791 bytes: 14
>>
>> 654949 bytes: 10
>>
>> 785939 bytes: 2
>>
>> 943127 bytes: 0
>>
>> 1131752 bytes: 4
>>
>> 1358102 bytes: 2
>>
>> 1629722 bytes: 1
>>
>> 1955666 bytes: 2
>>
>> 2346799 bytes: 0
>>
>> 2816159 bytes: 0
>>
>> 3379391 bytes: 0
>>
>> 4055269 bytes: 0
>>
>> 4866323 bytes: 1
>>
>>
>> Cell Count per Partition
>>
>> 1 cells: 50599458
>>
>> 2 cells: 13975363
>>
>> 3 cells: 7927898
>>
>> 4 cells: 4981142
>>
>> 5 cells: 3081166
>>
>> 6 cells: 3034895
>>
>> 7 cells: 1896720
>>
>> 8 cells: 1508849
>>
>> 10 cells: 2418845
>>
>> 12 cells: 1631289
>>
>> 14 cells: 1179269
>>
>> 17 cells: 1301647
>>
>> 20 cells: 919744
>>
>> 24 cells: 864313
>>
>> 29 cells: 735203
>>
>> 35 cells: 597749
>>
>> 42 cells: 469611
>>
>> 50 cells: 367422
>>
>> 60 cells: 312697
>>
>> 72 cells: 251782
>>
>> 86 cells: 197717
>>
>> 103 cells: 161244
>>
>> 124 cells: 132074
>>
>> 149 cells: 103711
>>
>> 179 cells: 81760
>>
>> 215 cells: 63360
>>
>> 258 cells: 48982
>>
>> 310 cells: 37932
>>
>> 372 cells: 28328
>>
>> 446 cells: 21347
>>
>> 535 cells: 15373
>>
>> 642 cells: 10959
>>
>> 770 cells: 7643
>>
>> 924 cells: 5346
>>
>> 1109 cells: 3613
>>
>> 1331 cells: 2291
>>
>> 1597 cells: 1558
>>
>> 1916 cells: 885
>>
>> 2299 cells: 577
>>
>> 2759 cells: 327
>>
>> 3311 cells: 169
>>
>> 3973 cells: 116
>>
>> 4768 cells: 57
>>
>> 5722 cells: 23
>>
>> 6866 cells: 15
>>
>> 8239 cells: 9
>>
>> 9887 cells: 3
>>
>> 11864 cells: 0
>>
>> 14237 cells: 5
>>
>> 17084 cells: 1
>>
>> 20501 cells: 0
>>
>> 24601 cells: 2
>>
>> 29521 cells: 0
>>
>> 35425 cells: 0
>>
>> 42510 cells: 0
>>
>> 51012 cells: 0
>>
>> 61214 cells: 1
>>
>>
>> CREATE TABLE global_user_events (
>>
>> user_id timeuuid,
>>
>> app_id int,
>>
>> type text,
>>
>> name text,
>>
>> extra_param map<text, text>,
>>
>> paid boolean,
>>
>> value float,
>>
>> PRIMARY KEY (user_id, app_id, type, name)
>>
>> ) WITH
>>
>> bloom_filter_fp_chance=0.100000 AND
>>
>> caching='KEYS_ONLY' AND
>>
>> comment='' AND
>>
>> dclocal_read_repair_chance=0.000000 AND
>>
>> gc_grace_seconds=86400 AND
>>
>> index_interval=128 AND
>>
>> read_repair_chance=0.100000 AND
>>
>> replicate_on_write='true' AND
>>
>> populate_io_cache_on_flush='false' AND
>>
>> default_time_to_live=0 AND
>>
>> speculative_retry='99.0PERCENTILE' AND
>>
>> memtable_flush_period_in_ms=0 AND
>>
>> compaction={'class': 'LeveledCompactionStrategy'} AND
>>
>> compression={'sstable_compression': 'LZ4Compressor'};
>>
>>
>> users/global_user_events histograms
>>
>>
>> SSTables per Read
>>
>> 1 sstables: 1024446
>>
>> 2 sstables: 632948
>>
>> 3 sstables: 230768
>>
>> 4 sstables: 112428
>>
>> 5 sstables: 82651
>>
>> 6 sstables: 61439
>>
>> 7 sstables: 47518
>>
>> 8 sstables: 38490
>>
>> 10 sstables: 58054
>>
>> 12 sstables: 41128
>>
>> 14 sstables: 29642
>>
>> 17 sstables: 31056
>>
>> 20 sstables: 20474
>>
>> 24 sstables: 17799
>>
>> 29 sstables: 13744
>>
>> 35 sstables: 9485
>>
>> 42 sstables: 5933
>>
>> 50 sstables: 3607
>>
>> 60 sstables: 1618
>>
>> 72 sstables: 263
>>
>> 86 sstables: 10
>>
>>
>> Write Latency (microseconds)
>>
>> 10 us: 10
>>
>> 12 us: 75
>>
>> 14 us: 319
>>
>> 17 us: 3550
>>
>> 20 us: 20617
>>
>> 24 us: 105254
>>
>> 29 us: 311290
>>
>> 35 us: 494404
>>
>> 42 us: 516301
>>
>> 50 us: 534061
>>
>> 60 us: 421438
>>
>> 72 us: 314377
>>
>> 86 us: 375240
>>
>> 103 us: 333402
>>
>> 124 us: 816891
>>
>> 149 us: 667832
>>
>> 179 us: 430659
>>
>> 215 us: 336063
>>
>> 258 us: 380606
>>
>> 310 us: 235592
>>
>> 372 us: 68679
>>
>> 446 us: 14988
>>
>> 535 us: 6169
>>
>> 642 us: 3763
>>
>> 770 us: 2591
>>
>> 924 us: 2270
>>
>> 1109 us: 1985
>>
>> 1331 us: 1744
>>
>> 1597 us: 1564
>>
>> 1916 us: 1489
>>
>> 2299 us: 1326
>>
>> 2759 us: 1114
>>
>> 3311 us: 1024
>>
>> 3973 us: 984
>>
>> 4768 us: 833
>>
>> 5722 us: 962
>>
>> 6866 us: 1076
>>
>> 8239 us: 891
>>
>> 9887 us: 611
>>
>> 11864 us: 489
>>
>> 14237 us: 324
>>
>> 17084 us: 306
>>
>> 20501 us: 222
>>
>> 24601 us: 426
>>
>> 29521 us: 791
>>
>> 35425 us: 820
>>
>> 42510 us: 534
>>
>> 51012 us: 530
>>
>> 61214 us: 946
>>
>> 73457 us: 1122
>>
>> 88148 us: 714
>>
>> 105778 us: 313
>>
>> 126934 us: 127
>>
>> 152321 us: 38
>>
>> 182785 us: 3
>>
>> 219342 us: 2
>>
>> 263210 us: 0
>>
>> 315852 us: 0
>>
>> 379022 us: 0
>>
>> 454826 us: 2
>>
>> 545791 us: 0
>>
>> 654949 us: 3
>>
>> 785939 us: 0
>>
>> 943127 us: 0
>>
>> 1131752 us: 0
>>
>> 1358102 us: 0
>>
>> 1629722 us: 0
>>
>> 1955666 us: 0
>>
>> 2346799 us: 1
>>
>> 2816159 us: 0
>>
>> 3379391 us: 0
>>
>> 4055269 us: 0
>>
>> 4866323 us: 0
>>
>> 5839588 us: 0
>>
>> 7007506 us: 1
>>
>>
>> Read Latency (microseconds)
>>
>> 42 us: 1
>>
>> 50 us: 2
>>
>> 60 us: 2
>>
>> 72 us: 4
>>
>> 86 us: 15
>>
>> 103 us: 23
>>
>> 124 us: 29
>>
>> 149 us: 48
>>
>> 179 us: 67
>>
>> 215 us: 80
>>
>> 258 us: 316
>>
>> 310 us: 1812
>>
>> 372 us: 4999
>>
>> 446 us: 8876
>>
>> 535 us: 14586
>>
>> 642 us: 24417
>>
>> 770 us: 31943
>>
>> 924 us: 37453
>>
>> 1109 us: 77449
>>
>> 1331 us: 139135
>>
>> 1597 us: 177538
>>
>> 1916 us: 169568
>>
>> 2299 us: 177617
>>
>> 2759 us: 202552
>>
>> 3311 us: 198101
>>
>> 3973 us: 179807
>>
>> 4768 us: 159149
>>
>> 5722 us: 142282
>>
>> 6866 us: 126124
>>
>> 8239 us: 111529
>>
>> 9887 us: 97658
>>
>> 11864 us: 84913
>>
>> 14237 us: 70905
>>
>> 17084 us: 54917
>>
>> 20501 us: 40623
>>
>> 24601 us: 29545
>>
>> 29521 us: 21448
>>
>> 35425 us: 15255
>>
>> 42510 us: 11222
>>
>> 51012 us: 8150
>>
>> 61214 us: 7473
>>
>> 73457 us: 10060
>>
>> 88148 us: 10530
>>
>> 105778 us: 7573
>>
>> 126934 us: 4331
>>
>> 152321 us: 2097
>>
>> 182785 us: 836
>>
>> 219342 us: 255
>>
>> 263210 us: 84
>>
>> 315852 us: 28
>>
>> 379022 us: 8
>>
>> 454826 us: 2
>>
>> 545791 us: 0
>>
>> 654949 us: 2
>>
>> 785939 us: 1
>>
>> 943127 us: 0
>>
>> 1131752 us: 0
>>
>> 1358102 us: 0
>>
>> 1629722 us: 2
>>
>> 1955666 us: 6
>>
>> 2346799 us: 18
>>
>> 2816159 us: 0
>>
>> 3379391 us: 0
>>
>> 4055269 us: 0
>>
>> 4866323 us: 0
>>
>> 5839588 us: 0
>>
>> 7007506 us: 34
>>
>>
>> Partition Size (bytes)
>>
>> 35 bytes: 25
>>
>> 42 bytes: 0
>>
>> 50 bytes: 0
>>
>> 60 bytes: 0
>>
>> 72 bytes: 21761
>>
>> 86 bytes: 694466658
>>
>> 103 bytes: 219007195
>>
>> 124 bytes: 7034596
>>
>> 149 bytes: 110269079
>>
>> 179 bytes: 101276811
>>
>> 215 bytes: 38518027
>>
>> 258 bytes: 103242356
>>
>> 310 bytes: 51434914
>>
>> 372 bytes: 38048245
>>
>> 446 bytes: 42558831
>>
>> 535 bytes: 32869176
>>
>> 642 bytes: 17188026
>>
>> 770 bytes: 14750559
>>
>> 924 bytes: 15963256
>>
>> 1109 bytes: 16125324
>>
>> 1331 bytes: 11462272
>>
>> 1597 bytes: 6810452
>>
>> 1916 bytes: 5685739
>>
>> 2299 bytes: 10232324
>>
>> 2759 bytes: 5138522
>>
>> 3311 bytes: 5847998
>>
>> 3973 bytes: 4065752
>>
>> 4768 bytes: 3468263
>>
>> 5722 bytes: 2545431
>>
>> 6866 bytes: 1701913
>>
>> 8239 bytes: 993808
>>
>> 9887 bytes: 495282
>>
>> 11864 bytes: 208644
>>
>> 14237 bytes: 74307
>>
>> 17084 bytes: 22286
>>
>> 20501 bytes: 5474
>>
>> 24601 bytes: 1225
>>
>> 29521 bytes: 210
>>
>> 35425 bytes: 97
>>
>> 42510 bytes: 48
>>
>> 51012 bytes: 21
>>
>> 61214 bytes: 15
>>
>> 73457 bytes: 10
>>
>> 88148 bytes: 3
>>
>> 105778 bytes: 4
>>
>> 126934 bytes: 1
>>
>> 152321 bytes: 2
>>
>> 182785 bytes: 1
>>
>>
>> Cell Count per Partition
>>
>> 1 cells: 914323128
>>
>> 2 cells: 196020828
>>
>> 3 cells: 165502164
>>
>> 4 cells: 71735648
>>
>> 5 cells: 54450466
>>
>> 6 cells: 28995756
>>
>> 7 cells: 9625976
>>
>> 8 cells: 10257788
>>
>> 10 cells: 24004757
>>
>> 12 cells: 13087147
>>
>> 14 cells: 13588347
>>
>> 17 cells: 11976353
>>
>> 20 cells: 6245374
>>
>> 24 cells: 5627821
>>
>> 29 cells: 9480533
>>
>> 35 cells: 6008276
>>
>> 42 cells: 5221357
>>
>> 50 cells: 4355454
>>
>> 60 cells: 3955488
>>
>> 72 cells: 2872578
>>
>> 86 cells: 1914447
>>
>> 103 cells: 1216276
>>
>> 124 cells: 652035
>>
>> 149 cells: 279565
>>
>> 179 cells: 99836
>>
>> 215 cells: 29132
>>
>> 258 cells: 6682
>>
>> 310 cells: 1276
>>
>> 372 cells: 258
>>
>> 446 cells: 96
>>
>> 535 cells: 48
>>
>> 642 cells: 25
>>
>> 770 cells: 9
>>
>> 924 cells: 9
>>
>> 1109 cells: 4
>>
>> 1331 cells: 3
>>
>> 1597 cells: 1
>>
>> 1916 cells: 2
>>
>> From: Jack Krupansky <ja...@basetechnology.com>
>> Reply-To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
>> Date: Thursday, July 24, 2014 at 4:07 PM
>>
>> To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
>> Subject: Re: Hot, large row
>>
>> Could it be some “fat columns” (cells with large blob or text values)
>> rather than the cell-count per se? IOW, a “big row” rather than a “wide
>> row”?
>>
>> And, could it be a large partition rather than a large row (many rows in
>> a single partition)? Are clustering columns being used in the primary key?
>>
>> -- Jack Krupansky
>>
>> *From:* DuyHai Doan <do...@gmail.com>
>> *Sent:* Thursday, July 24, 2014 3:53 PM
>> *To:* user@cassandra.apache.org
>> *Subject:* Re: Hot, large row
>>
>> Your extract of cfhistograms show that there are no particular "wide
>> rows". The widest has 61214 cells which is big but not that huge to be
>> really a concern.
>>
>> Turning on trace probabilty only tells give you some "hints" about what
>> kind of queries are done, it does not give the exact partition key nor
>> other statement values, especially when you are using prepared statements
>> ...
>>
>>
>> "I am considering reducing read_request_timeout_in_ms: 5000 in
>> cassandra.yaml so that it reduces the impact when this occurs." --> Don't
>> do that, you'll only sweep dust under the carpet. Find the real issue and
>> fix it instead of changing parameter to hide it.
>>
>> One solution would be on client side, to activate some logging to show
>> the CQL3 statements the application is issuing that may overload the
>> server. I know that's better said than done but I don't have any other
>> idea for the moment
>>
>> -------- Shameless self-promotion ------
>>
>> To support this kind of live prod debugging & investigation that I add a
>> new dynamic query logging feature in Achilles:
>> https://github.com/doanduyhai/Achilles/wiki/Statements-Logging-and-Tracing#dynamic-statements-logging
>>
>> Once you hit the issue, this kind of feature may save your day...
>>
>>
>>
>>
>>
>>
>> On Thu, Jul 24, 2014 at 9:22 PM, Keith Wright <kw...@nanigans.com>
>> wrote:
>>
>>> I can see from cfhistograms that I do have some wide rows (see below).
>>> I set trace probability as you suggested but the output doesn’t appear to
>>> tell me what row was actually read unless I missed something. I just see
>>> executing prepared statement. Any ideas how I can find the row in
>>> question?
>>>
>>> I am considering reducing read_request_timeout_in_ms: 5000 in
>>> cassandra.yaml so that it reduces the impact when this occurs.
>>>
>>> Any help in identifying my issue would be GREATLY appreciated
>>>
>>>
>>> Cell Count per Partition
>>>
>>> 1 cells: 50449950
>>>
>>> 2 cells: 14281828
>>>
>>> 3 cells: 8093366
>>>
>>> 4 cells: 5029200
>>>
>>> 5 cells: 3103023
>>>
>>> 6 cells: 3059903
>>>
>>> 7 cells: 1903018
>>>
>>> 8 cells: 1509297
>>>
>>> 10 cells: 2420359
>>>
>>> 12 cells: 1624895
>>>
>>> 14 cells: 1171678
>>>
>>> 17 cells: 1289391
>>>
>>> 20 cells: 909777
>>>
>>> 24 cells: 852081
>>>
>>> 29 cells: 722925
>>>
>>> 35 cells: 587067
>>>
>>> 42 cells: 459473
>>>
>>> 50 cells: 358744
>>>
>>> 60 cells: 304146
>>>
>>> 72 cells: 244682
>>>
>>> 86 cells: 191045
>>>
>>> 103 cells: 155337
>>>
>>> 124 cells: 127061
>>>
>>> 149 cells: 98913
>>>
>>> 179 cells: 77454
>>>
>>> 215 cells: 59849
>>>
>>> 258 cells: 46117
>>>
>>> 310 cells: 35321
>>>
>>> 372 cells: 26319
>>>
>>> 446 cells: 19379
>>>
>>> 535 cells: 13783
>>>
>>> 642 cells: 9993
>>>
>>> 770 cells: 6973
>>>
>>> 924 cells: 4713
>>>
>>> 1109 cells: 3229
>>>
>>> 1331 cells: 2062
>>>
>>> 1597 cells: 1338
>>>
>>> 1916 cells: 773
>>>
>>> 2299 cells: 495
>>>
>>> 2759 cells: 268
>>>
>>> 3311 cells: 150
>>>
>>> 3973 cells: 100
>>>
>>> 4768 cells: 42
>>>
>>> 5722 cells: 24
>>>
>>> 6866 cells: 12
>>>
>>> 8239 cells: 9
>>>
>>> 9887 cells: 3
>>>
>>> 11864 cells: 0
>>>
>>> 14237 cells: 5
>>>
>>> 17084 cells: 1
>>>
>>> 20501 cells: 0
>>>
>>> 24601 cells: 2
>>>
>>> 29521 cells: 0
>>>
>>> 35425 cells: 0
>>>
>>> 42510 cells: 0
>>>
>>> 51012 cells: 0
>>>
>>> 61214 cells: 2
>>>
>>> From: DuyHai Doan <do...@gmail.com>
>>> Reply-To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
>>> Date: Thursday, July 24, 2014 at 3:01 PM
>>> To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
>>> Subject: Re: Hot, large row
>>>
>>> "How can I detect wide rows?" -->
>>>
>>> nodetool cfhistograms <keyspace> <suspected column family>
>>>
>>> Look at column "Column count" (last column) and identify a line in this
>>> column with very high value of "Offset". In a well designed application you
>>> should have a gaussian distribution where 80% of your row have a similar
>>> number of columns.
>>>
>>> "Anyone know what debug level I can set so that I can see what reads the
>>> hot node is handling? " -->
>>>
>>> "nodetool settraceprobability <value>", where value is a small number
>>> (0.001) on the node where you encounter the issue. Activate the tracing for
>>> a while (5 mins) and deactivate it (value = 0). Then look into
>>> system_traces tables "events" & "sessions". It may help or not since the
>>> tracing is done once every 1000.
>>>
>>> "Any way to get the server to blacklist these wide rows automatically?"
>>> --> No
>>>
>>>
>>> On Thu, Jul 24, 2014 at 8:48 PM, Keith Wright <kw...@nanigans.com>
>>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> We are seeing an issue where basically daily one of our nodes spikes
>>>> in load and is churning in CMS heap pressure. It appears that reads are
>>>> backing up and my guess is that our application is reading a large row
>>>> repeatedly. Our write structure can lead itself to wide rows very
>>>> infrequently (<0.001%) and we do our best to detect and delete them but
>>>> obviously we’re missing a case. Hoping for assistance on the following
>>>> questions:
>>>>
>>>> - How can I detect wide rows?
>>>> - Anyone know what debug level I can set so that I can see what
>>>> reads the hot node is handling? I’m hoping to see the “bad” row
>>>> - Any way to get the server to blacklist these wide rows
>>>> automatically?
>>>>
>>>> We’re using C* 2.0.6 with Vnodes.
>>>>
>>>> Thanks
>>>>
>>>
>>>
>>
>>
>
>
Re: Hot, large row
Posted by Keith Wright <kw...@nanigans.com>.
One last item to add to this thread: we have consistently experienced this behavior where over time performance degrades (previously we were unable to bootstrap nodes to due long GC pauses from existing nodes). I believe its due to tombstone build up (as I mentioned previously one of the tables mentioned is showing a droppable tombstone ratio of > 30%). The sku table is used to hold SKUs that recent users viewed/purchased. When we write a new SKU we set the TTL to 30 days where the row key is the user id; our read case is to fetch ALL skus the user has seen within the TTL. Since the user sees SKUs consistently over time, this can result in a row with many columns much of which are likely tombstoned (see CASSANDRA-6654 which I filed for this which shows that C* does not handle this case well).
I guess I’m just surprised that others aren’t using C* for similar usage cases and thus having the same issue?
I am hoping to upgrade to 2.0.9 which has improvements to remove tombstones.
From: Keith Wright <kw...@nanigans.com>>
Reply-To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Date: Thursday, July 24, 2014 at 4:50 PM
To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Cc: Don Jackson <dj...@nanigans.com>>
Subject: Re: Hot, large row
When a node is showing the high CMS issue, io is actually low likely due to the fact that none is going on during CMS GC. On a node not showing the issue, iostat shows disk usage around 50% (these are SSD) and load hovers around 10 for a dual octo core machine this is fine.
In addition, nodetool compactionstats does not show that we are falling behind in compactions.
So not sure what is going on here. We are running CentOS 6.5 with java 1.7.0_51. It does seem like things are getting worse and I’m considering dropping and rebuilding all the tables (as I have the data in Hadoop). This seems to be a repeated problem for us with Cassandra and now that Aerospike has an open source version, we are very much considering switching.
Thanks again for the help and any insight you might have!
avg-cpu: %user %nice %system %iowait %steal %idle
23.43 12.40 11.40 6.20 0.00 46.57
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sda 0.00 0.00 720.50 0.00 5.51 0.00 15.66 1.05 1.54 0.57 41.35
sdc 0.00 0.00 5930.50 2100.50 55.42 8.21 16.23 2.30 0.29 0.06 51.25
sdb 0.00 0.00 6951.50 2052.50 65.82 8.02 16.80 4.31 0.48 0.07 59.60
avg-cpu: %user %nice %system %iowait %steal %idle
9.48 14.72 5.60 3.67 0.00 66.52
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sda 0.00 0.00 327.00 19.50 2.55 0.08 15.53 0.04 0.13 0.12 4.25
sdc 0.00 0.00 3974.50 1403.50 36.37 5.48 15.94 0.99 0.18 0.08 45.10
sdb 0.00 0.00 4357.50 1535.00 40.83 6.00 16.28 1.10 0.19 0.08 47.45
From: DuyHai Doan <do...@gmail.com>>
Reply-To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Date: Thursday, July 24, 2014 at 4:44 PM
To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Subject: Re: Hot, large row
For global_user_event_skus_v2
1. number of SSTables per read is quite huge. Considering you're using LCS, it means that LCS cannot keep up with write rate and is left behind. AFAIK LCS is using SizeTieredCompaction at L0 to cope with extreme write burst. Your high number of SSTables per read is quite fishy here.
2. Write latency is widespead up to 73.457 millisecs, meaning that your node is getting behind for write in some cases. Most of write are still below 1 millisec but we don't care. What we care here is the large trail of write latency climbing up to 73 millisecs
3. Same remarks for read latency which is worst because the distribution is even "flatter", worst cases going up to 100 ms.
If I were you, I'll check for disk I/O first and maybe CPU usage
On Thu, Jul 24, 2014 at 10:32 PM, Keith Wright <kw...@nanigans.com>> wrote:
Cfhistograms for the tables I believe are most likely the issue are below on the node that most recently presented the issue. Any ideas? Note that these tables are LCS and have droppable tombstone ratios of 27% for global_user_event_skus_v2 and 2.7% for the other. Table definitions also below, you can see that they do use compound keys.
Thanks
CREATE TABLE global_user_event_skus_v2 (
user_id timeuuid,
app_id int,
type text,
name text,
sku text,
creationtime timestamp,
PRIMARY KEY (user_id, app_id, type, name, sku)
) WITH
bloom_filter_fp_chance=0.100000 AND
caching='KEYS_ONLY' AND
comment='' AND
dclocal_read_repair_chance=0.000000 AND
gc_grace_seconds=86400 AND
index_interval=128 AND
read_repair_chance=0.100000 AND
replicate_on_write='true' AND
populate_io_cache_on_flush='false' AND
default_time_to_live=2592000 AND
speculative_retry='99.0PERCENTILE' AND
memtable_flush_period_in_ms=0 AND
compaction={'class': 'LeveledCompactionStrategy'} AND
compression={'sstable_compression': 'LZ4Compressor'};
users/global_user_event_skus_v2 histograms
SSTables per Read
1 sstables: 1398122
2 sstables: 231100
3 sstables: 97334
4 sstables: 56496
5 sstables: 43046
6 sstables: 35471
7 sstables: 26888
8 sstables: 21102
10 sstables: 29522
12 sstables: 18314
14 sstables: 11368
17 sstables: 10283
20 sstables: 5766
24 sstables: 3554
29 sstables: 1681
35 sstables: 521
42 sstables: 102
50 sstables: 1
Write Latency (microseconds)
2 us: 532
3 us: 10728
4 us: 49608
5 us: 173718
6 us: 404131
7 us: 697468
8 us: 981048
10 us: 2643680
12 us: 3183727
14 us: 3150881
17 us: 3746333
20 us: 2298481
24 us: 1647604
29 us: 1070733
35 us: 693300
42 us: 442120
50 us: 288618
60 us: 205061
72 us: 141194
86 us: 98781
103 us: 73629
124 us: 54900
149 us: 40518
179 us: 30330
215 us: 23953
258 us: 18875
310 us: 14147
372 us: 10509
446 us: 7984
535 us: 6257
642 us: 4903
770 us: 4086
924 us: 3252
1109 us: 2676
1331 us: 2129
1597 us: 1753
1916 us: 1488
2299 us: 1237
2759 us: 1078
3311 us: 866
3973 us: 704
4768 us: 551
5722 us: 553
6866 us: 774
8239 us: 443
9887 us: 303
11864 us: 222
14237 us: 181
17084 us: 146
20501 us: 147
24601 us: 499
29521 us: 842
35425 us: 548
42510 us: 303
51012 us: 560
61214 us: 842
73457 us: 600
88148 us: 283
105778 us: 173
126934 us: 81
152321 us: 18
182785 us: 2
Read Latency (microseconds)
35 us: 2
42 us: 72
50 us: 737
60 us: 3555
72 us: 6926
86 us: 8373
103 us: 15287
124 us: 30054
149 us: 46164
179 us: 67828
215 us: 86741
258 us: 92178
310 us: 82205
372 us: 69641
446 us: 64710
535 us: 58295
642 us: 55247
770 us: 57918
924 us: 56928
1109 us: 75762
1331 us: 122306
1597 us: 141564
1916 us: 117301
2299 us: 102293
2759 us: 98789
3311 us: 86923
3973 us: 75304
4768 us: 64012
5722 us: 54973
6866 us: 47221
8239 us: 41483
9887 us: 35015
11864 us: 30249
14237 us: 23505
17084 us: 18364
20501 us: 12992
24601 us: 8837
29521 us: 6497
35425 us: 4840
42510 us: 3536
51012 us: 2471
61214 us: 2784
73457 us: 3664
88148 us: 3219
105778 us: 1980
126934 us: 1050
152321 us: 519
182785 us: 184
219342 us: 82
263210 us: 39
315852 us: 22
379022 us: 16
454826 us: 0
545791 us: 0
654949 us: 1
785939 us: 1
943127 us: 0
1131752 us: 0
1358102 us: 0
1629722 us: 3
1955666 us: 3
2346799 us: 1
2816159 us: 0
3379391 us: 0
4055269 us: 0
4866323 us: 0
5839588 us: 0
7007506 us: 4
Partition Size (bytes)
35 bytes: 31
42 bytes: 0
50 bytes: 0
60 bytes: 0
72 bytes: 0
86 bytes: 0
103 bytes: 761108
124 bytes: 23596164
149 bytes: 22963275
179 bytes: 3517971
215 bytes: 6395335
258 bytes: 6197982
310 bytes: 4662589
372 bytes: 5261934
446 bytes: 3032032
535 bytes: 3775310
642 bytes: 2879935
770 bytes: 3002743
924 bytes: 2279900
1109 bytes: 1976357
1331 bytes: 1643875
1597 bytes: 1422199
1916 bytes: 1142746
2299 bytes: 901584
2759 bytes: 760173
3311 bytes: 596874
3973 bytes: 480502
4768 bytes: 380185
5722 bytes: 303315
6866 bytes: 239013
8239 bytes: 187337
9887 bytes: 147218
11864 bytes: 115684
14237 bytes: 89718
17084 bytes: 69160
20501 bytes: 53070
24601 bytes: 40496
29521 bytes: 30344
35425 bytes: 22656
42510 bytes: 16419
51012 bytes: 11621
61214 bytes: 8020
73457 bytes: 5561
88148 bytes: 3708
105778 bytes: 2434
126934 bytes: 1561
152321 bytes: 1017
182785 bytes: 559
219342 bytes: 349
263210 bytes: 181
315852 bytes: 121
379022 bytes: 66
454826 bytes: 22
545791 bytes: 14
654949 bytes: 10
785939 bytes: 2
943127 bytes: 0
1131752 bytes: 4
1358102 bytes: 2
1629722 bytes: 1
1955666 bytes: 2
2346799 bytes: 0
2816159 bytes: 0
3379391 bytes: 0
4055269 bytes: 0
4866323 bytes: 1
Cell Count per Partition
1 cells: 50599458
2 cells: 13975363
3 cells: 7927898
4 cells: 4981142
5 cells: 3081166
6 cells: 3034895
7 cells: 1896720
8 cells: 1508849
10 cells: 2418845
12 cells: 1631289
14 cells: 1179269
17 cells: 1301647
20 cells: 919744
24 cells: 864313
29 cells: 735203
35 cells: 597749
42 cells: 469611
50 cells: 367422
60 cells: 312697
72 cells: 251782
86 cells: 197717
103 cells: 161244
124 cells: 132074
149 cells: 103711
179 cells: 81760
215 cells: 63360
258 cells: 48982
310 cells: 37932
372 cells: 28328
446 cells: 21347
535 cells: 15373
642 cells: 10959
770 cells: 7643
924 cells: 5346
1109 cells: 3613
1331 cells: 2291
1597 cells: 1558
1916 cells: 885
2299 cells: 577
2759 cells: 327
3311 cells: 169
3973 cells: 116
4768 cells: 57
5722 cells: 23
6866 cells: 15
8239 cells: 9
9887 cells: 3
11864 cells: 0
14237 cells: 5
17084 cells: 1
20501 cells: 0
24601 cells: 2
29521 cells: 0
35425 cells: 0
42510 cells: 0
51012 cells: 0
61214 cells: 1
CREATE TABLE global_user_events (
user_id timeuuid,
app_id int,
type text,
name text,
extra_param map<text, text>,
paid boolean,
value float,
PRIMARY KEY (user_id, app_id, type, name)
) WITH
bloom_filter_fp_chance=0.100000 AND
caching='KEYS_ONLY' AND
comment='' AND
dclocal_read_repair_chance=0.000000 AND
gc_grace_seconds=86400 AND
index_interval=128 AND
read_repair_chance=0.100000 AND
replicate_on_write='true' AND
populate_io_cache_on_flush='false' AND
default_time_to_live=0 AND
speculative_retry='99.0PERCENTILE' AND
memtable_flush_period_in_ms=0 AND
compaction={'class': 'LeveledCompactionStrategy'} AND
compression={'sstable_compression': 'LZ4Compressor'};
users/global_user_events histograms
SSTables per Read
1 sstables: 1024446
2 sstables: 632948
3 sstables: 230768
4 sstables: 112428
5 sstables: 82651
6 sstables: 61439
7 sstables: 47518
8 sstables: 38490
10 sstables: 58054
12 sstables: 41128
14 sstables: 29642
17 sstables: 31056
20 sstables: 20474
24 sstables: 17799
29 sstables: 13744
35 sstables: 9485
42 sstables: 5933
50 sstables: 3607
60 sstables: 1618
72 sstables: 263
86 sstables: 10
Write Latency (microseconds)
10 us: 10
12 us: 75
14 us: 319
17 us: 3550
20 us: 20617
24 us: 105254
29 us: 311290
35 us: 494404
42 us: 516301
50 us: 534061
60 us: 421438
72 us: 314377
86 us: 375240
103 us: 333402
124 us: 816891
149 us: 667832
179 us: 430659
215 us: 336063
258 us: 380606
310 us: 235592
372 us: 68679
446 us: 14988
535 us: 6169
642 us: 3763
770 us: 2591
924 us: 2270
1109 us: 1985
1331 us: 1744
1597 us: 1564
1916 us: 1489
2299 us: 1326
2759 us: 1114
3311 us: 1024
3973 us: 984
4768 us: 833
5722 us: 962
6866 us: 1076
8239 us: 891
9887 us: 611
11864 us: 489
14237 us: 324
17084 us: 306
20501 us: 222
24601 us: 426
29521 us: 791
35425 us: 820
42510 us: 534
51012 us: 530
61214 us: 946
73457 us: 1122
88148 us: 714
105778 us: 313
126934 us: 127
152321 us: 38
182785 us: 3
219342 us: 2
263210 us: 0
315852 us: 0
379022 us: 0
454826 us: 2
545791 us: 0
654949 us: 3
785939 us: 0
943127 us: 0
1131752 us: 0
1358102 us: 0
1629722 us: 0
1955666 us: 0
2346799 us: 1
2816159 us: 0
3379391 us: 0
4055269 us: 0
4866323 us: 0
5839588 us: 0
7007506 us: 1
Read Latency (microseconds)
42 us: 1
50 us: 2
60 us: 2
72 us: 4
86 us: 15
103 us: 23
124 us: 29
149 us: 48
179 us: 67
215 us: 80
258 us: 316
310 us: 1812
372 us: 4999
446 us: 8876
535 us: 14586
642 us: 24417
770 us: 31943
924 us: 37453
1109 us: 77449
1331 us: 139135
1597 us: 177538
1916 us: 169568
2299 us: 177617
2759 us: 202552
3311 us: 198101
3973 us: 179807
4768 us: 159149
5722 us: 142282
6866 us: 126124
8239 us: 111529
9887 us: 97658
11864 us: 84913
14237 us: 70905
17084 us: 54917
20501 us: 40623
24601 us: 29545
29521 us: 21448
35425 us: 15255
42510 us: 11222
51012 us: 8150
61214 us: 7473
73457 us: 10060
88148 us: 10530
105778 us: 7573
126934 us: 4331
152321 us: 2097
182785 us: 836
219342 us: 255
263210 us: 84
315852 us: 28
379022 us: 8
454826 us: 2
545791 us: 0
654949 us: 2
785939 us: 1
943127 us: 0
1131752 us: 0
1358102 us: 0
1629722 us: 2
1955666 us: 6
2346799 us: 18
2816159 us: 0
3379391 us: 0
4055269 us: 0
4866323 us: 0
5839588 us: 0
7007506 us: 34
Partition Size (bytes)
35 bytes: 25
42 bytes: 0
50 bytes: 0
60 bytes: 0
72 bytes: 21761
86 bytes: 694466658
103 bytes: 219007195
124 bytes: 7034596
149 bytes: 110269079
179 bytes: 101276811
215 bytes: 38518027
258 bytes: 103242356
310 bytes: 51434914
372 bytes: 38048245
446 bytes: 42558831
535 bytes: 32869176
642 bytes: 17188026
770 bytes: 14750559
924 bytes: 15963256
1109 bytes: 16125324
1331 bytes: 11462272
1597 bytes: 6810452
1916 bytes: 5685739
2299 bytes: 10232324
2759 bytes: 5138522
3311 bytes: 5847998
3973 bytes: 4065752
4768 bytes: 3468263
5722 bytes: 2545431
6866 bytes: 1701913
8239 bytes: 993808
9887 bytes: 495282
11864 bytes: 208644
14237 bytes: 74307
17084 bytes: 22286
20501 bytes: 5474
24601 bytes: 1225
29521 bytes: 210
35425 bytes: 97
42510 bytes: 48
51012 bytes: 21
61214 bytes: 15
73457 bytes: 10
88148 bytes: 3
105778 bytes: 4
126934 bytes: 1
152321 bytes: 2
182785 bytes: 1
Cell Count per Partition
1 cells: 914323128
2 cells: 196020828
3 cells: 165502164
4 cells: 71735648
5 cells: 54450466
6 cells: 28995756
7 cells: 9625976
8 cells: 10257788
10 cells: 24004757
12 cells: 13087147
14 cells: 13588347
17 cells: 11976353
20 cells: 6245374
24 cells: 5627821
29 cells: 9480533
35 cells: 6008276
42 cells: 5221357
50 cells: 4355454
60 cells: 3955488
72 cells: 2872578
86 cells: 1914447
103 cells: 1216276
124 cells: 652035
149 cells: 279565
179 cells: 99836
215 cells: 29132
258 cells: 6682
310 cells: 1276
372 cells: 258
446 cells: 96
535 cells: 48
642 cells: 25
770 cells: 9
924 cells: 9
1109 cells: 4
1331 cells: 3
1597 cells: 1
1916 cells: 2
From: Jack Krupansky <ja...@basetechnology.com>>
Reply-To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Date: Thursday, July 24, 2014 at 4:07 PM
To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Subject: Re: Hot, large row
Could it be some “fat columns” (cells with large blob or text values) rather than the cell-count per se? IOW, a “big row” rather than a “wide row”?
And, could it be a large partition rather than a large row (many rows in a single partition)? Are clustering columns being used in the primary key?
-- Jack Krupansky
From: DuyHai Doan<ma...@gmail.com>
Sent: Thursday, July 24, 2014 3:53 PM
To: user@cassandra.apache.org<ma...@cassandra.apache.org>
Subject: Re: Hot, large row
Your extract of cfhistograms show that there are no particular "wide rows". The widest has 61214 cells which is big but not that huge to be really a concern.
Turning on trace probabilty only tells give you some "hints" about what kind of queries are done, it does not give the exact partition key nor other statement values, especially when you are using prepared statements ...
"I am considering reducing read_request_timeout_in_ms: 5000 in cassandra.yaml so that it reduces the impact when this occurs." --> Don't do that, you'll only sweep dust under the carpet. Find the real issue and fix it instead of changing parameter to hide it.
One solution would be on client side, to activate some logging to show the CQL3 statements the application is issuing that may overload the server. I know that's better said than done but I don't have any other idea for the moment
-------- Shameless self-promotion ------
To support this kind of live prod debugging & investigation that I add a new dynamic query logging feature in Achilles: https://github.com/doanduyhai/Achilles/wiki/Statements-Logging-and-Tracing#dynamic-statements-logging
Once you hit the issue, this kind of feature may save your day...
On Thu, Jul 24, 2014 at 9:22 PM, Keith Wright <kw...@nanigans.com>> wrote:
I can see from cfhistograms that I do have some wide rows (see below). I set trace probability as you suggested but the output doesn’t appear to tell me what row was actually read unless I missed something. I just see executing prepared statement. Any ideas how I can find the row in question?
I am considering reducing read_request_timeout_in_ms: 5000 in cassandra.yaml so that it reduces the impact when this occurs.
Any help in identifying my issue would be GREATLY appreciated
Cell Count per Partition
1 cells: 50449950
2 cells: 14281828
3 cells: 8093366
4 cells: 5029200
5 cells: 3103023<tel:3103023>
6 cells: 3059903
7 cells: 1903018
8 cells: 1509297
10 cells: 2420359
12 cells: 1624895
14 cells: 1171678
17 cells: 1289391
20 cells: 909777
24 cells: 852081
29 cells: 722925
35 cells: 587067
42 cells: 459473
50 cells: 358744
60 cells: 304146
72 cells: 244682
86 cells: 191045
103 cells: 155337
124 cells: 127061
149 cells: 98913
179 cells: 77454
215 cells: 59849
258 cells: 46117
310 cells: 35321
372 cells: 26319
446 cells: 19379
535 cells: 13783
642 cells: 9993
770 cells: 6973
924 cells: 4713
1109 cells: 3229
1331 cells: 2062
1597 cells: 1338
1916 cells: 773
2299 cells: 495
2759 cells: 268
3311 cells: 150
3973 cells: 100
4768 cells: 42
5722 cells: 24
6866 cells: 12
8239 cells: 9
9887 cells: 3
11864 cells: 0
14237 cells: 5
17084 cells: 1
20501 cells: 0
24601 cells: 2
29521 cells: 0
35425 cells: 0
42510 cells: 0
51012 cells: 0
61214 cells: 2
From: DuyHai Doan <do...@gmail.com>>
Reply-To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Date: Thursday, July 24, 2014 at 3:01 PM
To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Subject: Re: Hot, large row
"How can I detect wide rows?" -->
nodetool cfhistograms <keyspace> <suspected column family>
Look at column "Column count" (last column) and identify a line in this column with very high value of "Offset". In a well designed application you should have a gaussian distribution where 80% of your row have a similar number of columns.
"Anyone know what debug level I can set so that I can see what reads the hot node is handling? " -->
"nodetool settraceprobability <value>", where value is a small number (0.001) on the node where you encounter the issue. Activate the tracing for a while (5 mins) and deactivate it (value = 0). Then look into system_traces tables "events" & "sessions". It may help or not since the tracing is done once every 1000.
"Any way to get the server to blacklist these wide rows automatically?" --> No
On Thu, Jul 24, 2014 at 8:48 PM, Keith Wright <kw...@nanigans.com>> wrote:
Hi all,
We are seeing an issue where basically daily one of our nodes spikes in load and is churning in CMS heap pressure. It appears that reads are backing up and my guess is that our application is reading a large row repeatedly. Our write structure can lead itself to wide rows very infrequently (<0.001%) and we do our best to detect and delete them but obviously we’re missing a case. Hoping for assistance on the following questions:
* How can I detect wide rows?
* Anyone know what debug level I can set so that I can see what reads the hot node is handling? I’m hoping to see the “bad” row
* Any way to get the server to blacklist these wide rows automatically?
We’re using C* 2.0.6 with Vnodes.
Thanks
Re: Hot, large row
Posted by Keith Wright <kw...@nanigans.com>.
When a node is showing the high CMS issue, io is actually low likely due to the fact that none is going on during CMS GC. On a node not showing the issue, iostat shows disk usage around 50% (these are SSD) and load hovers around 10 for a dual octo core machine this is fine.
In addition, nodetool compactionstats does not show that we are falling behind in compactions.
So not sure what is going on here. We are running CentOS 6.5 with java 1.7.0_51. It does seem like things are getting worse and I’m considering dropping and rebuilding all the tables (as I have the data in Hadoop). This seems to be a repeated problem for us with Cassandra and now that Aerospike has an open source version, we are very much considering switching.
Thanks again for the help and any insight you might have!
avg-cpu: %user %nice %system %iowait %steal %idle
23.43 12.40 11.40 6.20 0.00 46.57
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sda 0.00 0.00 720.50 0.00 5.51 0.00 15.66 1.05 1.54 0.57 41.35
sdc 0.00 0.00 5930.50 2100.50 55.42 8.21 16.23 2.30 0.29 0.06 51.25
sdb 0.00 0.00 6951.50 2052.50 65.82 8.02 16.80 4.31 0.48 0.07 59.60
avg-cpu: %user %nice %system %iowait %steal %idle
9.48 14.72 5.60 3.67 0.00 66.52
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sda 0.00 0.00 327.00 19.50 2.55 0.08 15.53 0.04 0.13 0.12 4.25
sdc 0.00 0.00 3974.50 1403.50 36.37 5.48 15.94 0.99 0.18 0.08 45.10
sdb 0.00 0.00 4357.50 1535.00 40.83 6.00 16.28 1.10 0.19 0.08 47.45
From: DuyHai Doan <do...@gmail.com>>
Reply-To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Date: Thursday, July 24, 2014 at 4:44 PM
To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Subject: Re: Hot, large row
For global_user_event_skus_v2
1. number of SSTables per read is quite huge. Considering you're using LCS, it means that LCS cannot keep up with write rate and is left behind. AFAIK LCS is using SizeTieredCompaction at L0 to cope with extreme write burst. Your high number of SSTables per read is quite fishy here.
2. Write latency is widespead up to 73.457 millisecs, meaning that your node is getting behind for write in some cases. Most of write are still below 1 millisec but we don't care. What we care here is the large trail of write latency climbing up to 73 millisecs
3. Same remarks for read latency which is worst because the distribution is even "flatter", worst cases going up to 100 ms.
If I were you, I'll check for disk I/O first and maybe CPU usage
On Thu, Jul 24, 2014 at 10:32 PM, Keith Wright <kw...@nanigans.com>> wrote:
Cfhistograms for the tables I believe are most likely the issue are below on the node that most recently presented the issue. Any ideas? Note that these tables are LCS and have droppable tombstone ratios of 27% for global_user_event_skus_v2 and 2.7% for the other. Table definitions also below, you can see that they do use compound keys.
Thanks
CREATE TABLE global_user_event_skus_v2 (
user_id timeuuid,
app_id int,
type text,
name text,
sku text,
creationtime timestamp,
PRIMARY KEY (user_id, app_id, type, name, sku)
) WITH
bloom_filter_fp_chance=0.100000 AND
caching='KEYS_ONLY' AND
comment='' AND
dclocal_read_repair_chance=0.000000 AND
gc_grace_seconds=86400 AND
index_interval=128 AND
read_repair_chance=0.100000 AND
replicate_on_write='true' AND
populate_io_cache_on_flush='false' AND
default_time_to_live=2592000 AND
speculative_retry='99.0PERCENTILE' AND
memtable_flush_period_in_ms=0 AND
compaction={'class': 'LeveledCompactionStrategy'} AND
compression={'sstable_compression': 'LZ4Compressor'};
users/global_user_event_skus_v2 histograms
SSTables per Read
1 sstables: 1398122
2 sstables: 231100
3 sstables: 97334
4 sstables: 56496
5 sstables: 43046
6 sstables: 35471
7 sstables: 26888
8 sstables: 21102
10 sstables: 29522
12 sstables: 18314
14 sstables: 11368
17 sstables: 10283
20 sstables: 5766
24 sstables: 3554
29 sstables: 1681
35 sstables: 521
42 sstables: 102
50 sstables: 1
Write Latency (microseconds)
2 us: 532
3 us: 10728
4 us: 49608
5 us: 173718
6 us: 404131
7 us: 697468
8 us: 981048
10 us: 2643680
12 us: 3183727
14 us: 3150881
17 us: 3746333
20 us: 2298481
24 us: 1647604
29 us: 1070733
35 us: 693300
42 us: 442120
50 us: 288618
60 us: 205061
72 us: 141194
86 us: 98781
103 us: 73629
124 us: 54900
149 us: 40518
179 us: 30330
215 us: 23953
258 us: 18875
310 us: 14147
372 us: 10509
446 us: 7984
535 us: 6257
642 us: 4903
770 us: 4086
924 us: 3252
1109 us: 2676
1331 us: 2129
1597 us: 1753
1916 us: 1488
2299 us: 1237
2759 us: 1078
3311 us: 866
3973 us: 704
4768 us: 551
5722 us: 553
6866 us: 774
8239 us: 443
9887 us: 303
11864 us: 222
14237 us: 181
17084 us: 146
20501 us: 147
24601 us: 499
29521 us: 842
35425 us: 548
42510 us: 303
51012 us: 560
61214 us: 842
73457 us: 600
88148 us: 283
105778 us: 173
126934 us: 81
152321 us: 18
182785 us: 2
Read Latency (microseconds)
35 us: 2
42 us: 72
50 us: 737
60 us: 3555
72 us: 6926
86 us: 8373
103 us: 15287
124 us: 30054
149 us: 46164
179 us: 67828
215 us: 86741
258 us: 92178
310 us: 82205
372 us: 69641
446 us: 64710
535 us: 58295
642 us: 55247
770 us: 57918
924 us: 56928
1109 us: 75762
1331 us: 122306
1597 us: 141564
1916 us: 117301
2299 us: 102293
2759 us: 98789
3311 us: 86923
3973 us: 75304
4768 us: 64012
5722 us: 54973
6866 us: 47221
8239 us: 41483
9887 us: 35015
11864 us: 30249
14237 us: 23505
17084 us: 18364
20501 us: 12992
24601 us: 8837
29521 us: 6497
35425 us: 4840
42510 us: 3536
51012 us: 2471
61214 us: 2784
73457 us: 3664
88148 us: 3219
105778 us: 1980
126934 us: 1050
152321 us: 519
182785 us: 184
219342 us: 82
263210 us: 39
315852 us: 22
379022 us: 16
454826 us: 0
545791 us: 0
654949 us: 1
785939 us: 1
943127 us: 0
1131752 us: 0
1358102 us: 0
1629722 us: 3
1955666 us: 3
2346799 us: 1
2816159 us: 0
3379391 us: 0
4055269 us: 0
4866323 us: 0
5839588 us: 0
7007506 us: 4
Partition Size (bytes)
35 bytes: 31
42 bytes: 0
50 bytes: 0
60 bytes: 0
72 bytes: 0
86 bytes: 0
103 bytes: 761108
124 bytes: 23596164
149 bytes: 22963275
179 bytes: 3517971
215 bytes: 6395335
258 bytes: 6197982
310 bytes: 4662589
372 bytes: 5261934
446 bytes: 3032032
535 bytes: 3775310
642 bytes: 2879935
770 bytes: 3002743
924 bytes: 2279900
1109 bytes: 1976357
1331 bytes: 1643875
1597 bytes: 1422199
1916 bytes: 1142746
2299 bytes: 901584
2759 bytes: 760173
3311 bytes: 596874
3973 bytes: 480502
4768 bytes: 380185
5722 bytes: 303315
6866 bytes: 239013
8239 bytes: 187337
9887 bytes: 147218
11864 bytes: 115684
14237 bytes: 89718
17084 bytes: 69160
20501 bytes: 53070
24601 bytes: 40496
29521 bytes: 30344
35425 bytes: 22656
42510 bytes: 16419
51012 bytes: 11621
61214 bytes: 8020
73457 bytes: 5561
88148 bytes: 3708
105778 bytes: 2434
126934 bytes: 1561
152321 bytes: 1017
182785 bytes: 559
219342 bytes: 349
263210 bytes: 181
315852 bytes: 121
379022 bytes: 66
454826 bytes: 22
545791 bytes: 14
654949 bytes: 10
785939 bytes: 2
943127 bytes: 0
1131752 bytes: 4
1358102 bytes: 2
1629722 bytes: 1
1955666 bytes: 2
2346799 bytes: 0
2816159 bytes: 0
3379391 bytes: 0
4055269 bytes: 0
4866323 bytes: 1
Cell Count per Partition
1 cells: 50599458
2 cells: 13975363
3 cells: 7927898
4 cells: 4981142
5 cells: 3081166
6 cells: 3034895
7 cells: 1896720
8 cells: 1508849
10 cells: 2418845
12 cells: 1631289
14 cells: 1179269
17 cells: 1301647
20 cells: 919744
24 cells: 864313
29 cells: 735203
35 cells: 597749
42 cells: 469611
50 cells: 367422
60 cells: 312697
72 cells: 251782
86 cells: 197717
103 cells: 161244
124 cells: 132074
149 cells: 103711
179 cells: 81760
215 cells: 63360
258 cells: 48982
310 cells: 37932
372 cells: 28328
446 cells: 21347
535 cells: 15373
642 cells: 10959
770 cells: 7643
924 cells: 5346
1109 cells: 3613
1331 cells: 2291
1597 cells: 1558
1916 cells: 885
2299 cells: 577
2759 cells: 327
3311 cells: 169
3973 cells: 116
4768 cells: 57
5722 cells: 23
6866 cells: 15
8239 cells: 9
9887 cells: 3
11864 cells: 0
14237 cells: 5
17084 cells: 1
20501 cells: 0
24601 cells: 2
29521 cells: 0
35425 cells: 0
42510 cells: 0
51012 cells: 0
61214 cells: 1
CREATE TABLE global_user_events (
user_id timeuuid,
app_id int,
type text,
name text,
extra_param map<text, text>,
paid boolean,
value float,
PRIMARY KEY (user_id, app_id, type, name)
) WITH
bloom_filter_fp_chance=0.100000 AND
caching='KEYS_ONLY' AND
comment='' AND
dclocal_read_repair_chance=0.000000 AND
gc_grace_seconds=86400 AND
index_interval=128 AND
read_repair_chance=0.100000 AND
replicate_on_write='true' AND
populate_io_cache_on_flush='false' AND
default_time_to_live=0 AND
speculative_retry='99.0PERCENTILE' AND
memtable_flush_period_in_ms=0 AND
compaction={'class': 'LeveledCompactionStrategy'} AND
compression={'sstable_compression': 'LZ4Compressor'};
users/global_user_events histograms
SSTables per Read
1 sstables: 1024446
2 sstables: 632948
3 sstables: 230768
4 sstables: 112428
5 sstables: 82651
6 sstables: 61439
7 sstables: 47518
8 sstables: 38490
10 sstables: 58054
12 sstables: 41128
14 sstables: 29642
17 sstables: 31056
20 sstables: 20474
24 sstables: 17799
29 sstables: 13744
35 sstables: 9485
42 sstables: 5933
50 sstables: 3607
60 sstables: 1618
72 sstables: 263
86 sstables: 10
Write Latency (microseconds)
10 us: 10
12 us: 75
14 us: 319
17 us: 3550
20 us: 20617
24 us: 105254
29 us: 311290
35 us: 494404
42 us: 516301
50 us: 534061
60 us: 421438
72 us: 314377
86 us: 375240
103 us: 333402
124 us: 816891
149 us: 667832
179 us: 430659
215 us: 336063
258 us: 380606
310 us: 235592
372 us: 68679
446 us: 14988
535 us: 6169
642 us: 3763
770 us: 2591
924 us: 2270
1109 us: 1985
1331 us: 1744
1597 us: 1564
1916 us: 1489
2299 us: 1326
2759 us: 1114
3311 us: 1024
3973 us: 984
4768 us: 833
5722 us: 962
6866 us: 1076
8239 us: 891
9887 us: 611
11864 us: 489
14237 us: 324
17084 us: 306
20501 us: 222
24601 us: 426
29521 us: 791
35425 us: 820
42510 us: 534
51012 us: 530
61214 us: 946
73457 us: 1122
88148 us: 714
105778 us: 313
126934 us: 127
152321 us: 38
182785 us: 3
219342 us: 2
263210 us: 0
315852 us: 0
379022 us: 0
454826 us: 2
545791 us: 0
654949 us: 3
785939 us: 0
943127 us: 0
1131752 us: 0
1358102 us: 0
1629722 us: 0
1955666 us: 0
2346799 us: 1
2816159 us: 0
3379391 us: 0
4055269 us: 0
4866323 us: 0
5839588 us: 0
7007506 us: 1
Read Latency (microseconds)
42 us: 1
50 us: 2
60 us: 2
72 us: 4
86 us: 15
103 us: 23
124 us: 29
149 us: 48
179 us: 67
215 us: 80
258 us: 316
310 us: 1812
372 us: 4999
446 us: 8876
535 us: 14586
642 us: 24417
770 us: 31943
924 us: 37453
1109 us: 77449
1331 us: 139135
1597 us: 177538
1916 us: 169568
2299 us: 177617
2759 us: 202552
3311 us: 198101
3973 us: 179807
4768 us: 159149
5722 us: 142282
6866 us: 126124
8239 us: 111529
9887 us: 97658
11864 us: 84913
14237 us: 70905
17084 us: 54917
20501 us: 40623
24601 us: 29545
29521 us: 21448
35425 us: 15255
42510 us: 11222
51012 us: 8150
61214 us: 7473
73457 us: 10060
88148 us: 10530
105778 us: 7573
126934 us: 4331
152321 us: 2097
182785 us: 836
219342 us: 255
263210 us: 84
315852 us: 28
379022 us: 8
454826 us: 2
545791 us: 0
654949 us: 2
785939 us: 1
943127 us: 0
1131752 us: 0
1358102 us: 0
1629722 us: 2
1955666 us: 6
2346799 us: 18
2816159 us: 0
3379391 us: 0
4055269 us: 0
4866323 us: 0
5839588 us: 0
7007506 us: 34
Partition Size (bytes)
35 bytes: 25
42 bytes: 0
50 bytes: 0
60 bytes: 0
72 bytes: 21761
86 bytes: 694466658
103 bytes: 219007195
124 bytes: 7034596
149 bytes: 110269079
179 bytes: 101276811
215 bytes: 38518027
258 bytes: 103242356
310 bytes: 51434914
372 bytes: 38048245
446 bytes: 42558831
535 bytes: 32869176
642 bytes: 17188026
770 bytes: 14750559
924 bytes: 15963256
1109 bytes: 16125324
1331 bytes: 11462272
1597 bytes: 6810452
1916 bytes: 5685739
2299 bytes: 10232324
2759 bytes: 5138522
3311 bytes: 5847998
3973 bytes: 4065752
4768 bytes: 3468263
5722 bytes: 2545431
6866 bytes: 1701913
8239 bytes: 993808
9887 bytes: 495282
11864 bytes: 208644
14237 bytes: 74307
17084 bytes: 22286
20501 bytes: 5474
24601 bytes: 1225
29521 bytes: 210
35425 bytes: 97
42510 bytes: 48
51012 bytes: 21
61214 bytes: 15
73457 bytes: 10
88148 bytes: 3
105778 bytes: 4
126934 bytes: 1
152321 bytes: 2
182785 bytes: 1
Cell Count per Partition
1 cells: 914323128
2 cells: 196020828
3 cells: 165502164
4 cells: 71735648
5 cells: 54450466
6 cells: 28995756
7 cells: 9625976
8 cells: 10257788
10 cells: 24004757
12 cells: 13087147
14 cells: 13588347
17 cells: 11976353
20 cells: 6245374
24 cells: 5627821
29 cells: 9480533
35 cells: 6008276
42 cells: 5221357
50 cells: 4355454
60 cells: 3955488
72 cells: 2872578
86 cells: 1914447
103 cells: 1216276
124 cells: 652035
149 cells: 279565
179 cells: 99836
215 cells: 29132
258 cells: 6682
310 cells: 1276
372 cells: 258
446 cells: 96
535 cells: 48
642 cells: 25
770 cells: 9
924 cells: 9
1109 cells: 4
1331 cells: 3
1597 cells: 1
1916 cells: 2
From: Jack Krupansky <ja...@basetechnology.com>>
Reply-To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Date: Thursday, July 24, 2014 at 4:07 PM
To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Subject: Re: Hot, large row
Could it be some “fat columns” (cells with large blob or text values) rather than the cell-count per se? IOW, a “big row” rather than a “wide row”?
And, could it be a large partition rather than a large row (many rows in a single partition)? Are clustering columns being used in the primary key?
-- Jack Krupansky
From: DuyHai Doan<ma...@gmail.com>
Sent: Thursday, July 24, 2014 3:53 PM
To: user@cassandra.apache.org<ma...@cassandra.apache.org>
Subject: Re: Hot, large row
Your extract of cfhistograms show that there are no particular "wide rows". The widest has 61214 cells which is big but not that huge to be really a concern.
Turning on trace probabilty only tells give you some "hints" about what kind of queries are done, it does not give the exact partition key nor other statement values, especially when you are using prepared statements ...
"I am considering reducing read_request_timeout_in_ms: 5000 in cassandra.yaml so that it reduces the impact when this occurs." --> Don't do that, you'll only sweep dust under the carpet. Find the real issue and fix it instead of changing parameter to hide it.
One solution would be on client side, to activate some logging to show the CQL3 statements the application is issuing that may overload the server. I know that's better said than done but I don't have any other idea for the moment
-------- Shameless self-promotion ------
To support this kind of live prod debugging & investigation that I add a new dynamic query logging feature in Achilles: https://github.com/doanduyhai/Achilles/wiki/Statements-Logging-and-Tracing#dynamic-statements-logging
Once you hit the issue, this kind of feature may save your day...
On Thu, Jul 24, 2014 at 9:22 PM, Keith Wright <kw...@nanigans.com>> wrote:
I can see from cfhistograms that I do have some wide rows (see below). I set trace probability as you suggested but the output doesn’t appear to tell me what row was actually read unless I missed something. I just see executing prepared statement. Any ideas how I can find the row in question?
I am considering reducing read_request_timeout_in_ms: 5000 in cassandra.yaml so that it reduces the impact when this occurs.
Any help in identifying my issue would be GREATLY appreciated
Cell Count per Partition
1 cells: 50449950
2 cells: 14281828
3 cells: 8093366
4 cells: 5029200
5 cells: 3103023<tel:3103023>
6 cells: 3059903
7 cells: 1903018
8 cells: 1509297
10 cells: 2420359
12 cells: 1624895
14 cells: 1171678
17 cells: 1289391
20 cells: 909777
24 cells: 852081
29 cells: 722925
35 cells: 587067
42 cells: 459473
50 cells: 358744
60 cells: 304146
72 cells: 244682
86 cells: 191045
103 cells: 155337
124 cells: 127061
149 cells: 98913
179 cells: 77454
215 cells: 59849
258 cells: 46117
310 cells: 35321
372 cells: 26319
446 cells: 19379
535 cells: 13783
642 cells: 9993
770 cells: 6973
924 cells: 4713
1109 cells: 3229
1331 cells: 2062
1597 cells: 1338
1916 cells: 773
2299 cells: 495
2759 cells: 268
3311 cells: 150
3973 cells: 100
4768 cells: 42
5722 cells: 24
6866 cells: 12
8239 cells: 9
9887 cells: 3
11864 cells: 0
14237 cells: 5
17084 cells: 1
20501 cells: 0
24601 cells: 2
29521 cells: 0
35425 cells: 0
42510 cells: 0
51012 cells: 0
61214 cells: 2
From: DuyHai Doan <do...@gmail.com>>
Reply-To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Date: Thursday, July 24, 2014 at 3:01 PM
To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Subject: Re: Hot, large row
"How can I detect wide rows?" -->
nodetool cfhistograms <keyspace> <suspected column family>
Look at column "Column count" (last column) and identify a line in this column with very high value of "Offset". In a well designed application you should have a gaussian distribution where 80% of your row have a similar number of columns.
"Anyone know what debug level I can set so that I can see what reads the hot node is handling? " -->
"nodetool settraceprobability <value>", where value is a small number (0.001) on the node where you encounter the issue. Activate the tracing for a while (5 mins) and deactivate it (value = 0). Then look into system_traces tables "events" & "sessions". It may help or not since the tracing is done once every 1000.
"Any way to get the server to blacklist these wide rows automatically?" --> No
On Thu, Jul 24, 2014 at 8:48 PM, Keith Wright <kw...@nanigans.com>> wrote:
Hi all,
We are seeing an issue where basically daily one of our nodes spikes in load and is churning in CMS heap pressure. It appears that reads are backing up and my guess is that our application is reading a large row repeatedly. Our write structure can lead itself to wide rows very infrequently (<0.001%) and we do our best to detect and delete them but obviously we’re missing a case. Hoping for assistance on the following questions:
* How can I detect wide rows?
* Anyone know what debug level I can set so that I can see what reads the hot node is handling? I’m hoping to see the “bad” row
* Any way to get the server to blacklist these wide rows automatically?
We’re using C* 2.0.6 with Vnodes.
Thanks
Re: Hot, large row
Posted by DuyHai Doan <do...@gmail.com>.
For global_user_event_skus_v2
1. number of SSTables per read is quite huge. Considering you're using LCS,
it means that LCS cannot keep up with write rate and is left behind. AFAIK
LCS is using SizeTieredCompaction at L0 to cope with extreme write burst.
Your high number of SSTables per read is quite fishy here.
2. Write latency is widespead up to 73.457 millisecs, meaning that your
node is getting behind for write in some cases. Most of write are still
below 1 millisec but we don't care. What we care here is the large trail of
write latency climbing up to 73 millisecs
3. Same remarks for read latency which is worst because the distribution is
even "flatter", worst cases going up to 100 ms.
If I were you, I'll check for disk I/O first and maybe CPU usage
On Thu, Jul 24, 2014 at 10:32 PM, Keith Wright <kw...@nanigans.com> wrote:
> Cfhistograms for the tables I believe are most likely the issue are below
> on the node that most recently presented the issue. Any ideas? Note that
> these tables are LCS and have droppable tombstone ratios of 27% for
> global_user_event_skus_v2 and 2.7% for the other. Table definitions also
> below, you can see that they do use compound keys.
>
> Thanks
>
> CREATE TABLE global_user_event_skus_v2 (
>
> user_id timeuuid,
>
> app_id int,
>
> type text,
>
> name text,
>
> sku text,
>
> creationtime timestamp,
>
> PRIMARY KEY (user_id, app_id, type, name, sku)
>
> ) WITH
>
> bloom_filter_fp_chance=0.100000 AND
>
> caching='KEYS_ONLY' AND
>
> comment='' AND
>
> dclocal_read_repair_chance=0.000000 AND
>
> gc_grace_seconds=86400 AND
>
> index_interval=128 AND
>
> read_repair_chance=0.100000 AND
>
> replicate_on_write='true' AND
>
> populate_io_cache_on_flush='false' AND
>
> default_time_to_live=2592000 AND
>
> speculative_retry='99.0PERCENTILE' AND
>
> memtable_flush_period_in_ms=0 AND
>
> compaction={'class': 'LeveledCompactionStrategy'} AND
>
> compression={'sstable_compression': 'LZ4Compressor'};
>
>
> users/global_user_event_skus_v2 histograms
>
>
> SSTables per Read
>
> 1 sstables: 1398122
>
> 2 sstables: 231100
>
> 3 sstables: 97334
>
> 4 sstables: 56496
>
> 5 sstables: 43046
>
> 6 sstables: 35471
>
> 7 sstables: 26888
>
> 8 sstables: 21102
>
> 10 sstables: 29522
>
> 12 sstables: 18314
>
> 14 sstables: 11368
>
> 17 sstables: 10283
>
> 20 sstables: 5766
>
> 24 sstables: 3554
>
> 29 sstables: 1681
>
> 35 sstables: 521
>
> 42 sstables: 102
>
> 50 sstables: 1
>
>
> Write Latency (microseconds)
>
> 2 us: 532
>
> 3 us: 10728
>
> 4 us: 49608
>
> 5 us: 173718
>
> 6 us: 404131
>
> 7 us: 697468
>
> 8 us: 981048
>
> 10 us: 2643680
>
> 12 us: 3183727
>
> 14 us: 3150881
>
> 17 us: 3746333
>
> 20 us: 2298481
>
> 24 us: 1647604
>
> 29 us: 1070733
>
> 35 us: 693300
>
> 42 us: 442120
>
> 50 us: 288618
>
> 60 us: 205061
>
> 72 us: 141194
>
> 86 us: 98781
>
> 103 us: 73629
>
> 124 us: 54900
>
> 149 us: 40518
>
> 179 us: 30330
>
> 215 us: 23953
>
> 258 us: 18875
>
> 310 us: 14147
>
> 372 us: 10509
>
> 446 us: 7984
>
> 535 us: 6257
>
> 642 us: 4903
>
> 770 us: 4086
>
> 924 us: 3252
>
> 1109 us: 2676
>
> 1331 us: 2129
>
> 1597 us: 1753
>
> 1916 us: 1488
>
> 2299 us: 1237
>
> 2759 us: 1078
>
> 3311 us: 866
>
> 3973 us: 704
>
> 4768 us: 551
>
> 5722 us: 553
>
> 6866 us: 774
>
> 8239 us: 443
>
> 9887 us: 303
>
> 11864 us: 222
>
> 14237 us: 181
>
> 17084 us: 146
>
> 20501 us: 147
>
> 24601 us: 499
>
> 29521 us: 842
>
> 35425 us: 548
>
> 42510 us: 303
>
> 51012 us: 560
>
> 61214 us: 842
>
> 73457 us: 600
>
> 88148 us: 283
>
> 105778 us: 173
>
> 126934 us: 81
>
> 152321 us: 18
>
> 182785 us: 2
>
>
> Read Latency (microseconds)
>
> 35 us: 2
>
> 42 us: 72
>
> 50 us: 737
>
> 60 us: 3555
>
> 72 us: 6926
>
> 86 us: 8373
>
> 103 us: 15287
>
> 124 us: 30054
>
> 149 us: 46164
>
> 179 us: 67828
>
> 215 us: 86741
>
> 258 us: 92178
>
> 310 us: 82205
>
> 372 us: 69641
>
> 446 us: 64710
>
> 535 us: 58295
>
> 642 us: 55247
>
> 770 us: 57918
>
> 924 us: 56928
>
> 1109 us: 75762
>
> 1331 us: 122306
>
> 1597 us: 141564
>
> 1916 us: 117301
>
> 2299 us: 102293
>
> 2759 us: 98789
>
> 3311 us: 86923
>
> 3973 us: 75304
>
> 4768 us: 64012
>
> 5722 us: 54973
>
> 6866 us: 47221
>
> 8239 us: 41483
>
> 9887 us: 35015
>
> 11864 us: 30249
>
> 14237 us: 23505
>
> 17084 us: 18364
>
> 20501 us: 12992
>
> 24601 us: 8837
>
> 29521 us: 6497
>
> 35425 us: 4840
>
> 42510 us: 3536
>
> 51012 us: 2471
>
> 61214 us: 2784
>
> 73457 us: 3664
>
> 88148 us: 3219
>
> 105778 us: 1980
>
> 126934 us: 1050
>
> 152321 us: 519
>
> 182785 us: 184
>
> 219342 us: 82
>
> 263210 us: 39
>
> 315852 us: 22
>
> 379022 us: 16
>
> 454826 us: 0
>
> 545791 us: 0
>
> 654949 us: 1
>
> 785939 us: 1
>
> 943127 us: 0
>
> 1131752 us: 0
>
> 1358102 us: 0
>
> 1629722 us: 3
>
> 1955666 us: 3
>
> 2346799 us: 1
>
> 2816159 us: 0
>
> 3379391 us: 0
>
> 4055269 us: 0
>
> 4866323 us: 0
>
> 5839588 us: 0
>
> 7007506 us: 4
>
>
> Partition Size (bytes)
>
> 35 bytes: 31
>
> 42 bytes: 0
>
> 50 bytes: 0
>
> 60 bytes: 0
>
> 72 bytes: 0
>
> 86 bytes: 0
>
> 103 bytes: 761108
>
> 124 bytes: 23596164
>
> 149 bytes: 22963275
>
> 179 bytes: 3517971
>
> 215 bytes: 6395335
>
> 258 bytes: 6197982
>
> 310 bytes: 4662589
>
> 372 bytes: 5261934
>
> 446 bytes: 3032032
>
> 535 bytes: 3775310
>
> 642 bytes: 2879935
>
> 770 bytes: 3002743
>
> 924 bytes: 2279900
>
> 1109 bytes: 1976357
>
> 1331 bytes: 1643875
>
> 1597 bytes: 1422199
>
> 1916 bytes: 1142746
>
> 2299 bytes: 901584
>
> 2759 bytes: 760173
>
> 3311 bytes: 596874
>
> 3973 bytes: 480502
>
> 4768 bytes: 380185
>
> 5722 bytes: 303315
>
> 6866 bytes: 239013
>
> 8239 bytes: 187337
>
> 9887 bytes: 147218
>
> 11864 bytes: 115684
>
> 14237 bytes: 89718
>
> 17084 bytes: 69160
>
> 20501 bytes: 53070
>
> 24601 bytes: 40496
>
> 29521 bytes: 30344
>
> 35425 bytes: 22656
>
> 42510 bytes: 16419
>
> 51012 bytes: 11621
>
> 61214 bytes: 8020
>
> 73457 bytes: 5561
>
> 88148 bytes: 3708
>
> 105778 bytes: 2434
>
> 126934 bytes: 1561
>
> 152321 bytes: 1017
>
> 182785 bytes: 559
>
> 219342 bytes: 349
>
> 263210 bytes: 181
>
> 315852 bytes: 121
>
> 379022 bytes: 66
>
> 454826 bytes: 22
>
> 545791 bytes: 14
>
> 654949 bytes: 10
>
> 785939 bytes: 2
>
> 943127 bytes: 0
>
> 1131752 bytes: 4
>
> 1358102 bytes: 2
>
> 1629722 bytes: 1
>
> 1955666 bytes: 2
>
> 2346799 bytes: 0
>
> 2816159 bytes: 0
>
> 3379391 bytes: 0
>
> 4055269 bytes: 0
>
> 4866323 bytes: 1
>
>
> Cell Count per Partition
>
> 1 cells: 50599458
>
> 2 cells: 13975363
>
> 3 cells: 7927898
>
> 4 cells: 4981142
>
> 5 cells: 3081166
>
> 6 cells: 3034895
>
> 7 cells: 1896720
>
> 8 cells: 1508849
>
> 10 cells: 2418845
>
> 12 cells: 1631289
>
> 14 cells: 1179269
>
> 17 cells: 1301647
>
> 20 cells: 919744
>
> 24 cells: 864313
>
> 29 cells: 735203
>
> 35 cells: 597749
>
> 42 cells: 469611
>
> 50 cells: 367422
>
> 60 cells: 312697
>
> 72 cells: 251782
>
> 86 cells: 197717
>
> 103 cells: 161244
>
> 124 cells: 132074
>
> 149 cells: 103711
>
> 179 cells: 81760
>
> 215 cells: 63360
>
> 258 cells: 48982
>
> 310 cells: 37932
>
> 372 cells: 28328
>
> 446 cells: 21347
>
> 535 cells: 15373
>
> 642 cells: 10959
>
> 770 cells: 7643
>
> 924 cells: 5346
>
> 1109 cells: 3613
>
> 1331 cells: 2291
>
> 1597 cells: 1558
>
> 1916 cells: 885
>
> 2299 cells: 577
>
> 2759 cells: 327
>
> 3311 cells: 169
>
> 3973 cells: 116
>
> 4768 cells: 57
>
> 5722 cells: 23
>
> 6866 cells: 15
>
> 8239 cells: 9
>
> 9887 cells: 3
>
> 11864 cells: 0
>
> 14237 cells: 5
>
> 17084 cells: 1
>
> 20501 cells: 0
>
> 24601 cells: 2
>
> 29521 cells: 0
>
> 35425 cells: 0
>
> 42510 cells: 0
>
> 51012 cells: 0
>
> 61214 cells: 1
>
>
> CREATE TABLE global_user_events (
>
> user_id timeuuid,
>
> app_id int,
>
> type text,
>
> name text,
>
> extra_param map<text, text>,
>
> paid boolean,
>
> value float,
>
> PRIMARY KEY (user_id, app_id, type, name)
>
> ) WITH
>
> bloom_filter_fp_chance=0.100000 AND
>
> caching='KEYS_ONLY' AND
>
> comment='' AND
>
> dclocal_read_repair_chance=0.000000 AND
>
> gc_grace_seconds=86400 AND
>
> index_interval=128 AND
>
> read_repair_chance=0.100000 AND
>
> replicate_on_write='true' AND
>
> populate_io_cache_on_flush='false' AND
>
> default_time_to_live=0 AND
>
> speculative_retry='99.0PERCENTILE' AND
>
> memtable_flush_period_in_ms=0 AND
>
> compaction={'class': 'LeveledCompactionStrategy'} AND
>
> compression={'sstable_compression': 'LZ4Compressor'};
>
>
> users/global_user_events histograms
>
>
> SSTables per Read
>
> 1 sstables: 1024446
>
> 2 sstables: 632948
>
> 3 sstables: 230768
>
> 4 sstables: 112428
>
> 5 sstables: 82651
>
> 6 sstables: 61439
>
> 7 sstables: 47518
>
> 8 sstables: 38490
>
> 10 sstables: 58054
>
> 12 sstables: 41128
>
> 14 sstables: 29642
>
> 17 sstables: 31056
>
> 20 sstables: 20474
>
> 24 sstables: 17799
>
> 29 sstables: 13744
>
> 35 sstables: 9485
>
> 42 sstables: 5933
>
> 50 sstables: 3607
>
> 60 sstables: 1618
>
> 72 sstables: 263
>
> 86 sstables: 10
>
>
> Write Latency (microseconds)
>
> 10 us: 10
>
> 12 us: 75
>
> 14 us: 319
>
> 17 us: 3550
>
> 20 us: 20617
>
> 24 us: 105254
>
> 29 us: 311290
>
> 35 us: 494404
>
> 42 us: 516301
>
> 50 us: 534061
>
> 60 us: 421438
>
> 72 us: 314377
>
> 86 us: 375240
>
> 103 us: 333402
>
> 124 us: 816891
>
> 149 us: 667832
>
> 179 us: 430659
>
> 215 us: 336063
>
> 258 us: 380606
>
> 310 us: 235592
>
> 372 us: 68679
>
> 446 us: 14988
>
> 535 us: 6169
>
> 642 us: 3763
>
> 770 us: 2591
>
> 924 us: 2270
>
> 1109 us: 1985
>
> 1331 us: 1744
>
> 1597 us: 1564
>
> 1916 us: 1489
>
> 2299 us: 1326
>
> 2759 us: 1114
>
> 3311 us: 1024
>
> 3973 us: 984
>
> 4768 us: 833
>
> 5722 us: 962
>
> 6866 us: 1076
>
> 8239 us: 891
>
> 9887 us: 611
>
> 11864 us: 489
>
> 14237 us: 324
>
> 17084 us: 306
>
> 20501 us: 222
>
> 24601 us: 426
>
> 29521 us: 791
>
> 35425 us: 820
>
> 42510 us: 534
>
> 51012 us: 530
>
> 61214 us: 946
>
> 73457 us: 1122
>
> 88148 us: 714
>
> 105778 us: 313
>
> 126934 us: 127
>
> 152321 us: 38
>
> 182785 us: 3
>
> 219342 us: 2
>
> 263210 us: 0
>
> 315852 us: 0
>
> 379022 us: 0
>
> 454826 us: 2
>
> 545791 us: 0
>
> 654949 us: 3
>
> 785939 us: 0
>
> 943127 us: 0
>
> 1131752 us: 0
>
> 1358102 us: 0
>
> 1629722 us: 0
>
> 1955666 us: 0
>
> 2346799 us: 1
>
> 2816159 us: 0
>
> 3379391 us: 0
>
> 4055269 us: 0
>
> 4866323 us: 0
>
> 5839588 us: 0
>
> 7007506 us: 1
>
>
> Read Latency (microseconds)
>
> 42 us: 1
>
> 50 us: 2
>
> 60 us: 2
>
> 72 us: 4
>
> 86 us: 15
>
> 103 us: 23
>
> 124 us: 29
>
> 149 us: 48
>
> 179 us: 67
>
> 215 us: 80
>
> 258 us: 316
>
> 310 us: 1812
>
> 372 us: 4999
>
> 446 us: 8876
>
> 535 us: 14586
>
> 642 us: 24417
>
> 770 us: 31943
>
> 924 us: 37453
>
> 1109 us: 77449
>
> 1331 us: 139135
>
> 1597 us: 177538
>
> 1916 us: 169568
>
> 2299 us: 177617
>
> 2759 us: 202552
>
> 3311 us: 198101
>
> 3973 us: 179807
>
> 4768 us: 159149
>
> 5722 us: 142282
>
> 6866 us: 126124
>
> 8239 us: 111529
>
> 9887 us: 97658
>
> 11864 us: 84913
>
> 14237 us: 70905
>
> 17084 us: 54917
>
> 20501 us: 40623
>
> 24601 us: 29545
>
> 29521 us: 21448
>
> 35425 us: 15255
>
> 42510 us: 11222
>
> 51012 us: 8150
>
> 61214 us: 7473
>
> 73457 us: 10060
>
> 88148 us: 10530
>
> 105778 us: 7573
>
> 126934 us: 4331
>
> 152321 us: 2097
>
> 182785 us: 836
>
> 219342 us: 255
>
> 263210 us: 84
>
> 315852 us: 28
>
> 379022 us: 8
>
> 454826 us: 2
>
> 545791 us: 0
>
> 654949 us: 2
>
> 785939 us: 1
>
> 943127 us: 0
>
> 1131752 us: 0
>
> 1358102 us: 0
>
> 1629722 us: 2
>
> 1955666 us: 6
>
> 2346799 us: 18
>
> 2816159 us: 0
>
> 3379391 us: 0
>
> 4055269 us: 0
>
> 4866323 us: 0
>
> 5839588 us: 0
>
> 7007506 us: 34
>
>
> Partition Size (bytes)
>
> 35 bytes: 25
>
> 42 bytes: 0
>
> 50 bytes: 0
>
> 60 bytes: 0
>
> 72 bytes: 21761
>
> 86 bytes: 694466658
>
> 103 bytes: 219007195
>
> 124 bytes: 7034596
>
> 149 bytes: 110269079
>
> 179 bytes: 101276811
>
> 215 bytes: 38518027
>
> 258 bytes: 103242356
>
> 310 bytes: 51434914
>
> 372 bytes: 38048245
>
> 446 bytes: 42558831
>
> 535 bytes: 32869176
>
> 642 bytes: 17188026
>
> 770 bytes: 14750559
>
> 924 bytes: 15963256
>
> 1109 bytes: 16125324
>
> 1331 bytes: 11462272
>
> 1597 bytes: 6810452
>
> 1916 bytes: 5685739
>
> 2299 bytes: 10232324
>
> 2759 bytes: 5138522
>
> 3311 bytes: 5847998
>
> 3973 bytes: 4065752
>
> 4768 bytes: 3468263
>
> 5722 bytes: 2545431
>
> 6866 bytes: 1701913
>
> 8239 bytes: 993808
>
> 9887 bytes: 495282
>
> 11864 bytes: 208644
>
> 14237 bytes: 74307
>
> 17084 bytes: 22286
>
> 20501 bytes: 5474
>
> 24601 bytes: 1225
>
> 29521 bytes: 210
>
> 35425 bytes: 97
>
> 42510 bytes: 48
>
> 51012 bytes: 21
>
> 61214 bytes: 15
>
> 73457 bytes: 10
>
> 88148 bytes: 3
>
> 105778 bytes: 4
>
> 126934 bytes: 1
>
> 152321 bytes: 2
>
> 182785 bytes: 1
>
>
> Cell Count per Partition
>
> 1 cells: 914323128
>
> 2 cells: 196020828
>
> 3 cells: 165502164
>
> 4 cells: 71735648
>
> 5 cells: 54450466
>
> 6 cells: 28995756
>
> 7 cells: 9625976
>
> 8 cells: 10257788
>
> 10 cells: 24004757
>
> 12 cells: 13087147
>
> 14 cells: 13588347
>
> 17 cells: 11976353
>
> 20 cells: 6245374
>
> 24 cells: 5627821
>
> 29 cells: 9480533
>
> 35 cells: 6008276
>
> 42 cells: 5221357
>
> 50 cells: 4355454
>
> 60 cells: 3955488
>
> 72 cells: 2872578
>
> 86 cells: 1914447
>
> 103 cells: 1216276
>
> 124 cells: 652035
>
> 149 cells: 279565
>
> 179 cells: 99836
>
> 215 cells: 29132
>
> 258 cells: 6682
>
> 310 cells: 1276
>
> 372 cells: 258
>
> 446 cells: 96
>
> 535 cells: 48
>
> 642 cells: 25
>
> 770 cells: 9
>
> 924 cells: 9
>
> 1109 cells: 4
>
> 1331 cells: 3
>
> 1597 cells: 1
>
> 1916 cells: 2
>
> From: Jack Krupansky <ja...@basetechnology.com>
> Reply-To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
> Date: Thursday, July 24, 2014 at 4:07 PM
>
> To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
> Subject: Re: Hot, large row
>
> Could it be some “fat columns” (cells with large blob or text values)
> rather than the cell-count per se? IOW, a “big row” rather than a “wide
> row”?
>
> And, could it be a large partition rather than a large row (many rows in a
> single partition)? Are clustering columns being used in the primary key?
>
> -- Jack Krupansky
>
> *From:* DuyHai Doan <do...@gmail.com>
> *Sent:* Thursday, July 24, 2014 3:53 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Hot, large row
>
> Your extract of cfhistograms show that there are no particular "wide
> rows". The widest has 61214 cells which is big but not that huge to be
> really a concern.
>
> Turning on trace probabilty only tells give you some "hints" about what
> kind of queries are done, it does not give the exact partition key nor
> other statement values, especially when you are using prepared statements
> ...
>
>
> "I am considering reducing read_request_timeout_in_ms: 5000 in
> cassandra.yaml so that it reduces the impact when this occurs." --> Don't
> do that, you'll only sweep dust under the carpet. Find the real issue and
> fix it instead of changing parameter to hide it.
>
> One solution would be on client side, to activate some logging to show the
> CQL3 statements the application is issuing that may overload the server. I
> know that's better said than done but I don't have any other idea for the
> moment
>
> -------- Shameless self-promotion ------
>
> To support this kind of live prod debugging & investigation that I add a
> new dynamic query logging feature in Achilles:
> https://github.com/doanduyhai/Achilles/wiki/Statements-Logging-and-Tracing#dynamic-statements-logging
>
> Once you hit the issue, this kind of feature may save your day...
>
>
>
>
>
>
> On Thu, Jul 24, 2014 at 9:22 PM, Keith Wright <kw...@nanigans.com>
> wrote:
>
>> I can see from cfhistograms that I do have some wide rows (see below).
>> I set trace probability as you suggested but the output doesn’t appear to
>> tell me what row was actually read unless I missed something. I just see
>> executing prepared statement. Any ideas how I can find the row in
>> question?
>>
>> I am considering reducing read_request_timeout_in_ms: 5000 in
>> cassandra.yaml so that it reduces the impact when this occurs.
>>
>> Any help in identifying my issue would be GREATLY appreciated
>>
>>
>> Cell Count per Partition
>>
>> 1 cells: 50449950
>>
>> 2 cells: 14281828
>>
>> 3 cells: 8093366
>>
>> 4 cells: 5029200
>>
>> 5 cells: 3103023
>>
>> 6 cells: 3059903
>>
>> 7 cells: 1903018
>>
>> 8 cells: 1509297
>>
>> 10 cells: 2420359
>>
>> 12 cells: 1624895
>>
>> 14 cells: 1171678
>>
>> 17 cells: 1289391
>>
>> 20 cells: 909777
>>
>> 24 cells: 852081
>>
>> 29 cells: 722925
>>
>> 35 cells: 587067
>>
>> 42 cells: 459473
>>
>> 50 cells: 358744
>>
>> 60 cells: 304146
>>
>> 72 cells: 244682
>>
>> 86 cells: 191045
>>
>> 103 cells: 155337
>>
>> 124 cells: 127061
>>
>> 149 cells: 98913
>>
>> 179 cells: 77454
>>
>> 215 cells: 59849
>>
>> 258 cells: 46117
>>
>> 310 cells: 35321
>>
>> 372 cells: 26319
>>
>> 446 cells: 19379
>>
>> 535 cells: 13783
>>
>> 642 cells: 9993
>>
>> 770 cells: 6973
>>
>> 924 cells: 4713
>>
>> 1109 cells: 3229
>>
>> 1331 cells: 2062
>>
>> 1597 cells: 1338
>>
>> 1916 cells: 773
>>
>> 2299 cells: 495
>>
>> 2759 cells: 268
>>
>> 3311 cells: 150
>>
>> 3973 cells: 100
>>
>> 4768 cells: 42
>>
>> 5722 cells: 24
>>
>> 6866 cells: 12
>>
>> 8239 cells: 9
>>
>> 9887 cells: 3
>>
>> 11864 cells: 0
>>
>> 14237 cells: 5
>>
>> 17084 cells: 1
>>
>> 20501 cells: 0
>>
>> 24601 cells: 2
>>
>> 29521 cells: 0
>>
>> 35425 cells: 0
>>
>> 42510 cells: 0
>>
>> 51012 cells: 0
>>
>> 61214 cells: 2
>>
>> From: DuyHai Doan <do...@gmail.com>
>> Reply-To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
>> Date: Thursday, July 24, 2014 at 3:01 PM
>> To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
>> Subject: Re: Hot, large row
>>
>> "How can I detect wide rows?" -->
>>
>> nodetool cfhistograms <keyspace> <suspected column family>
>>
>> Look at column "Column count" (last column) and identify a line in this
>> column with very high value of "Offset". In a well designed application you
>> should have a gaussian distribution where 80% of your row have a similar
>> number of columns.
>>
>> "Anyone know what debug level I can set so that I can see what reads the
>> hot node is handling? " -->
>>
>> "nodetool settraceprobability <value>", where value is a small number
>> (0.001) on the node where you encounter the issue. Activate the tracing for
>> a while (5 mins) and deactivate it (value = 0). Then look into
>> system_traces tables "events" & "sessions". It may help or not since the
>> tracing is done once every 1000.
>>
>> "Any way to get the server to blacklist these wide rows automatically?"
>> --> No
>>
>>
>> On Thu, Jul 24, 2014 at 8:48 PM, Keith Wright <kw...@nanigans.com>
>> wrote:
>>
>>> Hi all,
>>>
>>> We are seeing an issue where basically daily one of our nodes spikes
>>> in load and is churning in CMS heap pressure. It appears that reads are
>>> backing up and my guess is that our application is reading a large row
>>> repeatedly. Our write structure can lead itself to wide rows very
>>> infrequently (<0.001%) and we do our best to detect and delete them but
>>> obviously we’re missing a case. Hoping for assistance on the following
>>> questions:
>>>
>>> - How can I detect wide rows?
>>> - Anyone know what debug level I can set so that I can see what
>>> reads the hot node is handling? I’m hoping to see the “bad” row
>>> - Any way to get the server to blacklist these wide rows
>>> automatically?
>>>
>>> We’re using C* 2.0.6 with Vnodes.
>>>
>>> Thanks
>>>
>>
>>
>
>
Re: Hot, large row
Posted by Keith Wright <kw...@nanigans.com>.
Cfhistograms for the tables I believe are most likely the issue are below on the node that most recently presented the issue. Any ideas? Note that these tables are LCS and have droppable tombstone ratios of 27% for global_user_event_skus_v2 and 2.7% for the other. Table definitions also below, you can see that they do use compound keys.
Thanks
CREATE TABLE global_user_event_skus_v2 (
user_id timeuuid,
app_id int,
type text,
name text,
sku text,
creationtime timestamp,
PRIMARY KEY (user_id, app_id, type, name, sku)
) WITH
bloom_filter_fp_chance=0.100000 AND
caching='KEYS_ONLY' AND
comment='' AND
dclocal_read_repair_chance=0.000000 AND
gc_grace_seconds=86400 AND
index_interval=128 AND
read_repair_chance=0.100000 AND
replicate_on_write='true' AND
populate_io_cache_on_flush='false' AND
default_time_to_live=2592000 AND
speculative_retry='99.0PERCENTILE' AND
memtable_flush_period_in_ms=0 AND
compaction={'class': 'LeveledCompactionStrategy'} AND
compression={'sstable_compression': 'LZ4Compressor'};
users/global_user_event_skus_v2 histograms
SSTables per Read
1 sstables: 1398122
2 sstables: 231100
3 sstables: 97334
4 sstables: 56496
5 sstables: 43046
6 sstables: 35471
7 sstables: 26888
8 sstables: 21102
10 sstables: 29522
12 sstables: 18314
14 sstables: 11368
17 sstables: 10283
20 sstables: 5766
24 sstables: 3554
29 sstables: 1681
35 sstables: 521
42 sstables: 102
50 sstables: 1
Write Latency (microseconds)
2 us: 532
3 us: 10728
4 us: 49608
5 us: 173718
6 us: 404131
7 us: 697468
8 us: 981048
10 us: 2643680
12 us: 3183727
14 us: 3150881
17 us: 3746333
20 us: 2298481
24 us: 1647604
29 us: 1070733
35 us: 693300
42 us: 442120
50 us: 288618
60 us: 205061
72 us: 141194
86 us: 98781
103 us: 73629
124 us: 54900
149 us: 40518
179 us: 30330
215 us: 23953
258 us: 18875
310 us: 14147
372 us: 10509
446 us: 7984
535 us: 6257
642 us: 4903
770 us: 4086
924 us: 3252
1109 us: 2676
1331 us: 2129
1597 us: 1753
1916 us: 1488
2299 us: 1237
2759 us: 1078
3311 us: 866
3973 us: 704
4768 us: 551
5722 us: 553
6866 us: 774
8239 us: 443
9887 us: 303
11864 us: 222
14237 us: 181
17084 us: 146
20501 us: 147
24601 us: 499
29521 us: 842
35425 us: 548
42510 us: 303
51012 us: 560
61214 us: 842
73457 us: 600
88148 us: 283
105778 us: 173
126934 us: 81
152321 us: 18
182785 us: 2
Read Latency (microseconds)
35 us: 2
42 us: 72
50 us: 737
60 us: 3555
72 us: 6926
86 us: 8373
103 us: 15287
124 us: 30054
149 us: 46164
179 us: 67828
215 us: 86741
258 us: 92178
310 us: 82205
372 us: 69641
446 us: 64710
535 us: 58295
642 us: 55247
770 us: 57918
924 us: 56928
1109 us: 75762
1331 us: 122306
1597 us: 141564
1916 us: 117301
2299 us: 102293
2759 us: 98789
3311 us: 86923
3973 us: 75304
4768 us: 64012
5722 us: 54973
6866 us: 47221
8239 us: 41483
9887 us: 35015
11864 us: 30249
14237 us: 23505
17084 us: 18364
20501 us: 12992
24601 us: 8837
29521 us: 6497
35425 us: 4840
42510 us: 3536
51012 us: 2471
61214 us: 2784
73457 us: 3664
88148 us: 3219
105778 us: 1980
126934 us: 1050
152321 us: 519
182785 us: 184
219342 us: 82
263210 us: 39
315852 us: 22
379022 us: 16
454826 us: 0
545791 us: 0
654949 us: 1
785939 us: 1
943127 us: 0
1131752 us: 0
1358102 us: 0
1629722 us: 3
1955666 us: 3
2346799 us: 1
2816159 us: 0
3379391 us: 0
4055269 us: 0
4866323 us: 0
5839588 us: 0
7007506 us: 4
Partition Size (bytes)
35 bytes: 31
42 bytes: 0
50 bytes: 0
60 bytes: 0
72 bytes: 0
86 bytes: 0
103 bytes: 761108
124 bytes: 23596164
149 bytes: 22963275
179 bytes: 3517971
215 bytes: 6395335
258 bytes: 6197982
310 bytes: 4662589
372 bytes: 5261934
446 bytes: 3032032
535 bytes: 3775310
642 bytes: 2879935
770 bytes: 3002743
924 bytes: 2279900
1109 bytes: 1976357
1331 bytes: 1643875
1597 bytes: 1422199
1916 bytes: 1142746
2299 bytes: 901584
2759 bytes: 760173
3311 bytes: 596874
3973 bytes: 480502
4768 bytes: 380185
5722 bytes: 303315
6866 bytes: 239013
8239 bytes: 187337
9887 bytes: 147218
11864 bytes: 115684
14237 bytes: 89718
17084 bytes: 69160
20501 bytes: 53070
24601 bytes: 40496
29521 bytes: 30344
35425 bytes: 22656
42510 bytes: 16419
51012 bytes: 11621
61214 bytes: 8020
73457 bytes: 5561
88148 bytes: 3708
105778 bytes: 2434
126934 bytes: 1561
152321 bytes: 1017
182785 bytes: 559
219342 bytes: 349
263210 bytes: 181
315852 bytes: 121
379022 bytes: 66
454826 bytes: 22
545791 bytes: 14
654949 bytes: 10
785939 bytes: 2
943127 bytes: 0
1131752 bytes: 4
1358102 bytes: 2
1629722 bytes: 1
1955666 bytes: 2
2346799 bytes: 0
2816159 bytes: 0
3379391 bytes: 0
4055269 bytes: 0
4866323 bytes: 1
Cell Count per Partition
1 cells: 50599458
2 cells: 13975363
3 cells: 7927898
4 cells: 4981142
5 cells: 3081166
6 cells: 3034895
7 cells: 1896720
8 cells: 1508849
10 cells: 2418845
12 cells: 1631289
14 cells: 1179269
17 cells: 1301647
20 cells: 919744
24 cells: 864313
29 cells: 735203
35 cells: 597749
42 cells: 469611
50 cells: 367422
60 cells: 312697
72 cells: 251782
86 cells: 197717
103 cells: 161244
124 cells: 132074
149 cells: 103711
179 cells: 81760
215 cells: 63360
258 cells: 48982
310 cells: 37932
372 cells: 28328
446 cells: 21347
535 cells: 15373
642 cells: 10959
770 cells: 7643
924 cells: 5346
1109 cells: 3613
1331 cells: 2291
1597 cells: 1558
1916 cells: 885
2299 cells: 577
2759 cells: 327
3311 cells: 169
3973 cells: 116
4768 cells: 57
5722 cells: 23
6866 cells: 15
8239 cells: 9
9887 cells: 3
11864 cells: 0
14237 cells: 5
17084 cells: 1
20501 cells: 0
24601 cells: 2
29521 cells: 0
35425 cells: 0
42510 cells: 0
51012 cells: 0
61214 cells: 1
CREATE TABLE global_user_events (
user_id timeuuid,
app_id int,
type text,
name text,
extra_param map<text, text>,
paid boolean,
value float,
PRIMARY KEY (user_id, app_id, type, name)
) WITH
bloom_filter_fp_chance=0.100000 AND
caching='KEYS_ONLY' AND
comment='' AND
dclocal_read_repair_chance=0.000000 AND
gc_grace_seconds=86400 AND
index_interval=128 AND
read_repair_chance=0.100000 AND
replicate_on_write='true' AND
populate_io_cache_on_flush='false' AND
default_time_to_live=0 AND
speculative_retry='99.0PERCENTILE' AND
memtable_flush_period_in_ms=0 AND
compaction={'class': 'LeveledCompactionStrategy'} AND
compression={'sstable_compression': 'LZ4Compressor'};
users/global_user_events histograms
SSTables per Read
1 sstables: 1024446
2 sstables: 632948
3 sstables: 230768
4 sstables: 112428
5 sstables: 82651
6 sstables: 61439
7 sstables: 47518
8 sstables: 38490
10 sstables: 58054
12 sstables: 41128
14 sstables: 29642
17 sstables: 31056
20 sstables: 20474
24 sstables: 17799
29 sstables: 13744
35 sstables: 9485
42 sstables: 5933
50 sstables: 3607
60 sstables: 1618
72 sstables: 263
86 sstables: 10
Write Latency (microseconds)
10 us: 10
12 us: 75
14 us: 319
17 us: 3550
20 us: 20617
24 us: 105254
29 us: 311290
35 us: 494404
42 us: 516301
50 us: 534061
60 us: 421438
72 us: 314377
86 us: 375240
103 us: 333402
124 us: 816891
149 us: 667832
179 us: 430659
215 us: 336063
258 us: 380606
310 us: 235592
372 us: 68679
446 us: 14988
535 us: 6169
642 us: 3763
770 us: 2591
924 us: 2270
1109 us: 1985
1331 us: 1744
1597 us: 1564
1916 us: 1489
2299 us: 1326
2759 us: 1114
3311 us: 1024
3973 us: 984
4768 us: 833
5722 us: 962
6866 us: 1076
8239 us: 891
9887 us: 611
11864 us: 489
14237 us: 324
17084 us: 306
20501 us: 222
24601 us: 426
29521 us: 791
35425 us: 820
42510 us: 534
51012 us: 530
61214 us: 946
73457 us: 1122
88148 us: 714
105778 us: 313
126934 us: 127
152321 us: 38
182785 us: 3
219342 us: 2
263210 us: 0
315852 us: 0
379022 us: 0
454826 us: 2
545791 us: 0
654949 us: 3
785939 us: 0
943127 us: 0
1131752 us: 0
1358102 us: 0
1629722 us: 0
1955666 us: 0
2346799 us: 1
2816159 us: 0
3379391 us: 0
4055269 us: 0
4866323 us: 0
5839588 us: 0
7007506 us: 1
Read Latency (microseconds)
42 us: 1
50 us: 2
60 us: 2
72 us: 4
86 us: 15
103 us: 23
124 us: 29
149 us: 48
179 us: 67
215 us: 80
258 us: 316
310 us: 1812
372 us: 4999
446 us: 8876
535 us: 14586
642 us: 24417
770 us: 31943
924 us: 37453
1109 us: 77449
1331 us: 139135
1597 us: 177538
1916 us: 169568
2299 us: 177617
2759 us: 202552
3311 us: 198101
3973 us: 179807
4768 us: 159149
5722 us: 142282
6866 us: 126124
8239 us: 111529
9887 us: 97658
11864 us: 84913
14237 us: 70905
17084 us: 54917
20501 us: 40623
24601 us: 29545
29521 us: 21448
35425 us: 15255
42510 us: 11222
51012 us: 8150
61214 us: 7473
73457 us: 10060
88148 us: 10530
105778 us: 7573
126934 us: 4331
152321 us: 2097
182785 us: 836
219342 us: 255
263210 us: 84
315852 us: 28
379022 us: 8
454826 us: 2
545791 us: 0
654949 us: 2
785939 us: 1
943127 us: 0
1131752 us: 0
1358102 us: 0
1629722 us: 2
1955666 us: 6
2346799 us: 18
2816159 us: 0
3379391 us: 0
4055269 us: 0
4866323 us: 0
5839588 us: 0
7007506 us: 34
Partition Size (bytes)
35 bytes: 25
42 bytes: 0
50 bytes: 0
60 bytes: 0
72 bytes: 21761
86 bytes: 694466658
103 bytes: 219007195
124 bytes: 7034596
149 bytes: 110269079
179 bytes: 101276811
215 bytes: 38518027
258 bytes: 103242356
310 bytes: 51434914
372 bytes: 38048245
446 bytes: 42558831
535 bytes: 32869176
642 bytes: 17188026
770 bytes: 14750559
924 bytes: 15963256
1109 bytes: 16125324
1331 bytes: 11462272
1597 bytes: 6810452
1916 bytes: 5685739
2299 bytes: 10232324
2759 bytes: 5138522
3311 bytes: 5847998
3973 bytes: 4065752
4768 bytes: 3468263
5722 bytes: 2545431
6866 bytes: 1701913
8239 bytes: 993808
9887 bytes: 495282
11864 bytes: 208644
14237 bytes: 74307
17084 bytes: 22286
20501 bytes: 5474
24601 bytes: 1225
29521 bytes: 210
35425 bytes: 97
42510 bytes: 48
51012 bytes: 21
61214 bytes: 15
73457 bytes: 10
88148 bytes: 3
105778 bytes: 4
126934 bytes: 1
152321 bytes: 2
182785 bytes: 1
Cell Count per Partition
1 cells: 914323128
2 cells: 196020828
3 cells: 165502164
4 cells: 71735648
5 cells: 54450466
6 cells: 28995756
7 cells: 9625976
8 cells: 10257788
10 cells: 24004757
12 cells: 13087147
14 cells: 13588347
17 cells: 11976353
20 cells: 6245374
24 cells: 5627821
29 cells: 9480533
35 cells: 6008276
42 cells: 5221357
50 cells: 4355454
60 cells: 3955488
72 cells: 2872578
86 cells: 1914447
103 cells: 1216276
124 cells: 652035
149 cells: 279565
179 cells: 99836
215 cells: 29132
258 cells: 6682
310 cells: 1276
372 cells: 258
446 cells: 96
535 cells: 48
642 cells: 25
770 cells: 9
924 cells: 9
1109 cells: 4
1331 cells: 3
1597 cells: 1
1916 cells: 2
From: Jack Krupansky <ja...@basetechnology.com>>
Reply-To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Date: Thursday, July 24, 2014 at 4:07 PM
To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Subject: Re: Hot, large row
Could it be some “fat columns” (cells with large blob or text values) rather than the cell-count per se? IOW, a “big row” rather than a “wide row”?
And, could it be a large partition rather than a large row (many rows in a single partition)? Are clustering columns being used in the primary key?
-- Jack Krupansky
From: DuyHai Doan<ma...@gmail.com>
Sent: Thursday, July 24, 2014 3:53 PM
To: user@cassandra.apache.org<ma...@cassandra.apache.org>
Subject: Re: Hot, large row
Your extract of cfhistograms show that there are no particular "wide rows". The widest has 61214 cells which is big but not that huge to be really a concern.
Turning on trace probabilty only tells give you some "hints" about what kind of queries are done, it does not give the exact partition key nor other statement values, especially when you are using prepared statements ...
"I am considering reducing read_request_timeout_in_ms: 5000 in cassandra.yaml so that it reduces the impact when this occurs." --> Don't do that, you'll only sweep dust under the carpet. Find the real issue and fix it instead of changing parameter to hide it.
One solution would be on client side, to activate some logging to show the CQL3 statements the application is issuing that may overload the server. I know that's better said than done but I don't have any other idea for the moment
-------- Shameless self-promotion ------
To support this kind of live prod debugging & investigation that I add a new dynamic query logging feature in Achilles: https://github.com/doanduyhai/Achilles/wiki/Statements-Logging-and-Tracing#dynamic-statements-logging
Once you hit the issue, this kind of feature may save your day...
On Thu, Jul 24, 2014 at 9:22 PM, Keith Wright <kw...@nanigans.com>> wrote:
I can see from cfhistograms that I do have some wide rows (see below). I set trace probability as you suggested but the output doesn’t appear to tell me what row was actually read unless I missed something. I just see executing prepared statement. Any ideas how I can find the row in question?
I am considering reducing read_request_timeout_in_ms: 5000 in cassandra.yaml so that it reduces the impact when this occurs.
Any help in identifying my issue would be GREATLY appreciated
Cell Count per Partition
1 cells: 50449950
2 cells: 14281828
3 cells: 8093366
4 cells: 5029200
5 cells: 3103023<tel:3103023>
6 cells: 3059903
7 cells: 1903018
8 cells: 1509297
10 cells: 2420359
12 cells: 1624895
14 cells: 1171678
17 cells: 1289391
20 cells: 909777
24 cells: 852081
29 cells: 722925
35 cells: 587067
42 cells: 459473
50 cells: 358744
60 cells: 304146
72 cells: 244682
86 cells: 191045
103 cells: 155337
124 cells: 127061
149 cells: 98913
179 cells: 77454
215 cells: 59849
258 cells: 46117
310 cells: 35321
372 cells: 26319
446 cells: 19379
535 cells: 13783
642 cells: 9993
770 cells: 6973
924 cells: 4713
1109 cells: 3229
1331 cells: 2062
1597 cells: 1338
1916 cells: 773
2299 cells: 495
2759 cells: 268
3311 cells: 150
3973 cells: 100
4768 cells: 42
5722 cells: 24
6866 cells: 12
8239 cells: 9
9887 cells: 3
11864 cells: 0
14237 cells: 5
17084 cells: 1
20501 cells: 0
24601 cells: 2
29521 cells: 0
35425 cells: 0
42510 cells: 0
51012 cells: 0
61214 cells: 2
From: DuyHai Doan <do...@gmail.com>>
Reply-To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Date: Thursday, July 24, 2014 at 3:01 PM
To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Subject: Re: Hot, large row
"How can I detect wide rows?" -->
nodetool cfhistograms <keyspace> <suspected column family>
Look at column "Column count" (last column) and identify a line in this column with very high value of "Offset". In a well designed application you should have a gaussian distribution where 80% of your row have a similar number of columns.
"Anyone know what debug level I can set so that I can see what reads the hot node is handling? " -->
"nodetool settraceprobability <value>", where value is a small number (0.001) on the node where you encounter the issue. Activate the tracing for a while (5 mins) and deactivate it (value = 0). Then look into system_traces tables "events" & "sessions". It may help or not since the tracing is done once every 1000.
"Any way to get the server to blacklist these wide rows automatically?" --> No
On Thu, Jul 24, 2014 at 8:48 PM, Keith Wright <kw...@nanigans.com>> wrote:
Hi all,
We are seeing an issue where basically daily one of our nodes spikes in load and is churning in CMS heap pressure. It appears that reads are backing up and my guess is that our application is reading a large row repeatedly. Our write structure can lead itself to wide rows very infrequently (<0.001%) and we do our best to detect and delete them but obviously we’re missing a case. Hoping for assistance on the following questions:
* How can I detect wide rows?
* Anyone know what debug level I can set so that I can see what reads the hot node is handling? I’m hoping to see the “bad” row
* Any way to get the server to blacklist these wide rows automatically?
We’re using C* 2.0.6 with Vnodes.
Thanks
Re: Hot, large row
Posted by Jack Krupansky <ja...@basetechnology.com>.
Could it be some “fat columns” (cells with large blob or text values) rather than the cell-count per se? IOW, a “big row” rather than a “wide row”?
And, could it be a large partition rather than a large row (many rows in a single partition)? Are clustering columns being used in the primary key?
-- Jack Krupansky
From: DuyHai Doan
Sent: Thursday, July 24, 2014 3:53 PM
To: user@cassandra.apache.org
Subject: Re: Hot, large row
Your extract of cfhistograms show that there are no particular "wide rows". The widest has 61214 cells which is big but not that huge to be really a concern.
Turning on trace probabilty only tells give you some "hints" about what kind of queries are done, it does not give the exact partition key nor other statement values, especially when you are using prepared statements ...
"I am considering reducing read_request_timeout_in_ms: 5000 in cassandra.yaml so that it reduces the impact when this occurs." --> Don't do that, you'll only sweep dust under the carpet. Find the real issue and fix it instead of changing parameter to hide it.
One solution would be on client side, to activate some logging to show the CQL3 statements the application is issuing that may overload the server. I know that's better said than done but I don't have any other idea for the moment
-------- Shameless self-promotion ------
To support this kind of live prod debugging & investigation that I add a new dynamic query logging feature in Achilles: https://github.com/doanduyhai/Achilles/wiki/Statements-Logging-and-Tracing#dynamic-statements-logging
Once you hit the issue, this kind of feature may save your day...
On Thu, Jul 24, 2014 at 9:22 PM, Keith Wright <kw...@nanigans.com> wrote:
I can see from cfhistograms that I do have some wide rows (see below). I set trace probability as you suggested but the output doesn’t appear to tell me what row was actually read unless I missed something. I just see executing prepared statement. Any ideas how I can find the row in question?
I am considering reducing read_request_timeout_in_ms: 5000 in cassandra.yaml so that it reduces the impact when this occurs.
Any help in identifying my issue would be GREATLY appreciated
Cell Count per Partition
1 cells: 50449950
2 cells: 14281828
3 cells: 8093366
4 cells: 5029200
5 cells: 3103023
6 cells: 3059903
7 cells: 1903018
8 cells: 1509297
10 cells: 2420359
12 cells: 1624895
14 cells: 1171678
17 cells: 1289391
20 cells: 909777
24 cells: 852081
29 cells: 722925
35 cells: 587067
42 cells: 459473
50 cells: 358744
60 cells: 304146
72 cells: 244682
86 cells: 191045
103 cells: 155337
124 cells: 127061
149 cells: 98913
179 cells: 77454
215 cells: 59849
258 cells: 46117
310 cells: 35321
372 cells: 26319
446 cells: 19379
535 cells: 13783
642 cells: 9993
770 cells: 6973
924 cells: 4713
1109 cells: 3229
1331 cells: 2062
1597 cells: 1338
1916 cells: 773
2299 cells: 495
2759 cells: 268
3311 cells: 150
3973 cells: 100
4768 cells: 42
5722 cells: 24
6866 cells: 12
8239 cells: 9
9887 cells: 3
11864 cells: 0
14237 cells: 5
17084 cells: 1
20501 cells: 0
24601 cells: 2
29521 cells: 0
35425 cells: 0
42510 cells: 0
51012 cells: 0
61214 cells: 2
From: DuyHai Doan <do...@gmail.com>
Reply-To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
Date: Thursday, July 24, 2014 at 3:01 PM
To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
Subject: Re: Hot, large row
"How can I detect wide rows?" -->
nodetool cfhistograms <keyspace> <suspected column family>
Look at column "Column count" (last column) and identify a line in this column with very high value of "Offset". In a well designed application you should have a gaussian distribution where 80% of your row have a similar number of columns.
"Anyone know what debug level I can set so that I can see what reads the hot node is handling? " -->
"nodetool settraceprobability <value>", where value is a small number (0.001) on the node where you encounter the issue. Activate the tracing for a while (5 mins) and deactivate it (value = 0). Then look into system_traces tables "events" & "sessions". It may help or not since the tracing is done once every 1000.
"Any way to get the server to blacklist these wide rows automatically?" --> No
On Thu, Jul 24, 2014 at 8:48 PM, Keith Wright <kw...@nanigans.com> wrote:
Hi all,
We are seeing an issue where basically daily one of our nodes spikes in load and is churning in CMS heap pressure. It appears that reads are backing up and my guess is that our application is reading a large row repeatedly. Our write structure can lead itself to wide rows very infrequently (<0.001%) and we do our best to detect and delete them but obviously we’re missing a case. Hoping for assistance on the following questions:
a.. How can I detect wide rows?
b.. Anyone know what debug level I can set so that I can see what reads the hot node is handling? I’m hoping to see the “bad” row
c.. Any way to get the server to blacklist these wide rows automatically?
We’re using C* 2.0.6 with Vnodes.
Thanks
Re: Hot, large row
Posted by Keith Wright <kw...@nanigans.com>.
I appreciate the feedback, doing it on the client side is interesting and I will start looking into that. To be clear, here are the symptoms I am seeing:
* A node will start showing high load and the CMS collection time jumps to 100+ ms/sec (per new is also up)
* If I run nodetool tpstats, I see a high number of items in the Pending phase for ReadStage. Other items mostly appear near empty. In addition, I see dropped reads.
* If I shutdown the effected node, two other nodes show increased load. I am at RF=3 so this would again suggest slow reads against a single row
Does it sound correct that my best course of action is to investigate a large row? If it were a small row being called repeatedly, I assume OS/key cache would make that a VERY fast operation.
Thanks
From: DuyHai Doan <do...@gmail.com>>
Reply-To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Date: Thursday, July 24, 2014 at 3:53 PM
To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Subject: Re: Hot, large row
Your extract of cfhistograms show that there are no particular "wide rows". The widest has 61214 cells which is big but not that huge to be really a concern.
Turning on trace probabilty only tells give you some "hints" about what kind of queries are done, it does not give the exact partition key nor other statement values, especially when you are using prepared statements ...
"I am considering reducing read_request_timeout_in_ms: 5000 in cassandra.yaml so that it reduces the impact when this occurs." --> Don't do that, you'll only sweep dust under the carpet. Find the real issue and fix it instead of changing parameter to hide it.
One solution would be on client side, to activate some logging to show the CQL3 statements the application is issuing that may overload the server. I know that's better said than done but I don't have any other idea for the moment
-------- Shameless self-promotion ------
To support this kind of live prod debugging & investigation that I add a new dynamic query logging feature in Achilles: https://github.com/doanduyhai/Achilles/wiki/Statements-Logging-and-Tracing#dynamic-statements-logging
Once you hit the issue, this kind of feature may save your day...
On Thu, Jul 24, 2014 at 9:22 PM, Keith Wright <kw...@nanigans.com>> wrote:
I can see from cfhistograms that I do have some wide rows (see below). I set trace probability as you suggested but the output doesn’t appear to tell me what row was actually read unless I missed something. I just see executing prepared statement. Any ideas how I can find the row in question?
I am considering reducing read_request_timeout_in_ms: 5000 in cassandra.yaml so that it reduces the impact when this occurs.
Any help in identifying my issue would be GREATLY appreciated
Cell Count per Partition
1 cells: 50449950
2 cells: 14281828
3 cells: 8093366
4 cells: 5029200
5 cells: 3103023<tel:3103023>
6 cells: 3059903
7 cells: 1903018
8 cells: 1509297
10 cells: 2420359
12 cells: 1624895
14 cells: 1171678
17 cells: 1289391
20 cells: 909777
24 cells: 852081
29 cells: 722925
35 cells: 587067
42 cells: 459473
50 cells: 358744
60 cells: 304146
72 cells: 244682
86 cells: 191045
103 cells: 155337
124 cells: 127061
149 cells: 98913
179 cells: 77454
215 cells: 59849
258 cells: 46117
310 cells: 35321
372 cells: 26319
446 cells: 19379
535 cells: 13783
642 cells: 9993
770 cells: 6973
924 cells: 4713
1109 cells: 3229
1331 cells: 2062
1597 cells: 1338
1916 cells: 773
2299 cells: 495
2759 cells: 268
3311 cells: 150
3973 cells: 100
4768 cells: 42
5722 cells: 24
6866 cells: 12
8239 cells: 9
9887 cells: 3
11864 cells: 0
14237 cells: 5
17084 cells: 1
20501 cells: 0
24601 cells: 2
29521 cells: 0
35425 cells: 0
42510 cells: 0
51012 cells: 0
61214 cells: 2
From: DuyHai Doan <do...@gmail.com>>
Reply-To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Date: Thursday, July 24, 2014 at 3:01 PM
To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Subject: Re: Hot, large row
"How can I detect wide rows?" -->
nodetool cfhistograms <keyspace> <suspected column family>
Look at column "Column count" (last column) and identify a line in this column with very high value of "Offset". In a well designed application you should have a gaussian distribution where 80% of your row have a similar number of columns.
"Anyone know what debug level I can set so that I can see what reads the hot node is handling? " -->
"nodetool settraceprobability <value>", where value is a small number (0.001) on the node where you encounter the issue. Activate the tracing for a while (5 mins) and deactivate it (value = 0). Then look into system_traces tables "events" & "sessions". It may help or not since the tracing is done once every 1000.
"Any way to get the server to blacklist these wide rows automatically?" --> No
On Thu, Jul 24, 2014 at 8:48 PM, Keith Wright <kw...@nanigans.com>> wrote:
Hi all,
We are seeing an issue where basically daily one of our nodes spikes in load and is churning in CMS heap pressure. It appears that reads are backing up and my guess is that our application is reading a large row repeatedly. Our write structure can lead itself to wide rows very infrequently (<0.001%) and we do our best to detect and delete them but obviously we’re missing a case. Hoping for assistance on the following questions:
* How can I detect wide rows?
* Anyone know what debug level I can set so that I can see what reads the hot node is handling? I’m hoping to see the “bad” row
* Any way to get the server to blacklist these wide rows automatically?
We’re using C* 2.0.6 with Vnodes.
Thanks
Re: Hot, large row
Posted by DuyHai Doan <do...@gmail.com>.
Your extract of cfhistograms show that there are no particular "wide rows".
The widest has 61214 cells which is big but not that huge to be really a
concern.
Turning on trace probabilty only tells give you some "hints" about what
kind of queries are done, it does not give the exact partition key nor
other statement values, especially when you are using prepared statements
...
"I am considering reducing read_request_timeout_in_ms: 5000 in
cassandra.yaml so that it reduces the impact when this occurs." --> Don't
do that, you'll only sweep dust under the carpet. Find the real issue and
fix it instead of changing parameter to hide it.
One solution would be on client side, to activate some logging to show the
CQL3 statements the application is issuing that may overload the server. I
know that's better said than done but I don't have any other idea for the
moment
-------- Shameless self-promotion ------
To support this kind of live prod debugging & investigation that I add a
new dynamic query logging feature in Achilles:
https://github.com/doanduyhai/Achilles/wiki/Statements-Logging-and-Tracing#dynamic-statements-logging
Once you hit the issue, this kind of feature may save your day...
On Thu, Jul 24, 2014 at 9:22 PM, Keith Wright <kw...@nanigans.com> wrote:
> I can see from cfhistograms that I do have some wide rows (see below). I
> set trace probability as you suggested but the output doesn’t appear to
> tell me what row was actually read unless I missed something. I just see
> executing prepared statement. Any ideas how I can find the row in
> question?
>
> I am considering reducing read_request_timeout_in_ms: 5000 in
> cassandra.yaml so that it reduces the impact when this occurs.
>
> Any help in identifying my issue would be GREATLY appreciated
>
> Cell Count per Partition
>
> 1 cells: 50449950
>
> 2 cells: 14281828
>
> 3 cells: 8093366
>
> 4 cells: 5029200
>
> 5 cells: 3103023
>
> 6 cells: 3059903
>
> 7 cells: 1903018
>
> 8 cells: 1509297
>
> 10 cells: 2420359
>
> 12 cells: 1624895
>
> 14 cells: 1171678
>
> 17 cells: 1289391
>
> 20 cells: 909777
>
> 24 cells: 852081
>
> 29 cells: 722925
>
> 35 cells: 587067
>
> 42 cells: 459473
>
> 50 cells: 358744
>
> 60 cells: 304146
>
> 72 cells: 244682
>
> 86 cells: 191045
>
> 103 cells: 155337
>
> 124 cells: 127061
>
> 149 cells: 98913
>
> 179 cells: 77454
>
> 215 cells: 59849
>
> 258 cells: 46117
>
> 310 cells: 35321
>
> 372 cells: 26319
>
> 446 cells: 19379
>
> 535 cells: 13783
>
> 642 cells: 9993
>
> 770 cells: 6973
>
> 924 cells: 4713
>
> 1109 cells: 3229
>
> 1331 cells: 2062
>
> 1597 cells: 1338
>
> 1916 cells: 773
>
> 2299 cells: 495
>
> 2759 cells: 268
>
> 3311 cells: 150
>
> 3973 cells: 100
>
> 4768 cells: 42
>
> 5722 cells: 24
>
> 6866 cells: 12
>
> 8239 cells: 9
>
> 9887 cells: 3
>
> 11864 cells: 0
>
> 14237 cells: 5
>
> 17084 cells: 1
>
> 20501 cells: 0
>
> 24601 cells: 2
>
> 29521 cells: 0
>
> 35425 cells: 0
>
> 42510 cells: 0
>
> 51012 cells: 0
>
> 61214 cells: 2
>
> From: DuyHai Doan <do...@gmail.com>
> Reply-To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
> Date: Thursday, July 24, 2014 at 3:01 PM
> To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
> Subject: Re: Hot, large row
>
> "How can I detect wide rows?" -->
>
> nodetool cfhistograms <keyspace> <suspected column family>
>
> Look at column "Column count" (last column) and identify a line in this
> column with very high value of "Offset". In a well designed application you
> should have a gaussian distribution where 80% of your row have a similar
> number of columns.
>
> "Anyone know what debug level I can set so that I can see what reads the
> hot node is handling? " -->
>
> "nodetool settraceprobability <value>", where value is a small number
> (0.001) on the node where you encounter the issue. Activate the tracing for
> a while (5 mins) and deactivate it (value = 0). Then look into
> system_traces tables "events" & "sessions". It may help or not since the
> tracing is done once every 1000.
>
> "Any way to get the server to blacklist these wide rows automatically?"
> --> No
>
>
> On Thu, Jul 24, 2014 at 8:48 PM, Keith Wright <kw...@nanigans.com>
> wrote:
>
>> Hi all,
>>
>> We are seeing an issue where basically daily one of our nodes spikes
>> in load and is churning in CMS heap pressure. It appears that reads are
>> backing up and my guess is that our application is reading a large row
>> repeatedly. Our write structure can lead itself to wide rows very
>> infrequently (<0.001%) and we do our best to detect and delete them but
>> obviously we’re missing a case. Hoping for assistance on the following
>> questions:
>>
>> - How can I detect wide rows?
>> - Anyone know what debug level I can set so that I can see what reads
>> the hot node is handling? I’m hoping to see the “bad” row
>> - Any way to get the server to blacklist these wide rows
>> automatically?
>>
>> We’re using C* 2.0.6 with Vnodes.
>>
>> Thanks
>>
>
>
Re: Hot, large row
Posted by Keith Wright <kw...@nanigans.com>.
I can see from cfhistograms that I do have some wide rows (see below). I set trace probability as you suggested but the output doesn’t appear to tell me what row was actually read unless I missed something. I just see executing prepared statement. Any ideas how I can find the row in question?
I am considering reducing read_request_timeout_in_ms: 5000 in cassandra.yaml so that it reduces the impact when this occurs.
Any help in identifying my issue would be GREATLY appreciated
Cell Count per Partition
1 cells: 50449950
2 cells: 14281828
3 cells: 8093366
4 cells: 5029200
5 cells: 3103023
6 cells: 3059903
7 cells: 1903018
8 cells: 1509297
10 cells: 2420359
12 cells: 1624895
14 cells: 1171678
17 cells: 1289391
20 cells: 909777
24 cells: 852081
29 cells: 722925
35 cells: 587067
42 cells: 459473
50 cells: 358744
60 cells: 304146
72 cells: 244682
86 cells: 191045
103 cells: 155337
124 cells: 127061
149 cells: 98913
179 cells: 77454
215 cells: 59849
258 cells: 46117
310 cells: 35321
372 cells: 26319
446 cells: 19379
535 cells: 13783
642 cells: 9993
770 cells: 6973
924 cells: 4713
1109 cells: 3229
1331 cells: 2062
1597 cells: 1338
1916 cells: 773
2299 cells: 495
2759 cells: 268
3311 cells: 150
3973 cells: 100
4768 cells: 42
5722 cells: 24
6866 cells: 12
8239 cells: 9
9887 cells: 3
11864 cells: 0
14237 cells: 5
17084 cells: 1
20501 cells: 0
24601 cells: 2
29521 cells: 0
35425 cells: 0
42510 cells: 0
51012 cells: 0
61214 cells: 2
From: DuyHai Doan <do...@gmail.com>>
Reply-To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Date: Thursday, July 24, 2014 at 3:01 PM
To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Subject: Re: Hot, large row
"How can I detect wide rows?" -->
nodetool cfhistograms <keyspace> <suspected column family>
Look at column "Column count" (last column) and identify a line in this column with very high value of "Offset". In a well designed application you should have a gaussian distribution where 80% of your row have a similar number of columns.
"Anyone know what debug level I can set so that I can see what reads the hot node is handling? " -->
"nodetool settraceprobability <value>", where value is a small number (0.001) on the node where you encounter the issue. Activate the tracing for a while (5 mins) and deactivate it (value = 0). Then look into system_traces tables "events" & "sessions". It may help or not since the tracing is done once every 1000.
"Any way to get the server to blacklist these wide rows automatically?" --> No
On Thu, Jul 24, 2014 at 8:48 PM, Keith Wright <kw...@nanigans.com>> wrote:
Hi all,
We are seeing an issue where basically daily one of our nodes spikes in load and is churning in CMS heap pressure. It appears that reads are backing up and my guess is that our application is reading a large row repeatedly. Our write structure can lead itself to wide rows very infrequently (<0.001%) and we do our best to detect and delete them but obviously we’re missing a case. Hoping for assistance on the following questions:
* How can I detect wide rows?
* Anyone know what debug level I can set so that I can see what reads the hot node is handling? I’m hoping to see the “bad” row
* Any way to get the server to blacklist these wide rows automatically?
We’re using C* 2.0.6 with Vnodes.
Thanks
Re: Hot, large row
Posted by DuyHai Doan <do...@gmail.com>.
"How can I detect wide rows?" -->
nodetool cfhistograms <keyspace> <suspected column family>
Look at column "Column count" (last column) and identify a line in this
column with very high value of "Offset". In a well designed application you
should have a gaussian distribution where 80% of your row have a similar
number of columns.
"Anyone know what debug level I can set so that I can see what reads the
hot node is handling? " -->
"nodetool settraceprobability <value>", where value is a small number
(0.001) on the node where you encounter the issue. Activate the tracing for
a while (5 mins) and deactivate it (value = 0). Then look into
system_traces tables "events" & "sessions". It may help or not since the
tracing is done once every 1000.
"Any way to get the server to blacklist these wide rows automatically?" -->
No
On Thu, Jul 24, 2014 at 8:48 PM, Keith Wright <kw...@nanigans.com> wrote:
> Hi all,
>
> We are seeing an issue where basically daily one of our nodes spikes in
> load and is churning in CMS heap pressure. It appears that reads are
> backing up and my guess is that our application is reading a large row
> repeatedly. Our write structure can lead itself to wide rows very
> infrequently (<0.001%) and we do our best to detect and delete them but
> obviously we’re missing a case. Hoping for assistance on the following
> questions:
>
> - How can I detect wide rows?
> - Anyone know what debug level I can set so that I can see what reads
> the hot node is handling? I’m hoping to see the “bad” row
> - Any way to get the server to blacklist these wide rows automatically?
>
> We’re using C* 2.0.6 with Vnodes.
>
> Thanks
>