You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Rudi Bruchez <ru...@babaluga.com> on 2017/08/27 21:45:56 UTC
timeouts on counter tables
Hello,
On a 3 nodes cluster (nodes : 48 procs, 32 Go RAM, SSD), I've timeouts
on counter table UPDATEs.
One node is specifically slow, generating timeouts. IO bound. iotop
shows consistently about 300 Mb/s reads, and writes are around 100 ko/s,
changing.
The keys seem well distributed.
The application uses a PHP driver, token aware, and sends updates
asynchronously from 11 client machines.
I don't know what could be the cause :
- too many concurrent UPDATE in async mode ?
- a counter type problem ? We've given 1 Gb for counter cache.
- disk ? SSD with software RAID 1
- key hotspot ?
I've compiled some information below. If someone has suggestions or
other checks or lines of thought I might pursue, that'd be great !
----------------------------------------
Cassandra version 3.11.0
*iostat* shows something like that on the slow node (software RAID 1 on
sda and sdb)
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 2,00 0,00 2160,00 0,00 169,20 0,00
160,43 147,10 68,53 68,53 0,00 0,46 100,00
sdb 1,00 0,00 1289,00 0,00 87,35 0,00
138,79 148,00 109,07 109,07 0,00 0,78 100,00
*nodetools status*
UN X.X.X.X 52.15 GiB 256 66,7%
UN X.X.X.X 54.86 GiB 256 69,3%
UN X.X.X.X 49.18 GiB 256 64,0%
*table structure*
CREATE TABLE document_search (
id_document bigint,
search_type ascii,
searchkeyword_id bigint,
nb_click counter,
nb_display counter,
PRIMARY KEY ((id_document, search_type), searchkeyword_id)
) WITH CLUSTERING ORDER BY(searchkeyword_id ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class':
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class':
'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';
*2 examples of nodetool tpstats at 2 different times
*
1
Pool Name Active Pending Completed Blocked All time blocked
Native-Transport-Requests 128 1083 1824166 0 0
CounterMutationStage 32 338 710480 0 0
2
Pool Name Active Pending Completed Blocked All time blocked
ReadStage 32 758 418822 0 0
CounterMutationStage 0 0 98310 0 0
*tablestats*
nodetool tablestats document_search
Total number of tables: 43
----------------
Read Count: 0
Read Latency: NaN ms.
Write Count: 288636
Write Latency: 2.354803579595061 ms.
Pending Flushes: 0
SSTable count: 11
Space used (live): 19683318113
Space used (total): 19683318113
Space used by snapshots (total): 0
Off heap memory used (total): 39258415
SSTable Compression Ratio: 0.3099081738824526
Number of keys (estimate): 4397936
Memtable cell count: 169182
Memtable data size: 20761379
Memtable off heap memory used: 0
Memtable switch count: 0
Local read count: 0
Local read latency: NaN ms
Local write count: 169182
Local write latency: NaN ms
Pending flushes: 0
Percent repaired: 61.58
Bloom filter false positives: 1
Bloom filter false ratio: 0,00000
Bloom filter space used: 26271840
Bloom filter off heap memory used: 26271752
Index summary off heap memory used: 5496319
Compression metadata off heap memory used: 7490344
Compacted partition minimum bytes: 104
Compacted partition maximum bytes: 4055269
Compacted partition mean bytes: 3206
Average live cells per slice (last five minutes): NaN
Maximum live cells per slice (last five minutes): 0
Average tombstones per slice (last five minutes): NaN
Maximum tombstones per slice (last five minutes): 0
Dropped Mutations: 19804
*nodetool info*
Gossip active : true
Thrift active : true
Native Transport active: true
Load : 53.85 GiB
Generation No : 1503674199
Uptime (seconds) : 194310
Heap Memory (MB) : 4663,19 / 7774,75
Off Heap Memory (MB) : 208,24
Exceptions : 0
Key Cache : entries 11987913, size 1,09 GiB, capacity 2
GiB, 129046135 hits, 144375554 requests, 0,894 recent hit rate, 14400
save period in seconds
Row Cache : entries 0, size 0 bytes, capacity 0 bytes, 0
hits, 0 requests, NaN recent hit rate, 0 save period in seconds
Counter Cache : entries 7579853, size 1 GiB, capacity 1 GiB,
9479923 hits, 39619041 requests, 0,239 recent hit rate, 7200 save period
in seconds
Chunk Cache : entries 97792, size 5,97 GiB, capacity 5,97
GiB, 38965356 misses, 182409581 requests, 0,786 recent hit rate, 56,113
microseconds miss latency
Percent Repaired : 46.78765116584098%
Re: timeouts on counter tables
Posted by Rudi Bruchez <ru...@babaluga.com>.
Le 28/08/2017 à 03:30, kurt greaves a écrit :
> If every node is a replica it sounds like you've got hardware issues.
> Have you compared iostat to the "normal" nodes? I assume there is
> nothing different in the logs on this one node?
> Also sanity check, you are using DCAwareRoundRobinPolicy?
>
Thanks for the answer, I had to concentrate on other things for a few
days, I'm back to that problem.
The PHP Driver call is :
$cassandrabuilder->withDatacenterAwareRoundRobinLoadBalancingPolicy("mycluster",
0, false)->withTokenAwareRouting(true)->withSchemaMetadata(true);
After that, the call is done like this :
$result = $cassandra->execute(new Cassandra\SimpleStatement($query));
$cassandrasession->executeAsync($this->queryPrepared, array('arguments'
=> $values));
Could the async call put too much pressure on the server ? Calls from 11
client machines.
Thanks !
Re: timeouts on counter tables
Posted by Rudi Bruchez <ru...@babaluga.com>.
I'm going to try different options. Do any of you have some experience
with tweaking one of those conf parameters to improve read throughput,
especially in case of counter tables ?
1/ using SSD :
trickle_fsync: true
trickle_fsync_interval_in_kb: 1024
2/ concurrent_compactors to the number of cores.
3/ concurrent_counter_writes
4/ Row Cache vs Chunk Cache
5/ change the compaction method to leveled, specifically when using
counter columns ??
thanks !
>> On 3 September 2017 at 20:25, Rudi Bruchez <rudi@babaluga.com
>> <ma...@babaluga.com>> wrote:
>>
>> Le 30/08/2017 à 05:33, Erick Ramirez a écrit :
>>> Is it possible at all that you may have a data hotspot if it's
>>> not hardware-related?
>>>
>>>
>> It does not seem so, The partition key seems well distributed and
>> the queries update different keys.
>>
>> We have dropped counter_mutation messages in the log :
>>
>> COUNTER_MUTATION messages were dropped in last 5000 ms: 0
>> internal and 2 cross node. Mean internal dropped latency: 0 ms
>> and Mean cross-node dropped latency: 5960 ms
>>
>> Pool Name Active Pending Completed
>> Blocked All Time Blocked
>> ReadStage 32 503 7481787 0 0
>> CounterMutationStage 32 221 5722101
>> 0 0
>>
>> The load could be too high ?
>>
>> Thanks
>>
>>
>
Re: timeouts on counter tables
Posted by Rudi Bruchez <ru...@babaluga.com>.
It can happen on any of the nodes. We can have a large number of pending
on ReadStage and CounterMutationStage. We'll try to increase
concurrent_counter_writes to see how it changes things
> Likely. I believe counter mutations are a tad more expensive than a
> normal mutation. If you're doing a lot of counter updates that
> probably doesn't help. Regardless, high amounts of pending
> reads/mutations is generally not good and indicates the node being
> overloaded. Are you just seeing this on the 1 node with IO issues or
> do other nodes have this problem as well?
>
> On 3 September 2017 at 20:25, Rudi Bruchez <rudi@babaluga.com
> <ma...@babaluga.com>> wrote:
>
> Le 30/08/2017 à 05:33, Erick Ramirez a écrit :
>> Is it possible at all that you may have a data hotspot if it's
>> not hardware-related?
>>
>>
> It does not seem so, The partition key seems well distributed and
> the queries update different keys.
>
> We have dropped counter_mutation messages in the log :
>
> COUNTER_MUTATION messages were dropped in last 5000 ms: 0 internal
> and 2 cross node. Mean internal dropped latency: 0 ms and Mean
> cross-node dropped latency: 5960 ms
>
> Pool Name Active Pending Completed Blocked
> All Time Blocked
> ReadStage 32 503 7481787 0 0
> CounterMutationStage 32 221 5722101
> 0 0
>
> The load could be too high ?
>
> Thanks
>
>
Re: timeouts on counter tables
Posted by kurt greaves <ku...@instaclustr.com>.
Likely. I believe counter mutations are a tad more expensive than a normal
mutation. If you're doing a lot of counter updates that probably doesn't
help. Regardless, high amounts of pending reads/mutations is generally not
good and indicates the node being overloaded. Are you just seeing this on
the 1 node with IO issues or do other nodes have this problem as well?
On 3 September 2017 at 20:25, Rudi Bruchez <ru...@babaluga.com> wrote:
> Le 30/08/2017 à 05:33, Erick Ramirez a écrit :
>
> Is it possible at all that you may have a data hotspot if it's not
> hardware-related?
>
>
> It does not seem so, The partition key seems well distributed and the
> queries update different keys.
>
> We have dropped counter_mutation messages in the log :
>
> COUNTER_MUTATION messages were dropped in last 5000 ms: 0 internal and 2
> cross node. Mean internal dropped latency: 0 ms and Mean cross-node dropped
> latency: 5960 ms
>
> Pool Name Active Pending Completed Blocked
> All Time Blocked
> ReadStage 32 503 7481787
> 0 0
> CounterMutationStage 32 221 5722101
> 0 0
>
> The load could be too high ?
>
> Thanks
>
Re: timeouts on counter tables
Posted by Rudi Bruchez <ru...@babaluga.com>.
Le 30/08/2017 à 05:33, Erick Ramirez a écrit :
> Is it possible at all that you may have a data hotspot if it's not
> hardware-related?
>
>
It does not seem so, The partition key seems well distributed and the
queries update different keys.
We have dropped counter_mutation messages in the log :
COUNTER_MUTATION messages were dropped in last 5000 ms: 0 internal and 2
cross node. Mean internal dropped latency: 0 ms and Mean cross-node
dropped latency: 5960 ms
Pool Name Active Pending Completed Blocked
All Time Blocked
ReadStage 32 503 7481787
0 0
CounterMutationStage 32 221 5722101 0 0
The load could be too high ?
Thanks
Re: timeouts on counter tables
Posted by Erick Ramirez <fl...@gmail.com>.
Is it possible at all that you may have a data hotspot if it's not
hardware-related?
On Mon, Aug 28, 2017 at 11:30 AM, kurt greaves <ku...@instaclustr.com> wrote:
> If every node is a replica it sounds like you've got hardware issues. Have
> you compared iostat to the "normal" nodes? I assume there is nothing
> different in the logs on this one node?
> Also sanity check, you are using DCAwareRoundRobinPolicy?
>
>
Re: timeouts on counter tables
Posted by kurt greaves <ku...@instaclustr.com>.
If every node is a replica it sounds like you've got hardware issues. Have
you compared iostat to the "normal" nodes? I assume there is nothing
different in the logs on this one node?
Also sanity check, you are using DCAwareRoundRobinPolicy?
Re: timeouts on counter tables
Posted by Rudi Bruchez <ru...@babaluga.com>.
Le 28/08/2017 à 00:11, kurt greaves a écrit :
> What is your RF?
>
> Also, as a side note RAID 1 shouldn't be necessary if you have >1 RF
> and would give you worse performance
2 + 1 on a backup single node. Consistency one. You're right about RAID
1, if the disk perf is the problem, that might be a way to improve on
that part. Still it's strange that only one node suffers from IO problems.
Re: timeouts on counter tables
Posted by kurt greaves <ku...@instaclustr.com>.
What is your RF?
Also, as a side note RAID 1 shouldn't be necessary if you have >1 RF and
would give you worse performance