You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Nikolai Grigoriev (JIRA)" <ji...@apache.org> on 2014/11/09 04:40:34 UTC
[jira] [Commented] (CASSANDRA-7949) LCS compaction low performance,
many pending compactions, nodes are almost idle
[ https://issues.apache.org/jira/browse/CASSANDRA-7949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14203751#comment-14203751 ]
Nikolai Grigoriev commented on CASSANDRA-7949:
----------------------------------------------
Here is another extreme (but, unfortunately, real) example of LCS going a bit crazy.
{code}
# nodetool cfstats myks.mytable
Keyspace: myks
Read Count: 3006212
Read Latency: 21.02595119106703 ms.
Write Count: 11226340
Write Latency: 1.8405579886231844 ms.
Pending Tasks: 0
Table: wm_contacts
SSTable count: 6530
SSTables in each level: [2369/4, 10, 104/100, 1043/1000, 3004, 0, 0, 0, 0]
Space used (live), bytes: 1113384288740
Space used (total), bytes: 1113406795020
SSTable Compression Ratio: 0.3307170610260717
Number of keys (estimate): 26294144
Memtable cell count: 782994
Memtable data size, bytes: 213472460
Memtable switch count: 3493
Local read count: 3006239
Local read latency: 21.026 ms
Local write count: 11226517
Local write latency: 1.841 ms
Pending tasks: 0
Bloom filter false positives: 41835779
Bloom filter false ratio: 0.97500
Bloom filter space used, bytes: 19666944
Compacted partition minimum bytes: 104
Compacted partition maximum bytes: 3379391
Compacted partition mean bytes: 139451
Average live cells per slice (last five minutes): 444.0
Average tombstones per slice (last five minutes): 0.0
{code}
{code}
# nodetool compactionstats
pending tasks: 190
compaction type keyspace table completed total unit progress
Compaction myks mytable2 7198353690 7446734394 bytes 96.66%
Compaction myks mytable 4851429651 10717052513 bytes 45.27%
Active compaction remaining time : 0h00m04s
{code}
Note the cfstats. The number of sstables at L0 is insane. Yet, C* is sitting quietly compacting the data using 2 cores out of 32.
Once it gets into this state I immediately start seeing large sstables forming - instead of 256Mb the sstables of 1-2Gb and more start appearing. And it creates the snowball effect.
> LCS compaction low performance, many pending compactions, nodes are almost idle
> -------------------------------------------------------------------------------
>
> Key: CASSANDRA-7949
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7949
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Environment: DSE 4.5.1-1, Cassandra 2.0.8
> Reporter: Nikolai Grigoriev
> Attachments: iostats.txt, nodetool_compactionstats.txt, nodetool_tpstats.txt, pending compactions 2day.png, system.log.gz, vmstat.txt
>
>
> I've been evaluating new cluster of 15 nodes (32 core, 6x800Gb SSD disks + 2x600Gb SAS, 128Gb RAM, OEL 6.5) and I've built a simulator that creates the load similar to the load in our future product. Before running the simulator I had to pre-generate enough data. This was done using Java code and DataStax Java driver. To avoid going deep into details, two tables have been generated. Each table currently has about 55M rows and between few dozens and few thousands of columns in each row.
> This data generation process was generating massive amount of non-overlapping data. Thus, the activity was write-only and highly parallel. This is not the type of the traffic that the system will have ultimately to deal with, it will be mix of reads and updates to the existing data in the future. This is just to explain the choice of LCS, not mentioning the expensive SSD disk space.
> At some point while generating the data I have noticed that the compactions started to pile up. I knew that I was overloading the cluster but I still wanted the genration test to complete. I was expecting to give the cluster enough time to finish the pending compactions and get ready for real traffic.
> However, after the storm of write requests have been stopped I have noticed that the number of pending compactions remained constant (and even climbed up a little bit) on all nodes. After trying to tune some parameters (like setting the compaction bandwidth cap to 0) I have noticed a strange pattern: the nodes were compacting one of the CFs in a single stream using virtually no CPU and no disk I/O. This process was taking hours. After that it would be followed by a short burst of few dozens of compactions running in parallel (CPU at 2000%, some disk I/O - up to 10-20%) and then getting stuck again for many hours doing one compaction at time. So it looks like this:
> # nodetool compactionstats
> pending tasks: 3351
> compaction type keyspace table completed total unit progress
> Compaction myks table_list1 66499295588 1910515889913 bytes 3.48%
> Active compaction remaining time : n/a
> # df -h
> ...
> /dev/sdb 1.5T 637G 854G 43% /cassandra-data/disk1
> /dev/sdc 1.5T 425G 1.1T 29% /cassandra-data/disk2
> /dev/sdd 1.5T 429G 1.1T 29% /cassandra-data/disk3
> # find . -name **table_list1**Data** | grep -v snapshot | wc -l
> 1310
> Among these files I see:
> 1043 files of 161Mb (my sstable size is 160Mb)
> 9 large files - 3 between 1 and 2Gb, 3 of 5-8Gb, 55Gb, 70Gb and 370Gb
> 263 files of various sized - between few dozens of Kb and 160Mb
> I've been running the heavy load for about 1,5days and it's been close to 3 days after that and the number of pending compactions does not go down.
> I have applied one of the not-so-obvious recommendations to disable multithreaded compactions and that seems to be helping a bit - I see some nodes started to have fewer pending compactions. About half of the cluster, in fact. But even there I see they are sitting idle most of the time lazily compacting in one stream with CPU at ~140% and occasionally doing the bursts of compaction work for few minutes.
> I am wondering if this is really a bug or something in the LCS logic that would manifest itself only in such an edge case scenario where I have loaded lots of unique data quickly.
> By the way, I see this pattern only for one of two tables - the one that has about 4 times more data than another (space-wise, number of rows is the same). Looks like all these pending compactions are really only for that larger table.
> I'll be attaching the relevant logs shortly.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)