You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Terje Marthinussen <tm...@gmail.com> on 2011/04/24 16:12:58 UTC

multithreaded compaction causes mutation storms?

Tested out multithreaded compaction in 0.8 last night.

We had first fed some data with compaction disabled so there was 1000+
sstables on the nodes and I decided to enable multithreaded compaction on
one of them to see how it performed vs. nodes that had no compaction at all.

Since this was sort of to see how it could perform, I set throughput to
128MB/sec (knowing that this was probably a bit more than it could manage)

It quickly generated 24 tmp files for the main CF (24 compaction threads?),
the CPUs got maxed out 90% (2x6 cores) and I started seeing these

 INFO [FlushWriter:1] 2011-04-24 03:07:46,776 Memtable.java (line 238)
Writing Memtable-Test@757483679(23549136/385697094 serialized/live bytes,
32026 ops)
 WARN [ScheduledTasks:1] 2011-04-24 03:07:46,946 MessagingService.java (line
548) Dropped 36506 MUTATION messages in the last 5000ms
 INFO [ScheduledTasks:1] 2011-04-24 03:07:46,947 StatusLogger.java (line 50)
Pool Name                    Active   Pending
 INFO [ScheduledTasks:1] 2011-04-24 03:07:46,947 StatusLogger.java (line 65)
ReadStage                         0         0
 INFO [ScheduledTasks:1] 2011-04-24 03:07:46,948 StatusLogger.java (line 65)
RequestResponseStage              0         3
 INFO [ScheduledTasks:1] 2011-04-24 03:07:46,948 StatusLogger.java (line 65)
ReadRepairStage                   0         0
 INFO [ScheduledTasks:1] 2011-04-24 03:07:46,948 StatusLogger.java (line 65)
MutationStage                    10     39549
 INFO [ScheduledTasks:1] 2011-04-24 03:07:46,949 StatusLogger.java (line 65)
ReplicateOnWriteStage             0         0

That the system is a bit overloaded is not really the question (I wanted to
find out what it could manage), but the curious part is that when checking
tpstats, the Mutation stage was mostly idle, however at seemingly regular
intervals it would get massive amounts of mutations.

Not sure if it could be related, but the log message always showed up just
before the "StatusLogger" printout (but not necessarily before all of them)

Some sort of internal event occuring causing these mutation storms or
something which ends up synchronizing the compaction threads in a way that
causes mutations storms like these?

The messages went away a little while after reducing the throughput
significantly to 6MB/sec...

It does not seem to be a problem normally, just when doing something extreme
like enabling multithreaded compaction when you have hundreds or thousands
of memtables already.

Regards,
Terje