You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Vasileios Vlachos <va...@gmail.com> on 2016/08/24 22:40:44 UTC

Flush activity and dropped messages

Hello,





We have an 8-node cluster spread out in 2 DCs, 4 nodes in each one. We run
C* 2.0.17 on Ubuntu 12.04 at the moment.




Our C# application often throws logs, which correlate with dropped messages
(counter mutations usually) in our logs. We think that if a specific
mutaiton stays in the queue for more than 5 seconds, Cassandra drops it.
This is also suggested by these lines in system.log:

ERROR [ScheduledTasks:1] 2016-08-23 13:29:51,454 MessagingService.java
(line 912) MUTATION messages were dropped in last 5000 ms: 317 for internal
timeout and 0 for cross node timeout
ERROR [ScheduledTasks:1] 2016-08-23 13:29:51,454 MessagingService.java
(line 912) COUNTER_MUTATION messages were dropped in last 5000 ms: 6 for
internal timeout and 0 for cross node timeout
 INFO [ScheduledTasks:1] 2016-08-23 13:29:51,455 StatusLogger.java (line
55) Pool Name                    Active   Pending      Completed   Blocked
 All Time Blocked
 INFO [ScheduledTasks:1] 2016-08-23 13:29:51,455 StatusLogger.java (line
70) ReadStage                         0         0      245177190         0
                0
 INFO [ScheduledTasks:1] 2016-08-23 13:29:51,455 StatusLogger.java (line
70) RequestResponseStage              0         0     3530334509         0
                0
 INFO [ScheduledTasks:1] 2016-08-23 13:29:51,456 StatusLogger.java (line
70) ReadRepairStage                   0         0        1549567         0
                0
 INFO [ScheduledTasks:1] 2016-08-23 13:29:51,456 StatusLogger.java (line
70) MutationStage                    48      1380     2540965500         0
                0
 INFO [ScheduledTasks:1] 2016-08-23 13:29:51,456 StatusLogger.java (line
70) ReplicateOnWriteStage             0         0      189615571         0
                0
 INFO [ScheduledTasks:1] 2016-08-23 13:29:51,457 StatusLogger.java (line
70) GossipStage                       0         0       20586077         0
                0
 INFO [ScheduledTasks:1] 2016-08-23 13:29:51,457 StatusLogger.java (line
70) CacheCleanupExecutor              0         0              0         0
                0
 INFO [ScheduledTasks:1] 2016-08-23 13:29:51,457 StatusLogger.java (line
70) MigrationStage                    0         0            106         0
                0
 INFO [ScheduledTasks:1] 2016-08-23 13:29:51,457 StatusLogger.java (line
70) MemoryMeter                       0         0         303029         0
                0
 INFO [ScheduledTasks:1] 2016-08-23 13:29:51,458 StatusLogger.java (line
70) ValidationExecutor                0         0              0         0
                0
 INFO [ScheduledTasks:1] 2016-08-23 13:29:51,458 StatusLogger.java (line
70) FlushWriter                       1         5         322604         1
             8227
 INFO [ScheduledTasks:1] 2016-08-23 13:29:51,458 StatusLogger.java (line
70) InternalResponseStage             0         0             35         0
                0
 INFO [ScheduledTasks:1] 2016-08-23 13:29:51,459 StatusLogger.java (line
70) AntiEntropyStage                  0         0              0         0
                0
 INFO [ScheduledTasks:1] 2016-08-23 13:29:51,459 StatusLogger.java (line
70) MemtablePostFlusher               1         5         424104         0
                0
 INFO [ScheduledTasks:1] 2016-08-23 13:29:51,459 StatusLogger.java (line
70) MiscStage                         0         0              0         0
                0
 INFO [ScheduledTasks:1] 2016-08-23 13:29:51,460 StatusLogger.java (line
70) PendingRangeCalculator            0         0             37         0
                0
 INFO [ScheduledTasks:1] 2016-08-23 13:29:51,460 StatusLogger.java (line
70) commitlog_archiver                0         0              0         0
                0
 INFO [ScheduledTasks:1] 2016-08-23 13:29:51,461 StatusLogger.java (line
70) CompactionExecutor                4         4        5144499         0
                0
 INFO [ScheduledTasks:1] 2016-08-23 13:29:51,461 StatusLogger.java (line
70) HintedHandoff                     0         0           3194         0
                0
 INFO [ScheduledTasks:1] 2016-08-23 13:29:51,461 StatusLogger.java (line
79) CompactionManager                 1         4
 INFO [ScheduledTasks:1] 2016-08-23 13:29:51,461 StatusLogger.java (line
81) Commitlog                       n/a         0
 INFO [ScheduledTasks:1] 2016-08-23 13:29:51,461 StatusLogger.java (line
93) MessagingService                n/a       0/0
 INFO [ScheduledTasks:1] 2016-08-23 13:29:51,462 StatusLogger.java (line
103) Cache Type                     Size                 Capacity
    KeysToSave
 INFO [ScheduledTasks:1] 2016-08-23 13:29:51,462 StatusLogger.java (line
105) KeyCache                  104828280                104857600
           all
 INFO [ScheduledTasks:1] 2016-08-23 13:29:51,462 StatusLogger.java (line
111) RowCache                          0                        0
           all



So far we have noticed that when our application throws logs, Cassandra
will have FlushWriter Active=1 and Pending=4+ tasks (like shown above).
Just before the tpstats-like output that appears in system.log we seem to
have consecutive flushes which take anywhere between 7-20 seconds, like
shown below:

INFO [FlushWriter:26567] 2016-08-22 21:15:30,550 Memtable.java (line 402)
Completed flushing
/var/lib/cassandra/data/LIVEStats/statsperdealer/LIVEStats-statsperdealer-jb-50127-Data.db
(640031 bytes) for commitlog position
ReplayPosition(segmentId=1465909831301, position=225)
INFO [FlushWriter:26567] 2016-08-22 21:15:30,678 Memtable.java (line 402)
Completed flushing
/var/lib/cassandra/data/LIVEStats/statsperdealerpervrm/LIVEStats-statsperdealerpervrm-jb-33131-Data.db
(893342 bytes) for commitlog position
ReplayPosition(segmentId=1465909831301, position=225)
INFO [FlushWriter:26567] 2016-08-22 21:15:30,706 Memtable.java (line 402)
Completed flushing
/var/lib/cassandra/data/LIVEStats/statsperserver/LIVEStats-statsperserver-jb-48158-Data.db
(4862 bytes) for commitlog position ReplayPosition(segmentId=1465909831301,
position=225)
INFO [FlushWriter:26567] 2016-08-22 21:15:33,881 Memtable.java (line 402)
Completed flushing
/var/lib/cassandra/data/Organami/sessionstate/Organami-sessionstate-jb-55718-Data.db
(76781692 bytes) for commitlog position
ReplayPosition(segmentId=1465909831301, position=225)

INFO [FlushWriter:26567] 2016-08-22 21:15:35,314 Memtable.java (line 402)
Completed flushing
/var/lib/cassandra/data/LIVEStock/stock/LIVEStock-stock-jb-36747-Data.db
(9302046 bytes) for commitlog position
ReplayPosition(segmentId=1465909831301, position=8363)
INFO [FlushWriter:26567] 2016-08-22 21:15:35,341 Memtable.java (line 402)
Completed flushing
/var/lib/cassandra/data/system/compactions_in_progress/system-compactions_in_progress-jb-1018208-Data.db
(161 bytes) for commitlog position ReplayPosition(segmentId=1465909831301,
position=226441)
INFO [FlushWriter:26567] 2016-08-22 21:15:38,719 Memtable.java (line 402)
Completed flushing
/var/lib/cassandra/data/DataMining/quotebyquotereference/DataMining-quotebyquotereference-jb-579845-Data.db
(18754778 bytes) for commitlog position
ReplayPosition(segmentId=1465909831301, position=325555)
INFO [FlushWriter:26567] 2016-08-22 21:15:38,770 Memtable.java (line 402)
Completed flushing
/var/lib/cassandra/data/system/compactions_in_progress/system-compactions_in_progress-jb-1018209-Data.db
(167 bytes) for commitlog position ReplayPosition(segmentId=1465909831301,
position=429568)
INFO [FlushWriter:26567] 2016-08-22 21:15:38,793 Memtable.java (line 402)
Completed flushing
/var/lib/cassandra/data/system/compactions_in_progress/system-compactions_in_progress-jb-1018210-Data.db
(161 bytes) for commitlog position ReplayPosition(segmentId=1465909831301,
position=430537)
INFO [FlushWriter:26567] 2016-08-22 21:15:38,813 Memtable.java (line 402)
Completed flushing
/var/lib/cassandra/data/system/compactions_in_progress/system-compactions_in_progress-jb-1018211-Data.db
(158 bytes) for commitlog position ReplayPosition(segmentId=1465909831301,
position=433145)

Even though no individual memtable seems to be taking a very long time, the
commitlog segment id for the flushes shown above is the same. We think that
this happens because the commitlog reached its limit and is rotating this
particular segment, which triggers flush activity for all memtables
referenced in this segment.

Our questions:





1. Will writes be blocked for the tables listed in the above FlushWriter
output while this commitlog segment is being rotated? If so, will the
blocking last for the entire duration or just for the time taken for the
respective memtable to be flushed to disk?

2. We have also noticed that flushes can also be triggered by MemoryMeter
(probably when occupancy fraction reaches the specified limit, at which
point the largest memtable is flushed to disk). Would it be wise to
increase the commitlog total size, so that MemoryMeter is the main reason
for flush activity? When flushes are triggered due to "memory pressure" we
never seem to through logs in our application.

3. Is there a recommended value for the ratio (total size of Commitlog
segments)/(total size of Memtables)? We use the defaults for both values
which makes the total size for Commitlog segments 1G (32 files of 32M) and
the total size used for Memtables 2G, since we have allocated 8G in total
for Heap.

4. The total space for all memtables seems to be Heap/4; this is 2G for us.
Is there a maximum size for an individual memtable?

5. What is the max queue size for MemtablePostFlusher and exactly happens
after a flush (which I pressume relates to MemtablePostFlusher)? Is it
concerning that we also have five pending tasks there?

Many thanks,


Vasilis

Re: Flush activity and dropped messages

Posted by Vasileios Vlachos <va...@gmail.com>.
Hi Benedict,

This makes sense now. Thank you very much for your input.

Regards,
Vasilis

On 25 Aug 2016 10:30 am, "Benedict Elliott Smith" <be...@apache.org>
wrote:

> You should update from 2.0 to avoid this behaviour, is the simple answer.
> You are correct that when the commit log gets full the memtables are
> flushed to make room.  2.0 has several interrelated problems here though:
>
> There is a maximum flush queue length property (I cannot recall its name),
> and once there are this many memtables flushing, no more writes can take
> place on the box, whatsoever.  You cannot simply increase this length,
> though, because that shrinks the maximum size of any single memtable (it
> is, iirc, total_memtable_space / (1 + flush_writers + max_queue_length)),
> which worsens write-amplification from compaction.
>
> This is because the memory management for memtables in 2.0 was really
> terrible, and this queue length was used to try to ensure the space
> allocated was not exceeded.
>
> Compounding this, when clearing the commit log 2.0 will flush all
> memtables with data in them regardless of it is useful to do so, meaning
> having more tables (that are actively written to) than your max queue
> length will necessarily cause stalls every time you run out of commit log
> space.
>
> In 2.1, none of these concerns apply.
>
>
> On 24 August 2016 at 23:40, Vasileios Vlachos <va...@gmail.com>
> wrote:
>
>> Hello,
>>
>>
>>
>>
>>
>> We have an 8-node cluster spread out in 2 DCs, 4 nodes in each one. We
>> run C* 2.0.17 on Ubuntu 12.04 at the moment.
>>
>>
>>
>>
>> Our C# application often throws logs, which correlate with dropped
>> messages (counter mutations usually) in our logs. We think that if a
>> specific mutaiton stays in the queue for more than 5 seconds, Cassandra
>> drops it. This is also suggested by these lines in system.log:
>>
>> ERROR [ScheduledTasks:1] 2016-08-23 13:29:51,454 MessagingService.java
>> (line 912) MUTATION messages were dropped in last 5000 ms: 317 for internal
>> timeout and 0 for cross node timeout
>> ERROR [ScheduledTasks:1] 2016-08-23 13:29:51,454 MessagingService.java
>> (line 912) COUNTER_MUTATION messages were dropped in last 5000 ms: 6 for
>> internal timeout and 0 for cross node timeout
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,455 StatusLogger.java (line
>> 55) Pool Name                    Active   Pending      Completed   Blocked
>>  All Time Blocked
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,455 StatusLogger.java (line
>> 70) ReadStage                         0         0      245177190         0
>>                 0
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,455 StatusLogger.java (line
>> 70) RequestResponseStage              0         0     3530334509         0
>>                 0
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,456 StatusLogger.java (line
>> 70) ReadRepairStage                   0         0        1549567         0
>>                 0
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,456 StatusLogger.java (line
>> 70) MutationStage                    48      1380     2540965500         0
>>                 0
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,456 StatusLogger.java (line
>> 70) ReplicateOnWriteStage             0         0      189615571         0
>>                 0
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,457 StatusLogger.java (line
>> 70) GossipStage                       0         0       20586077         0
>>                 0
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,457 StatusLogger.java (line
>> 70) CacheCleanupExecutor              0         0              0         0
>>                 0
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,457 StatusLogger.java (line
>> 70) MigrationStage                    0         0            106         0
>>                 0
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,457 StatusLogger.java (line
>> 70) MemoryMeter                       0         0         303029         0
>>                 0
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,458 StatusLogger.java (line
>> 70) ValidationExecutor                0         0              0         0
>>                 0
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,458 StatusLogger.java (line
>> 70) FlushWriter                       1         5         322604         1
>>              8227
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,458 StatusLogger.java (line
>> 70) InternalResponseStage             0         0             35         0
>>                 0
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,459 StatusLogger.java (line
>> 70) AntiEntropyStage                  0         0              0         0
>>                 0
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,459 StatusLogger.java (line
>> 70) MemtablePostFlusher               1         5         424104         0
>>                 0
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,459 StatusLogger.java (line
>> 70) MiscStage                         0         0              0         0
>>                 0
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,460 StatusLogger.java (line
>> 70) PendingRangeCalculator            0         0             37         0
>>                 0
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,460 StatusLogger.java (line
>> 70) commitlog_archiver                0         0              0         0
>>                 0
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,461 StatusLogger.java (line
>> 70) CompactionExecutor                4         4        5144499         0
>>                 0
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,461 StatusLogger.java (line
>> 70) HintedHandoff                     0         0           3194         0
>>                 0
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,461 StatusLogger.java (line
>> 79) CompactionManager                 1         4
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,461 StatusLogger.java (line
>> 81) Commitlog                       n/a         0
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,461 StatusLogger.java (line
>> 93) MessagingService                n/a       0/0
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,462 StatusLogger.java (line
>> 103) Cache Type                     Size                 Capacity
>>     KeysToSave
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,462 StatusLogger.java (line
>> 105) KeyCache                  104828280                104857600
>>            all
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,462 StatusLogger.java (line
>> 111) RowCache                          0                        0
>>            all
>>
>>
>>
>> So far we have noticed that when our application throws logs, Cassandra
>> will have FlushWriter Active=1 and Pending=4+ tasks (like shown above).
>> Just before the tpstats-like output that appears in system.log we seem to
>> have consecutive flushes which take anywhere between 7-20 seconds, like
>> shown below:
>>
>> INFO [FlushWriter:26567] 2016-08-22 21:15:30,550 Memtable.java (line 402)
>> Completed flushing /var/lib/cassandra/data/LIVESt
>> ats/statsperdealer/LIVEStats-statsperdealer-jb-50127-Data.db (640031
>> bytes) for commitlog position ReplayPosition(segmentId=1465909831301,
>> position=225)
>> INFO [FlushWriter:26567] 2016-08-22 21:15:30,678 Memtable.java (line 402)
>> Completed flushing /var/lib/cassandra/data/LIVESt
>> ats/statsperdealerpervrm/LIVEStats-statsperdealerpervrm-jb-33131-Data.db
>> (893342 bytes) for commitlog position ReplayPosition(segmentId=1465909831301,
>> position=225)
>> INFO [FlushWriter:26567] 2016-08-22 21:15:30,706 Memtable.java (line 402)
>> Completed flushing /var/lib/cassandra/data/LIVESt
>> ats/statsperserver/LIVEStats-statsperserver-jb-48158-Data.db (4862
>> bytes) for commitlog position ReplayPosition(segmentId=1465909831301,
>> position=225)
>> INFO [FlushWriter:26567] 2016-08-22 21:15:33,881 Memtable.java (line 402)
>> Completed flushing /var/lib/cassandra/data/Organa
>> mi/sessionstate/Organami-sessionstate-jb-55718-Data.db (76781692 bytes)
>> for commitlog position ReplayPosition(segmentId=1465909831301,
>> position=225)
>> INFO [FlushWriter:26567] 2016-08-22 21:15:35,314 Memtable.java (line 402)
>> Completed flushing /var/lib/cassandra/data/LIVESt
>> ock/stock/LIVEStock-stock-jb-36747-Data.db (9302046 bytes) for commitlog
>> position ReplayPosition(segmentId=1465909831301, position=8363)
>>
>> INFO [FlushWriter:26567] 2016-08-22 21:15:35,341 Memtable.java (line 402)
>> Completed flushing /var/lib/cassandra/data/system
>> /compactions_in_progress/system-compactions_in_progress-jb-1018208-Data.db
>> (161 bytes) for commitlog position ReplayPosition(segmentId=1465909831301,
>> position=226441)
>> INFO [FlushWriter:26567] 2016-08-22 21:15:38,719 Memtable.java (line 402)
>> Completed flushing /var/lib/cassandra/data/DataMi
>> ning/quotebyquotereference/DataMining-quotebyquotereference-jb-579845-Data.db
>> (18754778 bytes) for commitlog position ReplayPosition(segmentId=1465909831301,
>> position=325555)
>> INFO [FlushWriter:26567] 2016-08-22 21:15:38,770 Memtable.java (line 402)
>> Completed flushing /var/lib/cassandra/data/system
>> /compactions_in_progress/system-compactions_in_progress-jb-1018209-Data.db
>> (167 bytes) for commitlog position ReplayPosition(segmentId=1465909831301,
>> position=429568)
>> INFO [FlushWriter:26567] 2016-08-22 21:15:38,793 Memtable.java (line 402)
>> Completed flushing /var/lib/cassandra/data/system
>> /compactions_in_progress/system-compactions_in_progress-jb-1018210-Data.db
>> (161 bytes) for commitlog position ReplayPosition(segmentId=1465909831301,
>> position=430537)
>> INFO [FlushWriter:26567] 2016-08-22 21:15:38,813 Memtable.java (line 402)
>> Completed flushing /var/lib/cassandra/data/system
>> /compactions_in_progress/system-compactions_in_progress-jb-1018211-Data.db
>> (158 bytes) for commitlog position ReplayPosition(segmentId=1465909831301,
>> position=433145)
>>
>> Even though no individual memtable seems to be taking a very long time,
>> the commitlog segment id for the flushes shown above is the same. We think
>> that this happens because the commitlog reached its limit and is rotating
>> this particular segment, which triggers flush activity for all memtables
>> referenced in this segment.
>>
>> Our questions:
>>
>>
>>
>>
>>
>> 1. Will writes be blocked for the tables listed in the above FlushWriter
>> output while this commitlog segment is being rotated? If so, will the
>> blocking last for the entire duration or just for the time taken for the
>> respective memtable to be flushed to disk?
>>
>> 2. We have also noticed that flushes can also be triggered by MemoryMeter
>> (probably when occupancy fraction reaches the specified limit, at which
>> point the largest memtable is flushed to disk). Would it be wise to
>> increase the commitlog total size, so that MemoryMeter is the main reason
>> for flush activity? When flushes are triggered due to "memory pressure" we
>> never seem to through logs in our application.
>>
>> 3. Is there a recommended value for the ratio (total size of Commitlog
>> segments)/(total size of Memtables)? We use the defaults for both values
>> which makes the total size for Commitlog segments 1G (32 files of 32M) and
>> the total size used for Memtables 2G, since we have allocated 8G in total
>> for Heap.
>>
>> 4. The total space for all memtables seems to be Heap/4; this is 2G for
>> us. Is there a maximum size for an individual memtable?
>>
>> 5. What is the max queue size for MemtablePostFlusher and exactly happens
>> after a flush (which I pressume relates to MemtablePostFlusher)? Is it
>> concerning that we also have five pending tasks there?
>>
>> Many thanks,
>>
>>
>> Vasilis
>>
>
>

Re: Flush activity and dropped messages

Posted by Patrick McFadin <pm...@gmail.com>.
It's not that your disks are getting full. I suspect you don't have enough
throughput to handle the type of stress compaction and memtable flushing
produce. Blocked flush writers is almost always a disk problem.

Any storage with the words SAN, NAS, NFS or SATA in them, is going to make
your life miserable. Avoid them at all costs.

Take a look at this doc and scan down to the section on disks:
https://tobert.github.io/pages/als-cassandra-21-tuning-guide.html

Patrick


On Fri, Aug 26, 2016 at 9:07 AM, Vasileios Vlachos <
vasileiosvlachos@gmail.com> wrote:

> Hi Patrick and thanks for your reply,
>
> We are monitoring disk usage and more and we don't seem to be running out
> of space at the moment. We have separate partitions/disks for
> commitlog/data.  Which one do you suspect and why?
>
> Regards,
> Vasilis
>
> On 25 Aug 2016 4:01 pm, "Patrick McFadin" <pm...@gmail.com> wrote:
>
> This looks like you've run out of disk. What are your hardware specs?
>
> Patrick
>
>
> On Thursday, August 25, 2016, Benedict Elliott Smith <be...@apache.org>
> wrote:
>
>> You should update from 2.0 to avoid this behaviour, is the simple
>> answer.  You are correct that when the commit log gets full the memtables
>> are flushed to make room.  2.0 has several interrelated problems here
>> though:
>>
>> There is a maximum flush queue length property (I cannot recall its
>> name), and once there are this many memtables flushing, no more writes can
>> take place on the box, whatsoever.  You cannot simply increase this length,
>> though, because that shrinks the maximum size of any single memtable (it
>> is, iirc, total_memtable_space / (1 + flush_writers + max_queue_length)),
>> which worsens write-amplification from compaction.
>>
>> This is because the memory management for memtables in 2.0 was really
>> terrible, and this queue length was used to try to ensure the space
>> allocated was not exceeded.
>>
>> Compounding this, when clearing the commit log 2.0 will flush all
>> memtables with data in them regardless of it is useful to do so, meaning
>> having more tables (that are actively written to) than your max queue
>> length will necessarily cause stalls every time you run out of commit log
>> space.
>>
>> In 2.1, none of these concerns apply.
>>
>>
>> On 24 August 2016 at 23:40, Vasileios Vlachos <vasileiosvlachos@gmail.com
>> > wrote:
>>
>>> Hello,
>>>
>>>
>>>
>>>
>>>
>>> We have an 8-node cluster spread out in 2 DCs, 4 nodes in each one. We
>>> run C* 2.0.17 on Ubuntu 12.04 at the moment.
>>>
>>>
>>>
>>>
>>> Our C# application often throws logs, which correlate with dropped
>>> messages (counter mutations usually) in our logs. We think that if a
>>> specific mutaiton stays in the queue for more than 5 seconds, Cassandra
>>> drops it. This is also suggested by these lines in system.log:
>>>
>>> ERROR [ScheduledTasks:1] 2016-08-23 13:29:51,454 MessagingService.java
>>> (line 912) MUTATION messages were dropped in last 5000 ms: 317 for internal
>>> timeout and 0 for cross node timeout
>>> ERROR [ScheduledTasks:1] 2016-08-23 13:29:51,454 MessagingService.java
>>> (line 912) COUNTER_MUTATION messages were dropped in last 5000 ms: 6 for
>>> internal timeout and 0 for cross node timeout
>>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,455 StatusLogger.java (line
>>> 55) Pool Name                    Active   Pending      Completed   Blocked
>>>  All Time Blocked
>>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,455 StatusLogger.java (line
>>> 70) ReadStage                         0         0      245177190         0
>>>                 0
>>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,455 StatusLogger.java (line
>>> 70) RequestResponseStage              0         0     3530334509         0
>>>                 0
>>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,456 StatusLogger.java (line
>>> 70) ReadRepairStage                   0         0        1549567         0
>>>                 0
>>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,456 StatusLogger.java (line
>>> 70) MutationStage                    48      1380     2540965500         0
>>>                 0
>>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,456 StatusLogger.java (line
>>> 70) ReplicateOnWriteStage             0         0      189615571         0
>>>                 0
>>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,457 StatusLogger.java (line
>>> 70) GossipStage                       0         0       20586077         0
>>>                 0
>>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,457 StatusLogger.java (line
>>> 70) CacheCleanupExecutor              0         0              0         0
>>>                 0
>>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,457 StatusLogger.java (line
>>> 70) MigrationStage                    0         0            106         0
>>>                 0
>>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,457 StatusLogger.java (line
>>> 70) MemoryMeter                       0         0         303029         0
>>>                 0
>>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,458 StatusLogger.java (line
>>> 70) ValidationExecutor                0         0              0         0
>>>                 0
>>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,458 StatusLogger.java (line
>>> 70) FlushWriter                       1         5         322604         1
>>>              8227
>>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,458 StatusLogger.java (line
>>> 70) InternalResponseStage             0         0             35         0
>>>                 0
>>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,459 StatusLogger.java (line
>>> 70) AntiEntropyStage                  0         0              0         0
>>>                 0
>>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,459 StatusLogger.java (line
>>> 70) MemtablePostFlusher               1         5         424104         0
>>>                 0
>>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,459 StatusLogger.java (line
>>> 70) MiscStage                         0         0              0         0
>>>                 0
>>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,460 StatusLogger.java (line
>>> 70) PendingRangeCalculator            0         0             37         0
>>>                 0
>>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,460 StatusLogger.java (line
>>> 70) commitlog_archiver                0         0              0         0
>>>                 0
>>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,461 StatusLogger.java (line
>>> 70) CompactionExecutor                4         4        5144499         0
>>>                 0
>>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,461 StatusLogger.java (line
>>> 70) HintedHandoff                     0         0           3194         0
>>>                 0
>>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,461 StatusLogger.java (line
>>> 79) CompactionManager                 1         4
>>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,461 StatusLogger.java (line
>>> 81) Commitlog                       n/a         0
>>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,461 StatusLogger.java (line
>>> 93) MessagingService                n/a       0/0
>>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,462 StatusLogger.java (line
>>> 103) Cache Type                     Size                 Capacity
>>>     KeysToSave
>>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,462 StatusLogger.java (line
>>> 105) KeyCache                  104828280                104857600
>>>            all
>>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,462 StatusLogger.java (line
>>> 111) RowCache                          0                        0
>>>            all
>>>
>>>
>>>
>>> So far we have noticed that when our application throws logs, Cassandra
>>> will have FlushWriter Active=1 and Pending=4+ tasks (like shown above).
>>> Just before the tpstats-like output that appears in system.log we seem to
>>> have consecutive flushes which take anywhere between 7-20 seconds, like
>>> shown below:
>>>
>>> INFO [FlushWriter:26567] 2016-08-22 21:15:30,550 Memtable.java (line
>>> 402) Completed flushing /var/lib/cassandra/data/LIVESt
>>> ats/statsperdealer/LIVEStats-statsperdealer-jb-50127-Data.db (640031
>>> bytes) for commitlog position ReplayPosition(segmentId=1465909831301,
>>> position=225)
>>> INFO [FlushWriter:26567] 2016-08-22 21:15:30,678 Memtable.java (line
>>> 402) Completed flushing /var/lib/cassandra/data/LIVESt
>>> ats/statsperdealerpervrm/LIVEStats-statsperdealerpervrm-jb-33131-Data.db
>>> (893342 bytes) for commitlog position ReplayPosition(segmentId=1465909831301,
>>> position=225)
>>> INFO [FlushWriter:26567] 2016-08-22 21:15:30,706 Memtable.java (line
>>> 402) Completed flushing /var/lib/cassandra/data/LIVESt
>>> ats/statsperserver/LIVEStats-statsperserver-jb-48158-Data.db (4862
>>> bytes) for commitlog position ReplayPosition(segmentId=1465909831301,
>>> position=225)
>>> INFO [FlushWriter:26567] 2016-08-22 21:15:33,881 Memtable.java (line
>>> 402) Completed flushing /var/lib/cassandra/data/Organa
>>> mi/sessionstate/Organami-sessionstate-jb-55718-Data.db (76781692 bytes)
>>> for commitlog position ReplayPosition(segmentId=1465909831301,
>>> position=225)
>>> INFO [FlushWriter:26567] 2016-08-22 21:15:35,314 Memtable.java (line
>>> 402) Completed flushing /var/lib/cassandra/data/LIVESt
>>> ock/stock/LIVEStock-stock-jb-36747-Data.db (9302046 bytes) for
>>> commitlog position ReplayPosition(segmentId=1465909831301,
>>> position=8363)
>>> INFO [FlushWriter:26567] 2016-08-22 21:15:35,341 Memtable.java (line
>>> 402) Completed flushing /var/lib/cassandra/data/system
>>> /compactions_in_progress/system-compactions_in_progress-jb-1018208-Data.db
>>> (161 bytes) for commitlog position ReplayPosition(segmentId=1465909831301,
>>> position=226441)
>>> INFO [FlushWriter:26567] 2016-08-22 21:15:38,719 Memtable.java (line
>>> 402) Completed flushing /var/lib/cassandra/data/DataMi
>>> ning/quotebyquotereference/DataMining-quotebyquotereference-jb-579845-Data.db
>>> (18754778 bytes) for commitlog position ReplayPosition(segmentId=1465909831301,
>>> position=325555)
>>> INFO [FlushWriter:26567] 2016-08-22 21:15:38,770 Memtable.java (line
>>> 402) Completed flushing /var/lib/cassandra/data/system
>>> /compactions_in_progress/system-compactions_in_progress-jb-1018209-Data.db
>>> (167 bytes) for commitlog position ReplayPosition(segmentId=1465909831301,
>>> position=429568)
>>> INFO [FlushWriter:26567] 2016-08-22 21:15:38,793 Memtable.java (line
>>> 402) Completed flushing /var/lib/cassandra/data/system
>>> /compactions_in_progress/system-compactions_in_progress-jb-1018210-Data.db
>>> (161 bytes) for commitlog position ReplayPosition(segmentId=1465909831301,
>>> position=430537)
>>> INFO [FlushWriter:26567] 2016-08-22 21:15:38,813 Memtable.java (line
>>> 402) Completed flushing /var/lib/cassandra/data/system
>>> /compactions_in_progress/system-compactions_in_progress-jb-1018211-Data.db
>>> (158 bytes) for commitlog position ReplayPosition(segmentId=1465909831301,
>>> position=433145)
>>>
>>> Even though no individual memtable seems to be taking a very long time,
>>> the commitlog segment id for the flushes shown above is the same. We think
>>> that this happens because the commitlog reached its limit and is rotating
>>> this particular segment, which triggers flush activity for all memtables
>>> referenced in this segment.
>>>
>>> Our questions:
>>>
>>>
>>>
>>>
>>>
>>> 1. Will writes be blocked for the tables listed in the above FlushWriter
>>> output while this commitlog segment is being rotated? If so, will the
>>> blocking last for the entire duration or just for the time taken for the
>>> respective memtable to be flushed to disk?
>>>
>>> 2. We have also noticed that flushes can also be triggered by
>>> MemoryMeter (probably when occupancy fraction reaches the specified limit,
>>> at which point the largest memtable is flushed to disk). Would it be wise
>>> to increase the commitlog total size, so that MemoryMeter is the main
>>> reason for flush activity? When flushes are triggered due to "memory
>>> pressure" we never seem to through logs in our application.
>>>
>>> 3. Is there a recommended value for the ratio (total size of Commitlog
>>> segments)/(total size of Memtables)? We use the defaults for both values
>>> which makes the total size for Commitlog segments 1G (32 files of 32M) and
>>> the total size used for Memtables 2G, since we have allocated 8G in total
>>> for Heap.
>>>
>>> 4. The total space for all memtables seems to be Heap/4; this is 2G for
>>> us. Is there a maximum size for an individual memtable?
>>>
>>> 5. What is the max queue size for MemtablePostFlusher and exactly
>>> happens after a flush (which I pressume relates to MemtablePostFlusher)? Is
>>> it concerning that we also have five pending tasks there?
>>>
>>> Many thanks,
>>>
>>>
>>> Vasilis
>>>
>>
>>
>

Re: Flush activity and dropped messages

Posted by Vasileios Vlachos <va...@gmail.com>.
Hi Patrick and thanks for your reply,

We are monitoring disk usage and more and we don't seem to be running out
of space at the moment. We have separate partitions/disks for
commitlog/data.  Which one do you suspect and why?

Regards,
Vasilis

On 25 Aug 2016 4:01 pm, "Patrick McFadin" <pm...@gmail.com> wrote:

This looks like you've run out of disk. What are your hardware specs?

Patrick


On Thursday, August 25, 2016, Benedict Elliott Smith <be...@apache.org>
wrote:

> You should update from 2.0 to avoid this behaviour, is the simple answer.
> You are correct that when the commit log gets full the memtables are
> flushed to make room.  2.0 has several interrelated problems here though:
>
> There is a maximum flush queue length property (I cannot recall its name),
> and once there are this many memtables flushing, no more writes can take
> place on the box, whatsoever.  You cannot simply increase this length,
> though, because that shrinks the maximum size of any single memtable (it
> is, iirc, total_memtable_space / (1 + flush_writers + max_queue_length)),
> which worsens write-amplification from compaction.
>
> This is because the memory management for memtables in 2.0 was really
> terrible, and this queue length was used to try to ensure the space
> allocated was not exceeded.
>
> Compounding this, when clearing the commit log 2.0 will flush all
> memtables with data in them regardless of it is useful to do so, meaning
> having more tables (that are actively written to) than your max queue
> length will necessarily cause stalls every time you run out of commit log
> space.
>
> In 2.1, none of these concerns apply.
>
>
> On 24 August 2016 at 23:40, Vasileios Vlachos <va...@gmail.com>
> wrote:
>
>> Hello,
>>
>>
>>
>>
>>
>> We have an 8-node cluster spread out in 2 DCs, 4 nodes in each one. We
>> run C* 2.0.17 on Ubuntu 12.04 at the moment.
>>
>>
>>
>>
>> Our C# application often throws logs, which correlate with dropped
>> messages (counter mutations usually) in our logs. We think that if a
>> specific mutaiton stays in the queue for more than 5 seconds, Cassandra
>> drops it. This is also suggested by these lines in system.log:
>>
>> ERROR [ScheduledTasks:1] 2016-08-23 13:29:51,454 MessagingService.java
>> (line 912) MUTATION messages were dropped in last 5000 ms: 317 for internal
>> timeout and 0 for cross node timeout
>> ERROR [ScheduledTasks:1] 2016-08-23 13:29:51,454 MessagingService.java
>> (line 912) COUNTER_MUTATION messages were dropped in last 5000 ms: 6 for
>> internal timeout and 0 for cross node timeout
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,455 StatusLogger.java (line
>> 55) Pool Name                    Active   Pending      Completed   Blocked
>>  All Time Blocked
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,455 StatusLogger.java (line
>> 70) ReadStage                         0         0      245177190         0
>>                 0
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,455 StatusLogger.java (line
>> 70) RequestResponseStage              0         0     3530334509         0
>>                 0
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,456 StatusLogger.java (line
>> 70) ReadRepairStage                   0         0        1549567         0
>>                 0
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,456 StatusLogger.java (line
>> 70) MutationStage                    48      1380     2540965500         0
>>                 0
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,456 StatusLogger.java (line
>> 70) ReplicateOnWriteStage             0         0      189615571         0
>>                 0
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,457 StatusLogger.java (line
>> 70) GossipStage                       0         0       20586077         0
>>                 0
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,457 StatusLogger.java (line
>> 70) CacheCleanupExecutor              0         0              0         0
>>                 0
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,457 StatusLogger.java (line
>> 70) MigrationStage                    0         0            106         0
>>                 0
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,457 StatusLogger.java (line
>> 70) MemoryMeter                       0         0         303029         0
>>                 0
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,458 StatusLogger.java (line
>> 70) ValidationExecutor                0         0              0         0
>>                 0
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,458 StatusLogger.java (line
>> 70) FlushWriter                       1         5         322604         1
>>              8227
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,458 StatusLogger.java (line
>> 70) InternalResponseStage             0         0             35         0
>>                 0
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,459 StatusLogger.java (line
>> 70) AntiEntropyStage                  0         0              0         0
>>                 0
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,459 StatusLogger.java (line
>> 70) MemtablePostFlusher               1         5         424104         0
>>                 0
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,459 StatusLogger.java (line
>> 70) MiscStage                         0         0              0         0
>>                 0
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,460 StatusLogger.java (line
>> 70) PendingRangeCalculator            0         0             37         0
>>                 0
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,460 StatusLogger.java (line
>> 70) commitlog_archiver                0         0              0         0
>>                 0
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,461 StatusLogger.java (line
>> 70) CompactionExecutor                4         4        5144499         0
>>                 0
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,461 StatusLogger.java (line
>> 70) HintedHandoff                     0         0           3194         0
>>                 0
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,461 StatusLogger.java (line
>> 79) CompactionManager                 1         4
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,461 StatusLogger.java (line
>> 81) Commitlog                       n/a         0
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,461 StatusLogger.java (line
>> 93) MessagingService                n/a       0/0
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,462 StatusLogger.java (line
>> 103) Cache Type                     Size                 Capacity
>>     KeysToSave
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,462 StatusLogger.java (line
>> 105) KeyCache                  104828280                104857600
>>            all
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,462 StatusLogger.java (line
>> 111) RowCache                          0                        0
>>            all
>>
>>
>>
>> So far we have noticed that when our application throws logs, Cassandra
>> will have FlushWriter Active=1 and Pending=4+ tasks (like shown above).
>> Just before the tpstats-like output that appears in system.log we seem to
>> have consecutive flushes which take anywhere between 7-20 seconds, like
>> shown below:
>>
>> INFO [FlushWriter:26567] 2016-08-22 21:15:30,550 Memtable.java (line 402)
>> Completed flushing /var/lib/cassandra/data/LIVESt
>> ats/statsperdealer/LIVEStats-statsperdealer-jb-50127-Data.db (640031
>> bytes) for commitlog position ReplayPosition(segmentId=1465909831301,
>> position=225)
>> INFO [FlushWriter:26567] 2016-08-22 21:15:30,678 Memtable.java (line 402)
>> Completed flushing /var/lib/cassandra/data/LIVESt
>> ats/statsperdealerpervrm/LIVEStats-statsperdealerpervrm-jb-33131-Data.db
>> (893342 bytes) for commitlog position ReplayPosition(segmentId=1465909831301,
>> position=225)
>> INFO [FlushWriter:26567] 2016-08-22 21:15:30,706 Memtable.java (line 402)
>> Completed flushing /var/lib/cassandra/data/LIVESt
>> ats/statsperserver/LIVEStats-statsperserver-jb-48158-Data.db (4862
>> bytes) for commitlog position ReplayPosition(segmentId=1465909831301,
>> position=225)
>> INFO [FlushWriter:26567] 2016-08-22 21:15:33,881 Memtable.java (line 402)
>> Completed flushing /var/lib/cassandra/data/Organa
>> mi/sessionstate/Organami-sessionstate-jb-55718-Data.db (76781692 bytes)
>> for commitlog position ReplayPosition(segmentId=1465909831301,
>> position=225)
>> INFO [FlushWriter:26567] 2016-08-22 21:15:35,314 Memtable.java (line 402)
>> Completed flushing /var/lib/cassandra/data/LIVESt
>> ock/stock/LIVEStock-stock-jb-36747-Data.db (9302046 bytes) for commitlog
>> position ReplayPosition(segmentId=1465909831301, position=8363)
>>
>> INFO [FlushWriter:26567] 2016-08-22 21:15:35,341 Memtable.java (line 402)
>> Completed flushing /var/lib/cassandra/data/system
>> /compactions_in_progress/system-compactions_in_progress-jb-1018208-Data.db
>> (161 bytes) for commitlog position ReplayPosition(segmentId=1465909831301,
>> position=226441)
>> INFO [FlushWriter:26567] 2016-08-22 21:15:38,719 Memtable.java (line 402)
>> Completed flushing /var/lib/cassandra/data/DataMi
>> ning/quotebyquotereference/DataMining-quotebyquotereference-jb-579845-Data.db
>> (18754778 bytes) for commitlog position ReplayPosition(segmentId=1465909831301,
>> position=325555)
>> INFO [FlushWriter:26567] 2016-08-22 21:15:38,770 Memtable.java (line 402)
>> Completed flushing /var/lib/cassandra/data/system
>> /compactions_in_progress/system-compactions_in_progress-jb-1018209-Data.db
>> (167 bytes) for commitlog position ReplayPosition(segmentId=1465909831301,
>> position=429568)
>> INFO [FlushWriter:26567] 2016-08-22 21:15:38,793 Memtable.java (line 402)
>> Completed flushing /var/lib/cassandra/data/system
>> /compactions_in_progress/system-compactions_in_progress-jb-1018210-Data.db
>> (161 bytes) for commitlog position ReplayPosition(segmentId=1465909831301,
>> position=430537)
>> INFO [FlushWriter:26567] 2016-08-22 21:15:38,813 Memtable.java (line 402)
>> Completed flushing /var/lib/cassandra/data/system
>> /compactions_in_progress/system-compactions_in_progress-jb-1018211-Data.db
>> (158 bytes) for commitlog position ReplayPosition(segmentId=1465909831301,
>> position=433145)
>>
>> Even though no individual memtable seems to be taking a very long time,
>> the commitlog segment id for the flushes shown above is the same. We think
>> that this happens because the commitlog reached its limit and is rotating
>> this particular segment, which triggers flush activity for all memtables
>> referenced in this segment.
>>
>> Our questions:
>>
>>
>>
>>
>>
>> 1. Will writes be blocked for the tables listed in the above FlushWriter
>> output while this commitlog segment is being rotated? If so, will the
>> blocking last for the entire duration or just for the time taken for the
>> respective memtable to be flushed to disk?
>>
>> 2. We have also noticed that flushes can also be triggered by MemoryMeter
>> (probably when occupancy fraction reaches the specified limit, at which
>> point the largest memtable is flushed to disk). Would it be wise to
>> increase the commitlog total size, so that MemoryMeter is the main reason
>> for flush activity? When flushes are triggered due to "memory pressure" we
>> never seem to through logs in our application.
>>
>> 3. Is there a recommended value for the ratio (total size of Commitlog
>> segments)/(total size of Memtables)? We use the defaults for both values
>> which makes the total size for Commitlog segments 1G (32 files of 32M) and
>> the total size used for Memtables 2G, since we have allocated 8G in total
>> for Heap.
>>
>> 4. The total space for all memtables seems to be Heap/4; this is 2G for
>> us. Is there a maximum size for an individual memtable?
>>
>> 5. What is the max queue size for MemtablePostFlusher and exactly happens
>> after a flush (which I pressume relates to MemtablePostFlusher)? Is it
>> concerning that we also have five pending tasks there?
>>
>> Many thanks,
>>
>>
>> Vasilis
>>
>
>

Re: Flush activity and dropped messages

Posted by Patrick McFadin <pm...@gmail.com>.
This looks like you've run out of disk. What are your hardware specs?

Patrick

On Thursday, August 25, 2016, Benedict Elliott Smith <be...@apache.org>
wrote:

> You should update from 2.0 to avoid this behaviour, is the simple answer.
> You are correct that when the commit log gets full the memtables are
> flushed to make room.  2.0 has several interrelated problems here though:
>
> There is a maximum flush queue length property (I cannot recall its name),
> and once there are this many memtables flushing, no more writes can take
> place on the box, whatsoever.  You cannot simply increase this length,
> though, because that shrinks the maximum size of any single memtable (it
> is, iirc, total_memtable_space / (1 + flush_writers + max_queue_length)),
> which worsens write-amplification from compaction.
>
> This is because the memory management for memtables in 2.0 was really
> terrible, and this queue length was used to try to ensure the space
> allocated was not exceeded.
>
> Compounding this, when clearing the commit log 2.0 will flush all
> memtables with data in them regardless of it is useful to do so, meaning
> having more tables (that are actively written to) than your max queue
> length will necessarily cause stalls every time you run out of commit log
> space.
>
> In 2.1, none of these concerns apply.
>
>
> On 24 August 2016 at 23:40, Vasileios Vlachos <vasileiosvlachos@gmail.com
> <javascript:_e(%7B%7D,'cvml','vasileiosvlachos@gmail.com');>> wrote:
>
>> Hello,
>>
>>
>>
>>
>>
>> We have an 8-node cluster spread out in 2 DCs, 4 nodes in each one. We
>> run C* 2.0.17 on Ubuntu 12.04 at the moment.
>>
>>
>>
>>
>> Our C# application often throws logs, which correlate with dropped
>> messages (counter mutations usually) in our logs. We think that if a
>> specific mutaiton stays in the queue for more than 5 seconds, Cassandra
>> drops it. This is also suggested by these lines in system.log:
>>
>> ERROR [ScheduledTasks:1] 2016-08-23 13:29:51,454 MessagingService.java
>> (line 912) MUTATION messages were dropped in last 5000 ms: 317 for internal
>> timeout and 0 for cross node timeout
>> ERROR [ScheduledTasks:1] 2016-08-23 13:29:51,454 MessagingService.java
>> (line 912) COUNTER_MUTATION messages were dropped in last 5000 ms: 6 for
>> internal timeout and 0 for cross node timeout
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,455 StatusLogger.java (line
>> 55) Pool Name                    Active   Pending      Completed   Blocked
>>  All Time Blocked
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,455 StatusLogger.java (line
>> 70) ReadStage                         0         0      245177190         0
>>                 0
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,455 StatusLogger.java (line
>> 70) RequestResponseStage              0         0     3530334509         0
>>                 0
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,456 StatusLogger.java (line
>> 70) ReadRepairStage                   0         0        1549567         0
>>                 0
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,456 StatusLogger.java (line
>> 70) MutationStage                    48      1380     2540965500         0
>>                 0
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,456 StatusLogger.java (line
>> 70) ReplicateOnWriteStage             0         0      189615571         0
>>                 0
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,457 StatusLogger.java (line
>> 70) GossipStage                       0         0       20586077         0
>>                 0
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,457 StatusLogger.java (line
>> 70) CacheCleanupExecutor              0         0              0         0
>>                 0
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,457 StatusLogger.java (line
>> 70) MigrationStage                    0         0            106         0
>>                 0
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,457 StatusLogger.java (line
>> 70) MemoryMeter                       0         0         303029         0
>>                 0
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,458 StatusLogger.java (line
>> 70) ValidationExecutor                0         0              0         0
>>                 0
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,458 StatusLogger.java (line
>> 70) FlushWriter                       1         5         322604         1
>>              8227
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,458 StatusLogger.java (line
>> 70) InternalResponseStage             0         0             35         0
>>                 0
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,459 StatusLogger.java (line
>> 70) AntiEntropyStage                  0         0              0         0
>>                 0
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,459 StatusLogger.java (line
>> 70) MemtablePostFlusher               1         5         424104         0
>>                 0
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,459 StatusLogger.java (line
>> 70) MiscStage                         0         0              0         0
>>                 0
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,460 StatusLogger.java (line
>> 70) PendingRangeCalculator            0         0             37         0
>>                 0
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,460 StatusLogger.java (line
>> 70) commitlog_archiver                0         0              0         0
>>                 0
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,461 StatusLogger.java (line
>> 70) CompactionExecutor                4         4        5144499         0
>>                 0
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,461 StatusLogger.java (line
>> 70) HintedHandoff                     0         0           3194         0
>>                 0
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,461 StatusLogger.java (line
>> 79) CompactionManager                 1         4
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,461 StatusLogger.java (line
>> 81) Commitlog                       n/a         0
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,461 StatusLogger.java (line
>> 93) MessagingService                n/a       0/0
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,462 StatusLogger.java (line
>> 103) Cache Type                     Size                 Capacity
>>     KeysToSave
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,462 StatusLogger.java (line
>> 105) KeyCache                  104828280                104857600
>>            all
>>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,462 StatusLogger.java (line
>> 111) RowCache                          0                        0
>>            all
>>
>>
>>
>> So far we have noticed that when our application throws logs, Cassandra
>> will have FlushWriter Active=1 and Pending=4+ tasks (like shown above).
>> Just before the tpstats-like output that appears in system.log we seem to
>> have consecutive flushes which take anywhere between 7-20 seconds, like
>> shown below:
>>
>> INFO [FlushWriter:26567] 2016-08-22 21:15:30,550 Memtable.java (line 402)
>> Completed flushing /var/lib/cassandra/data/LIVESt
>> ats/statsperdealer/LIVEStats-statsperdealer-jb-50127-Data.db (640031
>> bytes) for commitlog position ReplayPosition(segmentId=1465909831301,
>> position=225)
>> INFO [FlushWriter:26567] 2016-08-22 21:15:30,678 Memtable.java (line 402)
>> Completed flushing /var/lib/cassandra/data/LIVESt
>> ats/statsperdealerpervrm/LIVEStats-statsperdealerpervrm-jb-33131-Data.db
>> (893342 bytes) for commitlog position ReplayPosition(segmentId=1465909831301,
>> position=225)
>> INFO [FlushWriter:26567] 2016-08-22 21:15:30,706 Memtable.java (line 402)
>> Completed flushing /var/lib/cassandra/data/LIVESt
>> ats/statsperserver/LIVEStats-statsperserver-jb-48158-Data.db (4862
>> bytes) for commitlog position ReplayPosition(segmentId=1465909831301,
>> position=225)
>> INFO [FlushWriter:26567] 2016-08-22 21:15:33,881 Memtable.java (line 402)
>> Completed flushing /var/lib/cassandra/data/Organa
>> mi/sessionstate/Organami-sessionstate-jb-55718-Data.db (76781692 bytes)
>> for commitlog position ReplayPosition(segmentId=1465909831301,
>> position=225)
>> INFO [FlushWriter:26567] 2016-08-22 21:15:35,314 Memtable.java (line 402)
>> Completed flushing /var/lib/cassandra/data/LIVESt
>> ock/stock/LIVEStock-stock-jb-36747-Data.db (9302046 bytes) for commitlog
>> position ReplayPosition(segmentId=1465909831301, position=8363)
>>
>> INFO [FlushWriter:26567] 2016-08-22 21:15:35,341 Memtable.java (line 402)
>> Completed flushing /var/lib/cassandra/data/system
>> /compactions_in_progress/system-compactions_in_progress-jb-1018208-Data.db
>> (161 bytes) for commitlog position ReplayPosition(segmentId=1465909831301,
>> position=226441)
>> INFO [FlushWriter:26567] 2016-08-22 21:15:38,719 Memtable.java (line 402)
>> Completed flushing /var/lib/cassandra/data/DataMi
>> ning/quotebyquotereference/DataMining-quotebyquotereference-jb-579845-Data.db
>> (18754778 bytes) for commitlog position ReplayPosition(segmentId=1465909831301,
>> position=325555)
>> INFO [FlushWriter:26567] 2016-08-22 21:15:38,770 Memtable.java (line 402)
>> Completed flushing /var/lib/cassandra/data/system
>> /compactions_in_progress/system-compactions_in_progress-jb-1018209-Data.db
>> (167 bytes) for commitlog position ReplayPosition(segmentId=1465909831301,
>> position=429568)
>> INFO [FlushWriter:26567] 2016-08-22 21:15:38,793 Memtable.java (line 402)
>> Completed flushing /var/lib/cassandra/data/system
>> /compactions_in_progress/system-compactions_in_progress-jb-1018210-Data.db
>> (161 bytes) for commitlog position ReplayPosition(segmentId=1465909831301,
>> position=430537)
>> INFO [FlushWriter:26567] 2016-08-22 21:15:38,813 Memtable.java (line 402)
>> Completed flushing /var/lib/cassandra/data/system
>> /compactions_in_progress/system-compactions_in_progress-jb-1018211-Data.db
>> (158 bytes) for commitlog position ReplayPosition(segmentId=1465909831301,
>> position=433145)
>>
>> Even though no individual memtable seems to be taking a very long time,
>> the commitlog segment id for the flushes shown above is the same. We think
>> that this happens because the commitlog reached its limit and is rotating
>> this particular segment, which triggers flush activity for all memtables
>> referenced in this segment.
>>
>> Our questions:
>>
>>
>>
>>
>>
>> 1. Will writes be blocked for the tables listed in the above FlushWriter
>> output while this commitlog segment is being rotated? If so, will the
>> blocking last for the entire duration or just for the time taken for the
>> respective memtable to be flushed to disk?
>>
>> 2. We have also noticed that flushes can also be triggered by MemoryMeter
>> (probably when occupancy fraction reaches the specified limit, at which
>> point the largest memtable is flushed to disk). Would it be wise to
>> increase the commitlog total size, so that MemoryMeter is the main reason
>> for flush activity? When flushes are triggered due to "memory pressure" we
>> never seem to through logs in our application.
>>
>> 3. Is there a recommended value for the ratio (total size of Commitlog
>> segments)/(total size of Memtables)? We use the defaults for both values
>> which makes the total size for Commitlog segments 1G (32 files of 32M) and
>> the total size used for Memtables 2G, since we have allocated 8G in total
>> for Heap.
>>
>> 4. The total space for all memtables seems to be Heap/4; this is 2G for
>> us. Is there a maximum size for an individual memtable?
>>
>> 5. What is the max queue size for MemtablePostFlusher and exactly happens
>> after a flush (which I pressume relates to MemtablePostFlusher)? Is it
>> concerning that we also have five pending tasks there?
>>
>> Many thanks,
>>
>>
>> Vasilis
>>
>
>

Re: Flush activity and dropped messages

Posted by Benedict Elliott Smith <be...@apache.org>.
You should update from 2.0 to avoid this behaviour, is the simple answer.
You are correct that when the commit log gets full the memtables are
flushed to make room.  2.0 has several interrelated problems here though:

There is a maximum flush queue length property (I cannot recall its name),
and once there are this many memtables flushing, no more writes can take
place on the box, whatsoever.  You cannot simply increase this length,
though, because that shrinks the maximum size of any single memtable (it
is, iirc, total_memtable_space / (1 + flush_writers + max_queue_length)),
which worsens write-amplification from compaction.

This is because the memory management for memtables in 2.0 was really
terrible, and this queue length was used to try to ensure the space
allocated was not exceeded.

Compounding this, when clearing the commit log 2.0 will flush all memtables
with data in them regardless of it is useful to do so, meaning having more
tables (that are actively written to) than your max queue length will
necessarily cause stalls every time you run out of commit log space.

In 2.1, none of these concerns apply.


On 24 August 2016 at 23:40, Vasileios Vlachos <va...@gmail.com>
wrote:

> Hello,
>
>
>
>
>
> We have an 8-node cluster spread out in 2 DCs, 4 nodes in each one. We run
> C* 2.0.17 on Ubuntu 12.04 at the moment.
>
>
>
>
> Our C# application often throws logs, which correlate with dropped
> messages (counter mutations usually) in our logs. We think that if a
> specific mutaiton stays in the queue for more than 5 seconds, Cassandra
> drops it. This is also suggested by these lines in system.log:
>
> ERROR [ScheduledTasks:1] 2016-08-23 13:29:51,454 MessagingService.java
> (line 912) MUTATION messages were dropped in last 5000 ms: 317 for internal
> timeout and 0 for cross node timeout
> ERROR [ScheduledTasks:1] 2016-08-23 13:29:51,454 MessagingService.java
> (line 912) COUNTER_MUTATION messages were dropped in last 5000 ms: 6 for
> internal timeout and 0 for cross node timeout
>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,455 StatusLogger.java (line
> 55) Pool Name                    Active   Pending      Completed   Blocked
>  All Time Blocked
>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,455 StatusLogger.java (line
> 70) ReadStage                         0         0      245177190         0
>                 0
>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,455 StatusLogger.java (line
> 70) RequestResponseStage              0         0     3530334509         0
>                 0
>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,456 StatusLogger.java (line
> 70) ReadRepairStage                   0         0        1549567         0
>                 0
>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,456 StatusLogger.java (line
> 70) MutationStage                    48      1380     2540965500         0
>                 0
>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,456 StatusLogger.java (line
> 70) ReplicateOnWriteStage             0         0      189615571         0
>                 0
>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,457 StatusLogger.java (line
> 70) GossipStage                       0         0       20586077         0
>                 0
>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,457 StatusLogger.java (line
> 70) CacheCleanupExecutor              0         0              0         0
>                 0
>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,457 StatusLogger.java (line
> 70) MigrationStage                    0         0            106         0
>                 0
>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,457 StatusLogger.java (line
> 70) MemoryMeter                       0         0         303029         0
>                 0
>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,458 StatusLogger.java (line
> 70) ValidationExecutor                0         0              0         0
>                 0
>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,458 StatusLogger.java (line
> 70) FlushWriter                       1         5         322604         1
>              8227
>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,458 StatusLogger.java (line
> 70) InternalResponseStage             0         0             35         0
>                 0
>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,459 StatusLogger.java (line
> 70) AntiEntropyStage                  0         0              0         0
>                 0
>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,459 StatusLogger.java (line
> 70) MemtablePostFlusher               1         5         424104         0
>                 0
>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,459 StatusLogger.java (line
> 70) MiscStage                         0         0              0         0
>                 0
>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,460 StatusLogger.java (line
> 70) PendingRangeCalculator            0         0             37         0
>                 0
>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,460 StatusLogger.java (line
> 70) commitlog_archiver                0         0              0         0
>                 0
>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,461 StatusLogger.java (line
> 70) CompactionExecutor                4         4        5144499         0
>                 0
>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,461 StatusLogger.java (line
> 70) HintedHandoff                     0         0           3194         0
>                 0
>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,461 StatusLogger.java (line
> 79) CompactionManager                 1         4
>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,461 StatusLogger.java (line
> 81) Commitlog                       n/a         0
>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,461 StatusLogger.java (line
> 93) MessagingService                n/a       0/0
>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,462 StatusLogger.java (line
> 103) Cache Type                     Size                 Capacity
>     KeysToSave
>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,462 StatusLogger.java (line
> 105) KeyCache                  104828280                104857600
>            all
>  INFO [ScheduledTasks:1] 2016-08-23 13:29:51,462 StatusLogger.java (line
> 111) RowCache                          0                        0
>            all
>
>
>
> So far we have noticed that when our application throws logs, Cassandra
> will have FlushWriter Active=1 and Pending=4+ tasks (like shown above).
> Just before the tpstats-like output that appears in system.log we seem to
> have consecutive flushes which take anywhere between 7-20 seconds, like
> shown below:
>
> INFO [FlushWriter:26567] 2016-08-22 21:15:30,550 Memtable.java (line 402)
> Completed flushing /var/lib/cassandra/data/LIVEStats/statsperdealer/
> LIVEStats-statsperdealer-jb-50127-Data.db (640031 bytes) for commitlog
> position ReplayPosition(segmentId=1465909831301, position=225)
> INFO [FlushWriter:26567] 2016-08-22 21:15:30,678 Memtable.java (line 402)
> Completed flushing /var/lib/cassandra/data/LIVEStats/statsperdealerpervrm/
> LIVEStats-statsperdealerpervrm-jb-33131-Data.db (893342 bytes) for
> commitlog position ReplayPosition(segmentId=1465909831301, position=225)
> INFO [FlushWriter:26567] 2016-08-22 21:15:30,706 Memtable.java (line 402)
> Completed flushing /var/lib/cassandra/data/LIVEStats/statsperserver/
> LIVEStats-statsperserver-jb-48158-Data.db (4862 bytes) for commitlog
> position ReplayPosition(segmentId=1465909831301, position=225)
>
> INFO [FlushWriter:26567] 2016-08-22 21:15:33,881 Memtable.java (line 402)
> Completed flushing /var/lib/cassandra/data/Organami/sessionstate/
> Organami-sessionstate-jb-55718-Data.db (76781692 bytes) for commitlog
> position ReplayPosition(segmentId=1465909831301, position=225)
>
> INFO [FlushWriter:26567] 2016-08-22 21:15:35,314 Memtable.java (line 402)
> Completed flushing /var/lib/cassandra/data/LIVEStock/stock/LIVEStock-stock-jb-36747-Data.db
> (9302046 bytes) for commitlog position ReplayPosition(segmentId=1465909831301,
> position=8363)
> INFO [FlushWriter:26567] 2016-08-22 21:15:35,341 Memtable.java (line 402)
> Completed flushing /var/lib/cassandra/data/system/compactions_in_
> progress/system-compactions_in_progress-jb-1018208-Data.db (161 bytes)
> for commitlog position ReplayPosition(segmentId=1465909831301,
> position=226441)
> INFO [FlushWriter:26567] 2016-08-22 21:15:38,719 Memtable.java (line 402)
> Completed flushing /var/lib/cassandra/data/DataMining/
> quotebyquotereference/DataMining-quotebyquotereference-jb-579845-Data.db
> (18754778 bytes) for commitlog position ReplayPosition(segmentId=1465909831301,
> position=325555)
> INFO [FlushWriter:26567] 2016-08-22 21:15:38,770 Memtable.java (line 402)
> Completed flushing /var/lib/cassandra/data/system/compactions_in_
> progress/system-compactions_in_progress-jb-1018209-Data.db (167 bytes)
> for commitlog position ReplayPosition(segmentId=1465909831301,
> position=429568)
> INFO [FlushWriter:26567] 2016-08-22 21:15:38,793 Memtable.java (line 402)
> Completed flushing /var/lib/cassandra/data/system/compactions_in_
> progress/system-compactions_in_progress-jb-1018210-Data.db (161 bytes)
> for commitlog position ReplayPosition(segmentId=1465909831301,
> position=430537)
> INFO [FlushWriter:26567] 2016-08-22 21:15:38,813 Memtable.java (line 402)
> Completed flushing /var/lib/cassandra/data/system/compactions_in_
> progress/system-compactions_in_progress-jb-1018211-Data.db (158 bytes)
> for commitlog position ReplayPosition(segmentId=1465909831301,
> position=433145)
>
> Even though no individual memtable seems to be taking a very long time,
> the commitlog segment id for the flushes shown above is the same. We think
> that this happens because the commitlog reached its limit and is rotating
> this particular segment, which triggers flush activity for all memtables
> referenced in this segment.
>
> Our questions:
>
>
>
>
>
> 1. Will writes be blocked for the tables listed in the above FlushWriter
> output while this commitlog segment is being rotated? If so, will the
> blocking last for the entire duration or just for the time taken for the
> respective memtable to be flushed to disk?
>
> 2. We have also noticed that flushes can also be triggered by MemoryMeter
> (probably when occupancy fraction reaches the specified limit, at which
> point the largest memtable is flushed to disk). Would it be wise to
> increase the commitlog total size, so that MemoryMeter is the main reason
> for flush activity? When flushes are triggered due to "memory pressure" we
> never seem to through logs in our application.
>
> 3. Is there a recommended value for the ratio (total size of Commitlog
> segments)/(total size of Memtables)? We use the defaults for both values
> which makes the total size for Commitlog segments 1G (32 files of 32M) and
> the total size used for Memtables 2G, since we have allocated 8G in total
> for Heap.
>
> 4. The total space for all memtables seems to be Heap/4; this is 2G for
> us. Is there a maximum size for an individual memtable?
>
> 5. What is the max queue size for MemtablePostFlusher and exactly happens
> after a flush (which I pressume relates to MemtablePostFlusher)? Is it
> concerning that we also have five pending tasks there?
>
> Many thanks,
>
>
> Vasilis
>