You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Sean Fulton (Jira)" <ji...@apache.org> on 2021/07/01 19:18:00 UTC
[jira] [Commented] (CASSANDRA-13810) Overload because of hint pressure + MVs

    [ https://issues.apache.org/jira/browse/CASSANDRA-13810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373006#comment-17373006 ] 

Sean Fulton commented on CASSANDRA-13810:
-----------------------------------------

We had a similar situation and there's very little out there on how to trouble-shoot so I wanted to append our findings here:

We have 1 cluster, 5 data centers and about 30 nodes. About a week ago, 1 node in a 7-node DC went to a system load of 133+, and a day later 3 nodes of a 5-node DC went to a load of 133+. Only error was as above, a burst of failed to apply hints, 0 nodes responded.

We several days on this using a variety of diagnostic tools. All we could determine is that the load was not being generated by an application, and was not hitting the disk. It was massive writes to the memtables that were simply sucking up CPU resources.

Finally we took the step of shutting down hinted handoffs across the cluster, while running htop in one of the impacted nodes. After five nodes had hints turned off, the load on the impacted nodes all dropped to normal. Examination of the five nodes that we had turned hints off on revealed that they had a huge stockpile of old hints files, with an average of about 6,000 hints files per machine.

Using find . -mtime +1 -print  we deleted the hints files older than 1 day and turned hinted handoff back on (nodetool resumehandoff). Everything was fine. We turned hinted handoff back on for all the nodes, and everything remained stable.--

Our theory is that this back-log of old hints files was being replayed onto servers that either no longer had the data or did not need them. Also, it's unclear whether this had anything to do with materialized views–there was a lot of data to the MVs, but there was a lot of writing in general, so it is difficult to tell whether the MV activity was a byproduct of the original problem.

I will say that there was next to no diagnostic info available, even with the logging level set to trace. The error (just like the one above), doesn't tell anything about what hint file it was running, or even where the file originated.

I would recommend that there be debugging–at least in trace mode–that provides the name of the hint file being processed, and where it comes from so that users can trouble-shoot this kind of problem in the future. Had we known it was file X coming from node Y, it would have been a simple matter to go to node Y and realize that there were old hints files laying around that needed to be cleaned out.

 

> Overload because of hint pressure + MVs
> ---------------------------------------
>
>                 Key: CASSANDRA-13810
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13810
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Feature/Materialized Views
>            Reporter: Tom van der Woerdt
>            Priority: Urgent
>              Labels: materializedviews
>             Fix For: 3.0.x, 3.11.x, 4.x
>
>
> Cluster setup: 3 DCs, 20 Cassandra nodes each, all 3.0.14, with approx. 200GB data per machine. Many tables have MVs associated.
> During some maintenance we did a rolling restart of all nodes in the cluster. This caused a buildup of hints/batches, as expected. Most nodes came back just fine, except for two nodes.
> These two nodes came back with a loadavg of >100, and 'nodetool tpstats' showed a million (not exaggerating) MutationStage tasks per second(!). It was clear that these were mostly (all?) mutations coming from hints, as indicated by thousands of log entries per second in debug.log :
> {noformat}
> DEBUG [SharedPool-Worker-107] 2017-08-27 13:16:51,098 HintVerbHandler.java:95 - Failed to apply hint
> java.util.concurrent.CompletionException: org.apache.cassandra.exceptions.WriteTimeoutException: Operation timed out - received only 0 responses.
>     at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292) ~[na:1.8.0_144]
>     at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308) ~[na:1.8.0_144]
>     at java.util.concurrent.CompletableFuture.uniAccept(CompletableFuture.java:647) ~[na:1.8.0_144]
>     at java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:632) ~[na:1.8.0_144]
>     at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) ~[na:1.8.0_144]
>     at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977) ~[na:1.8.0_144]
>     at org.apache.cassandra.db.Keyspace.applyInternal(Keyspace.java:481) ~[apache-cassandra-3.0.14.jar:3.0.14]
>     at org.apache.cassandra.db.Keyspace.lambda$applyInternal$0(Keyspace.java:495) ~[apache-cassandra-3.0.14.jar:3.0.14]
>     at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_144]
>     at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164) ~[apache-cassandra-3.0.14.jar:3.0.14]
>     at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) ~[apache-cassandra-3.0.14.jar:3.0.14]
>     at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_144]
> Caused by: org.apache.cassandra.exceptions.WriteTimeoutException: Operation timed out - received only 0 responses.
>     ... 6 common frames omitted
> {noformat}
> After reading the relevant code, it seems that a hint is considered droppable, and in the mutation path when the table contains a MV and the lock fails to acquire and the mutation is droppable, it throws a WTE without waiting until the timeout expires. This explains why Cassandra is able to process a million mutations per second without actually considering them 'dropped' in the 'nodetool tpstats' output.
> I managed to recover the two nodes by stopping handoffs on all nodes in the cluster and reenabling them one at a time. It's likely that the hint/batchlog settings were sub-optimal on this cluster, but I think that the retry behavior(?) of hints should be improved as it's hard to express hint throughput in kb/s when the mutations can involve MVs.
> More data available upon request -- I'm not sure which bits are relevant and which aren't.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org