You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Michał Łowicki <ml...@gmail.com> on 2015/09/26 16:34:01 UTC
compaction became super slow after interrupted repair
Hi,
Running C* 2.1.8 cluster in two data centers with 6 nodes each. I've
started running repair sequentially on each node (`nodetool repair
--parallel --in-local-dc`).
While running repair number of SSTables grows radically as well as pending
compaction tasks. It's fine as node usually recovers within couple of hours
after finishing repair (
https://www.dropbox.com/s/xzcndf5596mq7rm/Screenshot%202015-09-26%2016.17.44.png?dl=0).
One experiment showed that increasing compaction throughput and number of
compactors mitigates this problem.
Unfortunately one node didn't recovered... (
https://www.dropbox.com/s/nphnsaf2rbfm0bq/Screenshot%202015-09-26%2016.20.56.png?dl=0).
I needed to interrupt repair as node was running out of disk space. I hoped
that within couple of hours node will catch up with compaction but it
didn't happen even after 5 days.
I've tried to increase throughput, disable throttling, increasing number of
compactors, disabling binary / thrift / gossip, increasing heap size,
restarting but still compaction is super slow.
Tried today to run scrub:
root@db2:~# nodetool scrub sync
Aborted scrubbing atleast one column family in keyspace sync, check server
logs for more information.
error: nodetool failed, check server logs
-- StackTrace --
java.lang.RuntimeException: nodetool failed, check server logs
at
org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:290)
at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:202)
as well as cleanup:
root@db2:~# nodetool cleanup
Aborted cleaning up atleast one column family in keyspace sync, check
server logs for more information.
error: nodetool failed, check server logs
-- StackTrace --
java.lang.RuntimeException: nodetool failed, check server logs
at
org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:290)
at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:202)
Couldn't find anything in logs regarding these runtime exceptions (see log
here - https://www.dropbox.com/s/flmii7fgpyp07q2/db2.lati.system.log?dl=0).
Note that I'm experiencing CASSANDRA-9935 while running repair on each node
from the cluster.
Any help will be much appreciated.
--
BR,
Michał Łowicki