You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Benedict (JIRA)" <ji...@apache.org> on 2013/12/04 10:58:37 UTC
[jira] [Commented] (CASSANDRA-6364) There should be different disk_failure_policies for data and commit volumes or commit volume failure should always cause node exit

    [ https://issues.apache.org/jira/browse/CASSANDRA-6364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13838786#comment-13838786 ] 

Benedict commented on CASSANDRA-6364:
-------------------------------------

How far do we want to go with this?

Adding a simple exit on error is very straightforward, but in my experience you can have hang-style failures, so we should definitely have a separate thread checking the liveness of the CLSegmentManager and CLService. Probably a user-configurable not-alive time in the yaml should be used to mark the CL as non-responsive if either hasn't heartbeated in that time. Probably we don't want to immediately die on an error too, but simply not heartbeat and die if the error doesn't recover in some interval, so that anyone monitoring the error logs has time to correct the issue (let's say it's just out of space) before it dies.

The bigger question is, do we want to do anything clever if we don't want to die? Should we start draining the mutation stage and just dropping the messages? If so, should we attempt to recover if the drive starts responding again after draining the mutation stage?



> There should be different disk_failure_policies for data and commit volumes or commit volume failure should always cause node exit
> ----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-6364
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6364
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: JBOD, single dedicated commit disk
>            Reporter: J. Ryan Earl
>            Assignee: Benedict
>             Fix For: 2.0.4
>
>
> We're doing fault testing on a pre-production Cassandra cluster.  One of the tests was to simulation failure of the commit volume/disk, which in our case is on a dedicated disk.  We expected failure of the commit volume to be handled somehow, but what we found was that no action was taken by Cassandra when the commit volume fail.  We simulated this simply by pulling the physical disk that backed the commit volume, which resulted in filesystem I/O errors on the mount point.
> What then happened was that the Cassandra Heap filled up to the point that it was spending 90% of its time doing garbage collection.  No errors were logged in regards to the failed commit volume.  Gossip on other nodes in the cluster eventually flagged the node as down.  Gossip on the local node showed itself as up, and all other nodes as down.
> The most serious problem was that connections to the coordinator on this node became very slow due to the on-going GC, as I assume uncommitted writes piled up on the JVM heap.  What we believe should have happened is that Cassandra should have caught the I/O error and exited with a useful log message, or otherwise done some sort of useful cleanup.  Otherwise the node goes into a sort of Zombie state, spending most of its time in GC, and thus slowing down any transactions that happen to use the coordinator on said node.
> A limit on in-memory, unflushed writes before refusing requests may also work.  Point being, something should be done to handle the commit volume dying as doing nothing results in affecting the entire cluster.  I should note, we are using: disk_failure_policy: best_effort



--
This message was sent by Atlassian JIRA
(v6.1#6144)