You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Hannu Kröger <hk...@gmail.com> on 2016/03/22 12:32:04 UTC

Scenarios when a node can be missing writes

Hi,

I'm trying to reason the possible scenarios when a node of a C* cluster is
not getting the writes and the data needs some sort of anti-entropy
(repair, read-repair, etc.). In what cases does the coordinator not realize
that a write failed and doesn't replay the write from hinted handoff table?

1) The obvious case: A node is down and doesn't recover before hinted
handoff seconds has passed. Or hinted handoff is disabled altogether. In
this case a node will miss data and repair is needed.

2) Another obvious: Disk / filesystem problems. Repair is needed.

3) Node is up and receives the write but is too overloaded to handle it and
drops the mutation. This should be visible in tpstats as dropped mutation.
Does the write still stay in the hinted handoff table of the coordinator
and if so, when is it replayed if the node is seemingly up all the time? Or
is it assumed that if dropped mutations > 0 then repair is needed?

4) Node receives the write but goes down while writing the stuff to disk.
The write should be either in the commit log OR the coordinator does not
receive an OK for it. There is the small window (10s) when OK is given but
data is not synced to disk if "commitlog_sync" is "periodic" (which it is
by default) and "commitlog_sync_period_in_ms" is 10 seconds. Can this be a
cause of node missing writes if the server has stayed on for the whole time
and only cassandra has restarted?

Any other scenarios?

Cheers,
Hannu Kröger

Re: Scenarios when a node can be missing writes

Posted by Alain RODRIGUEZ <ar...@gmail.com>.
Hi,

1) and 2) I understand it the same way you do :-).


> 3) Node is up and receives the write but is too overloaded to handle it
> and drops the mutation. This should be visible in tpstats as dropped
> mutation. Does the write still stay in the hinted handoff table of the
> coordinator and if so, when is it replayed if the node is seemingly up all
> the time? Or is it assumed that if dropped mutations > 0 then repair is
> needed?



Off the top of my head:

Yes, hints are directly stored for nodes already marked down, but also
after a write timeouts. It is just longer to get there as we are waiting
for the node (marked up) to fail before storing hints. I believe hints are
handed off every time gossip marks a node up and every 10 min there is a
check in case some hints were stored on a brief outage where the node
wasn't marked down.

Or is it assumed that if dropped mutations > 0 then repair is needed


So no, hinted handoff should handle that too. Also read repair might
eventually help.

More information about this here:
http://fr.slideshare.net/ClmentLARDEUR/deep-into-cassandra-data-repair-mechanisms.
This doc is old but I believe it is still mostly relevant

4) Node receives the write but goes down while writing the stuff to disk.
> The write should be either in the commit log OR the coordinator does not
> receive an OK for it. There is the small window (10s) when OK is given but
> data is not synced to disk if "commitlog_sync" is "periodic" (which it is
> by default) and "commitlog_sync_period_in_ms" is 10 seconds. Can this be a
> cause of node missing writes if the server has stayed on for the whole time
> and only cassandra has restarted?


>From the outdated Cassandra doc:
https://wiki.apache.org/cassandra/Durability

"Cassandra's default configuration sets the commitlog_sync mode to
periodic, causing the commitlog to be synced every
commitlog_sync_period_in_ms milliseconds, so you can potentially lose up to
that much data if all replicas crash within that window of time. This
default behavior is decently performant even when the commitlog shares a
disk with data directories. You can also select batch mode, where Cassandra
will guarantee that it syncs before acknowledging writes. To avoid syncing
after every write, Cassandra groups the mutations into batches and syncs
every commitlog_batch_window_in_ms. When using this mode, we strongly
recommend putting your commitlog on a separate, dedicated device, as
described above."

I guess in this case you miss the hint indeed (if Cassandra crashes, if
stopping though drain + stop, you're fine). But I can be wrong.

 Any other scenarios?


I guess it is always possible to imagine about a twisted scenario, some
corner cases. Any time we spot one we try to report and fix it to have
Cassandra more reliable, and people is doing it for 5+ years now. I would
say that you should be fine in most cases. If you have a doubt, regular
repairs should regularly get rid of any remaining entropy in your cluster.


Globally, the way to go for you is probably to go through the code and
check for this all, or wait that someone wiser answer your questions :-).

I can add that in 4+ years using Cassandra I never lost data. At least I
never noticed it. Doing operational things properly should keep you out of
trouble, Cassandra is now quite reliable.

C*heers,
-----------------------
Alain Rodriguez - alain@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

2016-03-22 12:32 GMT+01:00 Hannu Kröger <hk...@gmail.com>:

> Hi,
>
> I'm trying to reason the possible scenarios when a node of a C* cluster is
> not getting the writes and the data needs some sort of anti-entropy
> (repair, read-repair, etc.). In what cases does the coordinator not realize
> that a write failed and doesn't replay the write from hinted handoff table?
>
> 1) The obvious case: A node is down and doesn't recover before hinted
> handoff seconds has passed. Or hinted handoff is disabled altogether. In
> this case a node will miss data and repair is needed.
>
> 2) Another obvious: Disk / filesystem problems. Repair is needed.
>
> 3) Node is up and receives the write but is too overloaded to handle it
> and drops the mutation. This should be visible in tpstats as dropped
> mutation. Does the write still stay in the hinted handoff table of the
> coordinator and if so, when is it replayed if the node is seemingly up all
> the time? Or is it assumed that if dropped mutations > 0 then repair is
> needed?
>
> 4) Node receives the write but goes down while writing the stuff to disk.
> The write should be either in the commit log OR the coordinator does not
> receive an OK for it. There is the small window (10s) when OK is given but
> data is not synced to disk if "commitlog_sync" is "periodic" (which it is
> by default) and "commitlog_sync_period_in_ms" is 10 seconds. Can this be a
> cause of node missing writes if the server has stayed on for the whole time
> and only cassandra has restarted?
>
> Any other scenarios?
>
> Cheers,
> Hannu Kröger
>