You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Jonathan Ellis <jb...@gmail.com> on 2012/10/17 23:27:05 UTC

potential data loss in Cassandra 1.1.0 .. 1.1.4

I wanted to call out a particularly important bug for those who aren't
in the habit of reading CHANGES.

Summary: the bug was fixed in 1.1.5, with an follow-on fix for 1.1.6
that only affects users of 1.1.0 .. 1.1.4.  Thus, if you upgraded from
1.0.x or earlier directly to 1.1.5, you're okay as far as this is
concerned.  But if you used an earlier 1.1 release, you should upgrade
to 1.1.6.

Explanation:

A rewrite of the commitlog code for 1.1.0 used Java's nanotime api to
generate commitlog segment IDs.  This could cause data loss in the
event of a power failure, since we assume commitlog IDs are strictly
increasing in our replay logic.  Simplified, the replay logic looks like this:

1. Take the most recent flush time X for each columnfamily
2. Replay all activity in the commitlog that occurred after X

The problem is that nanotime gets effectively a new random seed after
a reboot.  If the new seed is substantially below the old one, any new
commitlog segments will never be "after" the pre-reboot flush
timestamps.  Subsequently, restarting Cassandra will not replay any
unflushed updates.

We fixed the nanotime problem in 1.1.5 (CASSANDRA-4601).  But, we
didn't realize the implications for replay timestamps until later
(CASSANDRA-4782).  To fix these retroactively, 1.1.6 sets the flush
time of pre-1.1.6 sstables to zero.  Thus, the first startup of 1.1.6
will result in replaying the entire commitlog, including data that may
have already been flushed.

Replaying already-flushed data a second time is harmless -- except for
counters.  So, to avoid replaying flushed counter data, we recommend
performing drain when shutting down the pre-1.1.6 C* prior to upgrade.

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Re: potential data loss in Cassandra 1.1.0 .. 1.1.4

Posted by Jonathan Ellis <jb...@gmail.com>.

On Thu, Oct 18, 2012 at 7:30 AM, Alain RODRIGUEZ <ar...@gmail.com> wrote:
> Hi Jonathan.
>
> We are currently running the datastax AMI on amazon. Cassandra is in version
> 1.1.2.
>
> I guess that the datastax repo (deb http://debian.datastax.com/community
> stable main) will be updated directly in 1.1.6 ?

Yes.

> Could you ask your team to add this specific warning in your documentation
> like here : http://www.datastax.com/docs/1.1/install/expand_ami (we use to
> update to last stable release before expand) or here :
> http://www.datastax.com/docs/1.1/install/upgrading or in any other place
> where this could be useful ?

Good idea, I'll get that noted.  Thanks!

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Re: potential data loss in Cassandra 1.1.0 .. 1.1.4

Posted by Alain RODRIGUEZ <ar...@gmail.com>.

Hi Jonathan.

We are currently running the datastax AMI on amazon. Cassandra is in
version 1.1.2.

I guess that the datastax repo (deb
http://debian.datastax.com/communitystable main) will be updated
directly in 1.1.6 ?

"Replaying already-flushed data a second time is harmless -- except
for counters.
 So, to avoid replaying flushed counter data, we recommend performing drain
when shutting down the pre-1.1.6 C* prior to upgrade."

I'm afraid to forget draining my node before my next update or update +
expand.

Could you ask your team to add this specific warning in your documentation
like here : http://www.datastax.com/docs/1.1/install/expand_ami (we use to
update to last stable release before expand) or here :
http://www.datastax.com/docs/1.1/install/upgrading or in any other place
where this could be useful ?

Having counters replayed would lead to a big mess in our app, I guess there
are more people in our case who could save a lot of time and money with an
up to date documentation.

Anyway, thank you for this bug fix and this warning.

Alain

2012/10/17 Jonathan Ellis <jb...@gmail.com>

> I wanted to call out a particularly important bug for those who aren't
> in the habit of reading CHANGES.
>
> Summary: the bug was fixed in 1.1.5, with an follow-on fix for 1.1.6
> that only affects users of 1.1.0 .. 1.1.4.  Thus, if you upgraded from
> 1.0.x or earlier directly to 1.1.5, you're okay as far as this is
> concerned.  But if you used an earlier 1.1 release, you should upgrade
> to 1.1.6.
>
> Explanation:
>
> A rewrite of the commitlog code for 1.1.0 used Java's nanotime api to
> generate commitlog segment IDs.  This could cause data loss in the
> event of a power failure, since we assume commitlog IDs are strictly
> increasing in our replay logic.  Simplified, the replay logic looks like
> this:
>
> 1. Take the most recent flush time X for each columnfamily
> 2. Replay all activity in the commitlog that occurred after X
>
> The problem is that nanotime gets effectively a new random seed after
> a reboot.  If the new seed is substantially below the old one, any new
> commitlog segments will never be "after" the pre-reboot flush
> timestamps.  Subsequently, restarting Cassandra will not replay any
> unflushed updates.
>
> We fixed the nanotime problem in 1.1.5 (CASSANDRA-4601).  But, we
> didn't realize the implications for replay timestamps until later
> (CASSANDRA-4782).  To fix these retroactively, 1.1.6 sets the flush
> time of pre-1.1.6 sstables to zero.  Thus, the first startup of 1.1.6
> will result in replaying the entire commitlog, including data that may
> have already been flushed.
>
> Replaying already-flushed data a second time is harmless -- except for
> counters.  So, to avoid replaying flushed counter data, we recommend
> performing drain when shutting down the pre-1.1.6 C* prior to upgrade.
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>