You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Morten Vejen Nielsen <mv...@gmail.com> on 2018/12/17 14:04:44 UTC
CommitLog Recovery replay stop on first timestamp after restore-point-in-time

Hi,

I have found a statement in the Datastax documentation regarding CommitLog
recovery that concerns me, namely:

"*Restore stops when the first client-supplied timestamp is greater than
the restore point timestamp. Because the order in which the database
receives mutations does not strictly follow the timestamp order, this can
leave some mutations unrecovered.*"

From:
https://docs.datastax.com/en/cassandra/3.0/cassandra/configuration/configLogArchive.html
Which to me means that point in time restore really doesn't guarantee point
in time replay for the configured time. Since we expect to have mutations
out of order in our setup.

I conducted a few experiments on this myself by forcing my Cassandra
instance to do CommitLog replay with changes ahead in time. But I was not
able to reproduce this behavior.
I used a fresh instance taken from the official Cassandra docker image to
run the tests, so no changes to any configs was done other than setting the
restore_point_in_time as specified below.
I did the experiment as follows:

--edit /etc/cassandra/commitlog_archiving.properties, set
*restore_point_in_time* to something in the near future (lets say 2
hours ahead of server-time)

ssh into instance

cqlsh
create keyspace thezoo with replication =
{'class':'SimpleStrategy','replication_factor':1};
use thezoo;
create table animal (id int primary key, name varchar);
insert into animal (id, name) values (1, 'Bear1');insert into animal
(id, name) values (2, 'Bear2');insert into animal (id, name) values
(3, 'Bear3');insert into animal (id, name) values (4, 'Bear4');insert
into animal (id, name) values (5, 'Bear5');insert into animal (id,
name) values (6, 'Bear6');insert into animal (id, name) values (7,
'Bear7');insert into animal (id, name) values (8, 'Bear8');insert into
animal (id, name) values (9, 'Bear9');insert into animal (id, name)
values (10, 'Bear10');
select id,name,writetime(name) from animal;
--Add some to timestamp, and use this as future_timestamp, must be
ahead of what was defined in commitlog config file
insert into animal (id, name) values (11, 'DuckFromFuture') using
timestamp <future_timestamp>
insert into animal (id, name) values (12, 'Bird1');insert into animal
(id, name) values (13, 'Bird2');insert into animal (id, name) values
(14, 'Bird3');insert into animal (id, name) values (15,
'Bird4');insert into animal (id, name) values (16, 'Bird5');insert
into animal (id, name) values (17, 'Bird6');insert into animal (id,
name) values (18, 'Bird7');insert into animal (id, name) values (19,
'Bird8');insert into animal (id, name) values (20, 'Bird9');insert
into animal (id, name) values (21, 'Bird10');

--Now I simply forced the power off the machine held the power button
down. And restarted

--During startup verify that commitlog replay has been done in log

ssh into instance and enter cqlsh

cqlsh:thezoo> select * from animal;

--Which shows all the bears and birds have been replayed but not the duck!

I also did some digging in the Cassandra source code, and made the
following findings:

I think the code that skips mutations ahead of time is in CommitLogReplayer
class:
See lines: 194-195 (at the time of writing)
if (commitLogReplayer.pointInTimeExceeded(mutation))
       return;
This code is triggerred from CommitLogReader, where readSection seems to be
responsible for reading the commit logs, this is wrapped in a while loop,
that just reads the file until EOF.
See:
 while (statusTracker.shouldContinue() && reader.getFilePointer() < end &&
!reader.isEOF())
This method is called file by file from CommitLog.recover to recover all
commitlog segment files.
And just a note statusTracker.shouldContinue will fail if
statusTracker.requestTermination(); is called but I dont see this being the
case for the pointInTimeExceeded case.

I am a bit concerned if this is some hidden feature in Cassandra, and as
such we might have to revise our backup strategies, if this is the case.
However as far as I can see the Datastax documentation on this is simply
wrong unfortunately the official documentation on this just seems to be
work in progress.
The fact that it doesn't do this is in fact a positive result for me as I
would also expect the point-in-time to guarantee that all mutatations up
until this point-in-time is in fact recovered.

Can anyone confirm if this is just the documentation that is wrong or maybe
if I did something wrong in my experiments.

(For reference I also conducted some experiments with larger amount of data
where the recovery went through multiple commitlog files, but I got the
same results, namely that it recovered ALL records before
restore_point_in_time.)

Best regards
Morten V. Nielsen