You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@cassandra.apache.org by Morten Vejen Nielsen <mv...@gmail.com> on 2018/12/20 10:06:53 UTC

CommitLog Recovery replay stop on first timestamp after restore-point-in-time

Hi,

(Moved from user mailing list to here)

I have found a statement in the Datastax documentation regarding CommitLog
recovery that concerns me, namely:

"*Restore stops when the first client-supplied timestamp is greater than
the restore point timestamp. Because the order in which the database
receives mutations does not strictly follow the timestamp order, this can
leave some mutations unrecovered.*"

From:
https://docs.datastax.com/en/cassandra/3.0/cassandra/configuration/configLogArchive.html
Which to me means that point in time restore really doesn't guarantee point
in time replay for the configured time. Since we expect to have mutations
out of order in our setup.

I conducted a few experiments on this myself by forcing my Cassandra
instance to do CommitLog replay with changes ahead in time. But I was not
able to reproduce this behavior.
I used a fresh instance taken from the official Cassandra docker image to
run the tests, so no changes to any configs was done other than setting the
restore_point_in_time as specified below.
I did the experiment as follows:

--edit /etc/cassandra/commitlog_archiving.properties, set
*restore_point_in_time* to something in the near future (lets say 2
hours ahead of server-time)

ssh into instance

cqlsh
create keyspace thezoo with replication =
{'class':'SimpleStrategy','replication_factor':1};
use thezoo;
create table animal (id int primary key, name varchar);
insert into animal (id, name) values (1, 'Bear1');insert into animal
(id, name) values (2, 'Bear2');insert into animal (id, name) values
(3, 'Bear3');insert into animal (id, name) values (4, 'Bear4');insert
into animal (id, name) values (5, 'Bear5');insert into animal (id,
name) values (6, 'Bear6');insert into animal (id, name) values (7,
'Bear7');insert into animal (id, name) values (8, 'Bear8');insert into
animal (id, name) values (9, 'Bear9');insert into animal (id, name)
values (10, 'Bear10');
select id,name,writetime(name) from animal;
--Add some to timestamp, and use this as future_timestamp, must be
ahead of what was defined in commitlog config file
insert into animal (id, name) values (11, 'DuckFromFuture') using
timestamp <future_timestamp>
insert into animal (id, name) values (12, 'Bird1');insert into animal
(id, name) values (13, 'Bird2');insert into animal (id, name) values
(14, 'Bird3');insert into animal (id, name) values (15,
'Bird4');insert into animal (id, name) values (16, 'Bird5');insert
into animal (id, name) values (17, 'Bird6');insert into animal (id,
name) values (18, 'Bird7');insert into animal (id, name) values (19,
'Bird8');insert into animal (id, name) values (20, 'Bird9');insert
into animal (id, name) values (21, 'Bird10');

--Now I simply forced the power off the machine held the power button
down. And restarted

--During startup verify that commitlog replay has been done in log

ssh into instance and enter cqlsh

cqlsh:thezoo> select * from animal;

--Which shows all the bears and birds have been replayed but not the duck!

I also did some digging in the Cassandra source code, and made the
following findings:

I think the code that skips mutations ahead of time is in CommitLogReplayer
class:
See lines: 194-195 (at the time of writing)
if (commitLogReplayer.pointInTimeExceeded(mutation))
       return;
This code is triggerred from CommitLogReader, where readSection seems to be
responsible for reading the commit logs, this is wrapped in a while loop,
that just reads the file until EOF.
See:
 while (statusTracker.shouldContinue() && reader.getFilePointer() < end &&
!reader.isEOF())
This method is called file by file from CommitLog.recover to recover all
commitlog segment files.
And just a note statusTracker.shouldContinue will fail if
statusTracker.requestTermination(); is called but I dont see this being the
case for the pointInTimeExceeded case.

I am a bit concerned if this is some hidden feature in Cassandra, and as
such we might have to revise our backup strategies, if this is the case.
However as far as I can see the Datastax documentation on this is simply
wrong unfortunately the official documentation on this just seems to be
work in progress.
The fact that it doesn't do this is in fact a positive result for me as I
would also expect the point-in-time to guarantee that all mutatations up
until this point-in-time is in fact recovered.

Can anyone confirm if this is just the documentation that is wrong or maybe
if I did something wrong in my experiments.

(For reference I also conducted some experiments with larger amount of data
where the recovery went through multiple commitlog files, but I got the
same results, namely that it recovered ALL records before
restore_point_in_time.)

Best regards
Morten V. Nielsen

Re: CommitLog Recovery replay stop on first timestamp after restore-point-in-time

Posted by Morten Vejen Nielsen <mv...@gmail.com>.

Thanks for the response Ben this was exactly what I was looking for!
I think will write a test to ensure this condition holds when we do
Cassandra upgrades, and the linked test in CASSANDRA-6905
<https://issues.apache.org/jira/browse/CASSANDRA-6905> seems to be a nice
starting point.

Best regards
Morten V. Nielsen

On Thu, Dec 20, 2018 at 12:36 PM Ben Slater <be...@instaclustr.com>
wrote:

> I don’t have any personal knowledge of the fix but out of interest I took a
> look in Jira and it looks sounds to me like the behaviour was fixed here
> (in 2.0.10): https://issues.apache.org/jira/browse/CASSANDRA-6905
>
> ---
>
>
> *Ben Slater*
> *Chief Product Officer*
>
>
> <https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
> <https://www.linkedin.com/company/instaclustr>
>
> Read our latest technical blog posts here
> <https://www.instaclustr.com/blog/>.
>
> This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
> and Instaclustr Inc (USA).
>
> This email and any attachments may contain confidential and legally
> privileged information.  If you are not the intended recipient, do not copy
> or disclose its content, but please reply to this email immediately and
> highlight the error to the sender and then immediately delete the message.
>
>
> On Thu, 20 Dec 2018 at 21:07, Morten Vejen Nielsen <mv...@gmail.com>
> wrote:
>
> > Hi,
> >
> > (Moved from user mailing list to here)
> >
> > I have found a statement in the Datastax documentation regarding
> CommitLog
> > recovery that concerns me, namely:
> >
> > "*Restore stops when the first client-supplied timestamp is greater than
> > the restore point timestamp. Because the order in which the database
> > receives mutations does not strictly follow the timestamp order, this can
> > leave some mutations unrecovered.*"
> >
> > From:
> >
> >
> https://docs.datastax.com/en/cassandra/3.0/cassandra/configuration/configLogArchive.html
> > Which to me means that point in time restore really doesn't guarantee
> point
> > in time replay for the configured time. Since we expect to have mutations
> > out of order in our setup.
> >
> > I conducted a few experiments on this myself by forcing my Cassandra
> > instance to do CommitLog replay with changes ahead in time. But I was not
> > able to reproduce this behavior.
> > I used a fresh instance taken from the official Cassandra docker image to
> > run the tests, so no changes to any configs was done other than setting
> the
> > restore_point_in_time as specified below.
> > I did the experiment as follows:
> >
> > --edit /etc/cassandra/commitlog_archiving.properties, set
> > *restore_point_in_time* to something in the near future (lets say 2
> > hours ahead of server-time)
> >
> > ssh into instance
> >
> > cqlsh
> > create keyspace thezoo with replication =
> > {'class':'SimpleStrategy','replication_factor':1};
> > use thezoo;
> > create table animal (id int primary key, name varchar);
> > insert into animal (id, name) values (1, 'Bear1');insert into animal
> > (id, name) values (2, 'Bear2');insert into animal (id, name) values
> > (3, 'Bear3');insert into animal (id, name) values (4, 'Bear4');insert
> > into animal (id, name) values (5, 'Bear5');insert into animal (id,
> > name) values (6, 'Bear6');insert into animal (id, name) values (7,
> > 'Bear7');insert into animal (id, name) values (8, 'Bear8');insert into
> > animal (id, name) values (9, 'Bear9');insert into animal (id, name)
> > values (10, 'Bear10');
> > select id,name,writetime(name) from animal;
> > --Add some to timestamp, and use this as future_timestamp, must be
> > ahead of what was defined in commitlog config file
> > insert into animal (id, name) values (11, 'DuckFromFuture') using
> > timestamp <future_timestamp>
> > insert into animal (id, name) values (12, 'Bird1');insert into animal
> > (id, name) values (13, 'Bird2');insert into animal (id, name) values
> > (14, 'Bird3');insert into animal (id, name) values (15,
> > 'Bird4');insert into animal (id, name) values (16, 'Bird5');insert
> > into animal (id, name) values (17, 'Bird6');insert into animal (id,
> > name) values (18, 'Bird7');insert into animal (id, name) values (19,
> > 'Bird8');insert into animal (id, name) values (20, 'Bird9');insert
> > into animal (id, name) values (21, 'Bird10');
> >
> > --Now I simply forced the power off the machine held the power button
> > down. And restarted
> >
> > --During startup verify that commitlog replay has been done in log
> >
> > ssh into instance and enter cqlsh
> >
> > cqlsh:thezoo> select * from animal;
> >
> > --Which shows all the bears and birds have been replayed but not the
> duck!
> >
> > I also did some digging in the Cassandra source code, and made the
> > following findings:
> >
> > I think the code that skips mutations ahead of time is in
> CommitLogReplayer
> > class:
> > See lines: 194-195 (at the time of writing)
> > if (commitLogReplayer.pointInTimeExceeded(mutation))
> >        return;
> > This code is triggerred from CommitLogReader, where readSection seems to
> be
> > responsible for reading the commit logs, this is wrapped in a while loop,
> > that just reads the file until EOF.
> > See:
> >  while (statusTracker.shouldContinue() && reader.getFilePointer() < end
> &&
> > !reader.isEOF())
> > This method is called file by file from CommitLog.recover to recover all
> > commitlog segment files.
> > And just a note statusTracker.shouldContinue will fail if
> > statusTracker.requestTermination(); is called but I dont see this being
> the
> > case for the pointInTimeExceeded case.
> >
> > I am a bit concerned if this is some hidden feature in Cassandra, and as
> > such we might have to revise our backup strategies, if this is the case.
> > However as far as I can see the Datastax documentation on this is simply
> > wrong unfortunately the official documentation on this just seems to be
> > work in progress.
> > The fact that it doesn't do this is in fact a positive result for me as I
> > would also expect the point-in-time to guarantee that all mutatations up
> > until this point-in-time is in fact recovered.
> >
> > Can anyone confirm if this is just the documentation that is wrong or
> maybe
> > if I did something wrong in my experiments.
> >
> > (For reference I also conducted some experiments with larger amount of
> data
> > where the recovery went through multiple commitlog files, but I got the
> > same results, namely that it recovered ALL records before
> > restore_point_in_time.)
> >
> > Best regards
> > Morten V. Nielsen
> >
>


-- 
//Morten Vejen Nielsen - mvejen

Re: CommitLog Recovery replay stop on first timestamp after restore-point-in-time

Posted by Ben Slater <be...@instaclustr.com>.

I don’t have any personal knowledge of the fix but out of interest I took a
look in Jira and it looks sounds to me like the behaviour was fixed here
(in 2.0.10): https://issues.apache.org/jira/browse/CASSANDRA-6905

---


*Ben Slater*
*Chief Product Officer*


<https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>

Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


On Thu, 20 Dec 2018 at 21:07, Morten Vejen Nielsen <mv...@gmail.com> wrote:

> Hi,
>
> (Moved from user mailing list to here)
>
> I have found a statement in the Datastax documentation regarding CommitLog
> recovery that concerns me, namely:
>
> "*Restore stops when the first client-supplied timestamp is greater than
> the restore point timestamp. Because the order in which the database
> receives mutations does not strictly follow the timestamp order, this can
> leave some mutations unrecovered.*"
>
> From:
>
> https://docs.datastax.com/en/cassandra/3.0/cassandra/configuration/configLogArchive.html
> Which to me means that point in time restore really doesn't guarantee point
> in time replay for the configured time. Since we expect to have mutations
> out of order in our setup.
>
> I conducted a few experiments on this myself by forcing my Cassandra
> instance to do CommitLog replay with changes ahead in time. But I was not
> able to reproduce this behavior.
> I used a fresh instance taken from the official Cassandra docker image to
> run the tests, so no changes to any configs was done other than setting the
> restore_point_in_time as specified below.
> I did the experiment as follows:
>
> --edit /etc/cassandra/commitlog_archiving.properties, set
> *restore_point_in_time* to something in the near future (lets say 2
> hours ahead of server-time)
>
> ssh into instance
>
> cqlsh
> create keyspace thezoo with replication =
> {'class':'SimpleStrategy','replication_factor':1};
> use thezoo;
> create table animal (id int primary key, name varchar);
> insert into animal (id, name) values (1, 'Bear1');insert into animal
> (id, name) values (2, 'Bear2');insert into animal (id, name) values
> (3, 'Bear3');insert into animal (id, name) values (4, 'Bear4');insert
> into animal (id, name) values (5, 'Bear5');insert into animal (id,
> name) values (6, 'Bear6');insert into animal (id, name) values (7,
> 'Bear7');insert into animal (id, name) values (8, 'Bear8');insert into
> animal (id, name) values (9, 'Bear9');insert into animal (id, name)
> values (10, 'Bear10');
> select id,name,writetime(name) from animal;
> --Add some to timestamp, and use this as future_timestamp, must be
> ahead of what was defined in commitlog config file
> insert into animal (id, name) values (11, 'DuckFromFuture') using
> timestamp <future_timestamp>
> insert into animal (id, name) values (12, 'Bird1');insert into animal
> (id, name) values (13, 'Bird2');insert into animal (id, name) values
> (14, 'Bird3');insert into animal (id, name) values (15,
> 'Bird4');insert into animal (id, name) values (16, 'Bird5');insert
> into animal (id, name) values (17, 'Bird6');insert into animal (id,
> name) values (18, 'Bird7');insert into animal (id, name) values (19,
> 'Bird8');insert into animal (id, name) values (20, 'Bird9');insert
> into animal (id, name) values (21, 'Bird10');
>
> --Now I simply forced the power off the machine held the power button
> down. And restarted
>
> --During startup verify that commitlog replay has been done in log
>
> ssh into instance and enter cqlsh
>
> cqlsh:thezoo> select * from animal;
>
> --Which shows all the bears and birds have been replayed but not the duck!
>
> I also did some digging in the Cassandra source code, and made the
> following findings:
>
> I think the code that skips mutations ahead of time is in CommitLogReplayer
> class:
> See lines: 194-195 (at the time of writing)
> if (commitLogReplayer.pointInTimeExceeded(mutation))
>        return;
> This code is triggerred from CommitLogReader, where readSection seems to be
> responsible for reading the commit logs, this is wrapped in a while loop,
> that just reads the file until EOF.
> See:
>  while (statusTracker.shouldContinue() && reader.getFilePointer() < end &&
> !reader.isEOF())
> This method is called file by file from CommitLog.recover to recover all
> commitlog segment files.
> And just a note statusTracker.shouldContinue will fail if
> statusTracker.requestTermination(); is called but I dont see this being the
> case for the pointInTimeExceeded case.
>
> I am a bit concerned if this is some hidden feature in Cassandra, and as
> such we might have to revise our backup strategies, if this is the case.
> However as far as I can see the Datastax documentation on this is simply
> wrong unfortunately the official documentation on this just seems to be
> work in progress.
> The fact that it doesn't do this is in fact a positive result for me as I
> would also expect the point-in-time to guarantee that all mutatations up
> until this point-in-time is in fact recovered.
>
> Can anyone confirm if this is just the documentation that is wrong or maybe
> if I did something wrong in my experiments.
>
> (For reference I also conducted some experiments with larger amount of data
> where the recovery went through multiple commitlog files, but I got the
> same results, namely that it recovered ALL records before
> restore_point_in_time.)
>
> Best regards
> Morten V. Nielsen
>