You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by "Christopher J. Bottaro" <cj...@academicworks.com> on 2013/11/25 18:00:21 UTC

Data loss when swapping out cluster

Hello,

We recently experienced (pretty severe) data loss after moving our 4 node
Cassandra cluster from one EC2 availability zone to another.  Our strategy
for doing so was as follows:

   - One at a time, bring up new nodes in the new availability zone and
   have them join the cluster.
   - One at a time, decommission the old nodes in the old availability zone
   and turn them off (stop the Cassandra process).

Everything seemed to work as expected.  As we decommissioned each node, we
checked the logs for messages indicating "yes, this node is done
decommissioning" before turning the node off.

Pretty quickly after the old nodes left the cluster, we started getting
client calls about data missing.

We immediately turned the old nodes back on and when they rejoined the
cluster *most* of the reported missing data returned.  For the rest of the
missing data, we had to spin up a new cluster from EBS snapshots and copy
it over.

What did we do wrong?

In hindsight, we noticed a few things which may be clues...

   - The new nodes had much lower load after joining the cluster than the
   old ones (3-4 gb as opposed to 10 gb).
   - We have EC2Snitch turned on, although we're using SimpleStrategy for
   replication.
   - The new nodes showed even ownership (via nodetool status) after
   joining the cluster.

Here's more info about our cluster...

   - Cassandra 1.2.10
   - Replication factor of 3
   - Vnodes with 256 tokens
   - All tables made via CQL
   - Data dirs on EBS (yes, we are aware of the performance implications)


Thanks for the help.

Re: Data loss when swapping out cluster

Posted by Janne Jalkanen <ja...@ecyrd.com>.

A-yup. Got burned this too some time ago myself. If you do accidentally try to bootstrap a seed node, the solution is to run repair after adding the new node but before removing the old one. However, during this time the node will advertise itself as owning a range, but when queried, it'll return no data until the repair has completed :-(.

Honestly, with reference to the JIRA ticket, I just don't see a situation where the current behaviour would really be useful. It's a nasty thing that you "just have to know" when upgrading your cluster - there's no warning, no logging, no documentation; just something that you might accidentally do and which will manifest itself as random data loss.

/Janne

On 26 Nov 2013, at 21:20, Robert Coli <rc...@eventbrite.com> wrote:

> On Tue, Nov 26, 2013 at 9:48 AM, Christopher J. Bottaro <cj...@academicworks.com> wrote:
> One thing that I didn't mention, and I think may be the culprit after doing a lot or mailing list reading, is that when we brought the 4 new nodes into the cluster, they had themselves listed in the seeds list.  I read yesterday that if a node has itself in the seeds list, then it won't bootstrap properly.
> 
> https://issues.apache.org/jira/browse/CASSANDRA-5836
> 
> =Rob

Re: Data loss when swapping out cluster

Posted by Robert Coli <rc...@eventbrite.com>.

On Fri, Nov 29, 2013 at 6:36 PM, Anthony Grasso <an...@gmail.com>wrote:

> In this case would it be possible to do the following to replace a seed
> node?
>

With the quoted procedure, you are essentially just "changing the ip
address of a node", which will work as long as you set auto_bootstrap:false
in cassandra.yaml. This works because you are *not* bootstrapping, which is
different from bootstrapping.

Details :

https://engineering.eventbrite.com/changing-the-ip-address-of-a-cassandra-node-with-auto_bootstrapfalse/

=Rob

Re: Data loss when swapping out cluster

Posted by Anthony Grasso <an...@gmail.com>.

Hi Robert,

In this case would it be possible to do the following to replace a seed
node?

nodetool disablethrift
nodetool disablegossip
nodetool drain

stop Cassandra

deep copy /var/lib/cassandra/* on old seed node to new seed node

start Cassandra on new seed node

Regards,
Anthony


On Wed, Nov 27, 2013 at 6:20 AM, Robert Coli <rc...@eventbrite.com> wrote:

> On Tue, Nov 26, 2013 at 9:48 AM, Christopher J. Bottaro <
> cjbottaro@academicworks.com> wrote:
>
>> One thing that I didn't mention, and I think may be the culprit after
>> doing a lot or mailing list reading, is that when we brought the 4 new
>> nodes into the cluster, they had themselves listed in the seeds list.  I
>> read yesterday that if a node has itself in the seeds list, then it won't
>> bootstrap properly.
>>
>
> https://issues.apache.org/jira/browse/CASSANDRA-5836
>
> =Rob
>

Re: Data loss when swapping out cluster

Posted by Robert Coli <rc...@eventbrite.com>.

On Tue, Nov 26, 2013 at 9:48 AM, Christopher J. Bottaro <
cjbottaro@academicworks.com> wrote:

> One thing that I didn't mention, and I think may be the culprit after
> doing a lot or mailing list reading, is that when we brought the 4 new
> nodes into the cluster, they had themselves listed in the seeds list.  I
> read yesterday that if a node has itself in the seeds list, then it won't
> bootstrap properly.
>

https://issues.apache.org/jira/browse/CASSANDRA-5836

=Rob

Re: Data loss when swapping out cluster

Posted by "Christopher J. Bottaro" <cj...@academicworks.com>.

We ran repair -pr on each node after we realized there was data loss and we
added the 4 original nodes back in the cluster.  I.e. we ran repair on the
8 node cluster that consisted of the 4 old and 4 new nodes, once we
realized there was a problem.

We are using quorum reads and writes.

One thing that I didn't mention, and I think may be the culprit after doing
a lot or mailing list reading, is that when we brought the 4 new nodes into
the cluster, they had themselves listed in the seeds list.  I read
yesterday that if a node has itself in the seeds list, then it won't
bootstrap properly.

-- C


On Tue, Nov 26, 2013 at 8:14 AM, Janne Jalkanen <ja...@ecyrd.com>wrote:

>
> That sounds bad!  Did you run repair at any stage?  Which CL are you
> reading with?
>
> /Janne
>
> On 25 Nov 2013, at 19:00, Christopher J. Bottaro <
> cjbottaro@academicworks.com> wrote:
>
> Hello,
>
> We recently experienced (pretty severe) data loss after moving our 4 node
> Cassandra cluster from one EC2 availability zone to another.  Our strategy
> for doing so was as follows:
>
>    - One at a time, bring up new nodes in the new availability zone and
>    have them join the cluster.
>    - One at a time, decommission the old nodes in the old availability
>    zone and turn them off (stop the Cassandra process).
>
> Everything seemed to work as expected.  As we decommissioned each node, we
> checked the logs for messages indicating "yes, this node is done
> decommissioning" before turning the node off.
>
> Pretty quickly after the old nodes left the cluster, we started getting
> client calls about data missing.
>
> We immediately turned the old nodes back on and when they rejoined the
> cluster *most* of the reported missing data returned.  For the rest of the
> missing data, we had to spin up a new cluster from EBS snapshots and copy
> it over.
>
> What did we do wrong?
>
> In hindsight, we noticed a few things which may be clues...
>
>    - The new nodes had much lower load after joining the cluster than the
>    old ones (3-4 gb as opposed to 10 gb).
>    - We have EC2Snitch turned on, although we're using SimpleStrategy for
>    replication.
>    - The new nodes showed even ownership (via nodetool status) after
>    joining the cluster.
>
> Here's more info about our cluster...
>
>    - Cassandra 1.2.10
>    - Replication factor of 3
>    - Vnodes with 256 tokens
>    - All tables made via CQL
>    - Data dirs on EBS (yes, we are aware of the performance implications)
>
>
> Thanks for the help.
>
>
>

Re: Data loss when swapping out cluster

Posted by Janne Jalkanen <ja...@ecyrd.com>.

That sounds bad!  Did you run repair at any stage?  Which CL are you reading with? 

/Janne

On 25 Nov 2013, at 19:00, Christopher J. Bottaro <cj...@academicworks.com> wrote:

> Hello,
> 
> We recently experienced (pretty severe) data loss after moving our 4 node Cassandra cluster from one EC2 availability zone to another.  Our strategy for doing so was as follows:
> One at a time, bring up new nodes in the new availability zone and have them join the cluster.
> One at a time, decommission the old nodes in the old availability zone and turn them off (stop the Cassandra process).
> Everything seemed to work as expected.  As we decommissioned each node, we checked the logs for messages indicating "yes, this node is done decommissioning" before turning the node off.
> 
> Pretty quickly after the old nodes left the cluster, we started getting client calls about data missing.
> 
> We immediately turned the old nodes back on and when they rejoined the cluster *most* of the reported missing data returned.  For the rest of the missing data, we had to spin up a new cluster from EBS snapshots and copy it over.
> 
> What did we do wrong?
> 
> In hindsight, we noticed a few things which may be clues...
> The new nodes had much lower load after joining the cluster than the old ones (3-4 gb as opposed to 10 gb).
> We have EC2Snitch turned on, although we're using SimpleStrategy for replication.
> The new nodes showed even ownership (via nodetool status) after joining the cluster.
> Here's more info about our cluster...
> Cassandra 1.2.10
> Replication factor of 3
> Vnodes with 256 tokens
> All tables made via CQL
> Data dirs on EBS (yes, we are aware of the performance implications)
> 
> Thanks for the help.

Re: Data loss when swapping out cluster

Posted by Jeremiah D Jordan <je...@gmail.com>.

TL;DR you need to run repair in between doing those two things.

Full explanation:
https://issues.apache.org/jira/browse/CASSANDRA-2434
https://issues.apache.org/jira/browse/CASSANDRA-5901

Thanks,
-Jeremiah Jordan

On Nov 25, 2013, at 11:00 AM, Christopher J. Bottaro <cj...@academicworks.com> wrote:

> Hello,
> 
> We recently experienced (pretty severe) data loss after moving our 4 node Cassandra cluster from one EC2 availability zone to another.  Our strategy for doing so was as follows:
> One at a time, bring up new nodes in the new availability zone and have them join the cluster.
> One at a time, decommission the old nodes in the old availability zone and turn them off (stop the Cassandra process).
> Everything seemed to work as expected.  As we decommissioned each node, we checked the logs for messages indicating "yes, this node is done decommissioning" before turning the node off.
> 
> Pretty quickly after the old nodes left the cluster, we started getting client calls about data missing.
> 
> We immediately turned the old nodes back on and when they rejoined the cluster *most* of the reported missing data returned.  For the rest of the missing data, we had to spin up a new cluster from EBS snapshots and copy it over.
> 
> What did we do wrong?
> 
> In hindsight, we noticed a few things which may be clues...
> The new nodes had much lower load after joining the cluster than the old ones (3-4 gb as opposed to 10 gb).
> We have EC2Snitch turned on, although we're using SimpleStrategy for replication.
> The new nodes showed even ownership (via nodetool status) after joining the cluster.
> Here's more info about our cluster...
> Cassandra 1.2.10
> Replication factor of 3
> Vnodes with 256 tokens
> All tables made via CQL
> Data dirs on EBS (yes, we are aware of the performance implications)
> 
> Thanks for the help.