You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Jared Laprise <ja...@webonyx.com> on 2011/03/25 01:49:23 UTC

URGENT HELP PLEASE!

Hello all, I'm running 2 Cassandra 6.5 nodes and I brought down the secondary node and restarted the primary node. After Cassandra came back up all data has been reverted to several months ago.

I could really use some incite here, this is a production website and I need to act quickly. I have a cron job that takes a snapshot every night, but even with that I tried to restore a snapshot on my local development environment and it was also missing a ton of data.

Any help will be so appreciated.

Re: URGENT HELP PLEASE!

Posted by Peter Schuller <pe...@infidyne.com>.

> What happened is this:
> You started your cluster with only one node, so at first, all data was on this.
> Then you added a second node. Cassandra then moved (approximatively)
> half of the data to the second node. In theory, at that
> point the data that was moved to the second node could be removed from
> the first node (since you had RF=1). However, Cassandra
> don't do that removing part automatically for safety reasons. You'll
> have to run cleanup on the first node for that to happen.
> So there was stale data on the first node, that never got updated
> because the first node was not responsible anymore for that data.

But this doesn't explain why he was able to read the stale data? Or
did I miss something about actually having removed the second node
from the ring after it was shut off?

-- 
/ Peter Schuller

Re: URGENT HELP PLEASE!

Posted by Sylvain Lebresne <sy...@datastax.com>.

> Although after all the help from the Cassandra community I have a much better understanding of why and how my situation happened, there was still one strange side effect I noticed. For context, I store user accounts and other account information in Cassandra. When the second node was offline and I tried to log into the site, I got an error saying invalid password. Out of curiosity I logged into the cassandra-cli tool and looked at what columns and values were present for my user account. My User CF seemed to have data stored from right before I added the second node. I found that really strange assuming that Cassandra doesn't keep any historical or versioned data? Again, once the second node was back online both servers showed the expected more current data.

What happened is this:
You started your cluster with only one node, so at first, all data was on this.
Then you added a second node. Cassandra then moved (approximatively)
half of the data to the second node. In theory, at that
point the data that was moved to the second node could be removed from
the first node (since you had RF=1). However, Cassandra
don't do that removing part automatically for safety reasons. You'll
have to run cleanup on the first node for that to happen.
So there was stale data on the first node, that never got updated
because the first node was not responsible anymore for that data.
It was garbage that just didn't get removed. What you should have done
is run nodetool cleanup on the first node after having bootstrapped
the second one and checked everything was fine.

>
> Today I'm preparing to increase my replication factor to 2 and have been reading about the proper way to do that. Although I've found bits and pieces, I haven't found any definitive explanation on how to do it. Could someone please sanity check my intended approach?
>
> 1. Change the RF to 2 and restart Cassandra on both nodes
> 2. Run `nodetool repair` on both nodes, one at a time as to not halt up both servers (will that sync data between the nodes?)
>
> In a 2 node environment and RF=2 using consistency level of ONE would still ensure data is replicated to both servers, correct?
>
> -----Original Message-----
> From: Sylvain Lebresne [mailto:sylvain@datastax.com]
> Sent: Friday, March 25, 2011 3:01 AM
> To: user@cassandra.apache.org
> Cc: Jared Laprise
> Subject: Re: URGENT HELP PLEASE!
>
> On Fri, Mar 25, 2011 at 1:49 AM, Jared Laprise <ja...@webonyx.com> wrote:
>> Hello all, I'm running 2 Cassandra 6.5 nodes and I brought down the
>> secondary node and restarted the primary node. After Cassandra came
>> back up all data has been reverted to several months ago.
>
> Out of curiosity, when you said 'brought down the secondary node', did that involved a decomission or removeToken ? If so, I have an explanation for you.
>
> --
> Sylvain
>
>
>> I could really use some incite here, this is a production website and
>> I need to act quickly. I have a cron job that takes a snapshot every
>> night, but even with that I tried to restore a snapshot on my local
>> development environment and it was also missing a ton of data.
>>
>>
>>
>> Any help will be so appreciated.
>>
>>
>>
>>
>

Re: URGENT HELP PLEASE!

Posted by Watanabe Maki <wa...@gmail.com>.

With RF=2 & CL=ONE, take care on that you still have chance to read old data which is not replicated yet.

Maki

From iPhone


On 2011/03/26, at 5:10, Jared Laprise <ja...@webonyx.com> wrote:

> No, what initially started it all was that I needed to increase my EC2 server instance size. So I removed said server from the load balancer, stopped Cassandra, and then shutdown the server in order to change the instance type. I assumed the other node had all the data and everything should keep running without issue. Almost immediately I realized I was missing a bunch of data. Not fully understanding what happened  I was hesitant to bring up the other node again for fear of data loss (again because I didn't understand what had happened). I ended up bringing the other node back online and then everything seemed to snap back it expected working order.
> 
> Although after all the help from the Cassandra community I have a much better understanding of why and how my situation happened, there was still one strange side effect I noticed. For context, I store user accounts and other account information in Cassandra. When the second node was offline and I tried to log into the site, I got an error saying invalid password. Out of curiosity I logged into the cassandra-cli tool and looked at what columns and values were present for my user account. My User CF seemed to have data stored from right before I added the second node. I found that really strange assuming that Cassandra doesn't keep any historical or versioned data? Again, once the second node was back online both servers showed the expected more current data.
> 
> Today I'm preparing to increase my replication factor to 2 and have been reading about the proper way to do that. Although I've found bits and pieces, I haven't found any definitive explanation on how to do it. Could someone please sanity check my intended approach?
> 
> 1. Change the RF to 2 and restart Cassandra on both nodes
> 2. Run `nodetool repair` on both nodes, one at a time as to not halt up both servers (will that sync data between the nodes?)
> 
> In a 2 node environment and RF=2 using consistency level of ONE would still ensure data is replicated to both servers, correct?
> 
> -----Original Message-----
> From: Sylvain Lebresne [mailto:sylvain@datastax.com] 
> Sent: Friday, March 25, 2011 3:01 AM
> To: user@cassandra.apache.org
> Cc: Jared Laprise
> Subject: Re: URGENT HELP PLEASE!
> 
> On Fri, Mar 25, 2011 at 1:49 AM, Jared Laprise <ja...@webonyx.com> wrote:
>> Hello all, I'm running 2 Cassandra 6.5 nodes and I brought down the 
>> secondary node and restarted the primary node. After Cassandra came 
>> back up all data has been reverted to several months ago.
> 
> Out of curiosity, when you said 'brought down the secondary node', did that involved a decomission or removeToken ? If so, I have an explanation for you.
> 
> --
> Sylvain
> 
> 
>> I could really use some incite here, this is a production website and 
>> I need to act quickly. I have a cron job that takes a snapshot every 
>> night, but even with that I tried to restore a snapshot on my local 
>> development environment and it was also missing a ton of data.
>> 
>> 
>> 
>> Any help will be so appreciated.
>> 
>> 
>> 
>>

RE: URGENT HELP PLEASE!

Posted by Jared Laprise <ja...@webonyx.com>.

No, what initially started it all was that I needed to increase my EC2 server instance size. So I removed said server from the load balancer, stopped Cassandra, and then shutdown the server in order to change the instance type. I assumed the other node had all the data and everything should keep running without issue. Almost immediately I realized I was missing a bunch of data. Not fully understanding what happened  I was hesitant to bring up the other node again for fear of data loss (again because I didn't understand what had happened). I ended up bringing the other node back online and then everything seemed to snap back it expected working order.

Although after all the help from the Cassandra community I have a much better understanding of why and how my situation happened, there was still one strange side effect I noticed. For context, I store user accounts and other account information in Cassandra. When the second node was offline and I tried to log into the site, I got an error saying invalid password. Out of curiosity I logged into the cassandra-cli tool and looked at what columns and values were present for my user account. My User CF seemed to have data stored from right before I added the second node. I found that really strange assuming that Cassandra doesn't keep any historical or versioned data? Again, once the second node was back online both servers showed the expected more current data.

Today I'm preparing to increase my replication factor to 2 and have been reading about the proper way to do that. Although I've found bits and pieces, I haven't found any definitive explanation on how to do it. Could someone please sanity check my intended approach?

1. Change the RF to 2 and restart Cassandra on both nodes
2. Run `nodetool repair` on both nodes, one at a time as to not halt up both servers (will that sync data between the nodes?)

In a 2 node environment and RF=2 using consistency level of ONE would still ensure data is replicated to both servers, correct?

-----Original Message-----
From: Sylvain Lebresne [mailto:sylvain@datastax.com] 
Sent: Friday, March 25, 2011 3:01 AM
To: user@cassandra.apache.org
Cc: Jared Laprise
Subject: Re: URGENT HELP PLEASE!

On Fri, Mar 25, 2011 at 1:49 AM, Jared Laprise <ja...@webonyx.com> wrote:
> Hello all, I'm running 2 Cassandra 6.5 nodes and I brought down the 
> secondary node and restarted the primary node. After Cassandra came 
> back up all data has been reverted to several months ago.

Out of curiosity, when you said 'brought down the secondary node', did that involved a decomission or removeToken ? If so, I have an explanation for you.

--
Sylvain

> I could really use some incite here, this is a production website and 
> I need to act quickly. I have a cron job that takes a snapshot every 
> night, but even with that I tried to restore a snapshot on my local 
> development environment and it was also missing a ton of data.
>
>
>
> Any help will be so appreciated.
>
>
>
>

Re: URGENT HELP PLEASE!

Posted by Sylvain Lebresne <sy...@datastax.com>.

On Fri, Mar 25, 2011 at 1:49 AM, Jared Laprise <ja...@webonyx.com> wrote:
> Hello all, I’m running 2 Cassandra 6.5 nodes and I brought down the
> secondary node and restarted the primary node. After Cassandra came back up
> all data has been reverted to several months ago.

Out of curiosity, when you said 'brought down the secondary node', did
that involved
a decomission or removeToken ? If so, I have an explanation for you.

--
Sylvain


> I could really use some incite here, this is a production website and I need
> to act quickly. I have a cron job that takes a snapshot every night, but
> even with that I tried to restore a snapshot on my local development
> environment and it was also missing a ton of data.
>
>
>
> Any help will be so appreciated.
>
>
>
>

Re: URGENT HELP PLEASE!

Posted by Brandon Williams <dr...@gmail.com>.

On Thu, Mar 24, 2011 at 11:58 PM, Jared Laprise <ja...@webonyx.com> wrote:

> My replication factor is 1
>

Then you are living dangerously.

> I haven't run repair until today, I'm using ONE for consistency level.
>

Repair at rf=1 won't do anything.

I have two servers that are load balanced (per session) which both run
> Cassandra and each server connects to Cassandra on localhost.
>
> Based on what you're saying, and being I'm using session (cookie) based
> load balancing it would be true that data is rarely read or written (per
> user) on a different server, that could be why data isn't replicating.
>
> Again, I'm extremely appreciative of your feedback as I haven't had the
> luxury of delving into all aspects of Cassandra and especially related to
> multi-node deployments. Thanks!

Here is the key point: replication_factor refers to the total replicas
(where replica means 'instance of the data', not 'copy of it'.)  You only
have one, which is why half your data disappeared when a node was down.  I
would go to 2 and then run repair (see
http://wiki.apache.org/cassandra/FAQ#change_replication) so you have
redundancy.

-Brandon

Re: URGENT HELP PLEASE!

Posted by Peter Schuller <pe...@infidyne.com>.

> Based on what you're saying, and being I'm using session (cookie) based load balancing it would be true that data is rarely read or written (per user) on a different server, that could be why data isn't replicating.

You've probably discovered this already but just in case, and for
others finding this ML thread: The placement of data is controlled by
your replication strategy and ring layout. Placement is not affected
by which node you happen to talk to when doing your writes or reads.
The node you talk to - the co-ordinating node - is responsible for
routing requests appropriately, and it won't e.g. store data from a
write if doesn't happen to also be part of the replica set by
co-incidence.

(For those that react: CL.ANY and hinted hand-off kind of violates
this claim as phrased, but not the spirit of it since with CL.ANY
you're making a conscious decision to complete writes prior to any
node in the replica set receiving the data, and you won't be reading
the data until it does end up in the right location.)

-- 
/ Peter Schuller

RE: URGENT HELP PLEASE!

Posted by Jared Laprise <ja...@webonyx.com>.

My replication factor is 1

I haven't run repair until today, I'm using ONE for consistency level.

I have two servers that are load balanced (per session) which both run Cassandra and each server connects to Cassandra on localhost.

Based on what you're saying, and being I'm using session (cookie) based load balancing it would be true that data is rarely read or written (per user) on a different server, that could be why data isn't replicating. 

Again, I'm extremely appreciative of your feedback as I haven't had the luxury of delving into all aspects of Cassandra and especially related to multi-node deployments. Thanks!

-----Original Message-----
From: Benjamin Coverston [mailto:ben.coverston@datastax.com] 
Sent: Thursday, March 24, 2011 8:59 PM
To: user@cassandra.apache.org
Subject: Re: URGENT HELP PLEASE!

Hi Jared,

Sounds like you have two nodes in the cluster. What is your replication factor set to? 1? 2?

Have you ever run repair? What consistency level do you use for reads and writes?

 From the way you are speaking it sounds like you are sending all of your traffic to a single node (primary, secondary).

If you never repair the ranges, and you read the data infrequently (or only with range slices) I can guess that your data never got replicated to your secondary node.

Ben

On 3/24/11 9:44 PM, Jared Laprise wrote:
> Thanks for the responses. I got everything working again, and have some ideas on why but am not completely sure.
>
> How I got it working again was simply bring the second node back online. I was under the assumption that all data is replicated between nodes (eventually). Am I incorrect? It would seem that each node stores different data and delegates the read request to whichever node holds the data. Although I've spent a lot of time with Cassandra in a single node environment I think I may be lacking a bit of understanding on how Cassandra behaves in a clustered environment.
>
>
> -----Original Message-----
> From: Jonathan Ellis [mailto:jbellis@gmail.com]
> Sent: Thursday, March 24, 2011 8:22 PM
> To: user@cassandra.apache.org
> Cc: aaron morton
> Subject: Re: URGENT HELP PLEASE!
>
> Right, Cassandra doesn't keep old versions around so to see an old version you have to have uncompacted data and whack the new data -- either by blowing away sstables or not replaying the commitlog.
>
> Snapshots flush before creating their hard links, which rules out any commitlog problems.
>
> If you ran out of disk space you wouldn't get past the commitlog append, so you'd never get new data in at all after that.
>
> Sounds like an environmental problem, not Cassandra specific.
>
> On Thu, Mar 24, 2011 at 9:10 PM, aaron morton<aa...@thelastpickle.com>  wrote:
>> Was there anything in the server logs during startup ?
>> I've not heard of this happening before and it's hard think of how / 
>> why cassandra could revert it's data. Other than something external 
>> playing with the files on disk Aaron On 25 Mar 2011, at 13:49, Jared 
>> Laprise wrote:
>>
>> Hello all, I'm running 2 Cassandra 6.5 nodes and I brought down the 
>> secondary node and restarted the primary node. After Cassandra came 
>> back up all data has been reverted to several months ago.
>>
>> I could really use some incite here, this is a production website and 
>> I need to act quickly. I have a cron job that takes a snapshot every 
>> night, but even with that I tried to restore a snapshot on my local 
>> development environment and it was also missing a ton of data.
>>
>> Any help will be so appreciated.
>>
>>
>>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support 
> http://www.datastax.com

--
Ben Coverston
DataStax -- The Apache Cassandra Company http://www.datastax.com/

Re: URGENT HELP PLEASE!

Posted by Benjamin Coverston <be...@datastax.com>.

Hi Jared,

Sounds like you have two nodes in the cluster. What is your replication 
factor set to? 1? 2?

Have you ever run repair? What consistency level do you use for reads 
and writes?

 From the way you are speaking it sounds like you are sending all of 
your traffic to a single node (primary, secondary).

If you never repair the ranges, and you read the data infrequently (or 
only with range slices) I can guess that your data never got replicated 
to your secondary node.

Ben

On 3/24/11 9:44 PM, Jared Laprise wrote:
> Thanks for the responses. I got everything working again, and have some ideas on why but am not completely sure.
>
> How I got it working again was simply bring the second node back online. I was under the assumption that all data is replicated between nodes (eventually). Am I incorrect? It would seem that each node stores different data and delegates the read request to whichever node holds the data. Although I've spent a lot of time with Cassandra in a single node environment I think I may be lacking a bit of understanding on how Cassandra behaves in a clustered environment.
>
>
> -----Original Message-----
> From: Jonathan Ellis [mailto:jbellis@gmail.com]
> Sent: Thursday, March 24, 2011 8:22 PM
> To: user@cassandra.apache.org
> Cc: aaron morton
> Subject: Re: URGENT HELP PLEASE!
>
> Right, Cassandra doesn't keep old versions around so to see an old version you have to have uncompacted data and whack the new data -- either by blowing away sstables or not replaying the commitlog.
>
> Snapshots flush before creating their hard links, which rules out any commitlog problems.
>
> If you ran out of disk space you wouldn't get past the commitlog append, so you'd never get new data in at all after that.
>
> Sounds like an environmental problem, not Cassandra specific.
>
> On Thu, Mar 24, 2011 at 9:10 PM, aaron morton<aa...@thelastpickle.com>  wrote:
>> Was there anything in the server logs during startup ?
>> I've not heard of this happening before and it's hard think of how /
>> why cassandra could revert it's data. Other than something external
>> playing with the files on disk Aaron On 25 Mar 2011, at 13:49, Jared
>> Laprise wrote:
>>
>> Hello all, I'm running 2 Cassandra 6.5 nodes and I brought down the
>> secondary node and restarted the primary node. After Cassandra came
>> back up all data has been reverted to several months ago.
>>
>> I could really use some incite here, this is a production website and
>> I need to act quickly. I have a cron job that takes a snapshot every
>> night, but even with that I tried to restore a snapshot on my local
>> development environment and it was also missing a ton of data.
>>
>> Any help will be so appreciated.
>>
>>
>>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com

-- 
Ben Coverston
DataStax -- The Apache Cassandra Company
http://www.datastax.com/

RE: URGENT HELP PLEASE!

Posted by Jared Laprise <ja...@webonyx.com>.

Correct, replication factor of 1. 

I've been reading and researching as fast as possible so I'm also starting to realize what some of the configurations actually mean and getting a clearer picture. My request to the Cassandra community was a desperate, "Oh man! I F'd up!" moment, and didn't have the time to Google it myself :-)


-----Original Message-----
From: Jonathan Ellis [mailto:jbellis@gmail.com] 
Sent: Thursday, March 24, 2011 9:49 PM
To: user@cassandra.apache.org
Cc: Jared Laprise; aaron morton
Subject: Re: URGENT HELP PLEASE!

Each row is replicated to replication_factor nodes, not the entire cluster (or you couldn't scale writes by adding machines).

Sounds like you're running with RF=1.

On Thu, Mar 24, 2011 at 10:44 PM, Jared Laprise <ja...@webonyx.com> wrote:
> Thanks for the responses. I got everything working again, and have some ideas on why but am not completely sure.
>
> How I got it working again was simply bring the second node back online. I was under the assumption that all data is replicated between nodes (eventually). Am I incorrect? It would seem that each node stores different data and delegates the read request to whichever node holds the data. Although I've spent a lot of time with Cassandra in a single node environment I think I may be lacking a bit of understanding on how Cassandra behaves in a clustered environment.
>
>
> -----Original Message-----
> From: Jonathan Ellis [mailto:jbellis@gmail.com]
> Sent: Thursday, March 24, 2011 8:22 PM
> To: user@cassandra.apache.org
> Cc: aaron morton
> Subject: Re: URGENT HELP PLEASE!
>
> Right, Cassandra doesn't keep old versions around so to see an old version you have to have uncompacted data and whack the new data -- either by blowing away sstables or not replaying the commitlog.
>
> Snapshots flush before creating their hard links, which rules out any commitlog problems.
>
> If you ran out of disk space you wouldn't get past the commitlog append, so you'd never get new data in at all after that.
>
> Sounds like an environmental problem, not Cassandra specific.
>
> On Thu, Mar 24, 2011 at 9:10 PM, aaron morton <aa...@thelastpickle.com> wrote:
>> Was there anything in the server logs during startup ?
>> I've not heard of this happening before and it's hard think of how / 
>> why cassandra could revert it's data. Other than something external 
>> playing with the files on disk Aaron On 25 Mar 2011, at 13:49, Jared 
>> Laprise wrote:
>>
>> Hello all, I'm running 2 Cassandra 6.5 nodes and I brought down the 
>> secondary node and restarted the primary node. After Cassandra came 
>> back up all data has been reverted to several months ago.
>>
>> I could really use some incite here, this is a production website and 
>> I need to act quickly. I have a cron job that takes a snapshot every 
>> night, but even with that I tried to restore a snapshot on my local 
>> development environment and it was also missing a ton of data.
>>
>> Any help will be so appreciated.
>>
>>
>>
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support 
> http://www.datastax.com
>



--
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com

Re: URGENT HELP PLEASE!

Posted by Jonathan Ellis <jb...@gmail.com>.

Each row is replicated to replication_factor nodes, not the entire
cluster (or you couldn't scale writes by adding machines).

Sounds like you're running with RF=1.

On Thu, Mar 24, 2011 at 10:44 PM, Jared Laprise <ja...@webonyx.com> wrote:
> Thanks for the responses. I got everything working again, and have some ideas on why but am not completely sure.
>
> How I got it working again was simply bring the second node back online. I was under the assumption that all data is replicated between nodes (eventually). Am I incorrect? It would seem that each node stores different data and delegates the read request to whichever node holds the data. Although I've spent a lot of time with Cassandra in a single node environment I think I may be lacking a bit of understanding on how Cassandra behaves in a clustered environment.
>
>
> -----Original Message-----
> From: Jonathan Ellis [mailto:jbellis@gmail.com]
> Sent: Thursday, March 24, 2011 8:22 PM
> To: user@cassandra.apache.org
> Cc: aaron morton
> Subject: Re: URGENT HELP PLEASE!
>
> Right, Cassandra doesn't keep old versions around so to see an old version you have to have uncompacted data and whack the new data -- either by blowing away sstables or not replaying the commitlog.
>
> Snapshots flush before creating their hard links, which rules out any commitlog problems.
>
> If you ran out of disk space you wouldn't get past the commitlog append, so you'd never get new data in at all after that.
>
> Sounds like an environmental problem, not Cassandra specific.
>
> On Thu, Mar 24, 2011 at 9:10 PM, aaron morton <aa...@thelastpickle.com> wrote:
>> Was there anything in the server logs during startup ?
>> I've not heard of this happening before and it's hard think of how /
>> why cassandra could revert it's data. Other than something external
>> playing with the files on disk Aaron On 25 Mar 2011, at 13:49, Jared
>> Laprise wrote:
>>
>> Hello all, I'm running 2 Cassandra 6.5 nodes and I brought down the
>> secondary node and restarted the primary node. After Cassandra came
>> back up all data has been reverted to several months ago.
>>
>> I could really use some incite here, this is a production website and
>> I need to act quickly. I have a cron job that takes a snapshot every
>> night, but even with that I tried to restore a snapshot on my local
>> development environment and it was also missing a ton of data.
>>
>> Any help will be so appreciated.
>>
>>
>>
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

RE: URGENT HELP PLEASE!

Posted by Jared Laprise <ja...@webonyx.com>.

Thanks for the responses. I got everything working again, and have some ideas on why but am not completely sure.

How I got it working again was simply bring the second node back online. I was under the assumption that all data is replicated between nodes (eventually). Am I incorrect? It would seem that each node stores different data and delegates the read request to whichever node holds the data. Although I've spent a lot of time with Cassandra in a single node environment I think I may be lacking a bit of understanding on how Cassandra behaves in a clustered environment.

-----Original Message-----
From: Jonathan Ellis [mailto:jbellis@gmail.com] 
Sent: Thursday, March 24, 2011 8:22 PM
To: user@cassandra.apache.org
Cc: aaron morton
Subject: Re: URGENT HELP PLEASE!

Right, Cassandra doesn't keep old versions around so to see an old version you have to have uncompacted data and whack the new data -- either by blowing away sstables or not replaying the commitlog.

Snapshots flush before creating their hard links, which rules out any commitlog problems.

If you ran out of disk space you wouldn't get past the commitlog append, so you'd never get new data in at all after that.

Sounds like an environmental problem, not Cassandra specific.

On Thu, Mar 24, 2011 at 9:10 PM, aaron morton <aa...@thelastpickle.com> wrote:
> Was there anything in the server logs during startup ?
> I've not heard of this happening before and it's hard think of how / 
> why cassandra could revert it's data. Other than something external 
> playing with the files on disk Aaron On 25 Mar 2011, at 13:49, Jared 
> Laprise wrote:
>
> Hello all, I'm running 2 Cassandra 6.5 nodes and I brought down the 
> secondary node and restarted the primary node. After Cassandra came 
> back up all data has been reverted to several months ago.
>
> I could really use some incite here, this is a production website and 
> I need to act quickly. I have a cron job that takes a snapshot every 
> night, but even with that I tried to restore a snapshot on my local 
> development environment and it was also missing a ton of data.
>
> Any help will be so appreciated.
>
>
>

--
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com

Re: URGENT HELP PLEASE!

Posted by Jonathan Ellis <jb...@gmail.com>.

Right, Cassandra doesn't keep old versions around so to see an old
version you have to have uncompacted data and whack the new data --
either by blowing away sstables or not replaying the commitlog.

Snapshots flush before creating their hard links, which rules out any
commitlog problems.

If you ran out of disk space you wouldn't get past the commitlog
append, so you'd never get new data in at all after that.

Sounds like an environmental problem, not Cassandra specific.

On Thu, Mar 24, 2011 at 9:10 PM, aaron morton <aa...@thelastpickle.com> wrote:
> Was there anything in the server logs during startup ?
> I've not heard of this happening before and it's hard think of how / why
> cassandra could revert it's data. Other than something external playing with
> the files on disk
> Aaron
> On 25 Mar 2011, at 13:49, Jared Laprise wrote:
>
> Hello all, I’m running 2 Cassandra 6.5 nodes and I brought down the
> secondary node and restarted the primary node. After Cassandra came back up
> all data has been reverted to several months ago.
>
> I could really use some incite here, this is a production website and I need
> to act quickly. I have a cron job that takes a snapshot every night, but
> even with that I tried to restore a snapshot on my local development
> environment and it was also missing a ton of data.
>
> Any help will be so appreciated.
>
>
>

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Re: URGENT HELP PLEASE!

Posted by aaron morton <aa...@thelastpickle.com>.

Was there anything in the server logs during startup ?

I've not heard of this happening before and it's hard think of how / why cassandra could revert it's data. Other than something external playing with the files on disk 

Aaron

On 25 Mar 2011, at 13:49, Jared Laprise wrote:

> Hello all, I’m running 2 Cassandra 6.5 nodes and I brought down the secondary node and restarted the primary node. After Cassandra came back up all data has been reverted to several months ago.
>  
> I could really use some incite here, this is a production website and I need to act quickly. I have a cron job that takes a snapshot every night, but even with that I tried to restore a snapshot on my local development environment and it was also missing a ton of data.
>  
> Any help will be so appreciated.
>  
>