You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Thomas Stets <th...@gmail.com> on 2012/09/19 16:43:44 UTC

Losing keyspace on cassandra upgrade

I consistently keep losing my keyspace on upgrading from cassandra 1.1.1 to
1.1.5

I have the same cassandra keyspace on all our staging systems:

development:  a 3-node cluster
integration: a 3-node cluster
QS: a 2-node cluster
(productive will be a 4-node cluster, which is as yet not active)

All clusters were running cassandra 1.1.1. Before going productive I wanted
to upgrade to the
latest productive version of cassandra.

In all cases my keyspace disappeared when I started the cluster with
cassandra 1.1.5.
On the development system I didn't realize at first what was happening. I
just wondered that nodetool
showed a very low amount of data. On integration I saw the problem quickly,
but could not recover the
data. I re-installed the cassandra cluster from scratch, and populated it
with our test data, so our
developers could work.

I am currently using the QS system to recreate the problem and try to find
what I am doing wrong,
and how I can avoid losing productive data once we are live.

Basically I was doing the following:

1. create a snapshot on every node
2. create a tar.gz of my data directory, just to be safe
3. shut down and re-start cassandra 1.1.1 (just to see that it is not the
re-start that is creating the problem)
4. verify that the keyspace is still known, and the data present.
5. shut down cassandra 1.1.1
6. copy the config to cassandra 1.1.5 (doing a diff of cassandra.yaml to
the new one first, to see whether anything important has changed)
7. start cassandra 1.1.5

In the log file, after the "Replaying ..." messages I find the following:

 INFO [main] 2012-09-19 15:15:50,323 CommitLogReplayer.java (line 103)
Skipped 759 mutations from unknown (probably removed) CF with id 1187
 INFO [main] 2012-09-19 15:15:50,323 CommitLogReplayer.java (line 103)
Skipped 606 mutations from unknown (probably removed) CF with id 1186
 INFO [main] 2012-09-19 15:15:50,323 CommitLogReplayer.java (line 103)
Skipped 53 mutations from unknown (probably removed) CF with id 1185
 INFO [main] 2012-09-19 15:15:50,323 CommitLogReplayer.java (line 103)
Skipped 1945 mutations from unknown (probably removed) CF with id 1184
 INFO [main] 2012-09-19 15:15:50,323 CommitLogReplayer.java (line 103)
Skipped 1945 mutations from unknown (probably removed) CF with id 1191
 INFO [main] 2012-09-19 15:15:50,323 CommitLogReplayer.java (line 103)
Skipped 7506 mutations from unknown (probably removed) CF with id 1190
 INFO [main] 2012-09-19 15:15:50,324 CommitLogReplayer.java (line 103)
Skipped 88 mutations from unknown (probably removed) CF with id 1189
 INFO [main] 2012-09-19 15:15:50,324 CommitLogReplayer.java (line 103)
Skipped 87 mutations from unknown (probably removed) CF with id 1188
 INFO [main] 2012-09-19 15:15:50,324 CommitLogReplayer.java (line 103)
Skipped 354 mutations from unknown (probably removed) CF with id 1195
 INFO [main] 2012-09-19 15:15:50,324 CommitLogReplayer.java (line 103)
Skipped 87 mutations from unknown (probably removed) CF with id 1194
 INFO [main] 2012-09-19 15:15:50,324 CommitLogReplayer.java (line 103)
Skipped 45 mutations from unknown (probably removed) CF with id 1192
 INFO [main] 2012-09-19 15:15:50,324 CommitLogReplayer.java (line 103)
Skipped 82 mutations from unknown (probably removed) CF with id 1197
 INFO [main] 2012-09-19 15:15:50,324 CommitLogReplayer.java (line 103)
Skipped 46386 mutations from unknown (probably removed) CF with id 1177
 INFO [main] 2012-09-19 15:15:50,324 CommitLogReplayer.java (line 103)
Skipped 69 mutations from unknown (probably removed) CF with id 1178
 INFO [main] 2012-09-19 15:15:50,325 CommitLogReplayer.java (line 103)
Skipped 73 mutations from unknown (probably removed) CF with id 1179
 INFO [main] 2012-09-19 15:15:50,325 CommitLogReplayer.java (line 103)
Skipped 88 mutations from unknown (probably removed) CF with id 1181
 INFO [main] 2012-09-19 15:15:50,325 CommitLogReplayer.java (line 103)
Skipped 46386 mutations from unknown (probably removed) CF with id 1182
 INFO [main] 2012-09-19 15:15:50,325 CommitLogReplayer.java (line 103)
Skipped 7506 mutations from unknown (probably removed) CF with id 1183
 INFO [main] 2012-09-19 15:15:50,325 CommitLog.java (line 131) Log replay
complete, 0 replayed mutations

This is the first obvious indication something is wrong. Going further up
in the log file I discover that the SSTableReader logs only system keyspace
files.

Currently my cluster is in the folloing state:

node 1 runs cassandra 1.1.5, and doesn't know my keyspace
node 2 runs cassandra 1.1.1, and still nows my keyspace.

nodetool ring confirms this: node a has a load of 29kb, node 2 of roughly
1GB. The cluster itself is still intact, i.e. nodetool ring shows both
nodes.

I tried a nodetool resetlocalschema, and nodetool repair, but that didn't
change anything.

Any idea what I have been doing wrong (the preferred solution), or whether
I stumbled over a cassandra bug (not so nice)?


  TIA, Thomas

Re: Losing keyspace on cassandra upgrade

Posted by Thomas Stets <th...@gmail.com>.

On Wed, Sep 19, 2012 at 5:12 PM, Michael Kjellman
<mk...@barracuda.com>wrote:

> Sounds like you are loosing your system keyspace. When you say nothing
> important changed between yaml files do you mean with or without your
> changes?
>

I compared the 1.1.1 cassandra.yaml (with my changes) to the cassandra.yaml
distributed with 1.1.5. The only differences were my changes (hosts, ports
ad paths), and some comments.

>
> Did your data directories change in the migration? Permissions okay?
>

The data directory containing my keyspace has not changed. Directly after
startup cassandra began a compaction of its
system keyspace (something I saw in all cases), so that obviouly has
changes. Permissions are OK.

  Thomas

Re: Losing keyspace on cassandra upgrade

Posted by Edward Sargisson <ed...@globalrelay.net>.

https://issues.apache.org/jira/browse/CASSANDRA-4583

On 12-09-19 08:30 AM, Michael Kjellman wrote:
> @Edward Do you have a bug number for that by chance?
>
> On Sep 19, 2012, at 8:25 AM, "Edward Sargisson" <ed...@globalrelay.net>> wrote:
>
> We've seen that before too - supposedly it was fixed in 1.1.5. Your experience casts some doubt on that.
>
> Our workaround, thus far, is to shut down the entire ring and then bring each node back up starting with known good.
> Then you do nodetool resetlocalschema on the node that's confused and make sure it gets the schema linked up properly.
> Then nodetool repair.
>
> I see you've done that but we found a complete ring restart was necessary. This was on Cass 1.1.1.
>
> Cheers,
> Edward
>
> On 12-09-19 08:12 AM, Michael Kjellman wrote:
>
> Sounds like you are loosing your system keyspace. When you say nothing important changed between yaml files do you mean with or without your changes?
>
> Did your data directories change in the migration? Permissions okay?
>
> I've done a 1.1.1 to 1.1.5 upgrade on many of my nodes without issue..
>
> On Sep 19, 2012, at 7:44 AM, "Thomas Stets" <th...@gmail.com> wrote:
>
>
>
> I consistently keep losing my keyspace on upgrading from cassandra 1.1.1 to 1.1.5
>
> I have the same cassandra keyspace on all our staging systems:
>
> development:  a 3-node cluster
> integration: a 3-node cluster
> QS: a 2-node cluster
> (productive will be a 4-node cluster, which is as yet not active)
>
> All clusters were running cassandra 1.1.1. Before going productive I wanted to upgrade to the
> latest productive version of cassandra.
>
> In all cases my keyspace disappeared when I started the cluster with cassandra 1.1.5.
> On the development system I didn't realize at first what was happening. I just wondered that nodetool
> showed a very low amount of data. On integration I saw the problem quickly, but could not recover the
> data. I re-installed the cassandra cluster from scratch, and populated it with our test data, so our
> developers could work.
>
> I am currently using the QS system to recreate the problem and try to find what I am doing wrong,
> and how I can avoid losing productive data once we are live.
>
> Basically I was doing the following:
>
> 1. create a snapshot on every node
> 2. create a tar.gz of my data directory, just to be safe
> 3. shut down and re-start cassandra 1.1.1 (just to see that it is not the re-start that is creating the problem)
> 4. verify that the keyspace is still known, and the data present.
> 5. shut down cassandra 1.1.1
> 6. copy the config to cassandra 1.1.5 (doing a diff of cassandra.yaml to the new one first, to see whether anything important has changed)
> 7. start cassandra 1.1.5
>
> In the log file, after the "Replaying ..." messages I find the following:
>
>   INFO [main] 2012-09-19 15:15:50,323 CommitLogReplayer.java (line 103) Skipped 759 mutations from unknown (probably removed) CF with id 1187
>   INFO [main] 2012-09-19 15:15:50,323 CommitLogReplayer.java (line 103) Skipped 606 mutations from unknown (probably removed) CF with id 1186
>   INFO [main] 2012-09-19 15:15:50,323 CommitLogReplayer.java (line 103) Skipped 53 mutations from unknown (probably removed) CF with id 1185
>   INFO [main] 2012-09-19 15:15:50,323 CommitLogReplayer.java (line 103) Skipped 1945 mutations from unknown (probably removed) CF with id 1184
>   INFO [main] 2012-09-19 15:15:50,323 CommitLogReplayer.java (line 103) Skipped 1945 mutations from unknown (probably removed) CF with id 1191
>   INFO [main] 2012-09-19 15:15:50,323 CommitLogReplayer.java (line 103) Skipped 7506 mutations from unknown (probably removed) CF with id 1190
>   INFO [main] 2012-09-19 15:15:50,324 CommitLogReplayer.java (line 103) Skipped 88 mutations from unknown (probably removed) CF with id 1189
>   INFO [main] 2012-09-19 15:15:50,324 CommitLogReplayer.java (line 103) Skipped 87 mutations from unknown (probably removed) CF with id 1188
>   INFO [main] 2012-09-19 15:15:50,324 CommitLogReplayer.java (line 103) Skipped 354 mutations from unknown (probably removed) CF with id 1195
>   INFO [main] 2012-09-19 15:15:50,324 CommitLogReplayer.java (line 103) Skipped 87 mutations from unknown (probably removed) CF with id 1194
>   INFO [main] 2012-09-19 15:15:50,324 CommitLogReplayer.java (line 103) Skipped 45 mutations from unknown (probably removed) CF with id 1192
>   INFO [main] 2012-09-19 15:15:50,324 CommitLogReplayer.java (line 103) Skipped 82 mutations from unknown (probably removed) CF with id 1197
>   INFO [main] 2012-09-19 15:15:50,324 CommitLogReplayer.java (line 103) Skipped 46386 mutations from unknown (probably removed) CF with id 1177
>   INFO [main] 2012-09-19 15:15:50,324 CommitLogReplayer.java (line 103) Skipped 69 mutations from unknown (probably removed) CF with id 1178
>   INFO [main] 2012-09-19 15:15:50,325 CommitLogReplayer.java (line 103) Skipped 73 mutations from unknown (probably removed) CF with id 1179
>   INFO [main] 2012-09-19 15:15:50,325 CommitLogReplayer.java (line 103) Skipped 88 mutations from unknown (probably removed) CF with id 1181
>   INFO [main] 2012-09-19 15:15:50,325 CommitLogReplayer.java (line 103) Skipped 46386 mutations from unknown (probably removed) CF with id 1182
>   INFO [main] 2012-09-19 15:15:50,325 CommitLogReplayer.java (line 103) Skipped 7506 mutations from unknown (probably removed) CF with id 1183
>   INFO [main] 2012-09-19 15:15:50,325 CommitLog.java (line 131) Log replay complete, 0 replayed mutations
>
> This is the first obvious indication something is wrong. Going further up in the log file I discover that the SSTableReader logs only system keyspace files.
>
> Currently my cluster is in the folloing state:
>
> node 1 runs cassandra 1.1.5, and doesn't know my keyspace
> node 2 runs cassandra 1.1.1, and still nows my keyspace.
>
> nodetool ring confirms this: node a has a load of 29kb, node 2 of roughly 1GB. The cluster itself is still intact, i.e. nodetool ring shows both nodes.
>
> I tried a nodetool resetlocalschema, and nodetool repair, but that didn't change anything.
>
> Any idea what I have been doing wrong (the preferred solution), or whether I stumbled over a cassandra bug (not so nice)?
>
>
>    TIA, Thomas
>
>
> 'Like' us on Facebook for exclusive content and other resources on all Barracuda Networks solutions.
> Visit http://barracudanetworks.com/facebook
>
>
>
>
> --
>
> Edward Sargisson
>
> senior java developer
> Global Relay
>
> edward.sargisson@globalrelay.net<ma...@globalrelay.net>
>
>
> 866.484.6630
> New York | Chicago | Vancouver  |  London  (+44.0800.032.9829)  |  Singapore  (+65.3158.1301)
>
> Global Relay Archive supports email, instant messaging, BlackBerry, Bloomberg, Thomson Reuters, Pivot, YellowJacket, LinkedIn, Twitter, Facebook and more.
>
>
> Ask about Global Relay Message<http://www.globalrelay.com/services/message> — The Future of Collaboration in the Financial Services World
>
> All email sent to or from this address will be retained by Global Relay’s email archiving system. This message is intended only for the use of the individual or entity to which it is addressed, and may contain information that is privileged, confidential, and exempt from disclosure under applicable law.  Global Relay will not be liable for any compliance or technical information provided herein.  All trademarks are the property of their respective owners.
>
> 'Like' us on Facebook for exclusive content and other resources on all Barracuda Networks solutions.
> Visit http://barracudanetworks.com/facebook
>
>

-- 

Edward Sargisson

senior java developer
Global Relay

edward.sargisson@globalrelay.net <ma...@globalrelay.net>


*866.484.6630*
New York | Chicago | Vancouver | London (+44.0800.032.9829) | Singapore 
(+65.3158.1301)

Global Relay Archive supports email, instant messaging, BlackBerry, 
Bloomberg, Thomson Reuters, Pivot, YellowJacket, LinkedIn, Twitter, 
Facebook and more.


Ask about *Global Relay Message* 
<http://www.globalrelay.com/services/message>*— *The Future of 
Collaboration in the Financial Services World

*
*All email sent to or from this address will be retained by Global 
Relay’s email archiving system. This message is intended only for the 
use of the individual or entity to which it is addressed, and may 
contain information that is privileged, confidential, and exempt from 
disclosure under applicable law.  Global Relay will not be liable for 
any compliance or technical information provided herein. All trademarks 
are the property of their respective owners.

Re: Losing keyspace on cassandra upgrade

Posted by Michael Kjellman <mk...@barracuda.com>.

@Edward Do you have a bug number for that by chance?

On Sep 19, 2012, at 8:25 AM, "Edward Sargisson" <ed...@globalrelay.net>> wrote:

We've seen that before too - supposedly it was fixed in 1.1.5. Your experience casts some doubt on that.

Our workaround, thus far, is to shut down the entire ring and then bring each node back up starting with known good.
Then you do nodetool resetlocalschema on the node that's confused and make sure it gets the schema linked up properly.
Then nodetool repair.

I see you've done that but we found a complete ring restart was necessary. This was on Cass 1.1.1.

Cheers,
Edward

On 12-09-19 08:12 AM, Michael Kjellman wrote:

Sounds like you are loosing your system keyspace. When you say nothing important changed between yaml files do you mean with or without your changes?

Did your data directories change in the migration? Permissions okay?

I've done a 1.1.1 to 1.1.5 upgrade on many of my nodes without issue..

On Sep 19, 2012, at 7:44 AM, "Thomas Stets" <th...@gmail.com> wrote:



I consistently keep losing my keyspace on upgrading from cassandra 1.1.1 to 1.1.5

I have the same cassandra keyspace on all our staging systems:

development:  a 3-node cluster
integration: a 3-node cluster
QS: a 2-node cluster
(productive will be a 4-node cluster, which is as yet not active)

All clusters were running cassandra 1.1.1. Before going productive I wanted to upgrade to the
latest productive version of cassandra.

In all cases my keyspace disappeared when I started the cluster with cassandra 1.1.5.
On the development system I didn't realize at first what was happening. I just wondered that nodetool
showed a very low amount of data. On integration I saw the problem quickly, but could not recover the
data. I re-installed the cassandra cluster from scratch, and populated it with our test data, so our
developers could work.

I am currently using the QS system to recreate the problem and try to find what I am doing wrong,
and how I can avoid losing productive data once we are live.

Basically I was doing the following:

1. create a snapshot on every node
2. create a tar.gz of my data directory, just to be safe
3. shut down and re-start cassandra 1.1.1 (just to see that it is not the re-start that is creating the problem)
4. verify that the keyspace is still known, and the data present.
5. shut down cassandra 1.1.1
6. copy the config to cassandra 1.1.5 (doing a diff of cassandra.yaml to the new one first, to see whether anything important has changed)
7. start cassandra 1.1.5

In the log file, after the "Replaying ..." messages I find the following:

 INFO [main] 2012-09-19 15:15:50,323 CommitLogReplayer.java (line 103) Skipped 759 mutations from unknown (probably removed) CF with id 1187
 INFO [main] 2012-09-19 15:15:50,323 CommitLogReplayer.java (line 103) Skipped 606 mutations from unknown (probably removed) CF with id 1186
 INFO [main] 2012-09-19 15:15:50,323 CommitLogReplayer.java (line 103) Skipped 53 mutations from unknown (probably removed) CF with id 1185
 INFO [main] 2012-09-19 15:15:50,323 CommitLogReplayer.java (line 103) Skipped 1945 mutations from unknown (probably removed) CF with id 1184
 INFO [main] 2012-09-19 15:15:50,323 CommitLogReplayer.java (line 103) Skipped 1945 mutations from unknown (probably removed) CF with id 1191
 INFO [main] 2012-09-19 15:15:50,323 CommitLogReplayer.java (line 103) Skipped 7506 mutations from unknown (probably removed) CF with id 1190
 INFO [main] 2012-09-19 15:15:50,324 CommitLogReplayer.java (line 103) Skipped 88 mutations from unknown (probably removed) CF with id 1189
 INFO [main] 2012-09-19 15:15:50,324 CommitLogReplayer.java (line 103) Skipped 87 mutations from unknown (probably removed) CF with id 1188
 INFO [main] 2012-09-19 15:15:50,324 CommitLogReplayer.java (line 103) Skipped 354 mutations from unknown (probably removed) CF with id 1195
 INFO [main] 2012-09-19 15:15:50,324 CommitLogReplayer.java (line 103) Skipped 87 mutations from unknown (probably removed) CF with id 1194
 INFO [main] 2012-09-19 15:15:50,324 CommitLogReplayer.java (line 103) Skipped 45 mutations from unknown (probably removed) CF with id 1192
 INFO [main] 2012-09-19 15:15:50,324 CommitLogReplayer.java (line 103) Skipped 82 mutations from unknown (probably removed) CF with id 1197
 INFO [main] 2012-09-19 15:15:50,324 CommitLogReplayer.java (line 103) Skipped 46386 mutations from unknown (probably removed) CF with id 1177
 INFO [main] 2012-09-19 15:15:50,324 CommitLogReplayer.java (line 103) Skipped 69 mutations from unknown (probably removed) CF with id 1178
 INFO [main] 2012-09-19 15:15:50,325 CommitLogReplayer.java (line 103) Skipped 73 mutations from unknown (probably removed) CF with id 1179
 INFO [main] 2012-09-19 15:15:50,325 CommitLogReplayer.java (line 103) Skipped 88 mutations from unknown (probably removed) CF with id 1181
 INFO [main] 2012-09-19 15:15:50,325 CommitLogReplayer.java (line 103) Skipped 46386 mutations from unknown (probably removed) CF with id 1182
 INFO [main] 2012-09-19 15:15:50,325 CommitLogReplayer.java (line 103) Skipped 7506 mutations from unknown (probably removed) CF with id 1183
 INFO [main] 2012-09-19 15:15:50,325 CommitLog.java (line 131) Log replay complete, 0 replayed mutations

This is the first obvious indication something is wrong. Going further up in the log file I discover that the SSTableReader logs only system keyspace files.

Currently my cluster is in the folloing state:

node 1 runs cassandra 1.1.5, and doesn't know my keyspace
node 2 runs cassandra 1.1.1, and still nows my keyspace.

nodetool ring confirms this: node a has a load of 29kb, node 2 of roughly 1GB. The cluster itself is still intact, i.e. nodetool ring shows both nodes.

I tried a nodetool resetlocalschema, and nodetool repair, but that didn't change anything.

Any idea what I have been doing wrong (the preferred solution), or whether I stumbled over a cassandra bug (not so nice)?


  TIA, Thomas


'Like' us on Facebook for exclusive content and other resources on all Barracuda Networks solutions.
Visit http://barracudanetworks.com/facebook




--

Edward Sargisson

senior java developer
Global Relay

edward.sargisson@globalrelay.net<ma...@globalrelay.net>


866.484.6630
New York | Chicago | Vancouver  |  London  (+44.0800.032.9829)  |  Singapore  (+65.3158.1301)

Global Relay Archive supports email, instant messaging, BlackBerry, Bloomberg, Thomson Reuters, Pivot, YellowJacket, LinkedIn, Twitter, Facebook and more.


Ask about Global Relay Message<http://www.globalrelay.com/services/message> — The Future of Collaboration in the Financial Services World

All email sent to or from this address will be retained by Global Relay’s email archiving system. This message is intended only for the use of the individual or entity to which it is addressed, and may contain information that is privileged, confidential, and exempt from disclosure under applicable law.  Global Relay will not be liable for any compliance or technical information provided herein.  All trademarks are the property of their respective owners.

'Like' us on Facebook for exclusive content and other resources on all Barracuda Networks solutions.
Visit http://barracudanetworks.com/facebook

Re: Losing keyspace on cassandra upgrade

Posted by Edward Sargisson <ed...@globalrelay.net>.

We've seen that before too - supposedly it was fixed in 1.1.5. Your 
experience casts some doubt on that.

Our workaround, thus far, is to shut down the entire ring and then bring 
each node back up starting with known good.
Then you do nodetool resetlocalschema on the node that's confused and 
make sure it gets the schema linked up properly.
Then nodetool repair.

I see you've done that but we found a complete ring restart was 
necessary. This was on Cass 1.1.1.

Cheers,
Edward

On 12-09-19 08:12 AM, Michael Kjellman wrote:
> Sounds like you are loosing your system keyspace. When you say nothing important changed between yaml files do you mean with or without your changes?
>
> Did your data directories change in the migration? Permissions okay?
>
> I've done a 1.1.1 to 1.1.5 upgrade on many of my nodes without issue..
>
> On Sep 19, 2012, at 7:44 AM, "Thomas Stets" <th...@gmail.com> wrote:
>
>> I consistently keep losing my keyspace on upgrading from cassandra 1.1.1 to 1.1.5
>>
>> I have the same cassandra keyspace on all our staging systems:
>>
>> development:  a 3-node cluster
>> integration: a 3-node cluster
>> QS: a 2-node cluster
>> (productive will be a 4-node cluster, which is as yet not active)
>>
>> All clusters were running cassandra 1.1.1. Before going productive I wanted to upgrade to the
>> latest productive version of cassandra.
>>
>> In all cases my keyspace disappeared when I started the cluster with cassandra 1.1.5.
>> On the development system I didn't realize at first what was happening. I just wondered that nodetool
>> showed a very low amount of data. On integration I saw the problem quickly, but could not recover the
>> data. I re-installed the cassandra cluster from scratch, and populated it with our test data, so our
>> developers could work.
>>
>> I am currently using the QS system to recreate the problem and try to find what I am doing wrong,
>> and how I can avoid losing productive data once we are live.
>>
>> Basically I was doing the following:
>>
>> 1. create a snapshot on every node
>> 2. create a tar.gz of my data directory, just to be safe
>> 3. shut down and re-start cassandra 1.1.1 (just to see that it is not the re-start that is creating the problem)
>> 4. verify that the keyspace is still known, and the data present.
>> 5. shut down cassandra 1.1.1
>> 6. copy the config to cassandra 1.1.5 (doing a diff of cassandra.yaml to the new one first, to see whether anything important has changed)
>> 7. start cassandra 1.1.5
>>
>> In the log file, after the "Replaying ..." messages I find the following:
>>
>>   INFO [main] 2012-09-19 15:15:50,323 CommitLogReplayer.java (line 103) Skipped 759 mutations from unknown (probably removed) CF with id 1187
>>   INFO [main] 2012-09-19 15:15:50,323 CommitLogReplayer.java (line 103) Skipped 606 mutations from unknown (probably removed) CF with id 1186
>>   INFO [main] 2012-09-19 15:15:50,323 CommitLogReplayer.java (line 103) Skipped 53 mutations from unknown (probably removed) CF with id 1185
>>   INFO [main] 2012-09-19 15:15:50,323 CommitLogReplayer.java (line 103) Skipped 1945 mutations from unknown (probably removed) CF with id 1184
>>   INFO [main] 2012-09-19 15:15:50,323 CommitLogReplayer.java (line 103) Skipped 1945 mutations from unknown (probably removed) CF with id 1191
>>   INFO [main] 2012-09-19 15:15:50,323 CommitLogReplayer.java (line 103) Skipped 7506 mutations from unknown (probably removed) CF with id 1190
>>   INFO [main] 2012-09-19 15:15:50,324 CommitLogReplayer.java (line 103) Skipped 88 mutations from unknown (probably removed) CF with id 1189
>>   INFO [main] 2012-09-19 15:15:50,324 CommitLogReplayer.java (line 103) Skipped 87 mutations from unknown (probably removed) CF with id 1188
>>   INFO [main] 2012-09-19 15:15:50,324 CommitLogReplayer.java (line 103) Skipped 354 mutations from unknown (probably removed) CF with id 1195
>>   INFO [main] 2012-09-19 15:15:50,324 CommitLogReplayer.java (line 103) Skipped 87 mutations from unknown (probably removed) CF with id 1194
>>   INFO [main] 2012-09-19 15:15:50,324 CommitLogReplayer.java (line 103) Skipped 45 mutations from unknown (probably removed) CF with id 1192
>>   INFO [main] 2012-09-19 15:15:50,324 CommitLogReplayer.java (line 103) Skipped 82 mutations from unknown (probably removed) CF with id 1197
>>   INFO [main] 2012-09-19 15:15:50,324 CommitLogReplayer.java (line 103) Skipped 46386 mutations from unknown (probably removed) CF with id 1177
>>   INFO [main] 2012-09-19 15:15:50,324 CommitLogReplayer.java (line 103) Skipped 69 mutations from unknown (probably removed) CF with id 1178
>>   INFO [main] 2012-09-19 15:15:50,325 CommitLogReplayer.java (line 103) Skipped 73 mutations from unknown (probably removed) CF with id 1179
>>   INFO [main] 2012-09-19 15:15:50,325 CommitLogReplayer.java (line 103) Skipped 88 mutations from unknown (probably removed) CF with id 1181
>>   INFO [main] 2012-09-19 15:15:50,325 CommitLogReplayer.java (line 103) Skipped 46386 mutations from unknown (probably removed) CF with id 1182
>>   INFO [main] 2012-09-19 15:15:50,325 CommitLogReplayer.java (line 103) Skipped 7506 mutations from unknown (probably removed) CF with id 1183
>>   INFO [main] 2012-09-19 15:15:50,325 CommitLog.java (line 131) Log replay complete, 0 replayed mutations
>>
>> This is the first obvious indication something is wrong. Going further up in the log file I discover that the SSTableReader logs only system keyspace files.
>>
>> Currently my cluster is in the folloing state:
>>
>> node 1 runs cassandra 1.1.5, and doesn't know my keyspace
>> node 2 runs cassandra 1.1.1, and still nows my keyspace.
>>
>> nodetool ring confirms this: node a has a load of 29kb, node 2 of roughly 1GB. The cluster itself is still intact, i.e. nodetool ring shows both nodes.
>>
>> I tried a nodetool resetlocalschema, and nodetool repair, but that didn't change anything.
>>
>> Any idea what I have been doing wrong (the preferred solution), or whether I stumbled over a cassandra bug (not so nice)?
>>
>>
>>    TIA, Thomas
> 'Like' us on Facebook for exclusive content and other resources on all Barracuda Networks solutions.
> Visit http://barracudanetworks.com/facebook
>
>

-- 

Edward Sargisson

senior java developer
Global Relay

edward.sargisson@globalrelay.net <ma...@globalrelay.net>


*866.484.6630*
New York | Chicago | Vancouver | London (+44.0800.032.9829) | Singapore 
(+65.3158.1301)

Global Relay Archive supports email, instant messaging, BlackBerry, 
Bloomberg, Thomson Reuters, Pivot, YellowJacket, LinkedIn, Twitter, 
Facebook and more.


Ask about *Global Relay Message* 
<http://www.globalrelay.com/services/message>*--- *The Future of 
Collaboration in the Financial Services World

*
*All email sent to or from this address will be retained by Global 
Relay's email archiving system. This message is intended only for the 
use of the individual or entity to which it is addressed, and may 
contain information that is privileged, confidential, and exempt from 
disclosure under applicable law.  Global Relay will not be liable for 
any compliance or technical information provided herein. All trademarks 
are the property of their respective owners.

Re: Losing keyspace on cassandra upgrade

Posted by Michael Kjellman <mk...@barracuda.com>.

Sounds like you are loosing your system keyspace. When you say nothing important changed between yaml files do you mean with or without your changes?

Did your data directories change in the migration? Permissions okay?

I've done a 1.1.1 to 1.1.5 upgrade on many of my nodes without issue..

On Sep 19, 2012, at 7:44 AM, "Thomas Stets" <th...@gmail.com> wrote:

> I consistently keep losing my keyspace on upgrading from cassandra 1.1.1 to 1.1.5
> 
> I have the same cassandra keyspace on all our staging systems:
> 
> development:  a 3-node cluster
> integration: a 3-node cluster
> QS: a 2-node cluster
> (productive will be a 4-node cluster, which is as yet not active)
> 
> All clusters were running cassandra 1.1.1. Before going productive I wanted to upgrade to the
> latest productive version of cassandra.
> 
> In all cases my keyspace disappeared when I started the cluster with cassandra 1.1.5.
> On the development system I didn't realize at first what was happening. I just wondered that nodetool
> showed a very low amount of data. On integration I saw the problem quickly, but could not recover the
> data. I re-installed the cassandra cluster from scratch, and populated it with our test data, so our
> developers could work.
> 
> I am currently using the QS system to recreate the problem and try to find what I am doing wrong,
> and how I can avoid losing productive data once we are live.
> 
> Basically I was doing the following:
> 
> 1. create a snapshot on every node
> 2. create a tar.gz of my data directory, just to be safe
> 3. shut down and re-start cassandra 1.1.1 (just to see that it is not the re-start that is creating the problem)
> 4. verify that the keyspace is still known, and the data present.
> 5. shut down cassandra 1.1.1
> 6. copy the config to cassandra 1.1.5 (doing a diff of cassandra.yaml to the new one first, to see whether anything important has changed)
> 7. start cassandra 1.1.5
> 
> In the log file, after the "Replaying ..." messages I find the following:
> 
>  INFO [main] 2012-09-19 15:15:50,323 CommitLogReplayer.java (line 103) Skipped 759 mutations from unknown (probably removed) CF with id 1187
>  INFO [main] 2012-09-19 15:15:50,323 CommitLogReplayer.java (line 103) Skipped 606 mutations from unknown (probably removed) CF with id 1186
>  INFO [main] 2012-09-19 15:15:50,323 CommitLogReplayer.java (line 103) Skipped 53 mutations from unknown (probably removed) CF with id 1185
>  INFO [main] 2012-09-19 15:15:50,323 CommitLogReplayer.java (line 103) Skipped 1945 mutations from unknown (probably removed) CF with id 1184
>  INFO [main] 2012-09-19 15:15:50,323 CommitLogReplayer.java (line 103) Skipped 1945 mutations from unknown (probably removed) CF with id 1191
>  INFO [main] 2012-09-19 15:15:50,323 CommitLogReplayer.java (line 103) Skipped 7506 mutations from unknown (probably removed) CF with id 1190
>  INFO [main] 2012-09-19 15:15:50,324 CommitLogReplayer.java (line 103) Skipped 88 mutations from unknown (probably removed) CF with id 1189
>  INFO [main] 2012-09-19 15:15:50,324 CommitLogReplayer.java (line 103) Skipped 87 mutations from unknown (probably removed) CF with id 1188
>  INFO [main] 2012-09-19 15:15:50,324 CommitLogReplayer.java (line 103) Skipped 354 mutations from unknown (probably removed) CF with id 1195
>  INFO [main] 2012-09-19 15:15:50,324 CommitLogReplayer.java (line 103) Skipped 87 mutations from unknown (probably removed) CF with id 1194
>  INFO [main] 2012-09-19 15:15:50,324 CommitLogReplayer.java (line 103) Skipped 45 mutations from unknown (probably removed) CF with id 1192
>  INFO [main] 2012-09-19 15:15:50,324 CommitLogReplayer.java (line 103) Skipped 82 mutations from unknown (probably removed) CF with id 1197
>  INFO [main] 2012-09-19 15:15:50,324 CommitLogReplayer.java (line 103) Skipped 46386 mutations from unknown (probably removed) CF with id 1177
>  INFO [main] 2012-09-19 15:15:50,324 CommitLogReplayer.java (line 103) Skipped 69 mutations from unknown (probably removed) CF with id 1178
>  INFO [main] 2012-09-19 15:15:50,325 CommitLogReplayer.java (line 103) Skipped 73 mutations from unknown (probably removed) CF with id 1179
>  INFO [main] 2012-09-19 15:15:50,325 CommitLogReplayer.java (line 103) Skipped 88 mutations from unknown (probably removed) CF with id 1181
>  INFO [main] 2012-09-19 15:15:50,325 CommitLogReplayer.java (line 103) Skipped 46386 mutations from unknown (probably removed) CF with id 1182
>  INFO [main] 2012-09-19 15:15:50,325 CommitLogReplayer.java (line 103) Skipped 7506 mutations from unknown (probably removed) CF with id 1183
>  INFO [main] 2012-09-19 15:15:50,325 CommitLog.java (line 131) Log replay complete, 0 replayed mutations
> 
> This is the first obvious indication something is wrong. Going further up in the log file I discover that the SSTableReader logs only system keyspace files.
> 
> Currently my cluster is in the folloing state:
> 
> node 1 runs cassandra 1.1.5, and doesn't know my keyspace
> node 2 runs cassandra 1.1.1, and still nows my keyspace.
> 
> nodetool ring confirms this: node a has a load of 29kb, node 2 of roughly 1GB. The cluster itself is still intact, i.e. nodetool ring shows both nodes.
> 
> I tried a nodetool resetlocalschema, and nodetool repair, but that didn't change anything.
> 
> Any idea what I have been doing wrong (the preferred solution), or whether I stumbled over a cassandra bug (not so nice)?
> 
> 
>   TIA, Thomas

'Like' us on Facebook for exclusive content and other resources on all Barracuda Networks solutions.
Visit http://barracudanetworks.com/facebook

Re: Losing keyspace on cassandra upgrade

Posted by Thomas Stets <th...@gmail.com>.

On Fri, Sep 21, 2012 at 10:39 AM, aaron morton <aa...@thelastpickle.com>wrote:

> Have you tried nodetool resetlocalschema on the 1.1.5 ?
>

Yes, I tried a resetlocalschema, and a repair. This didn't change anything.

BTW I could find no documentation, what a resetlocalschema actually does...

  regards, Thomas

Re: Losing keyspace on cassandra upgrade

Posted by aaron morton <aa...@thelastpickle.com>.

Have you tried nodetool resetlocalschema on the 1.1.5 ?

Cheers

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 20/09/2012, at 11:41 PM, Thomas Stets <th...@gmail.com> wrote:

> A follow-up:
> 
> Currently I'm back on version 1.1.1.
> 
> I tried - unsuccessfully - the following things:
> 
> 1. Create the missing keyspace on the 1.1.5 node, then copy the files back into the data directory.
> This failed, since the keyspace was already known on the other node in the cluster.
> 
> 2. shut down the 1.1.1 node, that still has the keyspace. Then create the keyspace on the 1.1.5 node.
> This failes since the node could not distribute the information through the cluster.
> 
> 3. Restore the system keyspace from the snapshot I made before the upgrade.
> The restore seemed to work, but the node behaved just like after the update: it just forgot my keyspace.
> 
> Right now I'm at a loss on how to proceed. Any ideas? I'm pretty sure I can reproduce the problem,
> so if anyone has an idea on what to try, or where to look, I can do some tests (within limits)
> 
> 
> On Wed, Sep 19, 2012 at 4:43 PM, Thomas Stets <th...@gmail.com> wrote:
> I consistently keep losing my keyspace on upgrading from cassandra 1.1.1 to 1.1.5
> 
> I have the same cassandra keyspace on all our staging systems:
> 
> development:  a 3-node cluster
> integration: a 3-node cluster
> QS: a 2-node cluster
> (productive will be a 4-node cluster, which is as yet not active)
> 
> All clusters were running cassandra 1.1.1. Before going productive I wanted to upgrade to the
> latest productive version of cassandra.
> 
> In all cases my keyspace disappeared when I started the cluster with cassandra 1.1.5.
> On the development system I didn't realize at first what was happening. I just wondered that nodetool
> showed a very low amount of data. On integration I saw the problem quickly, but could not recover the
> data. I re-installed the cassandra cluster from scratch, and populated it with our test data, so our
> developers could work.
>  ...  
> 
> 
>   TIA, Thomas
>

Re: Losing keyspace on cassandra upgrade

Posted by Thomas Stets <th...@gmail.com>.

A follow-up:

Currently I'm back on version 1.1.1.

I tried - unsuccessfully - the following things:

1. Create the missing keyspace on the 1.1.5 node, then copy the files back
into the data directory.
This failed, since the keyspace was already known on the other node in the
cluster.

2. shut down the 1.1.1 node, that still has the keyspace. Then create the
keyspace on the 1.1.5 node.
This failes since the node could not distribute the information through the
cluster.

3. Restore the system keyspace from the snapshot I made before the upgrade.
The restore seemed to work, but the node behaved just like after the
update: it just forgot my keyspace.

Right now I'm at a loss on how to proceed. Any ideas? I'm pretty sure I can
reproduce the problem,
so if anyone has an idea on what to try, or where to look, I can do some
tests (within limits)

On Wed, Sep 19, 2012 at 4:43 PM, Thomas Stets <th...@gmail.com>wrote:

> I consistently keep losing my keyspace on upgrading from cassandra 1.1.1
> to 1.1.5
>
> I have the same cassandra keyspace on all our staging systems:
>
> development:  a 3-node cluster
> integration: a 3-node cluster
> QS: a 2-node cluster
> (productive will be a 4-node cluster, which is as yet not active)
>
> All clusters were running cassandra 1.1.1. Before going productive I
> wanted to upgrade to the
> latest productive version of cassandra.
>
> In all cases my keyspace disappeared when I started the cluster with
> cassandra 1.1.5.
> On the development system I didn't realize at first what was happening. I
> just wondered that nodetool
> showed a very low amount of data. On integration I saw the problem
> quickly, but could not recover the
> data. I re-installed the cassandra cluster from scratch, and populated it
> with our test data, so our
> developers could work.
>
 ...

>
>
>   TIA, Thomas
>