You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Mike Neir <mi...@liquidweb.com> on 2013/08/30 17:57:09 UTC

Upgrade from 1.0.9 to 1.2.8

Greetings folks,

I'm faced with the need to update a 36 node cluster with roughly 25T of data on 
disk to a version of cassandra in the 1.2.x series. While it seems that 1.2.8 
will play nicely in the 1.0.9 cluster long enough to do a rolling upgrade, I'd 
still like to have a roll-back plan in case the rolling upgrade goes sideways.

I've tried to upgrade a single node in my dev cluster, then roll back using a 
snapshot taken previously, but things don't appear to be going smoothly. The 
node will rejoin the ring eventually, but not after spending some time in the 
"Joining" state as shown by "nodetool ring", and spewing a ton of error messages 
similar to the following:

ERROR [MutationStage:31] 2013-08-29 14:07:20,530 RowMutationVerbHandler.java 
(line 61) Error in row mutation
org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find cfId=1178

My test procedure is as follows:
1)  nodetool -h localhost snapshot
2)  nodetool -h localhost drain
3)  service cassandra stop
4)  back up cassandra configs
5)  remove cassandra 1.0.9
6)  install cassandra 1.2.8
7)  restore cassandra configs, alter them to remove configuration entries no 
longer used
8)  start cassandra 1.2.8, let it run for a bit, then drain/stop it
9)  remove cassandra 1.2.8
10) reinstall cassandra 1.0.9
11) restore original cassandra configs
12) remove any commit logs present
13) remove folders for system_auth and system_traces Keyspaces (since they don't 
seem to be present in 1.0.9)
14) Move snapshots back to where they should be for 1.0.9 and remove cass 1.2.8 data
   # cd /var/lib/cassandra/data/$KEYSPACE/
   # mv */snapshots/$TIMESTAMP/* .
   # find . -mindepth 1 -type d -exec rm -rf {} \;
   # cd /var/lib/cassandra/data/system
   # mv */snapshots/$TIMESTAMP/* .
   # find . -mindepth 1 -type d -exec rm -rf {} \;
15) start cassandra 1.0.9
16) observe cassandra system.log

Does anyone have any insight on things I may be doing wrong, or whether this is 
just an unavoidable pain point caused by rolling back? It seems that since there 
are no schema changes going on, the node should be able to just hop back into 
the cluster without error and without transitioning through the "Joining" state.

-- 



Mike Neir
Liquid Web, Inc.
Infrastructure Administrator

Re: Upgrade from 1.0.9 to 1.2.8

Posted by Jon Haddad <jo...@jonhaddad.com>.

Does your previous snapshot include the system keyspace?  I haven't tried upgrading from 1.0.x then rolling back, but it's possible there's some backwards incompatible changes.    Other than that, make sure you also rolled back your config files? 

On Aug 30, 2013, at 8:57 AM, Mike Neir <mi...@liquidweb.com> wrote:

> Greetings folks,
> 
> I'm faced with the need to update a 36 node cluster with roughly 25T of data on disk to a version of cassandra in the 1.2.x series. While it seems that 1.2.8 will play nicely in the 1.0.9 cluster long enough to do a rolling upgrade, I'd still like to have a roll-back plan in case the rolling upgrade goes sideways.
> 
> I've tried to upgrade a single node in my dev cluster, then roll back using a snapshot taken previously, but things don't appear to be going smoothly. The node will rejoin the ring eventually, but not after spending some time in the "Joining" state as shown by "nodetool ring", and spewing a ton of error messages similar to the following:
> 
> ERROR [MutationStage:31] 2013-08-29 14:07:20,530 RowMutationVerbHandler.java (line 61) Error in row mutation
> org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find cfId=1178
> 
> My test procedure is as follows:
> 1)  nodetool -h localhost snapshot
> 2)  nodetool -h localhost drain
> 3)  service cassandra stop
> 4)  back up cassandra configs
> 5)  remove cassandra 1.0.9
> 6)  install cassandra 1.2.8
> 7)  restore cassandra configs, alter them to remove configuration entries no longer used
> 8)  start cassandra 1.2.8, let it run for a bit, then drain/stop it
> 9)  remove cassandra 1.2.8
> 10) reinstall cassandra 1.0.9
> 11) restore original cassandra configs
> 12) remove any commit logs present
> 13) remove folders for system_auth and system_traces Keyspaces (since they don't seem to be present in 1.0.9)
> 14) Move snapshots back to where they should be for 1.0.9 and remove cass 1.2.8 data
>  # cd /var/lib/cassandra/data/$KEYSPACE/
>  # mv */snapshots/$TIMESTAMP/* .
>  # find . -mindepth 1 -type d -exec rm -rf {} \;
>  # cd /var/lib/cassandra/data/system
>  # mv */snapshots/$TIMESTAMP/* .
>  # find . -mindepth 1 -type d -exec rm -rf {} \;
> 15) start cassandra 1.0.9
> 16) observe cassandra system.log
> 
> Does anyone have any insight on things I may be doing wrong, or whether this is just an unavoidable pain point caused by rolling back? It seems that since there are no schema changes going on, the node should be able to just hop back into the cluster without error and without transitioning through the "Joining" state.
> 
> -- 
> 
> 
> 
> Mike Neir
> Liquid Web, Inc.
> Infrastructure Administrator
>

Re: Upgrade from 1.0.9 to 1.2.8

Posted by Mike Neir <mi...@liquidweb.com>.

Ah. I was going by the upgrade recommendations in the NEWS.txt file in the 
cassandra source tree, which didn't make mention of that version (1.0.11) 
whatsoever. I didn't see any show-stoppers that would have prevented me from 
going straight from 1.0.9 to 1.2.x.

https://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=NEWS.txt;hb=refs/tags/cassandra-1.2.4

Looks like a multi-step upgrade is the way I'll be proceeding. Thanks for the 
insight, everyone.

MN

On 09/02/2013 11:04 AM, Jeremiah D Jordan wrote:
>> 1.0.9 -> 1.0.12 -> 1.1.12 -> 1.2.x?
>
> Because this fix in 1.0.11:
> * fix 1.0.x node join to mixed version cluster, other nodes >= 1.1 (CASSANDRA-4195)
>
> -Jeremiah

-- 

Mike Neir
Liquid Web, Inc.
Infrastructure Administrator

Re: Upgrade from 1.0.9 to 1.2.8

Posted by Jeremiah D Jordan <je...@gmail.com>.

> 1.0.9 -> 1.0.12 -> 1.1.12 -> 1.2.x?

Because this fix in 1.0.11:
* fix 1.0.x node join to mixed version cluster, other nodes >= 1.1 (CASSANDRA-4195)

-Jeremiah

On Aug 30, 2013, at 2:00 PM, Mike Neir <mi...@liquidweb.com> wrote:

> Is there anything that you can link that describes the pitfalls you mention? I'd like a bit more information. Just for clarity's sake, are you recommending 1.0.9 -> 1.0.12 -> 1.1.12 -> 1.2.x? Or would  1.0.9 -> 1.1.12 -> 1.2.x suffice?
> 
> Regarding the placement strategy mentioned in a different post, I'm using the Simple placement strategy, with the RackInferringSnitch. How does that play into the bugs mentioned previously about cross-DC replication?
> 
> MN
> 
> On 08/30/2013 01:28 PM, Jeremiah D Jordan wrote:
>> You probably want to go to 1.0.11/12 first no matter what.  If you want the least chance of issue you should then go to 1.1.12.  While there is a high probability that going from 1.0.X->1.2 will work. You have the best chance at no failures if you go through 1.1.12.  There are some edge cases that can cause errors if you don't do that.
>> 
>> -Jeremiah
>> 
>>

Re: Upgrade from 1.0.9 to 1.2.8

Posted by Mike Neir <mi...@liquidweb.com>.

Is there anything that you can link that describes the pitfalls you mention? I'd 
like a bit more information. Just for clarity's sake, are you recommending 1.0.9 
-> 1.0.12 -> 1.1.12 -> 1.2.x? Or would  1.0.9 -> 1.1.12 -> 1.2.x suffice?

Regarding the placement strategy mentioned in a different post, I'm using the 
Simple placement strategy, with the RackInferringSnitch. How does that play into 
the bugs mentioned previously about cross-DC replication?

MN

On 08/30/2013 01:28 PM, Jeremiah D Jordan wrote:
> You probably want to go to 1.0.11/12 first no matter what.  If you want the least chance of issue you should then go to 1.1.12.  While there is a high probability that going from 1.0.X->1.2 will work. You have the best chance at no failures if you go through 1.1.12.  There are some edge cases that can cause errors if you don't do that.
>
> -Jeremiah
>
>

Re: Upgrade from 1.0.9 to 1.2.8

Posted by Jeremiah D Jordan <je...@gmail.com>.

You probably want to go to 1.0.11/12 first no matter what.  If you want the least chance of issue you should then go to 1.1.12.  While there is a high probability that going from 1.0.X->1.2 will work. You have the best chance at no failures if you go through 1.1.12.  There are some edge cases that can cause errors if you don't do that.

-Jeremiah


On Aug 30, 2013, at 11:41 AM, Mike Neir <mi...@liquidweb.com> wrote:

> In my testing, mixing 1.0.9 and 1.2.8 seems to work fine as long as there is no need to do streaming operations (move/repair/bootstrap/etc). The reading I've done confirms that 1.2.x should be network-compatible with 1.0.x, sans streaming operations. Datastax seems to indicate here that doing a rolling upgrade from 1.0.x to 1.2.x is viable:
> 
> http://www.datastax.com/documentation/cassandra/1.2/webhelp/#upgrade/upgradeC_c.html#concept_ds_nht_czr_ck
> 
> See the second bullet point in the Prerequisites section.
> 
> I'll look into 1.2.9. It wasn't available when I started my testing.
> 
> MN
> 
> On 08/30/2013 12:15 PM, Robert Coli wrote:
>> On Fri, Aug 30, 2013 at 8:57 AM, Mike Neir <mike@liquidweb.com
>> <ma...@liquidweb.com>> wrote:
>> 
>>    I'm faced with the need to update a 36 node cluster with roughly 25T of data
>>    on disk to a version of cassandra in the 1.2.x series. While it seems that
>>    1.2.8 will play nicely in the 1.0.9 cluster long enough to do a rolling
>>    upgrade, I'd still like to have a roll-back plan in case the rolling upgrade
>>    goes sideways.
>> 
>> 
>> Upgrading two major versions online is an unsupported operation. I would not
>> expect it to work. Is there a detailed reason you believe it should work between
>> these versions? Also, instead of 1.2.8 you should upgrade to 1.2.9, released
>> yesterday. Everyone headed to 2.0 has to pass through 1.2.9.
>> 
>> =Rob
> 
> -- 
> 
> 
> 
> Mike Neir
> Liquid Web, Inc.
> Infrastructure Administrator
>

Re: Upgrade from 1.0.9 to 1.2.8

Posted by Mohit Anchlia <mo...@gmail.com>.

If you have multiple DCs you at least want to upgrade to 1.0.11. There is
an issue where you might get errors during cross DC replication.

On Fri, Aug 30, 2013 at 9:41 AM, Mike Neir <mi...@liquidweb.com> wrote:

> In my testing, mixing 1.0.9 and 1.2.8 seems to work fine as long as there
> is no need to do streaming operations (move/repair/bootstrap/etc). The
> reading I've done confirms that 1.2.x should be network-compatible with
> 1.0.x, sans streaming operations. Datastax seems to indicate here that
> doing a rolling upgrade from 1.0.x to 1.2.x is viable:
>
> http://www.datastax.com/**documentation/cassandra/1.2/**
> webhelp/#upgrade/upgradeC_c.**html#concept_ds_nht_czr_ck<http://www.datastax.com/documentation/cassandra/1.2/webhelp/#upgrade/upgradeC_c.html%23concept_ds_nht_czr_ck>
>
> See the second bullet point in the Prerequisites section.
>
> I'll look into 1.2.9. It wasn't available when I started my testing.
>
> MN
>
>
> On 08/30/2013 12:15 PM, Robert Coli wrote:
>
>> On Fri, Aug 30, 2013 at 8:57 AM, Mike Neir <mike@liquidweb.com
>>  <ma...@liquidweb.com>> wrote:
>>
>>     I'm faced with the need to update a 36 node cluster with roughly 25T
>> of data
>>     on disk to a version of cassandra in the 1.2.x series. While it seems
>> that
>>     1.2.8 will play nicely in the 1.0.9 cluster long enough to do a
>> rolling
>>     upgrade, I'd still like to have a roll-back plan in case the rolling
>> upgrade
>>     goes sideways.
>>
>>
>> Upgrading two major versions online is an unsupported operation. I would
>> not
>> expect it to work. Is there a detailed reason you believe it should work
>> between
>> these versions? Also, instead of 1.2.8 you should upgrade to 1.2.9,
>> released
>> yesterday. Everyone headed to 2.0 has to pass through 1.2.9.
>>
>> =Rob
>>
>
>  --
>
>
>
> Mike Neir
> Liquid Web, Inc.
> Infrastructure Administrator
>
>

Re: Upgrade from 1.0.9 to 1.2.8

Posted by Mike Neir <mi...@liquidweb.com>.

In my testing, mixing 1.0.9 and 1.2.8 seems to work fine as long as there is no 
need to do streaming operations (move/repair/bootstrap/etc). The reading I've 
done confirms that 1.2.x should be network-compatible with 1.0.x, sans streaming 
operations. Datastax seems to indicate here that doing a rolling upgrade from 
1.0.x to 1.2.x is viable:

http://www.datastax.com/documentation/cassandra/1.2/webhelp/#upgrade/upgradeC_c.html#concept_ds_nht_czr_ck

See the second bullet point in the Prerequisites section.

I'll look into 1.2.9. It wasn't available when I started my testing.

MN

On 08/30/2013 12:15 PM, Robert Coli wrote:
> On Fri, Aug 30, 2013 at 8:57 AM, Mike Neir <mike@liquidweb.com
> <ma...@liquidweb.com>> wrote:
>
>     I'm faced with the need to update a 36 node cluster with roughly 25T of data
>     on disk to a version of cassandra in the 1.2.x series. While it seems that
>     1.2.8 will play nicely in the 1.0.9 cluster long enough to do a rolling
>     upgrade, I'd still like to have a roll-back plan in case the rolling upgrade
>     goes sideways.
>
>
> Upgrading two major versions online is an unsupported operation. I would not
> expect it to work. Is there a detailed reason you believe it should work between
> these versions? Also, instead of 1.2.8 you should upgrade to 1.2.9, released
> yesterday. Everyone headed to 2.0 has to pass through 1.2.9.
>
> =Rob

-- 

Mike Neir
Liquid Web, Inc.
Infrastructure Administrator

Re: Upgrade from 1.0.9 to 1.2.8

Posted by Robert Coli <rc...@eventbrite.com>.

On Fri, Aug 30, 2013 at 8:57 AM, Mike Neir <mi...@liquidweb.com> wrote:

> I'm faced with the need to update a 36 node cluster with roughly 25T of
> data on disk to a version of cassandra in the 1.2.x series. While it seems
> that 1.2.8 will play nicely in the 1.0.9 cluster long enough to do a
> rolling upgrade, I'd still like to have a roll-back plan in case the
> rolling upgrade goes sideways.
>

Upgrading two major versions online is an unsupported operation. I would
not expect it to work. Is there a detailed reason you believe it should
work between these versions? Also, instead of 1.2.8 you should upgrade to
1.2.9, released yesterday. Everyone headed to 2.0 has to pass through 1.2.9.

=Rob

Re: Upgrade from 1.0.9 to 1.2.8

Posted by Jon Haddad <jo...@jonhaddad.com>.

Sorry, I didn't see the test procedure, it's still early.

On Aug 30, 2013, at 8:57 AM, Mike Neir <mi...@liquidweb.com> wrote:

> Greetings folks,
> 
> I'm faced with the need to update a 36 node cluster with roughly 25T of data on disk to a version of cassandra in the 1.2.x series. While it seems that 1.2.8 will play nicely in the 1.0.9 cluster long enough to do a rolling upgrade, I'd still like to have a roll-back plan in case the rolling upgrade goes sideways.
> 
> I've tried to upgrade a single node in my dev cluster, then roll back using a snapshot taken previously, but things don't appear to be going smoothly. The node will rejoin the ring eventually, but not after spending some time in the "Joining" state as shown by "nodetool ring", and spewing a ton of error messages similar to the following:
> 
> ERROR [MutationStage:31] 2013-08-29 14:07:20,530 RowMutationVerbHandler.java (line 61) Error in row mutation
> org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find cfId=1178
> 
> My test procedure is as follows:
> 1)  nodetool -h localhost snapshot
> 2)  nodetool -h localhost drain
> 3)  service cassandra stop
> 4)  back up cassandra configs
> 5)  remove cassandra 1.0.9
> 6)  install cassandra 1.2.8
> 7)  restore cassandra configs, alter them to remove configuration entries no longer used
> 8)  start cassandra 1.2.8, let it run for a bit, then drain/stop it
> 9)  remove cassandra 1.2.8
> 10) reinstall cassandra 1.0.9
> 11) restore original cassandra configs
> 12) remove any commit logs present
> 13) remove folders for system_auth and system_traces Keyspaces (since they don't seem to be present in 1.0.9)
> 14) Move snapshots back to where they should be for 1.0.9 and remove cass 1.2.8 data
>  # cd /var/lib/cassandra/data/$KEYSPACE/
>  # mv */snapshots/$TIMESTAMP/* .
>  # find . -mindepth 1 -type d -exec rm -rf {} \;
>  # cd /var/lib/cassandra/data/system
>  # mv */snapshots/$TIMESTAMP/* .
>  # find . -mindepth 1 -type d -exec rm -rf {} \;
> 15) start cassandra 1.0.9
> 16) observe cassandra system.log
> 
> Does anyone have any insight on things I may be doing wrong, or whether this is just an unavoidable pain point caused by rolling back? It seems that since there are no schema changes going on, the node should be able to just hop back into the cluster without error and without transitioning through the "Joining" state.
> 
> -- 
> 
> 
> 
> Mike Neir
> Liquid Web, Inc.
> Infrastructure Administrator
>