You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Vasileios Vlachos <va...@gmail.com> on 2015/09/09 21:09:52 UTC

Upgrade Limitations Question

Hello All,

I've asked this on the Cassandra IRC channel earlier, but I am asking the
list as well so that I get feedback from more people.

We have recently upgraded from Cassandra 1.2.19 to 2.0.16 and we are
currently in the stage where all boxes are running 2.0.16 but nt
upgradesstables has not yet been performed on all of them. Reading the
DataStax docs [1] :

   - Do not issue these types of queries during a rolling restart: DDL,
   TRUNCATE

In our case the restart bit has already been done. Do you know if it would
be a bad idea to create a new KS before all nodes have upgraded their
SSTables? Our concern is the time it takes to go through every single node,
run the upgradesstables and wait until it's all done. We think creating a
new KS wouldn't be a problem (someone on the channel said the same thing,
but recommended that we play safe and wait until it's all done). But if
anyone has any catastrophic experiences in doing so we would appreciate
their input.

Many thanks,
Vasilis

[1]
http://docs.datastax.com/en/upgrade/doc/upgrade/cassandra/upgradeCassandraDetails.html

Re: Upgrade Limitations Question

Posted by Vasileios Vlachos <va...@gmail.com>.

Any thoughts anyone?
On 9 Sep 2015 20:09, "Vasileios Vlachos" <va...@gmail.com> wrote:

> Hello All,
>
> I've asked this on the Cassandra IRC channel earlier, but I am asking the
> list as well so that I get feedback from more people.
>
> We have recently upgraded from Cassandra 1.2.19 to 2.0.16 and we are
> currently in the stage where all boxes are running 2.0.16 but nt
> upgradesstables has not yet been performed on all of them. Reading the
> DataStax docs [1] :
>
>    - Do not issue these types of queries during a rolling restart: DDL,
>    TRUNCATE
>
> In our case the restart bit has already been done. Do you know if it would
> be a bad idea to create a new KS before all nodes have upgraded their
> SSTables? Our concern is the time it takes to go through every single node,
> run the upgradesstables and wait until it's all done. We think creating a
> new KS wouldn't be a problem (someone on the channel said the same thing,
> but recommended that we play safe and wait until it's all done). But if
> anyone has any catastrophic experiences in doing so we would appreciate
> their input.
>
> Many thanks,
> Vasilis
>
> [1]
> http://docs.datastax.com/en/upgrade/doc/upgrade/cassandra/upgradeCassandraDetails.html
>

Re: Upgrade Limitations Question

Posted by Vasileios Vlachos <va...@gmail.com>.

Thank you very much for pointing this out Victor. Really useful to know.

On Wed, Sep 16, 2015 at 4:55 PM, Victor Chen <vi...@gmail.com>
wrote:

> Yes, you can examine the actual sstables in your cassandra data dir. That
> will tell you what version sstables you have on that node.
>
> You can refer to this link:
> http://www.bajb.net/2013/03/cassandra-sstable-format-version-numbers/
> which I found via google search phrase "sstable versions" to see which
> version you need to look for-- the relevant section of the link says:
>
>> Cassandra stores the version of the SSTable within the filename,
>> following the format *Keyspace-ColumnFamily-(optional tmp
>> marker-)SSTableFormat-generation*
>>
>
> FYI-- and at least in the cassandra-2.1 branch of the source code-- you
> can find sstable format generation descriptions in comments of
> Descriptor.java. Looks like for your old and new versions, you'd be looking
> for something like:
>
> for 1.2.1:
> $ find <path to datadir> -name "*-ib-*" -ls
>
> for 2.0.1:
> $ find <path to datadir> -name "*-jb-*" -ls
>
>
> On Wed, Sep 16, 2015 at 10:02 AM, Vasileios Vlachos <
> vasileiosvlachos@gmail.com> wrote:
>
>>
>> Hello Rob and thanks for your reply,
>>
>> At the end we had to wait for the upgradesstables ti finish on every
>> node. Just to eliminate the possibility of this being the reason of any
>> weird behaviour after the upgrade. However, this process might take a long
>> time in a cluster with a large number of nodes which means no new work can
>> be done for that period.
>>
>> 1) TRUNCATE requires all known nodes to be available to succeed, if you
>>> are restarting one, it won't be available.
>>>
>>
>> I suppose all means all, not all replicas here, is that right? Not
>> directly related to the original question, but that might explain why we
>> end up with peculiar behaviour some times when we run TRUNCATE. We've now
>> taken the approach DROP it and do it again when possible (even though this
>> is still problematic when using the same CF name).
>>
>>
>>> 2) in theory, the newly upgraded nodes might not get the DDL schema
>>> update properly due to some incompatible change
>>>
>>> To check for 2, do :
>>> "
>>> nodetool gossipinfo | grep SCHEMA |sort | uniq -c | sort -n
>>> "
>>>
>>> Before and after and make sure the schema propagates correctly. There
>>> should be a new version on all nodes between each DDL change, if there is
>>> you will likely be able to see the new schema on all the new nodes.
>>>
>>>
>> Yeas, this makes perfect sense. We monitor the schema changes every
>> minutes across the cluster with Nagios by checking the JMX console. It is
>> an important thing to monitor in several situations (running migrations for
>> example, or during upgrades like you describe here).
>>
>> Is there a way to find out if the upgradesstables has been run against a
>> particular node or not?
>>
>> Many Thanks,
>> Vasilis
>>
>
>

Re: Upgrade Limitations Question

Posted by Victor Chen <vi...@gmail.com>.

Yes, you can examine the actual sstables in your cassandra data dir. That
will tell you what version sstables you have on that node.

You can refer to this link:
http://www.bajb.net/2013/03/cassandra-sstable-format-version-numbers/ which
I found via google search phrase "sstable versions" to see which version
you need to look for-- the relevant section of the link says:

> Cassandra stores the version of the SSTable within the filename, following
> the format *Keyspace-ColumnFamily-(optional tmp
> marker-)SSTableFormat-generation*
>

FYI-- and at least in the cassandra-2.1 branch of the source code-- you can
find sstable format generation descriptions in comments of Descriptor.java.
Looks like for your old and new versions, you'd be looking for something
like:

for 1.2.1:
$ find <path to datadir> -name "*-ib-*" -ls

for 2.0.1:
$ find <path to datadir> -name "*-jb-*" -ls


On Wed, Sep 16, 2015 at 10:02 AM, Vasileios Vlachos <
vasileiosvlachos@gmail.com> wrote:

>
> Hello Rob and thanks for your reply,
>
> At the end we had to wait for the upgradesstables ti finish on every node.
> Just to eliminate the possibility of this being the reason of any weird
> behaviour after the upgrade. However, this process might take a long time
> in a cluster with a large number of nodes which means no new work can be
> done for that period.
>
> 1) TRUNCATE requires all known nodes to be available to succeed, if you
>> are restarting one, it won't be available.
>>
>
> I suppose all means all, not all replicas here, is that right? Not
> directly related to the original question, but that might explain why we
> end up with peculiar behaviour some times when we run TRUNCATE. We've now
> taken the approach DROP it and do it again when possible (even though this
> is still problematic when using the same CF name).
>
>
>> 2) in theory, the newly upgraded nodes might not get the DDL schema
>> update properly due to some incompatible change
>>
>> To check for 2, do :
>> "
>> nodetool gossipinfo | grep SCHEMA |sort | uniq -c | sort -n
>> "
>>
>> Before and after and make sure the schema propagates correctly. There
>> should be a new version on all nodes between each DDL change, if there is
>> you will likely be able to see the new schema on all the new nodes.
>>
>>
> Yeas, this makes perfect sense. We monitor the schema changes every
> minutes across the cluster with Nagios by checking the JMX console. It is
> an important thing to monitor in several situations (running migrations for
> example, or during upgrades like you describe here).
>
> Is there a way to find out if the upgradesstables has been run against a
> particular node or not?
>
> Many Thanks,
> Vasilis
>

Re: Upgrade Limitations Question

Posted by Robert Coli <rc...@eventbrite.com>.

On Wed, Sep 16, 2015 at 7:02 AM, Vasileios Vlachos <
vasileiosvlachos@gmail.com> wrote:

>
> At the end we had to wait for the upgradesstables ti finish on every node.
> Just to eliminate the possibility of this being the reason of any weird
> behaviour after the upgrade. However, this process might take a long time
> in a cluster with a large number of nodes which means no new work can be
> done for that period.
>

Yes, this is the worst case scenario and it's pretty bad for large clusters
/ large data-size per node.

1) TRUNCATE requires all known nodes to be available to succeed, if you are
>> restarting one, it won't be available.
>>
>
> I suppose all means all, not all replicas here, is that right? Not
> directly related to the original question, but that might explain why we
> end up with peculiar behaviour some times when we run TRUNCATE. We've now
> taken the approach DROP it and do it again when possible (even though this
> is still problematic when using the same CF name)
>

Pretty sure that TRUNCATE and DROP have the same behavior wrt node
availability. Yes, I mean all nodes which are supposed to replicate that
table.


> Is there a way to find out if the upgradesstables has been run against a
>> particular node or not?
>>
>
If you run it and it immediately completes [1], it has probably been run
before.

=Rob
[1] https://issues.apache.org/jira/browse/CASSANDRA-5366 - 1.2.4 - "NOOP on
upgradesstables for already upgraded node"

Re: Upgrade Limitations Question

Posted by Vasileios Vlachos <va...@gmail.com>.

Hello Rob and thanks for your reply,

At the end we had to wait for the upgradesstables ti finish on every node.
Just to eliminate the possibility of this being the reason of any weird
behaviour after the upgrade. However, this process might take a long time
in a cluster with a large number of nodes which means no new work can be
done for that period.

1) TRUNCATE requires all known nodes to be available to succeed, if you are
> restarting one, it won't be available.
>

I suppose all means all, not all replicas here, is that right? Not directly
related to the original question, but that might explain why we end up with
peculiar behaviour some times when we run TRUNCATE. We've now taken the
approach DROP it and do it again when possible (even though this is still
problematic when using the same CF name).


> 2) in theory, the newly upgraded nodes might not get the DDL schema update
> properly due to some incompatible change
>
> To check for 2, do :
> "
> nodetool gossipinfo | grep SCHEMA |sort | uniq -c | sort -n
> "
>
> Before and after and make sure the schema propagates correctly. There
> should be a new version on all nodes between each DDL change, if there is
> you will likely be able to see the new schema on all the new nodes.
>
>
Yeas, this makes perfect sense. We monitor the schema changes every minutes
across the cluster with Nagios by checking the JMX console. It is an
important thing to monitor in several situations (running migrations for
example, or during upgrades like you describe here).

Is there a way to find out if the upgradesstables has been run against a
particular node or not?

Many Thanks,
Vasilis

Re: Upgrade Limitations Question

Posted by Robert Coli <rc...@eventbrite.com>.

On Wed, Sep 9, 2015 at 12:09 PM, Vasileios Vlachos <
vasileiosvlachos@gmail.com> wrote:

> In our case the restart bit has already been done. Do you know if it would
> be a bad idea to create a new KS before all nodes have upgraded their
> SSTables?
>

This particular case is probably fine. The reason for the caution against
issuing certain kinds of queries during a rolling restart is twofold :

1) TRUNCATE requires all known nodes to be available to succeed, if you are
restarting one, it won't be available.
2) in theory, the newly upgraded nodes might not get the DDL schema update
properly due to some incompatible change

To check for 2, do :
"
nodetool gossipinfo | grep SCHEMA |sort | uniq -c | sort -n
"

Before and after and make sure the schema propagates correctly. There
should be a new version on all nodes between each DDL change, if there is
you will likely be able to see the new schema on all the new nodes.

Test in a test environment with the specific versions before trying in
production.

=Rob