You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Keith Wright <kw...@nanigans.com> on 2014/02/05 20:00:47 UTC

Move to smaller nodes

Hi all,

    Earlier today I emailed about issues we’re having bootstrapping nodes into our existing cluster.  One theory we have is that our nodes are simply too large and are considering moving to more, smaller nodes.  However, because we cannot bootstrap it makes it difficult.  As I see it, we have two options (assuming the new cluster is already setup and running):

 *   Add the new cluster as another data center.  I am already using NetworkTopologySnitch.  The existing nodes would then stream their data over to the new cluster.  Couple questions here:
    *   I assume its ok if data centers have different node sizes (I.e. Smaller) and more nodes?
    *   Is adding a new data center to a cluster basically a large bootstrap in which case its quite possible our existing bootstrap issues would present themselves?  Documentation via nodetool rebuild indicates it is.
 *   Use SSTableLoader to bulk load data on the existing cluster to the new one.  To do, I would need to do the following steps:
    *   Have clients start dual writes to new and old cluster (only read from old)
    *   Backup data on the nodes.  We are using JNA so this should not result in double the data space usage, correct?  I assume I can then simply ftp the hard links to another server?
    *   Run SSTableLoader on each of the SSTables taken from the backup to the new cluster
    *   When SSTableLoader has completed, new cluster will have all of the data and old cluster can be decommissioned

Thoughts?  Any automated tools around the SSTableLoader option?

Thanks

Re: Move to smaller nodes

Posted by Robert Coli <rc...@eventbrite.com>.
On Wed, Feb 5, 2014 at 11:18 AM, Keith Wright <kw...@nanigans.com> wrote:

> Hi Rob, thanks for the response!  Interestingly if we run a repair we
> don't see the bootstrap issue so I am considering doing the empty node
> repair methodology.
>

Weird. Bootstrap should not be more fragile than repair.

>
>    - Update our JRE, we are using 1.7.0_17 and I believe we're up to
>    1.7.0_54
>
> Unlikely to be the cause, but couldn't hurt.

>
>    - GC tuning as it does appear that we're suffering from GC issues.  We
>    could just allocate more eden space and then revert after the bootstrap
>    succeeds
>
> This is a generalized cause of streaming failures, so sure. I'm not so
sure about the specific proposed solution, but yes, it's possible that
tuning your GC will make bootstrap possible.

>
>    - As I mentioned, don't load data via bootstrap but instead use
>    repair.  With bootstrap disabled in Vnodes, will the node still assign
>    itself tokens?
>
> My belief is yes, and I just re-read the code and that's what it appears
to do in the auto_bootstrap:false-with-num_tokens_set case.

You can verify for yourself by reading the code here :

https://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=blob;f=src/java/org/apache/cassandra/service/StorageService.java;hb=HEAD

There are other methods of doing this which would be available to you if
you were not using vnodes. Unfortunately the use of vnodes seems to
preclude any copy-the-sstables method of cluster shifting short of copying
all sstables to all nodes, globally uniquing their filenames first, and
then running cleanup.

***** IMPORTANT WARNING ******

https://issues.apache.org/jira/browse/CASSANDRA-6615

Affects versions of Cassandra 1.2.x before 1.2.14, including the version of
Cassandra you are running. It WILL REMOVE NODES FROM YOUR CLUSTER AND MAKE
IT HARD TO GET THEM BACK IN IF YOU USE AUTO_BOOTSTRAP:FALSE UNDER CERTAIN
CIRCUMSTANCES.

If you plan to use auto_bootstrap:false to deal with your issue, I VERY
STRONGLY RECOMMEND UPGRADING TO 1.2.14 BEFORE DOING SO.

(The above warning applies to anyone using auto_bootstrap:false in 1.2.x,
either stop doing that or upgrade to 1.2.14 ASAP.)

***** IMPORTANT WARNING ******

=Rob

Re: Move to smaller nodes

Posted by Robert Coli <rc...@eventbrite.com>.
On Wed, Feb 5, 2014 at 11:22 AM, Keith Wright <kw...@nanigans.com> wrote:

> Also there is one more option which is we could upgrade to 2.0 in the
> hopes that our issue is fixed as part of the streaming overhaul.  But
> seeing as this is a production cluster and 2.0 does not yet appear
> production ready, that makes me nervous.
>

My canonical opinion on this question :

https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/

This opinion has not changed as a result of the observed stability of the
existing releases in the 2.x series.

=Rob

Re: Move to smaller nodes

Posted by Keith Wright <kw...@nanigans.com>.
Also there is one more option which is we could upgrade to 2.0 in the hopes that our issue is fixed as part of the streaming overhaul.  But seeing as this is a production cluster and 2.0 does not yet appear production ready, that makes me nervous.

From: Keith Wright <kw...@nanigans.com>>
Reply-To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Date: Wednesday, February 5, 2014 at 2:18 PM
To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Cc: Don Jackson <dj...@nanigans.com>>, Dave Carroll <dc...@nanigans.com>>
Subject: Re: Move to smaller nodes

Hi Rob, thanks for the response!  Interestingly if we run a repair we don’t see the bootstrap issue so I am considering doing the empty node repair methodology.  Its just that it usually takes a week for that to work.  As I see it, we could try the following to fix the bootstrap issue:

 *   Update our JRE, we are using 1.7.0_17 and I believe we’re up to 1.7.0_54
 *   GC tuning as it does appear that we’re suffering from GC issues.  We could just allocate more eden space and then revert after the bootstrap succeeds
 *   As I mentioned, don’t load data via bootstrap but instead use repair.  With bootstrap disabled in Vnodes, will the node still assign itself tokens?

Thanks

From: Robert Coli <rc...@eventbrite.com>>
Reply-To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Date: Wednesday, February 5, 2014 at 2:10 PM
To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Cc: Don Jackson <dj...@nanigans.com>>, Dave Carroll <dc...@nanigans.com>>
Subject: Re: Move to smaller nodes

http://www.palominodb.com/blog/2012/09/25/bulk-loading-options-cassandra

Re: Move to smaller nodes

Posted by Keith Wright <kw...@nanigans.com>.
Hi Rob, thanks for the response!  Interestingly if we run a repair we don’t see the bootstrap issue so I am considering doing the empty node repair methodology.  Its just that it usually takes a week for that to work.  As I see it, we could try the following to fix the bootstrap issue:

 *   Update our JRE, we are using 1.7.0_17 and I believe we’re up to 1.7.0_54
 *   GC tuning as it does appear that we’re suffering from GC issues.  We could just allocate more eden space and then revert after the bootstrap succeeds
 *   As I mentioned, don’t load data via bootstrap but instead use repair.  With bootstrap disabled in Vnodes, will the node still assign itself tokens?

Thanks

From: Robert Coli <rc...@eventbrite.com>>
Reply-To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Date: Wednesday, February 5, 2014 at 2:10 PM
To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Cc: Don Jackson <dj...@nanigans.com>>, Dave Carroll <dc...@nanigans.com>>
Subject: Re: Move to smaller nodes

http://www.palominodb.com/blog/2012/09/25/bulk-loading-options-cassandra

Re: Move to smaller nodes

Posted by Robert Coli <rc...@eventbrite.com>.
On Wed, Feb 5, 2014 at 11:00 AM, Keith Wright <kw...@nanigans.com> wrote:

>     Earlier today I emailed about issues we're having bootstrapping nodes
> into our existing cluster.  One theory we have is that our nodes are simply
> too large and are considering moving to more, smaller nodes.  However,
> because we cannot bootstrap it makes it difficult.  As I see it, we have
> two options (assuming the new cluster is already setup and running):
>

First, the problems you describe seem unusual. There are other people with
1T node sizes who are able to add and remove nodes from their clusters.

Streaming is fragile, especially so before fixes in 1.2 and the wholesale
re-write in 2.0. But it is rare for streaming to be so fragile that
bootstrap never succeeds. If I were you I would expend some more effort on
trying to understand why you are in this somewhat unusual case before
taking the extreme step of resizing your nodes.

Rebuild operation is in fact effectively the same as bootstrap. Repair of
an empty node is also similar, in that it will stream a large set of
SSTables and that streaming could hang.

SSTableLoader... also uses streaming. Why will your new SSTableloader,
streaming to your new cluster, be less likely to hang a stream than your
current cluster?

Depending on the migration in question, you could try the
"copy-the-sstables-and-then-cleanup" method described here :

http://www.palominodb.com/blog/2012/09/25/bulk-loading-options-cassandra

Other than not using SSTableLoader, it is effectively the dual writes
solution you propose.

=Rob