You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Chad Johnson <ch...@gmail.com> on 2011/08/10 23:54:16 UTC

Bootstrapping

Hi,

I have a 15 node cluster with a RF=3 running version 0.7.5. I am planning to perform some filesystem maintenance on each of the nodes. The filesystem happens to be on the partition holding the keyspace data. The maintenance means that all the SSTables for our keyspace will be destroyed. Rather than backup all the data to a backup disk and restore, my plan was to bring the node down, perform the maintenance, keep the original initial_token, set auto_bootstrap to true and let Cassandra repopulate the data through the streaming process. Nodes in the cluster will have a load of about 250 to 300GB

I have a couple questions regarding bootstrapping and the streaming process.

1. I realize this will put a heavier I/O load on the replication nodes to AntiCompact the CF's, but what kind of load does this put on the JVM. Are there any gotchas I should be aware of to prevent long gc times or OOM exceptions on the replication nodes.
2. If the initial_token is not changed, is it correct to assume that anticompaction will occur only on the replication nodes and not throughout the cluster as the key space has not been modified.
3. Documentation at http://wiki.apache.org/cassandra/Operations says that the thrift port is not active on the bootstrapping node during the streaming process. What is the process that brings the node up-to-date with mutations that occurred during the time of the bootstrap? Maybe it's only reads that are disabled and writes are allowed?
4. What happens if schema changes (add/drop column families) occur in the cluster while the bootstrap is in progress?

Thanks for your help

Chad

Re: Bootstrapping

Posted by aaron morton <aa...@thelastpickle.com>.
First, upgrade from 0.7.5 if possible. This is as good a reason as any https://github.com/apache/cassandra/blob/cassandra-0.7.8/CHANGES.txt#L58

Can you copy the SSTables off node and then just bring it back ? It will be *a lot* faster than use nodetool repair. (drain the node first to clear the commit log). Or if you have a spare machine perform a rolling migration.

If at all possible I would try to do it as an upgrade described above. It will be much much easier. 

If you plan to turn a node off and clear it's data you should remove the nodes token from the ring. You can either use nodetool decommission which will distribute the data around the ring, or turn it off and then use nodetool remove token which will not. 

> 1. I realize this will put a heavier I/O load on the replication nodes to AntiCompact the CF's, but what kind of load does this put on the JVM. Are there any gotchas I should be aware of to prevent long gc times or OOM exceptions on the replication nodes.
We don't have the AnitCompaction step any more. If your app is stable I would assume the repair process would be to. Do your normal repair processed complete ok ?

> 3. Documentation at http://wiki.apache.org/cassandra/Operations says that the thrift port is not active on the bootstrapping node during the streaming process. What is the process that brings the node up-to-date with mutations that occurred during the time of the bootstrap? Maybe it's only reads that are disabled and writes are allowed?

Thrift is the connection the client uses, disabling it means clients cannot write to it. The node will announce it's intention to take ownership of a token range in the ring when the bootstrap starts. From that point on other nodes will include it in write requests but not read requests. During that time your data is replicated to RF+1 nodes. 
 
> 4. What happens if schema changes (add/drop column families) occur in the cluster while the bootstrap is in progress?
They will be distributed to the node when it comes back. Until it gets the new updates it will log ERRORs for mutations to non existent CF's. Best advice is do not make those changes. 

Cheers

-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 11 Aug 2011, at 09:54, Chad Johnson wrote:

> Hi,
> 
> I have a 15 node cluster with a RF=3 running version 0.7.5. I am planning to perform some filesystem maintenance on each of the nodes. The filesystem happens to be on the partition holding the keyspace data. The maintenance means that all the SSTables for our keyspace will be destroyed. Rather than backup all the data to a backup disk and restore, my plan was to bring the node down, perform the maintenance, keep the original initial_token, set auto_bootstrap to true and let Cassandra repopulate the data through the streaming process. Nodes in the cluster will have a load of about 250 to 300GB
> 
> I have a couple questions regarding bootstrapping and the streaming process.
> 
> 1. I realize this will put a heavier I/O load on the replication nodes to AntiCompact the CF's, but what kind of load does this put on the JVM. Are there any gotchas I should be aware of to prevent long gc times or OOM exceptions on the replication nodes.
> 2. If the initial_token is not changed, is it correct to assume that anticompaction will occur only on the replication nodes and not throughout the cluster as the key space has not been modified.
> 3. Documentation at http://wiki.apache.org/cassandra/Operations says that the thrift port is not active on the bootstrapping node during the streaming process. What is the process that brings the node up-to-date with mutations that occurred during the time of the bootstrap? Maybe it's only reads that are disabled and writes are allowed?
> 4. What happens if schema changes (add/drop column families) occur in the cluster while the bootstrap is in progress?
> 
> Thanks for your help
> 
> Chad