You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by cass savy <ca...@gmail.com> on 2015/12/09 22:26:24 UTC

Switching to Vnodes

We want to move our clusters to use Vnodes. I know the docs online say we
have to create new DC with vnodes and move to new dC and decommission old
one. We use DSE for our c* clusters.C* version is 2.0.14

1. Is there any other way to migrate existing nodes to vnodes?
2. What are the known issues with that approach?
3. We have few secondary indexes in the keyspace, will that cause any
issues with moving to vnodes?

4. What are the issues encountered after moving to vnodes in PROD
5. anybody recommend Vnodes for Spark nodes.

*Approach : Moving to new DC with vnodes enabled*:
When I tested it for  a  keyspace which has secondary indexes, rebuilds on
Vnode enabled Datacenter takes days and don't know when it completes or
even if it will complete. I tried with 256,32,64 tokens per node but no
luck.

Please advise.

Re: Switching to Vnodes

Posted by cass savy <ca...@gmail.com>.
Victor,
We have 21 nodes in 3 DC, spark DC has 3 nodes. Primary datacenter nodes
has 300gb of data.

What the num_tokens you have in prod cluster? are u using default 256?

On Wed, Dec 9, 2015 at 2:19 PM, Victor Chen <vi...@gmail.com> wrote:

> I have a 12 node cluster in prod using vnodes and C* version 2.18. I have
> never used rebuild, and instead prefer bootstrapping new nodes, even if it
> means there is additional shuffling of data and cleanup needed on the
> initial nodes in each DC, mostly b/c you can tell when bootstrapping is
> finished. w/ rebuild, like you have observed, there's really no way to be
> sure, apart from comparing load. I have no experience with vnodes and spark
> though, so I can't really comment on that. We are using secondary indexes
> though, and aren't seeing many issues. How much data do you have per node
> and in total and how many nodes?
>
> On Wed, Dec 9, 2015 at 4:26 PM, cass savy <ca...@gmail.com> wrote:
>
>> We want to move our clusters to use Vnodes. I know the docs online say we
>> have to create new DC with vnodes and move to new dC and decommission old
>> one. We use DSE for our c* clusters.C* version is 2.0.14
>>
>> 1. Is there any other way to migrate existing nodes to vnodes?
>> 2. What are the known issues with that approach?
>> 3. We have few secondary indexes in the keyspace, will that cause any
>> issues with moving to vnodes?
>>
>> 4. What are the issues encountered after moving to vnodes in PROD
>> 5. anybody recommend Vnodes for Spark nodes.
>>
>> *Approach : Moving to new DC with vnodes enabled*:
>> When I tested it for  a  keyspace which has secondary indexes, rebuilds
>> on Vnode enabled Datacenter takes days and don't know when it completes or
>> even if it will complete. I tried with 256,32,64 tokens per node but no
>> luck.
>>
>> Please advise.
>>
>>
>>
>
>

Re: Switching to Vnodes

Posted by Victor Chen <vi...@gmail.com>.
I have a 12 node cluster in prod using vnodes and C* version 2.18. I have
never used rebuild, and instead prefer bootstrapping new nodes, even if it
means there is additional shuffling of data and cleanup needed on the
initial nodes in each DC, mostly b/c you can tell when bootstrapping is
finished. w/ rebuild, like you have observed, there's really no way to be
sure, apart from comparing load. I have no experience with vnodes and spark
though, so I can't really comment on that. We are using secondary indexes
though, and aren't seeing many issues. How much data do you have per node
and in total and how many nodes?

On Wed, Dec 9, 2015 at 4:26 PM, cass savy <ca...@gmail.com> wrote:

> We want to move our clusters to use Vnodes. I know the docs online say we
> have to create new DC with vnodes and move to new dC and decommission old
> one. We use DSE for our c* clusters.C* version is 2.0.14
>
> 1. Is there any other way to migrate existing nodes to vnodes?
> 2. What are the known issues with that approach?
> 3. We have few secondary indexes in the keyspace, will that cause any
> issues with moving to vnodes?
>
> 4. What are the issues encountered after moving to vnodes in PROD
> 5. anybody recommend Vnodes for Spark nodes.
>
> *Approach : Moving to new DC with vnodes enabled*:
> When I tested it for  a  keyspace which has secondary indexes, rebuilds on
> Vnode enabled Datacenter takes days and don't know when it completes or
> even if it will complete. I tried with 256,32,64 tokens per node but no
> luck.
>
> Please advise.
>
>
>

Re: Switching to Vnodes

Posted by cass savy <ca...@gmail.com>.
Thanks Jeff for detailed clarifications.

We tried rebuild data in Spark DC nodes one node at a time in May again but
ran into issues. prod has 3 DC, DC1(9 nodes)  and DC2 (9 nodes) are only C*
and DC3 has spark with 3 nodes and vnodes enabled with numtokens=32

We also dropped few unused indexes. Then tried rebuild data in spark node 1
using  DC2 which takes less traffic so that DC1 will not be impacted. When
we tried this approach in May, we got  latency hit on DC1. We stopped
rebuild job and still saw the latency impact. Latency issue stopped once we
remove  replication to Spark nodes at keyspace level.

we need to rebuild data in new DC which has spark enabled. I am thinking of
this new approach to see if this will avoid latency impact.

1. Enable replication for keyspace with rep factor =1 for Analytics DC
2. keep the vnodes to be 32 and
3. do the nodetool rebuild in first spark DC node  using DC2

OR

1. disable vnodes and use manual tokens for all 3 nodes in Spark DC
2. cleanup all old data/logs
3. enable replication factor =1 for keyspace for Analytics DC
4. rebuild data using DC2

I want to ensure that  new approach does not send traffic to DC1 at all.
 let me know if there is any other options or if we need to change any of
the above approach.

Appreciate feedback.


On Wed, Dec 9, 2015 at 2:37 PM, Jeff Jirsa <je...@crowdstrike.com>
wrote:

> Streaming with vnodes is not always pleasant – rebuild uses streaming (as
> does bootstrap, repair, and decommission). The rebuild delay you see may or
> may not be related to that. It could also be that the streams timed out,
> and you don’t have a stream timeout set. Are you seeing data move? Are the
> new nodes busy compacting? Secondary indexes themselves may not cause
> problems, but there are cases where very large indexes (due to very large
> partitions or unusual cardinalities) may case problems.
>
>
>    1. The other way is to backup your data, make a new vnode cluster, and
>    load your data in with sstableloader
>    2. Known issues are that streaming with vnodes creates a lot of small
>    tables and does a lot more work than streaming without vnodes
>    3. Not necessarily
>    4. See #2
>
>
> From: cass savy
> Reply-To: "user@cassandra.apache.org"
> Date: Wednesday, December 9, 2015 at 1:26 PM
> To: "user@cassandra.apache.org"
> Subject: Switching to Vnodes
>
> We want to move our clusters to use Vnodes. I know the docs online say we
> have to create new DC with vnodes and move to new dC and decommission old
> one. We use DSE for our c* clusters.C* version is 2.0.14
>
> 1. Is there any other way to migrate existing nodes to vnodes?
> 2. What are the known issues with that approach?
> 3. We have few secondary indexes in the keyspace, will that cause any
> issues with moving to vnodes?
>
> 4. What are the issues encountered after moving to vnodes in PROD
> 5. anybody recommend Vnodes for Spark nodes.
>
> *Approach : Moving to new DC with vnodes enabled*:
> When I tested it for  a  keyspace which has secondary indexes, rebuilds on
> Vnode enabled Datacenter takes days and don't know when it completes or
> even if it will complete. I tried with 256,32,64 tokens per node but no
> luck.
>
> Please advise.
>
>
>

Re: Switching to Vnodes

Posted by Jeff Jirsa <je...@crowdstrike.com>.
Streaming with vnodes is not always pleasant – rebuild uses streaming (as does bootstrap, repair, and decommission). The rebuild delay you see may or may not be related to that. It could also be that the streams timed out, and you don’t have a stream timeout set. Are you seeing data move? Are the new nodes busy compacting? Secondary indexes themselves may not cause problems, but there are cases where very large indexes (due to very large partitions or unusual cardinalities) may case problems.

The other way is to backup your data, make a new vnode cluster, and load your data in with sstableloader
Known issues are that streaming with vnodes creates a lot of small tables and does a lot more work than streaming without vnodes
Not necessarily
See #2
From:  cass savy
Reply-To:  "user@cassandra.apache.org"
Date:  Wednesday, December 9, 2015 at 1:26 PM
To:  "user@cassandra.apache.org"
Subject:  Switching to Vnodes

We want to move our clusters to use Vnodes. I know the docs online say we have to create new DC with vnodes and move to new dC and decommission old one. We use DSE for our c* clusters.C* version is 2.0.14 

1. Is there any other way to migrate existing nodes to vnodes?
2. What are the known issues with that approach?
3. We have few secondary indexes in the keyspace, will that cause any issues with moving to vnodes?

4. What are the issues encountered after moving to vnodes in PROD
5. anybody recommend Vnodes for Spark nodes.

Approach : Moving to new DC with vnodes enabled:
When I tested it for  a  keyspace which has secondary indexes, rebuilds on Vnode enabled Datacenter takes days and don't know when it completes or even if it will complete. I tried with 256,32,64 tokens per node but no luck. 

Please advise.