You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Maxim Parkachov <la...@gmail.com> on 2020/01/30 15:05:26 UTC

How to reduce vnodes without downtime

Hi everyone,

with discussion about reducing default vnodes in version 4.0 I would like
to ask, what would be optimal procedure to perform reduction of vnodes in
existing 3.11.x cluster which was set up with default value 256. Cluster
has 2 DC with 5 nodes each and RF=3. There is one more restriction, I could
not add more servers, nor to create additional DC, everything is physical.
This should be done without downtime.

My idea for such procedure would be

for each node:
- decommission node
- set auto_bootstrap to true and vnodes to 4
- start and wait till node joins cluster
- run cleanup on rest of nodes in cluster
- run repair on whole cluster (not sure if needed after cleanup)
- set auto_bootstrap to false
repeat for each node

rolling restart of cluster
cluster repair

Is this sounds right ? My concern is that after decommission, node will
start on the same IP which could create some confusion.

Regards,
Maxim.

Re: [EXTERNAL] How to reduce vnodes without downtime

Posted by Sergio <la...@gmail.com>.

Thanks Erick!

Best,

Sergio

On Sun, Feb 2, 2020, 10:07 PM Erick Ramirez <fl...@gmail.com> wrote:

> If you are after more details about the trade-offs between different sized
>> token values, please see the discussion on the dev mailing list: "[Discuss]
>> num_tokens default in Cassandra 4.0
>> <https://www.mail-archive.com/search?l=dev%40cassandra.apache.org&q=subject%3A%22%5C%5BDiscuss%5C%5D+num_tokens+default+in+Cassandra+4.0%22&o=oldest>
>> ".
>>
>> Regards,
>> Anthony
>>
>> On Sat, 1 Feb 2020 at 10:07, Sergio <la...@gmail.com> wrote:
>>
>>>
>>> https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html This
>>> is the article with 4 token recommendations.
>>> @Erick Ramirez. which is the dev thread for the default 32 tokens
>>> recommendation?
>>>
>>> Thanks,
>>> Sergio
>>>
>>
> Sergio, my apologies for not replying. For some reason, your reply went to
> my spam folder and I didn't see it.
>
> Thanks, Anthony, for responding. I was indeed referring to that dev
> thread. Cheers!
>
>

Re: [EXTERNAL] How to reduce vnodes without downtime

Posted by Maxim Parkachov <la...@gmail.com>.

Hi guys,

thanks a lot for useful tips. I obviously underestimated complexity of such
change.

Thanks again,
Maxim.

>

Re: [EXTERNAL] How to reduce vnodes without downtime

Posted by Erick Ramirez <fl...@gmail.com>.

>
> If you are after more details about the trade-offs between different sized
> token values, please see the discussion on the dev mailing list: "[Discuss]
> num_tokens default in Cassandra 4.0
> <https://www.mail-archive.com/search?l=dev%40cassandra.apache.org&q=subject%3A%22%5C%5BDiscuss%5C%5D+num_tokens+default+in+Cassandra+4.0%22&o=oldest>
> ".
>
> Regards,
> Anthony
>
> On Sat, 1 Feb 2020 at 10:07, Sergio <la...@gmail.com> wrote:
>
>>
>> https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html This
>> is the article with 4 token recommendations.
>> @Erick Ramirez. which is the dev thread for the default 32 tokens
>> recommendation?
>>
>> Thanks,
>> Sergio
>>
>
Sergio, my apologies for not replying. For some reason, your reply went to
my spam folder and I didn't see it.

Thanks, Anthony, for responding. I was indeed referring to that dev thread.
Cheers!

Re: [EXTERNAL] How to reduce vnodes without downtime

Posted by Erick Ramirez <er...@datastax.com>.

>
> I am seeing some unbalancing and I was worried because I have 256 vnodes
> Weird stuff is related to this post where I don't find a match between the
> load and du -sh * for the node 10.1.31.60 and I was trying to figure out
> the reason, if it was due to the number of vnodes.


Out of curiosity, did you start with a smaller cluster then added new ones?
Just wondering if this is a case of not having ran nodetool cleanup
post-expansion.

Does Cassandra keep a copy of the data per rack so if I need to keep the
> things balanced and I would have to add 3 racks at the time in a single
> Datacenter keep the things balanced?


The output you posted shows that all nodes are in the same rack but yes, C*
will place a replica in each rack so that each rack has a full copy.
Caveats apply such as RF=3 and 3 racks in the DC.

Is it better to keep a single Rack with a single Datacenter in 3 different
> availability zones with replication factor = 3 or to have for each
> Datacenter: 1 Rack and 1 Availability Zone and eventually redirect the
> client to a fallback Datacenter in case one of the availability zone is not
> reachable


If you have RF=3 and EC2 instances in 3 AZs, you can choose to allocate the
instances to a logical C* rack based on their AZ. However, you would only
do this if each AZ has identical number of nodes in each. If the node count
in each rack is not identical, you will end up with some bloated nodes for
the same reason that C* will keep a full copy of data in each rack.

Using racks also means that when you want to expand the cluster, you need
to provision instances in each AZ. As above, if you only provision
instances in 2 of 3 AZs (for example) then nodes in 1 AZ will be "fatter"
than nodes in the other 2 AZs.

Failing over to another DC isn't really necessary if only 1 AZ is
unreachable if using a CL of LOCAL_QUORUM since 2 AZs is sufficient to
satisfy requests. But that really depends on your application's business
rules. Cheers!

>

Re: [EXTERNAL] How to reduce vnodes without downtime

Posted by Sergio <la...@gmail.com>.

Do you have any chance to take a look about this one?

Il giorno lun 3 feb 2020 alle ore 23:36 Sergio <la...@gmail.com>
ha scritto:

> After reading this
>
> *I would only consider moving a cluster to 4 tokens if it is larger than
> 100 nodes. If you read through the paper that Erick mentioned, written
> by Joe Lynch & Josh Snyder, they show that the num_tokens impacts the
> availability of large scale clusters.*
>
> and
>
> With 16 tokens, that is vastly improved, but you still have up to 64 nodes
> each node needs to query against, so you're again, hitting every node
> unless you go above ~96 nodes in the cluster (assuming 3 racks / AZs).  I
> wouldn't use 16 here, and I doubt any of you would either.  I've advocated
> for 4 tokens because you'd have overlap with only 16 nodes, which works
> well for small clusters as well as large.  Assuming I was creating a new
> cluster for myself (in a hypothetical brand new application I'm building) I
> would put this in production.  I have worked with several teams where I
> helped them put 4 token clusters in prod and it has worked very well.  We
> didn't see any wild imbalance issues.
>
> from
> https://lists.apache.org/thread.html/r55d8e68483aea30010a4162ae94e92bc63ed74d486e6c642ee66f6ae%40%3Cuser.cassandra.apache.org%3E
>
> Sorry guys, but I am kinda confused now which should be the recommended
> approach for the number of *vnodes*.
> Right now I am handling a cluster with just 9 nodes and a data size of
> 100-200GB per node.
>
> I am seeing some unbalancing and I was worried because I have 256 vnodes
>
> --  Address      Load       Tokens       Owns    Host ID
>             Rack
> UN  10.1.30.112  115.88 GiB  256          ?
> e5108a8e-cc2f-4914-a86e-fccf770e3f0f  us-east-1b
> UN  10.1.24.146  127.42 GiB  256          ?
> adf40fa3-86c4-42c3-bf0a-0f3ee1651696  us-east-1b
> UN  10.1.26.181  133.44 GiB  256          ?
> 0a8f07ba-a129-42b0-b73a-df649bd076ef  us-east-1b
> UN  10.1.29.202  113.33 GiB  256          ?
> d260d719-eae3-48ab-8a98-ea5c7b8f6eb6  us-east-1b
> UN  10.1.31.60   183.63 GiB  256          ?
> 3647fcca-688a-4851-ab15-df36819910f4  us-east-1b
> UN  10.1.24.175  118.09 GiB  256          ?
> bba1e80b-8156-4399-bd6a-1b5ccb47bddb  us-east-1b
> UN  10.1.29.223  137.24 GiB  256          ?
> 450fbb61-3817-419a-a4c6-4b652eb5ce01  us-east-1b
>
> Weird stuff is related to this post
> <https://lists.apache.org/thread.html/r92279215bb2e169848cc2b15d320b8a15bfcf1db2dae79d5662c97c5%40%3Cuser.cassandra.apache.org%3E>
> where I don't find a match between the load and du -sh * for the node
> 10.1.31.60 and I was trying to figure out the reason, if it was due to the
> number of vnodes.
>
> 2 Out-of-topic questions:
>
> 1)
> Does Cassandra keep a copy of the data per rack so if I need to keep the
> things balanced and I would have to add 3 racks at the time in a single
> Datacenter keep the things balanced?
>
> 2) Is it better to keep a single Rack with a single Datacenter in 3
> different availability zones with replication factor = 3 or to have for
> each Datacenter: 1 Rack and 1 Availability Zone and eventually redirect the
> client to a fallback Datacenter in case one of the availability zone is not
> reachable?
>
> Right now we are separating the Datacenter for reads from the one that
> handles the writes...
>
> Thanks for your help!
>
> Sergio
>
>
>
>
> Il giorno dom 2 feb 2020 alle ore 18:36 Anthony Grasso <
> anthony.grasso@gmail.com> ha scritto:
>
>> Hi Sergio,
>>
>> There is a misunderstanding here. My post makes no recommendation for the
>> value of num_tokens. Rather, it focuses on how to use
>> the allocate_tokens_for_keyspace setting when creating a new cluster.
>>
>> Whilst a value of 4 is used for num_tokens in the post, it was chosen for
>> demonstration purposes. Specifically it makes:
>>
>>    - the uneven token distribution in a small cluster very obvious,
>>    - identifying the endpoints displayed in nodetool ring easy, and
>>    - the initial_token setup less verbose and easier to follow.
>>
>> I will add an editorial note to the post with the above information
>> so there is no confusion about why 4 tokens were used.
>>
>> I would only consider moving a cluster to 4 tokens if it is larger than
>> 100 nodes. If you read through the paper that Erick mentioned, written
>> by Joe Lynch & Josh Snyder, they show that the num_tokens impacts the
>> availability of large scale clusters.
>>
>> If you are after more details about the trade-offs between different
>> sized token values, please see the discussion on the dev mailing list: "[Discuss]
>> num_tokens default in Cassandra 4.0
>> <https://www.mail-archive.com/search?l=dev%40cassandra.apache.org&q=subject%3A%22%5C%5BDiscuss%5C%5D+num_tokens+default+in+Cassandra+4.0%22&o=oldest>
>> ".
>>
>> Regards,
>> Anthony
>>
>> On Sat, 1 Feb 2020 at 10:07, Sergio <la...@gmail.com> wrote:
>>
>>>
>>> https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html This
>>> is the article with 4 token recommendations.
>>> @Erick Ramirez. which is the dev thread for the default 32 tokens
>>> recommendation?
>>>
>>> Thanks,
>>> Sergio
>>>
>>> Il giorno ven 31 gen 2020 alle ore 14:49 Erick Ramirez <
>>> flightctlr@gmail.com> ha scritto:
>>>
>>>> There's an active discussion going on right now in a separate dev
>>>> thread. The current "default recommendation" is 32 tokens. But there's a
>>>> push for 4 in combination with allocate_tokens_for_keyspace from Jon
>>>> Haddad & co (based on a paper from Joe Lynch & Josh Snyder).
>>>>
>>>> If you're satisfied with the results from your own testing, go with 4
>>>> tokens. And that's the key -- you must test, test, TEST! Cheers!
>>>>
>>>> On Sat, Feb 1, 2020 at 5:17 AM Arvinder Dhillon <dh...@gmail.com>
>>>> wrote:
>>>>
>>>>> What is recommended vnodes now? I read 8 in later cassandra 3.x
>>>>> Is the new recommendation 4 now even in version 3.x (asking for 3.11)?
>>>>> Thanks
>>>>>
>>>>> On Fri, Jan 31, 2020 at 9:49 AM Durity, Sean R <
>>>>> SEAN_R_DURITY@homedepot.com> wrote:
>>>>>
>>>>>> These are good clarifications and expansions.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Sean Durity
>>>>>>
>>>>>>
>>>>>>
>>>>>> *From:* Anthony Grasso <an...@gmail.com>
>>>>>> *Sent:* Thursday, January 30, 2020 7:25 PM
>>>>>> *To:* user <us...@cassandra.apache.org>
>>>>>> *Subject:* Re: [EXTERNAL] How to reduce vnodes without downtime
>>>>>>
>>>>>>
>>>>>>
>>>>>> Hi Maxim,
>>>>>>
>>>>>>
>>>>>>
>>>>>> Basically what Sean suggested is the way to do this without downtime.
>>>>>>
>>>>>>
>>>>>>
>>>>>> To clarify the, the *three* steps following the "Decommission each
>>>>>> node in the DC you are working on" step should be applied to *only*
>>>>>> the decommissioned nodes. So where it say "*all nodes*" or "*every
>>>>>> node*" it applies to only the decommissioned nodes.
>>>>>>
>>>>>>
>>>>>>
>>>>>> In addition, the step that says "Wipe data on all the nodes", I would
>>>>>> delete all files in the following directories on the decommissioned nodes.
>>>>>>
>>>>>>    - data (usually located in /var/lib/cassandra/data)
>>>>>>    - commitlogs (usually located in /var/lib/cassandra/commitlogs)
>>>>>>    - hints (usually located in /var/lib/casandra/hints)
>>>>>>    - saved_caches (usually located in
>>>>>>    /var/lib/cassandra/saved_caches)
>>>>>>
>>>>>>
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>> Anthony
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, 31 Jan 2020 at 03:05, Durity, Sean R <
>>>>>> SEAN_R_DURITY@homedepot.com> wrote:
>>>>>>
>>>>>> Your procedure won’t work very well. On the first node, if you
>>>>>> switched to 4, you would end up with only a tiny fraction of the data
>>>>>> (because the other nodes would still be at 256). I updated a large cluster
>>>>>> (over 150 nodes – 2 DCs) to smaller number of vnodes. The basic outline was
>>>>>> this:
>>>>>>
>>>>>>
>>>>>>
>>>>>>    - Stop all repairs
>>>>>>    - Make sure the app is running against one DC only
>>>>>>    - Change the replication settings on keyspaces to use only 1 DC
>>>>>>    (basically cutting off the other DC)
>>>>>>    - Decommission each node in the DC you are working on. Because
>>>>>>    the replication setting are changed, no streaming occurs. But it releases
>>>>>>    the token assignments
>>>>>>    - Wipe data on all the nodes
>>>>>>    - Update configuration on every node to your new settings,
>>>>>>    including auto_bootstrap = false
>>>>>>    - Start all nodes. They will choose tokens, but not stream any
>>>>>>    data
>>>>>>    - Update replication factor for all keyspaces to include the new
>>>>>>    DC
>>>>>>    - I disabled binary on those nodes to prevent app connections
>>>>>>    - Run nodetool reduild with -dc (other DC) on as many nodes as
>>>>>>    your system can safely handle until they are all rebuilt.
>>>>>>    - Re-enable binary (and app connections to the rebuilt DC)
>>>>>>    - Turn on repairs
>>>>>>    - Rest for a bit, then reverse the process for the remaining DCs
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Sean Durity – Staff Systems Engineer, Cassandra
>>>>>>
>>>>>>
>>>>>>
>>>>>> *From:* Maxim Parkachov <la...@gmail.com>
>>>>>> *Sent:* Thursday, January 30, 2020 10:05 AM
>>>>>> *To:* user@cassandra.apache.org
>>>>>> *Subject:* [EXTERNAL] How to reduce vnodes without downtime
>>>>>>
>>>>>>
>>>>>>
>>>>>> Hi everyone,
>>>>>>
>>>>>>
>>>>>>
>>>>>> with discussion about reducing default vnodes in version 4.0 I would
>>>>>> like to ask, what would be optimal procedure to perform reduction of vnodes
>>>>>> in existing 3.11.x cluster which was set up with default value 256. Cluster
>>>>>> has 2 DC with 5 nodes each and RF=3. There is one more restriction, I could
>>>>>> not add more servers, nor to create additional DC, everything is physical.
>>>>>> This should be done without downtime.
>>>>>>
>>>>>>
>>>>>>
>>>>>> My idea for such procedure would be
>>>>>>
>>>>>>
>>>>>>
>>>>>> for each node:
>>>>>>
>>>>>> - decommission node
>>>>>>
>>>>>> - set auto_bootstrap to true and vnodes to 4
>>>>>>
>>>>>> - start and wait till node joins cluster
>>>>>>
>>>>>> - run cleanup on rest of nodes in cluster
>>>>>>
>>>>>> - run repair on whole cluster (not sure if needed after cleanup)
>>>>>>
>>>>>> - set auto_bootstrap to false
>>>>>>
>>>>>> repeat for each node
>>>>>>
>>>>>>
>>>>>>
>>>>>> rolling restart of cluster
>>>>>>
>>>>>> cluster repair
>>>>>>
>>>>>>
>>>>>>
>>>>>> Is this sounds right ? My concern is that after decommission, node
>>>>>> will start on the same IP which could create some confusion.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Maxim.
>>>>>>
>>>>>>
>>>>>> ------------------------------
>>>>>>
>>>>>>
>>>>>> The information in this Internet Email is confidential and may be
>>>>>> legally privileged. It is intended solely for the addressee. Access to this
>>>>>> Email by anyone else is unauthorized. If you are not the intended
>>>>>> recipient, any disclosure, copying, distribution or any action taken or
>>>>>> omitted to be taken in reliance on it, is prohibited and may be unlawful.
>>>>>> When addressed to our clients any opinions or advice contained in this
>>>>>> Email are subject to the terms and conditions expressed in any applicable
>>>>>> governing The Home Depot terms of business or client engagement letter. The
>>>>>> Home Depot disclaims all responsibility and liability for the accuracy and
>>>>>> content of this attachment and for any damages or losses arising from any
>>>>>> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
>>>>>> items of a destructive nature, which may be contained in this attachment
>>>>>> and shall not be liable for direct, indirect, consequential or special
>>>>>> damages in connection with this e-mail message or its attachment.
>>>>>>
>>>>>>
>>>>>> ------------------------------
>>>>>>
>>>>>> The information in this Internet Email is confidential and may be
>>>>>> legally privileged. It is intended solely for the addressee. Access to this
>>>>>> Email by anyone else is unauthorized. If you are not the intended
>>>>>> recipient, any disclosure, copying, distribution or any action taken or
>>>>>> omitted to be taken in reliance on it, is prohibited and may be unlawful.
>>>>>> When addressed to our clients any opinions or advice contained in this
>>>>>> Email are subject to the terms and conditions expressed in any applicable
>>>>>> governing The Home Depot terms of business or client engagement letter. The
>>>>>> Home Depot disclaims all responsibility and liability for the accuracy and
>>>>>> content of this attachment and for any damages or losses arising from any
>>>>>> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
>>>>>> items of a destructive nature, which may be contained in this attachment
>>>>>> and shall not be liable for direct, indirect, consequential or special
>>>>>> damages in connection with this e-mail message or its attachment.
>>>>>>
>>>>>

Re: [EXTERNAL] How to reduce vnodes without downtime

Posted by Sergio <la...@gmail.com>.

After reading this

*I would only consider moving a cluster to 4 tokens if it is larger than
100 nodes. If you read through the paper that Erick mentioned, written
by Joe Lynch & Josh Snyder, they show that the num_tokens impacts the
availability of large scale clusters.*

and

With 16 tokens, that is vastly improved, but you still have up to 64 nodes
each node needs to query against, so you're again, hitting every node
unless you go above ~96 nodes in the cluster (assuming 3 racks / AZs).  I
wouldn't use 16 here, and I doubt any of you would either.  I've advocated
for 4 tokens because you'd have overlap with only 16 nodes, which works
well for small clusters as well as large.  Assuming I was creating a new
cluster for myself (in a hypothetical brand new application I'm building) I
would put this in production.  I have worked with several teams where I
helped them put 4 token clusters in prod and it has worked very well.  We
didn't see any wild imbalance issues.

from
https://lists.apache.org/thread.html/r55d8e68483aea30010a4162ae94e92bc63ed74d486e6c642ee66f6ae%40%3Cuser.cassandra.apache.org%3E

Sorry guys, but I am kinda confused now which should be the recommended
approach for the number of *vnodes*.
Right now I am handling a cluster with just 9 nodes and a data size of
100-200GB per node.

I am seeing some unbalancing and I was worried because I have 256 vnodes

--  Address      Load       Tokens       Owns    Host ID
            Rack
UN  10.1.30.112  115.88 GiB  256          ?
e5108a8e-cc2f-4914-a86e-fccf770e3f0f  us-east-1b
UN  10.1.24.146  127.42 GiB  256          ?
adf40fa3-86c4-42c3-bf0a-0f3ee1651696  us-east-1b
UN  10.1.26.181  133.44 GiB  256          ?
0a8f07ba-a129-42b0-b73a-df649bd076ef  us-east-1b
UN  10.1.29.202  113.33 GiB  256          ?
d260d719-eae3-48ab-8a98-ea5c7b8f6eb6  us-east-1b
UN  10.1.31.60   183.63 GiB  256          ?
3647fcca-688a-4851-ab15-df36819910f4  us-east-1b
UN  10.1.24.175  118.09 GiB  256          ?
bba1e80b-8156-4399-bd6a-1b5ccb47bddb  us-east-1b
UN  10.1.29.223  137.24 GiB  256          ?
450fbb61-3817-419a-a4c6-4b652eb5ce01  us-east-1b

Weird stuff is related to this post
<https://lists.apache.org/thread.html/r92279215bb2e169848cc2b15d320b8a15bfcf1db2dae79d5662c97c5%40%3Cuser.cassandra.apache.org%3E>
where I don't find a match between the load and du -sh * for the node
10.1.31.60 and I was trying to figure out the reason, if it was due to the
number of vnodes.

2 Out-of-topic questions:

1)
Does Cassandra keep a copy of the data per rack so if I need to keep the
things balanced and I would have to add 3 racks at the time in a single
Datacenter keep the things balanced?

2) Is it better to keep a single Rack with a single Datacenter in 3
different availability zones with replication factor = 3 or to have for
each Datacenter: 1 Rack and 1 Availability Zone and eventually redirect the
client to a fallback Datacenter in case one of the availability zone is not
reachable?

Right now we are separating the Datacenter for reads from the one that
handles the writes...

Thanks for your help!

Sergio




Il giorno dom 2 feb 2020 alle ore 18:36 Anthony Grasso <
anthony.grasso@gmail.com> ha scritto:

> Hi Sergio,
>
> There is a misunderstanding here. My post makes no recommendation for the
> value of num_tokens. Rather, it focuses on how to use
> the allocate_tokens_for_keyspace setting when creating a new cluster.
>
> Whilst a value of 4 is used for num_tokens in the post, it was chosen for
> demonstration purposes. Specifically it makes:
>
>    - the uneven token distribution in a small cluster very obvious,
>    - identifying the endpoints displayed in nodetool ring easy, and
>    - the initial_token setup less verbose and easier to follow.
>
> I will add an editorial note to the post with the above information
> so there is no confusion about why 4 tokens were used.
>
> I would only consider moving a cluster to 4 tokens if it is larger than
> 100 nodes. If you read through the paper that Erick mentioned, written
> by Joe Lynch & Josh Snyder, they show that the num_tokens impacts the
> availability of large scale clusters.
>
> If you are after more details about the trade-offs between different sized
> token values, please see the discussion on the dev mailing list: "[Discuss]
> num_tokens default in Cassandra 4.0
> <https://www.mail-archive.com/search?l=dev%40cassandra.apache.org&q=subject%3A%22%5C%5BDiscuss%5C%5D+num_tokens+default+in+Cassandra+4.0%22&o=oldest>
> ".
>
> Regards,
> Anthony
>
> On Sat, 1 Feb 2020 at 10:07, Sergio <la...@gmail.com> wrote:
>
>>
>> https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html This
>> is the article with 4 token recommendations.
>> @Erick Ramirez. which is the dev thread for the default 32 tokens
>> recommendation?
>>
>> Thanks,
>> Sergio
>>
>> Il giorno ven 31 gen 2020 alle ore 14:49 Erick Ramirez <
>> flightctlr@gmail.com> ha scritto:
>>
>>> There's an active discussion going on right now in a separate dev
>>> thread. The current "default recommendation" is 32 tokens. But there's a
>>> push for 4 in combination with allocate_tokens_for_keyspace from Jon
>>> Haddad & co (based on a paper from Joe Lynch & Josh Snyder).
>>>
>>> If you're satisfied with the results from your own testing, go with 4
>>> tokens. And that's the key -- you must test, test, TEST! Cheers!
>>>
>>> On Sat, Feb 1, 2020 at 5:17 AM Arvinder Dhillon <dh...@gmail.com>
>>> wrote:
>>>
>>>> What is recommended vnodes now? I read 8 in later cassandra 3.x
>>>> Is the new recommendation 4 now even in version 3.x (asking for 3.11)?
>>>> Thanks
>>>>
>>>> On Fri, Jan 31, 2020 at 9:49 AM Durity, Sean R <
>>>> SEAN_R_DURITY@homedepot.com> wrote:
>>>>
>>>>> These are good clarifications and expansions.
>>>>>
>>>>>
>>>>>
>>>>> Sean Durity
>>>>>
>>>>>
>>>>>
>>>>> *From:* Anthony Grasso <an...@gmail.com>
>>>>> *Sent:* Thursday, January 30, 2020 7:25 PM
>>>>> *To:* user <us...@cassandra.apache.org>
>>>>> *Subject:* Re: [EXTERNAL] How to reduce vnodes without downtime
>>>>>
>>>>>
>>>>>
>>>>> Hi Maxim,
>>>>>
>>>>>
>>>>>
>>>>> Basically what Sean suggested is the way to do this without downtime.
>>>>>
>>>>>
>>>>>
>>>>> To clarify the, the *three* steps following the "Decommission each
>>>>> node in the DC you are working on" step should be applied to *only*
>>>>> the decommissioned nodes. So where it say "*all nodes*" or "*every
>>>>> node*" it applies to only the decommissioned nodes.
>>>>>
>>>>>
>>>>>
>>>>> In addition, the step that says "Wipe data on all the nodes", I would
>>>>> delete all files in the following directories on the decommissioned nodes.
>>>>>
>>>>>    - data (usually located in /var/lib/cassandra/data)
>>>>>    - commitlogs (usually located in /var/lib/cassandra/commitlogs)
>>>>>    - hints (usually located in /var/lib/casandra/hints)
>>>>>    - saved_caches (usually located in /var/lib/cassandra/saved_caches)
>>>>>
>>>>>
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Anthony
>>>>>
>>>>>
>>>>>
>>>>> On Fri, 31 Jan 2020 at 03:05, Durity, Sean R <
>>>>> SEAN_R_DURITY@homedepot.com> wrote:
>>>>>
>>>>> Your procedure won’t work very well. On the first node, if you
>>>>> switched to 4, you would end up with only a tiny fraction of the data
>>>>> (because the other nodes would still be at 256). I updated a large cluster
>>>>> (over 150 nodes – 2 DCs) to smaller number of vnodes. The basic outline was
>>>>> this:
>>>>>
>>>>>
>>>>>
>>>>>    - Stop all repairs
>>>>>    - Make sure the app is running against one DC only
>>>>>    - Change the replication settings on keyspaces to use only 1 DC
>>>>>    (basically cutting off the other DC)
>>>>>    - Decommission each node in the DC you are working on. Because the
>>>>>    replication setting are changed, no streaming occurs. But it releases the
>>>>>    token assignments
>>>>>    - Wipe data on all the nodes
>>>>>    - Update configuration on every node to your new settings,
>>>>>    including auto_bootstrap = false
>>>>>    - Start all nodes. They will choose tokens, but not stream any data
>>>>>    - Update replication factor for all keyspaces to include the new DC
>>>>>    - I disabled binary on those nodes to prevent app connections
>>>>>    - Run nodetool reduild with -dc (other DC) on as many nodes as
>>>>>    your system can safely handle until they are all rebuilt.
>>>>>    - Re-enable binary (and app connections to the rebuilt DC)
>>>>>    - Turn on repairs
>>>>>    - Rest for a bit, then reverse the process for the remaining DCs
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Sean Durity – Staff Systems Engineer, Cassandra
>>>>>
>>>>>
>>>>>
>>>>> *From:* Maxim Parkachov <la...@gmail.com>
>>>>> *Sent:* Thursday, January 30, 2020 10:05 AM
>>>>> *To:* user@cassandra.apache.org
>>>>> *Subject:* [EXTERNAL] How to reduce vnodes without downtime
>>>>>
>>>>>
>>>>>
>>>>> Hi everyone,
>>>>>
>>>>>
>>>>>
>>>>> with discussion about reducing default vnodes in version 4.0 I would
>>>>> like to ask, what would be optimal procedure to perform reduction of vnodes
>>>>> in existing 3.11.x cluster which was set up with default value 256. Cluster
>>>>> has 2 DC with 5 nodes each and RF=3. There is one more restriction, I could
>>>>> not add more servers, nor to create additional DC, everything is physical.
>>>>> This should be done without downtime.
>>>>>
>>>>>
>>>>>
>>>>> My idea for such procedure would be
>>>>>
>>>>>
>>>>>
>>>>> for each node:
>>>>>
>>>>> - decommission node
>>>>>
>>>>> - set auto_bootstrap to true and vnodes to 4
>>>>>
>>>>> - start and wait till node joins cluster
>>>>>
>>>>> - run cleanup on rest of nodes in cluster
>>>>>
>>>>> - run repair on whole cluster (not sure if needed after cleanup)
>>>>>
>>>>> - set auto_bootstrap to false
>>>>>
>>>>> repeat for each node
>>>>>
>>>>>
>>>>>
>>>>> rolling restart of cluster
>>>>>
>>>>> cluster repair
>>>>>
>>>>>
>>>>>
>>>>> Is this sounds right ? My concern is that after decommission, node
>>>>> will start on the same IP which could create some confusion.
>>>>>
>>>>>
>>>>>
>>>>> Regards,
>>>>>
>>>>> Maxim.
>>>>>
>>>>>
>>>>> ------------------------------
>>>>>
>>>>>
>>>>> The information in this Internet Email is confidential and may be
>>>>> legally privileged. It is intended solely for the addressee. Access to this
>>>>> Email by anyone else is unauthorized. If you are not the intended
>>>>> recipient, any disclosure, copying, distribution or any action taken or
>>>>> omitted to be taken in reliance on it, is prohibited and may be unlawful.
>>>>> When addressed to our clients any opinions or advice contained in this
>>>>> Email are subject to the terms and conditions expressed in any applicable
>>>>> governing The Home Depot terms of business or client engagement letter. The
>>>>> Home Depot disclaims all responsibility and liability for the accuracy and
>>>>> content of this attachment and for any damages or losses arising from any
>>>>> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
>>>>> items of a destructive nature, which may be contained in this attachment
>>>>> and shall not be liable for direct, indirect, consequential or special
>>>>> damages in connection with this e-mail message or its attachment.
>>>>>
>>>>>
>>>>> ------------------------------
>>>>>
>>>>> The information in this Internet Email is confidential and may be
>>>>> legally privileged. It is intended solely for the addressee. Access to this
>>>>> Email by anyone else is unauthorized. If you are not the intended
>>>>> recipient, any disclosure, copying, distribution or any action taken or
>>>>> omitted to be taken in reliance on it, is prohibited and may be unlawful.
>>>>> When addressed to our clients any opinions or advice contained in this
>>>>> Email are subject to the terms and conditions expressed in any applicable
>>>>> governing The Home Depot terms of business or client engagement letter. The
>>>>> Home Depot disclaims all responsibility and liability for the accuracy and
>>>>> content of this attachment and for any damages or losses arising from any
>>>>> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
>>>>> items of a destructive nature, which may be contained in this attachment
>>>>> and shall not be liable for direct, indirect, consequential or special
>>>>> damages in connection with this e-mail message or its attachment.
>>>>>
>>>>

Re: [EXTERNAL] How to reduce vnodes without downtime

Posted by Sergio <la...@gmail.com>.

Thanks Anthony!

I will read more about it

Best,

Sergio



Il giorno dom 2 feb 2020 alle ore 18:36 Anthony Grasso <
anthony.grasso@gmail.com> ha scritto:

> Hi Sergio,
>
> There is a misunderstanding here. My post makes no recommendation for the
> value of num_tokens. Rather, it focuses on how to use
> the allocate_tokens_for_keyspace setting when creating a new cluster.
>
> Whilst a value of 4 is used for num_tokens in the post, it was chosen for
> demonstration purposes. Specifically it makes:
>
>    - the uneven token distribution in a small cluster very obvious,
>    - identifying the endpoints displayed in nodetool ring easy, and
>    - the initial_token setup less verbose and easier to follow.
>
> I will add an editorial note to the post with the above information
> so there is no confusion about why 4 tokens were used.
>
> I would only consider moving a cluster to 4 tokens if it is larger than
> 100 nodes. If you read through the paper that Erick mentioned, written
> by Joe Lynch & Josh Snyder, they show that the num_tokens impacts the
> availability of large scale clusters.
>
> If you are after more details about the trade-offs between different sized
> token values, please see the discussion on the dev mailing list: "[Discuss]
> num_tokens default in Cassandra 4.0
> <https://www.mail-archive.com/search?l=dev%40cassandra.apache.org&q=subject%3A%22%5C%5BDiscuss%5C%5D+num_tokens+default+in+Cassandra+4.0%22&o=oldest>
> ".
>
> Regards,
> Anthony
>
> On Sat, 1 Feb 2020 at 10:07, Sergio <la...@gmail.com> wrote:
>
>>
>> https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html This
>> is the article with 4 token recommendations.
>> @Erick Ramirez. which is the dev thread for the default 32 tokens
>> recommendation?
>>
>> Thanks,
>> Sergio
>>
>> Il giorno ven 31 gen 2020 alle ore 14:49 Erick Ramirez <
>> flightctlr@gmail.com> ha scritto:
>>
>>> There's an active discussion going on right now in a separate dev
>>> thread. The current "default recommendation" is 32 tokens. But there's a
>>> push for 4 in combination with allocate_tokens_for_keyspace from Jon
>>> Haddad & co (based on a paper from Joe Lynch & Josh Snyder).
>>>
>>> If you're satisfied with the results from your own testing, go with 4
>>> tokens. And that's the key -- you must test, test, TEST! Cheers!
>>>
>>> On Sat, Feb 1, 2020 at 5:17 AM Arvinder Dhillon <dh...@gmail.com>
>>> wrote:
>>>
>>>> What is recommended vnodes now? I read 8 in later cassandra 3.x
>>>> Is the new recommendation 4 now even in version 3.x (asking for 3.11)?
>>>> Thanks
>>>>
>>>> On Fri, Jan 31, 2020 at 9:49 AM Durity, Sean R <
>>>> SEAN_R_DURITY@homedepot.com> wrote:
>>>>
>>>>> These are good clarifications and expansions.
>>>>>
>>>>>
>>>>>
>>>>> Sean Durity
>>>>>
>>>>>
>>>>>
>>>>> *From:* Anthony Grasso <an...@gmail.com>
>>>>> *Sent:* Thursday, January 30, 2020 7:25 PM
>>>>> *To:* user <us...@cassandra.apache.org>
>>>>> *Subject:* Re: [EXTERNAL] How to reduce vnodes without downtime
>>>>>
>>>>>
>>>>>
>>>>> Hi Maxim,
>>>>>
>>>>>
>>>>>
>>>>> Basically what Sean suggested is the way to do this without downtime.
>>>>>
>>>>>
>>>>>
>>>>> To clarify the, the *three* steps following the "Decommission each
>>>>> node in the DC you are working on" step should be applied to *only*
>>>>> the decommissioned nodes. So where it say "*all nodes*" or "*every
>>>>> node*" it applies to only the decommissioned nodes.
>>>>>
>>>>>
>>>>>
>>>>> In addition, the step that says "Wipe data on all the nodes", I would
>>>>> delete all files in the following directories on the decommissioned nodes.
>>>>>
>>>>>    - data (usually located in /var/lib/cassandra/data)
>>>>>    - commitlogs (usually located in /var/lib/cassandra/commitlogs)
>>>>>    - hints (usually located in /var/lib/casandra/hints)
>>>>>    - saved_caches (usually located in /var/lib/cassandra/saved_caches)
>>>>>
>>>>>
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Anthony
>>>>>
>>>>>
>>>>>
>>>>> On Fri, 31 Jan 2020 at 03:05, Durity, Sean R <
>>>>> SEAN_R_DURITY@homedepot.com> wrote:
>>>>>
>>>>> Your procedure won’t work very well. On the first node, if you
>>>>> switched to 4, you would end up with only a tiny fraction of the data
>>>>> (because the other nodes would still be at 256). I updated a large cluster
>>>>> (over 150 nodes – 2 DCs) to smaller number of vnodes. The basic outline was
>>>>> this:
>>>>>
>>>>>
>>>>>
>>>>>    - Stop all repairs
>>>>>    - Make sure the app is running against one DC only
>>>>>    - Change the replication settings on keyspaces to use only 1 DC
>>>>>    (basically cutting off the other DC)
>>>>>    - Decommission each node in the DC you are working on. Because the
>>>>>    replication setting are changed, no streaming occurs. But it releases the
>>>>>    token assignments
>>>>>    - Wipe data on all the nodes
>>>>>    - Update configuration on every node to your new settings,
>>>>>    including auto_bootstrap = false
>>>>>    - Start all nodes. They will choose tokens, but not stream any data
>>>>>    - Update replication factor for all keyspaces to include the new DC
>>>>>    - I disabled binary on those nodes to prevent app connections
>>>>>    - Run nodetool reduild with -dc (other DC) on as many nodes as
>>>>>    your system can safely handle until they are all rebuilt.
>>>>>    - Re-enable binary (and app connections to the rebuilt DC)
>>>>>    - Turn on repairs
>>>>>    - Rest for a bit, then reverse the process for the remaining DCs
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Sean Durity – Staff Systems Engineer, Cassandra
>>>>>
>>>>>
>>>>>
>>>>> *From:* Maxim Parkachov <la...@gmail.com>
>>>>> *Sent:* Thursday, January 30, 2020 10:05 AM
>>>>> *To:* user@cassandra.apache.org
>>>>> *Subject:* [EXTERNAL] How to reduce vnodes without downtime
>>>>>
>>>>>
>>>>>
>>>>> Hi everyone,
>>>>>
>>>>>
>>>>>
>>>>> with discussion about reducing default vnodes in version 4.0 I would
>>>>> like to ask, what would be optimal procedure to perform reduction of vnodes
>>>>> in existing 3.11.x cluster which was set up with default value 256. Cluster
>>>>> has 2 DC with 5 nodes each and RF=3. There is one more restriction, I could
>>>>> not add more servers, nor to create additional DC, everything is physical.
>>>>> This should be done without downtime.
>>>>>
>>>>>
>>>>>
>>>>> My idea for such procedure would be
>>>>>
>>>>>
>>>>>
>>>>> for each node:
>>>>>
>>>>> - decommission node
>>>>>
>>>>> - set auto_bootstrap to true and vnodes to 4
>>>>>
>>>>> - start and wait till node joins cluster
>>>>>
>>>>> - run cleanup on rest of nodes in cluster
>>>>>
>>>>> - run repair on whole cluster (not sure if needed after cleanup)
>>>>>
>>>>> - set auto_bootstrap to false
>>>>>
>>>>> repeat for each node
>>>>>
>>>>>
>>>>>
>>>>> rolling restart of cluster
>>>>>
>>>>> cluster repair
>>>>>
>>>>>
>>>>>
>>>>> Is this sounds right ? My concern is that after decommission, node
>>>>> will start on the same IP which could create some confusion.
>>>>>
>>>>>
>>>>>
>>>>> Regards,
>>>>>
>>>>> Maxim.
>>>>>
>>>>>
>>>>> ------------------------------
>>>>>
>>>>>
>>>>> The information in this Internet Email is confidential and may be
>>>>> legally privileged. It is intended solely for the addressee. Access to this
>>>>> Email by anyone else is unauthorized. If you are not the intended
>>>>> recipient, any disclosure, copying, distribution or any action taken or
>>>>> omitted to be taken in reliance on it, is prohibited and may be unlawful.
>>>>> When addressed to our clients any opinions or advice contained in this
>>>>> Email are subject to the terms and conditions expressed in any applicable
>>>>> governing The Home Depot terms of business or client engagement letter. The
>>>>> Home Depot disclaims all responsibility and liability for the accuracy and
>>>>> content of this attachment and for any damages or losses arising from any
>>>>> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
>>>>> items of a destructive nature, which may be contained in this attachment
>>>>> and shall not be liable for direct, indirect, consequential or special
>>>>> damages in connection with this e-mail message or its attachment.
>>>>>
>>>>>
>>>>> ------------------------------
>>>>>
>>>>> The information in this Internet Email is confidential and may be
>>>>> legally privileged. It is intended solely for the addressee. Access to this
>>>>> Email by anyone else is unauthorized. If you are not the intended
>>>>> recipient, any disclosure, copying, distribution or any action taken or
>>>>> omitted to be taken in reliance on it, is prohibited and may be unlawful.
>>>>> When addressed to our clients any opinions or advice contained in this
>>>>> Email are subject to the terms and conditions expressed in any applicable
>>>>> governing The Home Depot terms of business or client engagement letter. The
>>>>> Home Depot disclaims all responsibility and liability for the accuracy and
>>>>> content of this attachment and for any damages or losses arising from any
>>>>> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
>>>>> items of a destructive nature, which may be contained in this attachment
>>>>> and shall not be liable for direct, indirect, consequential or special
>>>>> damages in connection with this e-mail message or its attachment.
>>>>>
>>>>

Re: [EXTERNAL] How to reduce vnodes without downtime

Posted by Anthony Grasso <an...@gmail.com>.

Hi Sergio,

There is a misunderstanding here. My post makes no recommendation for the
value of num_tokens. Rather, it focuses on how to use
the allocate_tokens_for_keyspace setting when creating a new cluster.

Whilst a value of 4 is used for num_tokens in the post, it was chosen for
demonstration purposes. Specifically it makes:

   - the uneven token distribution in a small cluster very obvious,
   - identifying the endpoints displayed in nodetool ring easy, and
   - the initial_token setup less verbose and easier to follow.

I will add an editorial note to the post with the above information
so there is no confusion about why 4 tokens were used.

I would only consider moving a cluster to 4 tokens if it is larger than 100
nodes. If you read through the paper that Erick mentioned, written by Joe
Lynch & Josh Snyder, they show that the num_tokens impacts the availability
of large scale clusters.

If you are after more details about the trade-offs between different sized
token values, please see the discussion on the dev mailing list: "[Discuss]
num_tokens default in Cassandra 4.0
<https://www.mail-archive.com/search?l=dev%40cassandra.apache.org&q=subject%3A%22%5C%5BDiscuss%5C%5D+num_tokens+default+in+Cassandra+4.0%22&o=oldest>
".

Regards,
Anthony

On Sat, 1 Feb 2020 at 10:07, Sergio <la...@gmail.com> wrote:

>
> https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html This
> is the article with 4 token recommendations.
> @Erick Ramirez. which is the dev thread for the default 32 tokens
> recommendation?
>
> Thanks,
> Sergio
>
> Il giorno ven 31 gen 2020 alle ore 14:49 Erick Ramirez <
> flightctlr@gmail.com> ha scritto:
>
>> There's an active discussion going on right now in a separate dev thread.
>> The current "default recommendation" is 32 tokens. But there's a push for 4
>> in combination with allocate_tokens_for_keyspace from Jon Haddad & co
>> (based on a paper from Joe Lynch & Josh Snyder).
>>
>> If you're satisfied with the results from your own testing, go with 4
>> tokens. And that's the key -- you must test, test, TEST! Cheers!
>>
>> On Sat, Feb 1, 2020 at 5:17 AM Arvinder Dhillon <dh...@gmail.com>
>> wrote:
>>
>>> What is recommended vnodes now? I read 8 in later cassandra 3.x
>>> Is the new recommendation 4 now even in version 3.x (asking for 3.11)?
>>> Thanks
>>>
>>> On Fri, Jan 31, 2020 at 9:49 AM Durity, Sean R <
>>> SEAN_R_DURITY@homedepot.com> wrote:
>>>
>>>> These are good clarifications and expansions.
>>>>
>>>>
>>>>
>>>> Sean Durity
>>>>
>>>>
>>>>
>>>> *From:* Anthony Grasso <an...@gmail.com>
>>>> *Sent:* Thursday, January 30, 2020 7:25 PM
>>>> *To:* user <us...@cassandra.apache.org>
>>>> *Subject:* Re: [EXTERNAL] How to reduce vnodes without downtime
>>>>
>>>>
>>>>
>>>> Hi Maxim,
>>>>
>>>>
>>>>
>>>> Basically what Sean suggested is the way to do this without downtime.
>>>>
>>>>
>>>>
>>>> To clarify the, the *three* steps following the "Decommission each
>>>> node in the DC you are working on" step should be applied to *only*
>>>> the decommissioned nodes. So where it say "*all nodes*" or "*every
>>>> node*" it applies to only the decommissioned nodes.
>>>>
>>>>
>>>>
>>>> In addition, the step that says "Wipe data on all the nodes", I would
>>>> delete all files in the following directories on the decommissioned nodes.
>>>>
>>>>    - data (usually located in /var/lib/cassandra/data)
>>>>    - commitlogs (usually located in /var/lib/cassandra/commitlogs)
>>>>    - hints (usually located in /var/lib/casandra/hints)
>>>>    - saved_caches (usually located in /var/lib/cassandra/saved_caches)
>>>>
>>>>
>>>>
>>>> Cheers,
>>>>
>>>> Anthony
>>>>
>>>>
>>>>
>>>> On Fri, 31 Jan 2020 at 03:05, Durity, Sean R <
>>>> SEAN_R_DURITY@homedepot.com> wrote:
>>>>
>>>> Your procedure won’t work very well. On the first node, if you switched
>>>> to 4, you would end up with only a tiny fraction of the data (because the
>>>> other nodes would still be at 256). I updated a large cluster (over 150
>>>> nodes – 2 DCs) to smaller number of vnodes. The basic outline was this:
>>>>
>>>>
>>>>
>>>>    - Stop all repairs
>>>>    - Make sure the app is running against one DC only
>>>>    - Change the replication settings on keyspaces to use only 1 DC
>>>>    (basically cutting off the other DC)
>>>>    - Decommission each node in the DC you are working on. Because the
>>>>    replication setting are changed, no streaming occurs. But it releases the
>>>>    token assignments
>>>>    - Wipe data on all the nodes
>>>>    - Update configuration on every node to your new settings,
>>>>    including auto_bootstrap = false
>>>>    - Start all nodes. They will choose tokens, but not stream any data
>>>>    - Update replication factor for all keyspaces to include the new DC
>>>>    - I disabled binary on those nodes to prevent app connections
>>>>    - Run nodetool reduild with -dc (other DC) on as many nodes as your
>>>>    system can safely handle until they are all rebuilt.
>>>>    - Re-enable binary (and app connections to the rebuilt DC)
>>>>    - Turn on repairs
>>>>    - Rest for a bit, then reverse the process for the remaining DCs
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Sean Durity – Staff Systems Engineer, Cassandra
>>>>
>>>>
>>>>
>>>> *From:* Maxim Parkachov <la...@gmail.com>
>>>> *Sent:* Thursday, January 30, 2020 10:05 AM
>>>> *To:* user@cassandra.apache.org
>>>> *Subject:* [EXTERNAL] How to reduce vnodes without downtime
>>>>
>>>>
>>>>
>>>> Hi everyone,
>>>>
>>>>
>>>>
>>>> with discussion about reducing default vnodes in version 4.0 I would
>>>> like to ask, what would be optimal procedure to perform reduction of vnodes
>>>> in existing 3.11.x cluster which was set up with default value 256. Cluster
>>>> has 2 DC with 5 nodes each and RF=3. There is one more restriction, I could
>>>> not add more servers, nor to create additional DC, everything is physical.
>>>> This should be done without downtime.
>>>>
>>>>
>>>>
>>>> My idea for such procedure would be
>>>>
>>>>
>>>>
>>>> for each node:
>>>>
>>>> - decommission node
>>>>
>>>> - set auto_bootstrap to true and vnodes to 4
>>>>
>>>> - start and wait till node joins cluster
>>>>
>>>> - run cleanup on rest of nodes in cluster
>>>>
>>>> - run repair on whole cluster (not sure if needed after cleanup)
>>>>
>>>> - set auto_bootstrap to false
>>>>
>>>> repeat for each node
>>>>
>>>>
>>>>
>>>> rolling restart of cluster
>>>>
>>>> cluster repair
>>>>
>>>>
>>>>
>>>> Is this sounds right ? My concern is that after decommission, node will
>>>> start on the same IP which could create some confusion.
>>>>
>>>>
>>>>
>>>> Regards,
>>>>
>>>> Maxim.
>>>>
>>>>
>>>> ------------------------------
>>>>
>>>>
>>>> The information in this Internet Email is confidential and may be
>>>> legally privileged. It is intended solely for the addressee. Access to this
>>>> Email by anyone else is unauthorized. If you are not the intended
>>>> recipient, any disclosure, copying, distribution or any action taken or
>>>> omitted to be taken in reliance on it, is prohibited and may be unlawful.
>>>> When addressed to our clients any opinions or advice contained in this
>>>> Email are subject to the terms and conditions expressed in any applicable
>>>> governing The Home Depot terms of business or client engagement letter. The
>>>> Home Depot disclaims all responsibility and liability for the accuracy and
>>>> content of this attachment and for any damages or losses arising from any
>>>> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
>>>> items of a destructive nature, which may be contained in this attachment
>>>> and shall not be liable for direct, indirect, consequential or special
>>>> damages in connection with this e-mail message or its attachment.
>>>>
>>>>
>>>> ------------------------------
>>>>
>>>> The information in this Internet Email is confidential and may be
>>>> legally privileged. It is intended solely for the addressee. Access to this
>>>> Email by anyone else is unauthorized. If you are not the intended
>>>> recipient, any disclosure, copying, distribution or any action taken or
>>>> omitted to be taken in reliance on it, is prohibited and may be unlawful.
>>>> When addressed to our clients any opinions or advice contained in this
>>>> Email are subject to the terms and conditions expressed in any applicable
>>>> governing The Home Depot terms of business or client engagement letter. The
>>>> Home Depot disclaims all responsibility and liability for the accuracy and
>>>> content of this attachment and for any damages or losses arising from any
>>>> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
>>>> items of a destructive nature, which may be contained in this attachment
>>>> and shall not be liable for direct, indirect, consequential or special
>>>> damages in connection with this e-mail message or its attachment.
>>>>
>>>

Re: [EXTERNAL] How to reduce vnodes without downtime

Posted by Sergio <la...@gmail.com>.

https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html
This
is the article with 4 token recommendations.
@Erick Ramirez. which is the dev thread for the default 32 tokens
recommendation?

Thanks,
Sergio

Il giorno ven 31 gen 2020 alle ore 14:49 Erick Ramirez <fl...@gmail.com>
ha scritto:

> There's an active discussion going on right now in a separate dev thread.
> The current "default recommendation" is 32 tokens. But there's a push for 4
> in combination with allocate_tokens_for_keyspace from Jon Haddad & co
> (based on a paper from Joe Lynch & Josh Snyder).
>
> If you're satisfied with the results from your own testing, go with 4
> tokens. And that's the key -- you must test, test, TEST! Cheers!
>
> On Sat, Feb 1, 2020 at 5:17 AM Arvinder Dhillon <dh...@gmail.com>
> wrote:
>
>> What is recommended vnodes now? I read 8 in later cassandra 3.x
>> Is the new recommendation 4 now even in version 3.x (asking for 3.11)?
>> Thanks
>>
>> On Fri, Jan 31, 2020 at 9:49 AM Durity, Sean R <
>> SEAN_R_DURITY@homedepot.com> wrote:
>>
>>> These are good clarifications and expansions.
>>>
>>>
>>>
>>> Sean Durity
>>>
>>>
>>>
>>> *From:* Anthony Grasso <an...@gmail.com>
>>> *Sent:* Thursday, January 30, 2020 7:25 PM
>>> *To:* user <us...@cassandra.apache.org>
>>> *Subject:* Re: [EXTERNAL] How to reduce vnodes without downtime
>>>
>>>
>>>
>>> Hi Maxim,
>>>
>>>
>>>
>>> Basically what Sean suggested is the way to do this without downtime.
>>>
>>>
>>>
>>> To clarify the, the *three* steps following the "Decommission each node
>>> in the DC you are working on" step should be applied to *only* the
>>> decommissioned nodes. So where it say "*all nodes*" or "*every node*"
>>> it applies to only the decommissioned nodes.
>>>
>>>
>>>
>>> In addition, the step that says "Wipe data on all the nodes", I would
>>> delete all files in the following directories on the decommissioned nodes.
>>>
>>>    - data (usually located in /var/lib/cassandra/data)
>>>    - commitlogs (usually located in /var/lib/cassandra/commitlogs)
>>>    - hints (usually located in /var/lib/casandra/hints)
>>>    - saved_caches (usually located in /var/lib/cassandra/saved_caches)
>>>
>>>
>>>
>>> Cheers,
>>>
>>> Anthony
>>>
>>>
>>>
>>> On Fri, 31 Jan 2020 at 03:05, Durity, Sean R <
>>> SEAN_R_DURITY@homedepot.com> wrote:
>>>
>>> Your procedure won’t work very well. On the first node, if you switched
>>> to 4, you would end up with only a tiny fraction of the data (because the
>>> other nodes would still be at 256). I updated a large cluster (over 150
>>> nodes – 2 DCs) to smaller number of vnodes. The basic outline was this:
>>>
>>>
>>>
>>>    - Stop all repairs
>>>    - Make sure the app is running against one DC only
>>>    - Change the replication settings on keyspaces to use only 1 DC
>>>    (basically cutting off the other DC)
>>>    - Decommission each node in the DC you are working on. Because the
>>>    replication setting are changed, no streaming occurs. But it releases the
>>>    token assignments
>>>    - Wipe data on all the nodes
>>>    - Update configuration on every node to your new settings, including
>>>    auto_bootstrap = false
>>>    - Start all nodes. They will choose tokens, but not stream any data
>>>    - Update replication factor for all keyspaces to include the new DC
>>>    - I disabled binary on those nodes to prevent app connections
>>>    - Run nodetool reduild with -dc (other DC) on as many nodes as your
>>>    system can safely handle until they are all rebuilt.
>>>    - Re-enable binary (and app connections to the rebuilt DC)
>>>    - Turn on repairs
>>>    - Rest for a bit, then reverse the process for the remaining DCs
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> Sean Durity – Staff Systems Engineer, Cassandra
>>>
>>>
>>>
>>> *From:* Maxim Parkachov <la...@gmail.com>
>>> *Sent:* Thursday, January 30, 2020 10:05 AM
>>> *To:* user@cassandra.apache.org
>>> *Subject:* [EXTERNAL] How to reduce vnodes without downtime
>>>
>>>
>>>
>>> Hi everyone,
>>>
>>>
>>>
>>> with discussion about reducing default vnodes in version 4.0 I would
>>> like to ask, what would be optimal procedure to perform reduction of vnodes
>>> in existing 3.11.x cluster which was set up with default value 256. Cluster
>>> has 2 DC with 5 nodes each and RF=3. There is one more restriction, I could
>>> not add more servers, nor to create additional DC, everything is physical.
>>> This should be done without downtime.
>>>
>>>
>>>
>>> My idea for such procedure would be
>>>
>>>
>>>
>>> for each node:
>>>
>>> - decommission node
>>>
>>> - set auto_bootstrap to true and vnodes to 4
>>>
>>> - start and wait till node joins cluster
>>>
>>> - run cleanup on rest of nodes in cluster
>>>
>>> - run repair on whole cluster (not sure if needed after cleanup)
>>>
>>> - set auto_bootstrap to false
>>>
>>> repeat for each node
>>>
>>>
>>>
>>> rolling restart of cluster
>>>
>>> cluster repair
>>>
>>>
>>>
>>> Is this sounds right ? My concern is that after decommission, node will
>>> start on the same IP which could create some confusion.
>>>
>>>
>>>
>>> Regards,
>>>
>>> Maxim.
>>>
>>>
>>> ------------------------------
>>>
>>>
>>> The information in this Internet Email is confidential and may be
>>> legally privileged. It is intended solely for the addressee. Access to this
>>> Email by anyone else is unauthorized. If you are not the intended
>>> recipient, any disclosure, copying, distribution or any action taken or
>>> omitted to be taken in reliance on it, is prohibited and may be unlawful.
>>> When addressed to our clients any opinions or advice contained in this
>>> Email are subject to the terms and conditions expressed in any applicable
>>> governing The Home Depot terms of business or client engagement letter. The
>>> Home Depot disclaims all responsibility and liability for the accuracy and
>>> content of this attachment and for any damages or losses arising from any
>>> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
>>> items of a destructive nature, which may be contained in this attachment
>>> and shall not be liable for direct, indirect, consequential or special
>>> damages in connection with this e-mail message or its attachment.
>>>
>>>
>>> ------------------------------
>>>
>>> The information in this Internet Email is confidential and may be
>>> legally privileged. It is intended solely for the addressee. Access to this
>>> Email by anyone else is unauthorized. If you are not the intended
>>> recipient, any disclosure, copying, distribution or any action taken or
>>> omitted to be taken in reliance on it, is prohibited and may be unlawful.
>>> When addressed to our clients any opinions or advice contained in this
>>> Email are subject to the terms and conditions expressed in any applicable
>>> governing The Home Depot terms of business or client engagement letter. The
>>> Home Depot disclaims all responsibility and liability for the accuracy and
>>> content of this attachment and for any damages or losses arising from any
>>> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
>>> items of a destructive nature, which may be contained in this attachment
>>> and shall not be liable for direct, indirect, consequential or special
>>> damages in connection with this e-mail message or its attachment.
>>>
>>

Re: [EXTERNAL] How to reduce vnodes without downtime

Posted by Erick Ramirez <fl...@gmail.com>.

There's an active discussion going on right now in a separate dev thread.
The current "default recommendation" is 32 tokens. But there's a push for 4
in combination with allocate_tokens_for_keyspace from Jon Haddad & co
(based on a paper from Joe Lynch & Josh Snyder).

If you're satisfied with the results from your own testing, go with 4
tokens. And that's the key -- you must test, test, TEST! Cheers!

On Sat, Feb 1, 2020 at 5:17 AM Arvinder Dhillon <dh...@gmail.com>
wrote:

> What is recommended vnodes now? I read 8 in later cassandra 3.x
> Is the new recommendation 4 now even in version 3.x (asking for 3.11)?
> Thanks
>
> On Fri, Jan 31, 2020 at 9:49 AM Durity, Sean R <
> SEAN_R_DURITY@homedepot.com> wrote:
>
>> These are good clarifications and expansions.
>>
>>
>>
>> Sean Durity
>>
>>
>>
>> *From:* Anthony Grasso <an...@gmail.com>
>> *Sent:* Thursday, January 30, 2020 7:25 PM
>> *To:* user <us...@cassandra.apache.org>
>> *Subject:* Re: [EXTERNAL] How to reduce vnodes without downtime
>>
>>
>>
>> Hi Maxim,
>>
>>
>>
>> Basically what Sean suggested is the way to do this without downtime.
>>
>>
>>
>> To clarify the, the *three* steps following the "Decommission each node
>> in the DC you are working on" step should be applied to *only* the
>> decommissioned nodes. So where it say "*all nodes*" or "*every node*" it
>> applies to only the decommissioned nodes.
>>
>>
>>
>> In addition, the step that says "Wipe data on all the nodes", I would
>> delete all files in the following directories on the decommissioned nodes.
>>
>>    - data (usually located in /var/lib/cassandra/data)
>>    - commitlogs (usually located in /var/lib/cassandra/commitlogs)
>>    - hints (usually located in /var/lib/casandra/hints)
>>    - saved_caches (usually located in /var/lib/cassandra/saved_caches)
>>
>>
>>
>> Cheers,
>>
>> Anthony
>>
>>
>>
>> On Fri, 31 Jan 2020 at 03:05, Durity, Sean R <SE...@homedepot.com>
>> wrote:
>>
>> Your procedure won’t work very well. On the first node, if you switched
>> to 4, you would end up with only a tiny fraction of the data (because the
>> other nodes would still be at 256). I updated a large cluster (over 150
>> nodes – 2 DCs) to smaller number of vnodes. The basic outline was this:
>>
>>
>>
>>    - Stop all repairs
>>    - Make sure the app is running against one DC only
>>    - Change the replication settings on keyspaces to use only 1 DC
>>    (basically cutting off the other DC)
>>    - Decommission each node in the DC you are working on. Because the
>>    replication setting are changed, no streaming occurs. But it releases the
>>    token assignments
>>    - Wipe data on all the nodes
>>    - Update configuration on every node to your new settings, including
>>    auto_bootstrap = false
>>    - Start all nodes. They will choose tokens, but not stream any data
>>    - Update replication factor for all keyspaces to include the new DC
>>    - I disabled binary on those nodes to prevent app connections
>>    - Run nodetool reduild with -dc (other DC) on as many nodes as your
>>    system can safely handle until they are all rebuilt.
>>    - Re-enable binary (and app connections to the rebuilt DC)
>>    - Turn on repairs
>>    - Rest for a bit, then reverse the process for the remaining DCs
>>
>>
>>
>>
>>
>>
>>
>> Sean Durity – Staff Systems Engineer, Cassandra
>>
>>
>>
>> *From:* Maxim Parkachov <la...@gmail.com>
>> *Sent:* Thursday, January 30, 2020 10:05 AM
>> *To:* user@cassandra.apache.org
>> *Subject:* [EXTERNAL] How to reduce vnodes without downtime
>>
>>
>>
>> Hi everyone,
>>
>>
>>
>> with discussion about reducing default vnodes in version 4.0 I would like
>> to ask, what would be optimal procedure to perform reduction of vnodes in
>> existing 3.11.x cluster which was set up with default value 256. Cluster
>> has 2 DC with 5 nodes each and RF=3. There is one more restriction, I could
>> not add more servers, nor to create additional DC, everything is physical.
>> This should be done without downtime.
>>
>>
>>
>> My idea for such procedure would be
>>
>>
>>
>> for each node:
>>
>> - decommission node
>>
>> - set auto_bootstrap to true and vnodes to 4
>>
>> - start and wait till node joins cluster
>>
>> - run cleanup on rest of nodes in cluster
>>
>> - run repair on whole cluster (not sure if needed after cleanup)
>>
>> - set auto_bootstrap to false
>>
>> repeat for each node
>>
>>
>>
>> rolling restart of cluster
>>
>> cluster repair
>>
>>
>>
>> Is this sounds right ? My concern is that after decommission, node will
>> start on the same IP which could create some confusion.
>>
>>
>>
>> Regards,
>>
>> Maxim.
>>
>>
>> ------------------------------
>>
>>
>> The information in this Internet Email is confidential and may be legally
>> privileged. It is intended solely for the addressee. Access to this Email
>> by anyone else is unauthorized. If you are not the intended recipient, any
>> disclosure, copying, distribution or any action taken or omitted to be
>> taken in reliance on it, is prohibited and may be unlawful. When addressed
>> to our clients any opinions or advice contained in this Email are subject
>> to the terms and conditions expressed in any applicable governing The Home
>> Depot terms of business or client engagement letter. The Home Depot
>> disclaims all responsibility and liability for the accuracy and content of
>> this attachment and for any damages or losses arising from any
>> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
>> items of a destructive nature, which may be contained in this attachment
>> and shall not be liable for direct, indirect, consequential or special
>> damages in connection with this e-mail message or its attachment.
>>
>>
>> ------------------------------
>>
>> The information in this Internet Email is confidential and may be legally
>> privileged. It is intended solely for the addressee. Access to this Email
>> by anyone else is unauthorized. If you are not the intended recipient, any
>> disclosure, copying, distribution or any action taken or omitted to be
>> taken in reliance on it, is prohibited and may be unlawful. When addressed
>> to our clients any opinions or advice contained in this Email are subject
>> to the terms and conditions expressed in any applicable governing The Home
>> Depot terms of business or client engagement letter. The Home Depot
>> disclaims all responsibility and liability for the accuracy and content of
>> this attachment and for any damages or losses arising from any
>> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
>> items of a destructive nature, which may be contained in this attachment
>> and shall not be liable for direct, indirect, consequential or special
>> damages in connection with this e-mail message or its attachment.
>>
>

Re: [EXTERNAL] How to reduce vnodes without downtime

Posted by Arvinder Dhillon <dh...@gmail.com>.

What is recommended vnodes now? I read 8 in later cassandra 3.x
Is the new recommendation 4 now even in version 3.x (asking for 3.11)?
Thanks

On Fri, Jan 31, 2020 at 9:49 AM Durity, Sean R <SE...@homedepot.com>
wrote:

> These are good clarifications and expansions.
>
>
>
> Sean Durity
>
>
>
> *From:* Anthony Grasso <an...@gmail.com>
> *Sent:* Thursday, January 30, 2020 7:25 PM
> *To:* user <us...@cassandra.apache.org>
> *Subject:* Re: [EXTERNAL] How to reduce vnodes without downtime
>
>
>
> Hi Maxim,
>
>
>
> Basically what Sean suggested is the way to do this without downtime.
>
>
>
> To clarify the, the *three* steps following the "Decommission each node
> in the DC you are working on" step should be applied to *only* the
> decommissioned nodes. So where it say "*all nodes*" or "*every node*" it
> applies to only the decommissioned nodes.
>
>
>
> In addition, the step that says "Wipe data on all the nodes", I would
> delete all files in the following directories on the decommissioned nodes.
>
>    - data (usually located in /var/lib/cassandra/data)
>    - commitlogs (usually located in /var/lib/cassandra/commitlogs)
>    - hints (usually located in /var/lib/casandra/hints)
>    - saved_caches (usually located in /var/lib/cassandra/saved_caches)
>
>
>
> Cheers,
>
> Anthony
>
>
>
> On Fri, 31 Jan 2020 at 03:05, Durity, Sean R <SE...@homedepot.com>
> wrote:
>
> Your procedure won’t work very well. On the first node, if you switched to
> 4, you would end up with only a tiny fraction of the data (because the
> other nodes would still be at 256). I updated a large cluster (over 150
> nodes – 2 DCs) to smaller number of vnodes. The basic outline was this:
>
>
>
>    - Stop all repairs
>    - Make sure the app is running against one DC only
>    - Change the replication settings on keyspaces to use only 1 DC
>    (basically cutting off the other DC)
>    - Decommission each node in the DC you are working on. Because the
>    replication setting are changed, no streaming occurs. But it releases the
>    token assignments
>    - Wipe data on all the nodes
>    - Update configuration on every node to your new settings, including
>    auto_bootstrap = false
>    - Start all nodes. They will choose tokens, but not stream any data
>    - Update replication factor for all keyspaces to include the new DC
>    - I disabled binary on those nodes to prevent app connections
>    - Run nodetool reduild with -dc (other DC) on as many nodes as your
>    system can safely handle until they are all rebuilt.
>    - Re-enable binary (and app connections to the rebuilt DC)
>    - Turn on repairs
>    - Rest for a bit, then reverse the process for the remaining DCs
>
>
>
>
>
>
>
> Sean Durity – Staff Systems Engineer, Cassandra
>
>
>
> *From:* Maxim Parkachov <la...@gmail.com>
> *Sent:* Thursday, January 30, 2020 10:05 AM
> *To:* user@cassandra.apache.org
> *Subject:* [EXTERNAL] How to reduce vnodes without downtime
>
>
>
> Hi everyone,
>
>
>
> with discussion about reducing default vnodes in version 4.0 I would like
> to ask, what would be optimal procedure to perform reduction of vnodes in
> existing 3.11.x cluster which was set up with default value 256. Cluster
> has 2 DC with 5 nodes each and RF=3. There is one more restriction, I could
> not add more servers, nor to create additional DC, everything is physical.
> This should be done without downtime.
>
>
>
> My idea for such procedure would be
>
>
>
> for each node:
>
> - decommission node
>
> - set auto_bootstrap to true and vnodes to 4
>
> - start and wait till node joins cluster
>
> - run cleanup on rest of nodes in cluster
>
> - run repair on whole cluster (not sure if needed after cleanup)
>
> - set auto_bootstrap to false
>
> repeat for each node
>
>
>
> rolling restart of cluster
>
> cluster repair
>
>
>
> Is this sounds right ? My concern is that after decommission, node will
> start on the same IP which could create some confusion.
>
>
>
> Regards,
>
> Maxim.
>
>
> ------------------------------
>
>
> The information in this Internet Email is confidential and may be legally
> privileged. It is intended solely for the addressee. Access to this Email
> by anyone else is unauthorized. If you are not the intended recipient, any
> disclosure, copying, distribution or any action taken or omitted to be
> taken in reliance on it, is prohibited and may be unlawful. When addressed
> to our clients any opinions or advice contained in this Email are subject
> to the terms and conditions expressed in any applicable governing The Home
> Depot terms of business or client engagement letter. The Home Depot
> disclaims all responsibility and liability for the accuracy and content of
> this attachment and for any damages or losses arising from any
> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
> items of a destructive nature, which may be contained in this attachment
> and shall not be liable for direct, indirect, consequential or special
> damages in connection with this e-mail message or its attachment.
>
>
> ------------------------------
>
> The information in this Internet Email is confidential and may be legally
> privileged. It is intended solely for the addressee. Access to this Email
> by anyone else is unauthorized. If you are not the intended recipient, any
> disclosure, copying, distribution or any action taken or omitted to be
> taken in reliance on it, is prohibited and may be unlawful. When addressed
> to our clients any opinions or advice contained in this Email are subject
> to the terms and conditions expressed in any applicable governing The Home
> Depot terms of business or client engagement letter. The Home Depot
> disclaims all responsibility and liability for the accuracy and content of
> this attachment and for any damages or losses arising from any
> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
> items of a destructive nature, which may be contained in this attachment
> and shall not be liable for direct, indirect, consequential or special
> damages in connection with this e-mail message or its attachment.
>

RE: [EXTERNAL] How to reduce vnodes without downtime

Posted by "Durity, Sean R" <SE...@homedepot.com>.

These are good clarifications and expansions.

Sean Durity

From: Anthony Grasso <an...@gmail.com>
Sent: Thursday, January 30, 2020 7:25 PM
To: user <us...@cassandra.apache.org>
Subject: Re: [EXTERNAL] How to reduce vnodes without downtime

Hi Maxim,

Basically what Sean suggested is the way to do this without downtime.

To clarify the, the three steps following the "Decommission each node in the DC you are working on" step should be applied to only the decommissioned nodes. So where it say "all nodes" or "every node" it applies to only the decommissioned nodes.

In addition, the step that says "Wipe data on all the nodes", I would delete all files in the following directories on the decommissioned nodes.

  *   data (usually located in /var/lib/cassandra/data)
  *   commitlogs (usually located in /var/lib/cassandra/commitlogs)
  *   hints (usually located in /var/lib/casandra/hints)
  *   saved_caches (usually located in /var/lib/cassandra/saved_caches)

Cheers,
Anthony

On Fri, 31 Jan 2020 at 03:05, Durity, Sean R <SE...@homedepot.com>> wrote:
Your procedure won’t work very well. On the first node, if you switched to 4, you would end up with only a tiny fraction of the data (because the other nodes would still be at 256). I updated a large cluster (over 150 nodes – 2 DCs) to smaller number of vnodes. The basic outline was this:

  *   Stop all repairs
  *   Make sure the app is running against one DC only
  *   Change the replication settings on keyspaces to use only 1 DC (basically cutting off the other DC)
  *   Decommission each node in the DC you are working on. Because the replication setting are changed, no streaming occurs. But it releases the token assignments
  *   Wipe data on all the nodes
  *   Update configuration on every node to your new settings, including auto_bootstrap = false
  *   Start all nodes. They will choose tokens, but not stream any data
  *   Update replication factor for all keyspaces to include the new DC
  *   I disabled binary on those nodes to prevent app connections
  *   Run nodetool reduild with -dc (other DC) on as many nodes as your system can safely handle until they are all rebuilt.
  *   Re-enable binary (and app connections to the rebuilt DC)
  *   Turn on repairs
  *   Rest for a bit, then reverse the process for the remaining DCs

Sean Durity – Staff Systems Engineer, Cassandra

From: Maxim Parkachov <la...@gmail.com>>
Sent: Thursday, January 30, 2020 10:05 AM
To: user@cassandra.apache.org<ma...@cassandra.apache.org>
Subject: [EXTERNAL] How to reduce vnodes without downtime

Hi everyone,

with discussion about reducing default vnodes in version 4.0 I would like to ask, what would be optimal procedure to perform reduction of vnodes in existing 3.11.x cluster which was set up with default value 256. Cluster has 2 DC with 5 nodes each and RF=3. There is one more restriction, I could not add more servers, nor to create additional DC, everything is physical. This should be done without downtime.

My idea for such procedure would be

for each node:
- decommission node
- set auto_bootstrap to true and vnodes to 4
- start and wait till node joins cluster
- run cleanup on rest of nodes in cluster
- run repair on whole cluster (not sure if needed after cleanup)
- set auto_bootstrap to false
repeat for each node

rolling restart of cluster
cluster repair

Is this sounds right ? My concern is that after decommission, node will start on the same IP which could create some confusion.

Regards,
Maxim.

________________________________

The information in this Internet Email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this Email by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. When addressed to our clients any opinions or advice contained in this Email are subject to the terms and conditions expressed in any applicable governing The Home Depot terms of business or client engagement letter. The Home Depot disclaims all responsibility and liability for the accuracy and content of this attachment and for any damages or losses arising from any inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other items of a destructive nature, which may be contained in this attachment and shall not be liable for direct, indirect, consequential or special damages in connection with this e-mail message or its attachment.

________________________________

The information in this Internet Email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this Email by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. When addressed to our clients any opinions or advice contained in this Email are subject to the terms and conditions expressed in any applicable governing The Home Depot terms of business or client engagement letter. The Home Depot disclaims all responsibility and liability for the accuracy and content of this attachment and for any damages or losses arising from any inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other items of a destructive nature, which may be contained in this attachment and shall not be liable for direct, indirect, consequential or special damages in connection with this e-mail message or its attachment.

Re: [EXTERNAL] How to reduce vnodes without downtime

Posted by Anthony Grasso <an...@gmail.com>.

Hi Maxim,

Basically what Sean suggested is the way to do this without downtime.

To clarify the, the *three* steps following the "Decommission each node in
the DC you are working on" step should be applied to *only* the
decommissioned nodes. So where it say "*all nodes*" or "*every node*" it
applies to only the decommissioned nodes.

In addition, the step that says "Wipe data on all the nodes", I would
delete all files in the following directories on the decommissioned nodes.

   - data (usually located in /var/lib/cassandra/data)
   - commitlogs (usually located in /var/lib/cassandra/commitlogs)
   - hints (usually located in /var/lib/casandra/hints)
   - saved_caches (usually located in /var/lib/cassandra/saved_caches)


Cheers,
Anthony

On Fri, 31 Jan 2020 at 03:05, Durity, Sean R <SE...@homedepot.com>
wrote:

> Your procedure won’t work very well. On the first node, if you switched to
> 4, you would end up with only a tiny fraction of the data (because the
> other nodes would still be at 256). I updated a large cluster (over 150
> nodes – 2 DCs) to smaller number of vnodes. The basic outline was this:
>
>
>
>    - Stop all repairs
>    - Make sure the app is running against one DC only
>    - Change the replication settings on keyspaces to use only 1 DC
>    (basically cutting off the other DC)
>    - Decommission each node in the DC you are working on. Because the
>    replication setting are changed, no streaming occurs. But it releases the
>    token assignments
>    - Wipe data on all the nodes
>    - Update configuration on every node to your new settings, including
>    auto_bootstrap = false
>    - Start all nodes. They will choose tokens, but not stream any data
>    - Update replication factor for all keyspaces to include the new DC
>    - I disabled binary on those nodes to prevent app connections
>    - Run nodetool reduild with -dc (other DC) on as many nodes as your
>    system can safely handle until they are all rebuilt.
>    - Re-enable binary (and app connections to the rebuilt DC)
>    - Turn on repairs
>    - Rest for a bit, then reverse the process for the remaining DCs
>
>
>
>
>
>
>
> Sean Durity – Staff Systems Engineer, Cassandra
>
>
>
> *From:* Maxim Parkachov <la...@gmail.com>
> *Sent:* Thursday, January 30, 2020 10:05 AM
> *To:* user@cassandra.apache.org
> *Subject:* [EXTERNAL] How to reduce vnodes without downtime
>
>
>
> Hi everyone,
>
>
>
> with discussion about reducing default vnodes in version 4.0 I would like
> to ask, what would be optimal procedure to perform reduction of vnodes in
> existing 3.11.x cluster which was set up with default value 256. Cluster
> has 2 DC with 5 nodes each and RF=3. There is one more restriction, I could
> not add more servers, nor to create additional DC, everything is physical.
> This should be done without downtime.
>
>
>
> My idea for such procedure would be
>
>
>
> for each node:
>
> - decommission node
>
> - set auto_bootstrap to true and vnodes to 4
>
> - start and wait till node joins cluster
>
> - run cleanup on rest of nodes in cluster
>
> - run repair on whole cluster (not sure if needed after cleanup)
>
> - set auto_bootstrap to false
>
> repeat for each node
>
>
>
> rolling restart of cluster
>
> cluster repair
>
>
>
> Is this sounds right ? My concern is that after decommission, node will
> start on the same IP which could create some confusion.
>
>
>
> Regards,
>
> Maxim.
>
> ------------------------------
>
> The information in this Internet Email is confidential and may be legally
> privileged. It is intended solely for the addressee. Access to this Email
> by anyone else is unauthorized. If you are not the intended recipient, any
> disclosure, copying, distribution or any action taken or omitted to be
> taken in reliance on it, is prohibited and may be unlawful. When addressed
> to our clients any opinions or advice contained in this Email are subject
> to the terms and conditions expressed in any applicable governing The Home
> Depot terms of business or client engagement letter. The Home Depot
> disclaims all responsibility and liability for the accuracy and content of
> this attachment and for any damages or losses arising from any
> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
> items of a destructive nature, which may be contained in this attachment
> and shall not be liable for direct, indirect, consequential or special
> damages in connection with this e-mail message or its attachment.
>

RE: [EXTERNAL] How to reduce vnodes without downtime

Posted by "Durity, Sean R" <SE...@homedepot.com>.

Your procedure won’t work very well. On the first node, if you switched to 4, you would end up with only a tiny fraction of the data (because the other nodes would still be at 256). I updated a large cluster (over 150 nodes – 2 DCs) to smaller number of vnodes. The basic outline was this:


  *   Stop all repairs
  *   Make sure the app is running against one DC only
  *   Change the replication settings on keyspaces to use only 1 DC (basically cutting off the other DC)
  *   Decommission each node in the DC you are working on. Because the replication setting are changed, no streaming occurs. But it releases the token assignments
  *   Wipe data on all the nodes
  *   Update configuration on every node to your new settings, including auto_bootstrap = false
  *   Start all nodes. They will choose tokens, but not stream any data
  *   Update replication factor for all keyspaces to include the new DC
  *   I disabled binary on those nodes to prevent app connections
  *   Run nodetool reduild with -dc (other DC) on as many nodes as your system can safely handle until they are all rebuilt.
  *   Re-enable binary (and app connections to the rebuilt DC)
  *   Turn on repairs
  *   Rest for a bit, then reverse the process for the remaining DCs



Sean Durity – Staff Systems Engineer, Cassandra

From: Maxim Parkachov <la...@gmail.com>
Sent: Thursday, January 30, 2020 10:05 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] How to reduce vnodes without downtime

Hi everyone,

with discussion about reducing default vnodes in version 4.0 I would like to ask, what would be optimal procedure to perform reduction of vnodes in existing 3.11.x cluster which was set up with default value 256. Cluster has 2 DC with 5 nodes each and RF=3. There is one more restriction, I could not add more servers, nor to create additional DC, everything is physical. This should be done without downtime.

My idea for such procedure would be

for each node:
- decommission node
- set auto_bootstrap to true and vnodes to 4
- start and wait till node joins cluster
- run cleanup on rest of nodes in cluster
- run repair on whole cluster (not sure if needed after cleanup)
- set auto_bootstrap to false
repeat for each node

rolling restart of cluster
cluster repair

Is this sounds right ? My concern is that after decommission, node will start on the same IP which could create some confusion.

Regards,
Maxim.

________________________________

The information in this Internet Email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this Email by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. When addressed to our clients any opinions or advice contained in this Email are subject to the terms and conditions expressed in any applicable governing The Home Depot terms of business or client engagement letter. The Home Depot disclaims all responsibility and liability for the accuracy and content of this attachment and for any damages or losses arising from any inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other items of a destructive nature, which may be contained in this attachment and shall not be liable for direct, indirect, consequential or special damages in connection with this e-mail message or its attachment.