You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by onmstester onmstester <on...@zoho.com> on 2018/10/01 05:16:27 UTC

how to configure the Token Allocation Algorithm

Since i failed to find a document on how to configure and use the Token Allocation Algorithm (to replace the random Algorithm), just wanted to be sure about the procedure i've done: 1. Using Apache Cassandra 3.11.2 2. Configured one of seed nodes with num_tokens=8 and started it. 3. Using Cqlsh created keyspace test with NetworkTopologyStrategy and RF=3. 4. Stopped the seed node. 5. add this line to cassandra.yaml of all nodes (all have num_tokens=8) and started the cluster: allocate_tokens_for_keyspace=test My cluster Size won't go beyond 150 nodes, should i still use The Allocation Algorithm instead of random with 256 tokens (performance wise or load-balance wise)? Is the Allocation Algorithm, widely used and tested with Community and can we migrate all clusters with any size to use this Algorithm Safely? Out of Curiosity, i wonder how people (i.e, in Apple) config and maintain token management of clusters with thousands of nodes? Sent using Zoho Mail

Re: Re: Re: how to configure the Token Allocation Algorithm

Posted by Anthony Grasso <an...@gmail.com>.
Hi Jean,

Good question. I think that sentence is slightly confusing and here is why:

If the cluster has tokens are already evenly distributed and there is no
plans to expand the cluster, then applying the allocate_tokens_for_keyspace
setting has no real practical value.

If the cluster has tokens that are unevenly distributed and there are plans
to expand the cluster, then it may be worth using the
allocate_tokens_for_keyspace setting when adding a new node to the cluster.

Looking back on that sentence, I think it should probably read:

*"However, therein lies the problem, for existing clusters using this
> setting is easy, as a keyspace already exists"*


If you think that wording gives better clarification, I'll go back and
update the post when I have time. Let me know what you think.

Regards,
Anthony

On Mon, 29 Apr 2019 at 18:45, Jean Carlo <je...@gmail.com> wrote:

> Hello Anthony,
>
> Effectively I did not start the seed of every rack firsts. Thank you for
> the post. I believe this is something important to have as official
> documentation in cassandra.apache.org. This issues as many others are not
> documented properly.
>
> Of course I find the blog of last pickle very useful in this matters, but
> having a properly documentation of how to start a fresh new cluster
> cassandra is basic.
>
> I have one question about your post, when you mention
> "*However, therein lies the problem, for existing clusters updating this
> setting is easy, as a keyspace already exists*"
> What is the interest to use allocate_tokens_for_keyspace in a cluster
> with data if there tokens are already distributed? in the worst case
> scenario, the cluster is already unbalanced
>
>
> Cheers
>
> Jean Carlo
>
> "The best way to predict the future is to invent it" Alan Kay
>
>
> On Mon, Apr 29, 2019 at 2:45 AM Anthony Grasso <an...@gmail.com>
> wrote:
>
>> Hi Jean,
>>
>> It sounds like there are no nodes in one of the racks for the eu-west-3
>> datacenter. What does the output of nodetool status look like currently?
>>
>> Note, you will need to start a node in each rack before creating the
>> keyspace. I wrote a blog post with the procedure to set up a new cluster
>> using the predictive token allocation algorithm:
>> http://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html
>>
>> Regards,
>> Anthony
>>
>> On Fri, 26 Apr 2019 at 19:53, Jean Carlo <je...@gmail.com>
>> wrote:
>>
>>> Creating a fresh new cluster in aws using this procedure, I got this
>>> problem once I am bootstrapping the second rack of the cluster of 6
>>> machines with 3 racks and a keyspace of rf 3
>>>
>>> WARN  [main] 2019-04-26 11:37:43,845 TokenAllocation.java:63 - Selected
>>> tokens [-5106267594614944625, 623001446449719390, 7048665031315327212,
>>> 3265006217757525070, 5054577454645148534, 314677103601736696,
>>> 7660890915606146375, -5329427405842523680]
>>> ERROR [main] 2019-04-26 11:37:43,860 CassandraDaemon.java:749 - Fatal
>>> configuration error
>>> org.apache.cassandra.exceptions.ConfigurationException: Token allocation
>>> failed: the number of racks 2 in datacenter eu-west-3 is lower than its
>>> replication factor 3.
>>>
>>> Someone got this problem ?
>>>
>>> I am not quite sure why I have this, since my cluster has 3 racks.
>>>
>>> Cluster Information:
>>>     Name: test
>>>     Snitch: org.apache.cassandra.locator.GossipingPropertyFileSnitch
>>>     DynamicEndPointSnitch: enabled
>>>     Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
>>>     Schema versions:
>>>         3bf63440-fad7-3371-9c14-4855ad11ee83: [192.0.0.1, 192.0.0.2]
>>>
>>>
>>>
>>> Jean Carlo
>>>
>>> "The best way to predict the future is to invent it" Alan Kay
>>>
>>>
>>> On Thu, Jan 24, 2019 at 10:32 AM Ahmed Eljami <ah...@gmail.com>
>>> wrote:
>>>
>>>> Hi folks,
>>>>
>>>> What about adding new keyspaces in the existing cluster, test_2 with
>>>> the same RF.
>>>>
>>>> It will use the same logic as the existing kesypace test ? Or I should
>>>> restart nodes and add the new keyspace to the cassandra.yaml ?
>>>>
>>>> Thanks.
>>>>
>>>> Le mar. 2 oct. 2018 à 10:28, Varun Barala <va...@gmail.com> a
>>>> écrit :
>>>>
>>>>> Hi,
>>>>>
>>>>> Managing `initial_token` by yourself will give you more control over
>>>>> scale-in and scale-out.
>>>>> Let's say you have three node cluster with `num_token: 1`
>>>>>
>>>>> And your initial range looks like:-
>>>>>
>>>>> Datacenter: datacenter1
>>>>> ==========
>>>>> Address    Rack        Status State   Load            Owns
>>>>>    Token
>>>>>
>>>>>                                    3074457345618258602
>>>>>
>>>>> 127.0.0.1  rack1       Up     Normal  98.96 KiB       66.67%
>>>>>    -9223372036854775808
>>>>> 127.0.0.2  rack1       Up     Normal  98.96 KiB       66.67%
>>>>>    -3074457345618258603
>>>>> 127.0.0.3  rack1       Up     Normal  98.96 KiB       66.67%
>>>>>    3074457345618258602
>>>>>
>>>>> Now let's say you want to scale out the cluster to twice the current
>>>>> throughput(means you are adding 3 more nodes)
>>>>>
>>>>> If you are using AWS EBS volumes then you can use the same volumes and
>>>>> spin three more nodes by selecting midpoints of existing ranges which means
>>>>> your new nodes are already having data.
>>>>> Once you have mounted volumes on your new nodes:-
>>>>> * You need to delete every system table except schema related tables.
>>>>> * You need to generate system/local table by yourself which has
>>>>> `Bootstrap state` as completed and schema-version same as other existing
>>>>> nodes.
>>>>> * You need to remove extra data on all the machines using cleanup
>>>>> commands
>>>>>
>>>>> This is how you can scale out Cassandra cluster in the minutes. In
>>>>> case you want to add nodes one by one then you need to write some small
>>>>> tool which will always figure out the bigger range in the existing cluster
>>>>> and will split it into the half.
>>>>>
>>>>> However, I never tested it thoroughly but this should work
>>>>> conceptually. So here we are taking advantage of the fact that we have
>>>>> volumes(data) for the new node beforehand so we no need to bootstrap them.
>>>>>
>>>>> Thanks & Regards,
>>>>> Varun Barala
>>>>>
>>>>> On Tue, Oct 2, 2018 at 2:31 PM onmstester onmstester <
>>>>> onmstester@zoho.com> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> Sent using Zoho Mail <https://www.zoho.com/mail/>
>>>>>>
>>>>>>
>>>>>> ---- On Mon, 01 Oct 2018 18:36:03 +0330 *Alain RODRIGUEZ
>>>>>> <arodrime@gmail.com <ar...@gmail.com>>* wrote ----
>>>>>>
>>>>>> Hello again :),
>>>>>>
>>>>>> I thought a little bit more about this question, and I was actually
>>>>>> wondering if something like this would work:
>>>>>>
>>>>>> Imagine 3 node cluster, and create them using:
>>>>>> For the 3 nodes: `num_token: 4`
>>>>>> Node 1: `intial_token: -9223372036854775808, -4611686018427387905,
>>>>>> -2, 4611686018427387901`
>>>>>> Node 2: `intial_token: -7686143364045646507, -3074457345618258604,
>>>>>> 1537228672809129299, 6148914691236517202`
>>>>>> Node 3: `intial_token: -6148914691236517206, -1537228672809129303,
>>>>>> 3074457345618258600, 7686143364045646503`
>>>>>>
>>>>>>  If you know the initial size of your cluster, you can calculate the
>>>>>> total number of tokens: number of nodes * vnodes and use the
>>>>>> formula/python code above to get the tokens. Then use the first token for
>>>>>> the first node, move to the second node, use the second token and repeat.
>>>>>> In my case there is a total of 12 tokens (3 nodes, 4 tokens each)
>>>>>> ```
>>>>>> >>> number_of_tokens = 12
>>>>>> >>> [str(((2**64 / number_of_tokens) * i) - 2**63) for i in
>>>>>> range(number_of_tokens)]
>>>>>> ['-9223372036854775808', '-7686143364045646507',
>>>>>> '-6148914691236517206', '-4611686018427387905', '-3074457345618258604',
>>>>>> '-1537228672809129303', '-2', '1537228672809129299', '3074457345618258600',
>>>>>> '4611686018427387901', '6148914691236517202', '7686143364045646503']
>>>>>> ```
>>>>>>
>>>>>>
>>>>>> Using manual initial_token (your idea), how could i add a new node to
>>>>>> a long running cluster (the procedure)?
>>>>>>
>>>>>>
>>>>
>>>> --
>>>> Cordialement;
>>>>
>>>> Ahmed ELJAMI
>>>>
>>>

Re: Re: Re: how to configure the Token Allocation Algorithm

Posted by Jean Carlo <je...@gmail.com>.
Hello Anthony,

Effectively I did not start the seed of every rack firsts. Thank you for
the post. I believe this is something important to have as official
documentation in cassandra.apache.org. This issues as many others are not
documented properly.

Of course I find the blog of last pickle very useful in this matters, but
having a properly documentation of how to start a fresh new cluster
cassandra is basic.

I have one question about your post, when you mention
"*However, therein lies the problem, for existing clusters updating this
setting is easy, as a keyspace already exists*"
What is the interest to use allocate_tokens_for_keyspace in a cluster with
data if there tokens are already distributed? in the worst case scenario,
the cluster is already unbalanced


Cheers

Jean Carlo

"The best way to predict the future is to invent it" Alan Kay


On Mon, Apr 29, 2019 at 2:45 AM Anthony Grasso <an...@gmail.com>
wrote:

> Hi Jean,
>
> It sounds like there are no nodes in one of the racks for the eu-west-3
> datacenter. What does the output of nodetool status look like currently?
>
> Note, you will need to start a node in each rack before creating the
> keyspace. I wrote a blog post with the procedure to set up a new cluster
> using the predictive token allocation algorithm:
> http://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html
>
> Regards,
> Anthony
>
> On Fri, 26 Apr 2019 at 19:53, Jean Carlo <je...@gmail.com>
> wrote:
>
>> Creating a fresh new cluster in aws using this procedure, I got this
>> problem once I am bootstrapping the second rack of the cluster of 6
>> machines with 3 racks and a keyspace of rf 3
>>
>> WARN  [main] 2019-04-26 11:37:43,845 TokenAllocation.java:63 - Selected
>> tokens [-5106267594614944625, 623001446449719390, 7048665031315327212,
>> 3265006217757525070, 5054577454645148534, 314677103601736696,
>> 7660890915606146375, -5329427405842523680]
>> ERROR [main] 2019-04-26 11:37:43,860 CassandraDaemon.java:749 - Fatal
>> configuration error
>> org.apache.cassandra.exceptions.ConfigurationException: Token allocation
>> failed: the number of racks 2 in datacenter eu-west-3 is lower than its
>> replication factor 3.
>>
>> Someone got this problem ?
>>
>> I am not quite sure why I have this, since my cluster has 3 racks.
>>
>> Cluster Information:
>>     Name: test
>>     Snitch: org.apache.cassandra.locator.GossipingPropertyFileSnitch
>>     DynamicEndPointSnitch: enabled
>>     Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
>>     Schema versions:
>>         3bf63440-fad7-3371-9c14-4855ad11ee83: [192.0.0.1, 192.0.0.2]
>>
>>
>>
>> Jean Carlo
>>
>> "The best way to predict the future is to invent it" Alan Kay
>>
>>
>> On Thu, Jan 24, 2019 at 10:32 AM Ahmed Eljami <ah...@gmail.com>
>> wrote:
>>
>>> Hi folks,
>>>
>>> What about adding new keyspaces in the existing cluster, test_2 with
>>> the same RF.
>>>
>>> It will use the same logic as the existing kesypace test ? Or I should
>>> restart nodes and add the new keyspace to the cassandra.yaml ?
>>>
>>> Thanks.
>>>
>>> Le mar. 2 oct. 2018 à 10:28, Varun Barala <va...@gmail.com> a
>>> écrit :
>>>
>>>> Hi,
>>>>
>>>> Managing `initial_token` by yourself will give you more control over
>>>> scale-in and scale-out.
>>>> Let's say you have three node cluster with `num_token: 1`
>>>>
>>>> And your initial range looks like:-
>>>>
>>>> Datacenter: datacenter1
>>>> ==========
>>>> Address    Rack        Status State   Load            Owns
>>>>    Token
>>>>
>>>>                                  3074457345618258602
>>>>
>>>> 127.0.0.1  rack1       Up     Normal  98.96 KiB       66.67%
>>>>    -9223372036854775808
>>>> 127.0.0.2  rack1       Up     Normal  98.96 KiB       66.67%
>>>>    -3074457345618258603
>>>> 127.0.0.3  rack1       Up     Normal  98.96 KiB       66.67%
>>>>    3074457345618258602
>>>>
>>>> Now let's say you want to scale out the cluster to twice the current
>>>> throughput(means you are adding 3 more nodes)
>>>>
>>>> If you are using AWS EBS volumes then you can use the same volumes and
>>>> spin three more nodes by selecting midpoints of existing ranges which means
>>>> your new nodes are already having data.
>>>> Once you have mounted volumes on your new nodes:-
>>>> * You need to delete every system table except schema related tables.
>>>> * You need to generate system/local table by yourself which has
>>>> `Bootstrap state` as completed and schema-version same as other existing
>>>> nodes.
>>>> * You need to remove extra data on all the machines using cleanup
>>>> commands
>>>>
>>>> This is how you can scale out Cassandra cluster in the minutes. In case
>>>> you want to add nodes one by one then you need to write some small tool
>>>> which will always figure out the bigger range in the existing cluster and
>>>> will split it into the half.
>>>>
>>>> However, I never tested it thoroughly but this should work
>>>> conceptually. So here we are taking advantage of the fact that we have
>>>> volumes(data) for the new node beforehand so we no need to bootstrap them.
>>>>
>>>> Thanks & Regards,
>>>> Varun Barala
>>>>
>>>> On Tue, Oct 2, 2018 at 2:31 PM onmstester onmstester <
>>>> onmstester@zoho.com> wrote:
>>>>
>>>>>
>>>>>
>>>>> Sent using Zoho Mail <https://www.zoho.com/mail/>
>>>>>
>>>>>
>>>>> ---- On Mon, 01 Oct 2018 18:36:03 +0330 *Alain RODRIGUEZ
>>>>> <arodrime@gmail.com <ar...@gmail.com>>* wrote ----
>>>>>
>>>>> Hello again :),
>>>>>
>>>>> I thought a little bit more about this question, and I was actually
>>>>> wondering if something like this would work:
>>>>>
>>>>> Imagine 3 node cluster, and create them using:
>>>>> For the 3 nodes: `num_token: 4`
>>>>> Node 1: `intial_token: -9223372036854775808, -4611686018427387905, -2,
>>>>> 4611686018427387901`
>>>>> Node 2: `intial_token: -7686143364045646507, -3074457345618258604,
>>>>> 1537228672809129299, 6148914691236517202`
>>>>> Node 3: `intial_token: -6148914691236517206, -1537228672809129303,
>>>>> 3074457345618258600, 7686143364045646503`
>>>>>
>>>>>  If you know the initial size of your cluster, you can calculate the
>>>>> total number of tokens: number of nodes * vnodes and use the
>>>>> formula/python code above to get the tokens. Then use the first token for
>>>>> the first node, move to the second node, use the second token and repeat.
>>>>> In my case there is a total of 12 tokens (3 nodes, 4 tokens each)
>>>>> ```
>>>>> >>> number_of_tokens = 12
>>>>> >>> [str(((2**64 / number_of_tokens) * i) - 2**63) for i in
>>>>> range(number_of_tokens)]
>>>>> ['-9223372036854775808', '-7686143364045646507',
>>>>> '-6148914691236517206', '-4611686018427387905', '-3074457345618258604',
>>>>> '-1537228672809129303', '-2', '1537228672809129299', '3074457345618258600',
>>>>> '4611686018427387901', '6148914691236517202', '7686143364045646503']
>>>>> ```
>>>>>
>>>>>
>>>>> Using manual initial_token (your idea), how could i add a new node to
>>>>> a long running cluster (the procedure)?
>>>>>
>>>>>
>>>
>>> --
>>> Cordialement;
>>>
>>> Ahmed ELJAMI
>>>
>>

Re: Re: Re: how to configure the Token Allocation Algorithm

Posted by Anthony Grasso <an...@gmail.com>.
Hi Jean,

It sounds like there are no nodes in one of the racks for the eu-west-3
datacenter. What does the output of nodetool status look like currently?

Note, you will need to start a node in each rack before creating the
keyspace. I wrote a blog post with the procedure to set up a new cluster
using the predictive token allocation algorithm:
http://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html

Regards,
Anthony

On Fri, 26 Apr 2019 at 19:53, Jean Carlo <je...@gmail.com> wrote:

> Creating a fresh new cluster in aws using this procedure, I got this
> problem once I am bootstrapping the second rack of the cluster of 6
> machines with 3 racks and a keyspace of rf 3
>
> WARN  [main] 2019-04-26 11:37:43,845 TokenAllocation.java:63 - Selected
> tokens [-5106267594614944625, 623001446449719390, 7048665031315327212,
> 3265006217757525070, 5054577454645148534, 314677103601736696,
> 7660890915606146375, -5329427405842523680]
> ERROR [main] 2019-04-26 11:37:43,860 CassandraDaemon.java:749 - Fatal
> configuration error
> org.apache.cassandra.exceptions.ConfigurationException: Token allocation
> failed: the number of racks 2 in datacenter eu-west-3 is lower than its
> replication factor 3.
>
> Someone got this problem ?
>
> I am not quite sure why I have this, since my cluster has 3 racks.
>
> Cluster Information:
>     Name: test
>     Snitch: org.apache.cassandra.locator.GossipingPropertyFileSnitch
>     DynamicEndPointSnitch: enabled
>     Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
>     Schema versions:
>         3bf63440-fad7-3371-9c14-4855ad11ee83: [192.0.0.1, 192.0.0.2]
>
>
>
> Jean Carlo
>
> "The best way to predict the future is to invent it" Alan Kay
>
>
> On Thu, Jan 24, 2019 at 10:32 AM Ahmed Eljami <ah...@gmail.com>
> wrote:
>
>> Hi folks,
>>
>> What about adding new keyspaces in the existing cluster, test_2 with the
>> same RF.
>>
>> It will use the same logic as the existing kesypace test ? Or I should
>> restart nodes and add the new keyspace to the cassandra.yaml ?
>>
>> Thanks.
>>
>> Le mar. 2 oct. 2018 à 10:28, Varun Barala <va...@gmail.com> a
>> écrit :
>>
>>> Hi,
>>>
>>> Managing `initial_token` by yourself will give you more control over
>>> scale-in and scale-out.
>>> Let's say you have three node cluster with `num_token: 1`
>>>
>>> And your initial range looks like:-
>>>
>>> Datacenter: datacenter1
>>> ==========
>>> Address    Rack        Status State   Load            Owns
>>>  Token
>>>
>>>                                  3074457345618258602
>>>
>>> 127.0.0.1  rack1       Up     Normal  98.96 KiB       66.67%
>>>  -9223372036854775808
>>> 127.0.0.2  rack1       Up     Normal  98.96 KiB       66.67%
>>>  -3074457345618258603
>>> 127.0.0.3  rack1       Up     Normal  98.96 KiB       66.67%
>>>  3074457345618258602
>>>
>>> Now let's say you want to scale out the cluster to twice the current
>>> throughput(means you are adding 3 more nodes)
>>>
>>> If you are using AWS EBS volumes then you can use the same volumes and
>>> spin three more nodes by selecting midpoints of existing ranges which means
>>> your new nodes are already having data.
>>> Once you have mounted volumes on your new nodes:-
>>> * You need to delete every system table except schema related tables.
>>> * You need to generate system/local table by yourself which has
>>> `Bootstrap state` as completed and schema-version same as other existing
>>> nodes.
>>> * You need to remove extra data on all the machines using cleanup
>>> commands
>>>
>>> This is how you can scale out Cassandra cluster in the minutes. In case
>>> you want to add nodes one by one then you need to write some small tool
>>> which will always figure out the bigger range in the existing cluster and
>>> will split it into the half.
>>>
>>> However, I never tested it thoroughly but this should work conceptually.
>>> So here we are taking advantage of the fact that we have volumes(data) for
>>> the new node beforehand so we no need to bootstrap them.
>>>
>>> Thanks & Regards,
>>> Varun Barala
>>>
>>> On Tue, Oct 2, 2018 at 2:31 PM onmstester onmstester <
>>> onmstester@zoho.com> wrote:
>>>
>>>>
>>>>
>>>> Sent using Zoho Mail <https://www.zoho.com/mail/>
>>>>
>>>>
>>>> ---- On Mon, 01 Oct 2018 18:36:03 +0330 *Alain RODRIGUEZ
>>>> <arodrime@gmail.com <ar...@gmail.com>>* wrote ----
>>>>
>>>> Hello again :),
>>>>
>>>> I thought a little bit more about this question, and I was actually
>>>> wondering if something like this would work:
>>>>
>>>> Imagine 3 node cluster, and create them using:
>>>> For the 3 nodes: `num_token: 4`
>>>> Node 1: `intial_token: -9223372036854775808, -4611686018427387905, -2,
>>>> 4611686018427387901`
>>>> Node 2: `intial_token: -7686143364045646507, -3074457345618258604,
>>>> 1537228672809129299, 6148914691236517202`
>>>> Node 3: `intial_token: -6148914691236517206, -1537228672809129303,
>>>> 3074457345618258600, 7686143364045646503`
>>>>
>>>>  If you know the initial size of your cluster, you can calculate the
>>>> total number of tokens: number of nodes * vnodes and use the
>>>> formula/python code above to get the tokens. Then use the first token for
>>>> the first node, move to the second node, use the second token and repeat.
>>>> In my case there is a total of 12 tokens (3 nodes, 4 tokens each)
>>>> ```
>>>> >>> number_of_tokens = 12
>>>> >>> [str(((2**64 / number_of_tokens) * i) - 2**63) for i in
>>>> range(number_of_tokens)]
>>>> ['-9223372036854775808', '-7686143364045646507',
>>>> '-6148914691236517206', '-4611686018427387905', '-3074457345618258604',
>>>> '-1537228672809129303', '-2', '1537228672809129299', '3074457345618258600',
>>>> '4611686018427387901', '6148914691236517202', '7686143364045646503']
>>>> ```
>>>>
>>>>
>>>> Using manual initial_token (your idea), how could i add a new node to a
>>>> long running cluster (the procedure)?
>>>>
>>>>
>>
>> --
>> Cordialement;
>>
>> Ahmed ELJAMI
>>
>

Re: Re: Re: how to configure the Token Allocation Algorithm

Posted by Jean Carlo <je...@gmail.com>.
Creating a fresh new cluster in aws using this procedure, I got this
problem once I am bootstrapping the second rack of the cluster of 6
machines with 3 racks and a keyspace of rf 3

WARN  [main] 2019-04-26 11:37:43,845 TokenAllocation.java:63 - Selected
tokens [-5106267594614944625, 623001446449719390, 7048665031315327212,
3265006217757525070, 5054577454645148534, 314677103601736696,
7660890915606146375, -5329427405842523680]
ERROR [main] 2019-04-26 11:37:43,860 CassandraDaemon.java:749 - Fatal
configuration error
org.apache.cassandra.exceptions.ConfigurationException: Token allocation
failed: the number of racks 2 in datacenter eu-west-3 is lower than its
replication factor 3.

Someone got this problem ?

I am not quite sure why I have this, since my cluster has 3 racks.

Cluster Information:
    Name: test
    Snitch: org.apache.cassandra.locator.GossipingPropertyFileSnitch
    DynamicEndPointSnitch: enabled
    Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
    Schema versions:
        3bf63440-fad7-3371-9c14-4855ad11ee83: [192.0.0.1, 192.0.0.2]



Jean Carlo

"The best way to predict the future is to invent it" Alan Kay


On Thu, Jan 24, 2019 at 10:32 AM Ahmed Eljami <ah...@gmail.com>
wrote:

> Hi folks,
>
> What about adding new keyspaces in the existing cluster, test_2 with the
> same RF.
>
> It will use the same logic as the existing kesypace test ? Or I should
> restart nodes and add the new keyspace to the cassandra.yaml ?
>
> Thanks.
>
> Le mar. 2 oct. 2018 à 10:28, Varun Barala <va...@gmail.com> a
> écrit :
>
>> Hi,
>>
>> Managing `initial_token` by yourself will give you more control over
>> scale-in and scale-out.
>> Let's say you have three node cluster with `num_token: 1`
>>
>> And your initial range looks like:-
>>
>> Datacenter: datacenter1
>> ==========
>> Address    Rack        Status State   Load            Owns
>>  Token
>>
>>                                3074457345618258602
>> 127.0.0.1  rack1       Up     Normal  98.96 KiB       66.67%
>>  -9223372036854775808
>> 127.0.0.2  rack1       Up     Normal  98.96 KiB       66.67%
>>  -3074457345618258603
>> 127.0.0.3  rack1       Up     Normal  98.96 KiB       66.67%
>>  3074457345618258602
>>
>> Now let's say you want to scale out the cluster to twice the current
>> throughput(means you are adding 3 more nodes)
>>
>> If you are using AWS EBS volumes then you can use the same volumes and
>> spin three more nodes by selecting midpoints of existing ranges which means
>> your new nodes are already having data.
>> Once you have mounted volumes on your new nodes:-
>> * You need to delete every system table except schema related tables.
>> * You need to generate system/local table by yourself which has
>> `Bootstrap state` as completed and schema-version same as other existing
>> nodes.
>> * You need to remove extra data on all the machines using cleanup commands
>>
>> This is how you can scale out Cassandra cluster in the minutes. In case
>> you want to add nodes one by one then you need to write some small tool
>> which will always figure out the bigger range in the existing cluster and
>> will split it into the half.
>>
>> However, I never tested it thoroughly but this should work conceptually.
>> So here we are taking advantage of the fact that we have volumes(data) for
>> the new node beforehand so we no need to bootstrap them.
>>
>> Thanks & Regards,
>> Varun Barala
>>
>> On Tue, Oct 2, 2018 at 2:31 PM onmstester onmstester <on...@zoho.com>
>> wrote:
>>
>>>
>>>
>>> Sent using Zoho Mail <https://www.zoho.com/mail/>
>>>
>>>
>>> ---- On Mon, 01 Oct 2018 18:36:03 +0330 *Alain RODRIGUEZ
>>> <arodrime@gmail.com <ar...@gmail.com>>* wrote ----
>>>
>>> Hello again :),
>>>
>>> I thought a little bit more about this question, and I was actually
>>> wondering if something like this would work:
>>>
>>> Imagine 3 node cluster, and create them using:
>>> For the 3 nodes: `num_token: 4`
>>> Node 1: `intial_token: -9223372036854775808, -4611686018427387905, -2,
>>> 4611686018427387901`
>>> Node 2: `intial_token: -7686143364045646507, -3074457345618258604,
>>> 1537228672809129299, 6148914691236517202`
>>> Node 3: `intial_token: -6148914691236517206, -1537228672809129303,
>>> 3074457345618258600, 7686143364045646503`
>>>
>>>  If you know the initial size of your cluster, you can calculate the
>>> total number of tokens: number of nodes * vnodes and use the
>>> formula/python code above to get the tokens. Then use the first token for
>>> the first node, move to the second node, use the second token and repeat.
>>> In my case there is a total of 12 tokens (3 nodes, 4 tokens each)
>>> ```
>>> >>> number_of_tokens = 12
>>> >>> [str(((2**64 / number_of_tokens) * i) - 2**63) for i in
>>> range(number_of_tokens)]
>>> ['-9223372036854775808', '-7686143364045646507', '-6148914691236517206',
>>> '-4611686018427387905', '-3074457345618258604', '-1537228672809129303',
>>> '-2', '1537228672809129299', '3074457345618258600', '4611686018427387901',
>>> '6148914691236517202', '7686143364045646503']
>>> ```
>>>
>>>
>>> Using manual initial_token (your idea), how could i add a new node to a
>>> long running cluster (the procedure)?
>>>
>>>
>
> --
> Cordialement;
>
> Ahmed ELJAMI
>

Re: Re: Re: how to configure the Token Allocation Algorithm

Posted by Ahmed Eljami <ah...@gmail.com>.
Hi folks,

What about adding new keyspaces in the existing cluster, test_2 with the
same RF.

It will use the same logic as the existing kesypace test ? Or I should
restart nodes and add the new keyspace to the cassandra.yaml ?

Thanks.

Le mar. 2 oct. 2018 à 10:28, Varun Barala <va...@gmail.com> a
écrit :

> Hi,
>
> Managing `initial_token` by yourself will give you more control over
> scale-in and scale-out.
> Let's say you have three node cluster with `num_token: 1`
>
> And your initial range looks like:-
>
> Datacenter: datacenter1
> ==========
> Address    Rack        Status State   Load            Owns
>  Token
>
>                                3074457345618258602
> 127.0.0.1  rack1       Up     Normal  98.96 KiB       66.67%
>  -9223372036854775808
> 127.0.0.2  rack1       Up     Normal  98.96 KiB       66.67%
>  -3074457345618258603
> 127.0.0.3  rack1       Up     Normal  98.96 KiB       66.67%
>  3074457345618258602
>
> Now let's say you want to scale out the cluster to twice the current
> throughput(means you are adding 3 more nodes)
>
> If you are using AWS EBS volumes then you can use the same volumes and
> spin three more nodes by selecting midpoints of existing ranges which means
> your new nodes are already having data.
> Once you have mounted volumes on your new nodes:-
> * You need to delete every system table except schema related tables.
> * You need to generate system/local table by yourself which has `Bootstrap
> state` as completed and schema-version same as other existing nodes.
> * You need to remove extra data on all the machines using cleanup commands
>
> This is how you can scale out Cassandra cluster in the minutes. In case
> you want to add nodes one by one then you need to write some small tool
> which will always figure out the bigger range in the existing cluster and
> will split it into the half.
>
> However, I never tested it thoroughly but this should work conceptually.
> So here we are taking advantage of the fact that we have volumes(data) for
> the new node beforehand so we no need to bootstrap them.
>
> Thanks & Regards,
> Varun Barala
>
> On Tue, Oct 2, 2018 at 2:31 PM onmstester onmstester <on...@zoho.com>
> wrote:
>
>>
>>
>> Sent using Zoho Mail <https://www.zoho.com/mail/>
>>
>>
>> ---- On Mon, 01 Oct 2018 18:36:03 +0330 *Alain RODRIGUEZ
>> <arodrime@gmail.com <ar...@gmail.com>>* wrote ----
>>
>> Hello again :),
>>
>> I thought a little bit more about this question, and I was actually
>> wondering if something like this would work:
>>
>> Imagine 3 node cluster, and create them using:
>> For the 3 nodes: `num_token: 4`
>> Node 1: `intial_token: -9223372036854775808, -4611686018427387905, -2,
>> 4611686018427387901`
>> Node 2: `intial_token: -7686143364045646507, -3074457345618258604,
>> 1537228672809129299, 6148914691236517202`
>> Node 3: `intial_token: -6148914691236517206, -1537228672809129303,
>> 3074457345618258600, 7686143364045646503`
>>
>>  If you know the initial size of your cluster, you can calculate the
>> total number of tokens: number of nodes * vnodes and use the
>> formula/python code above to get the tokens. Then use the first token for
>> the first node, move to the second node, use the second token and repeat.
>> In my case there is a total of 12 tokens (3 nodes, 4 tokens each)
>> ```
>> >>> number_of_tokens = 12
>> >>> [str(((2**64 / number_of_tokens) * i) - 2**63) for i in
>> range(number_of_tokens)]
>> ['-9223372036854775808', '-7686143364045646507', '-6148914691236517206',
>> '-4611686018427387905', '-3074457345618258604', '-1537228672809129303',
>> '-2', '1537228672809129299', '3074457345618258600', '4611686018427387901',
>> '6148914691236517202', '7686143364045646503']
>> ```
>>
>>
>> Using manual initial_token (your idea), how could i add a new node to a
>> long running cluster (the procedure)?
>>
>>

-- 
Cordialement;

Ahmed ELJAMI

Re: Re: Re: how to configure the Token Allocation Algorithm

Posted by Varun Barala <va...@gmail.com>.
Hi,

Managing `initial_token` by yourself will give you more control over
scale-in and scale-out.
Let's say you have three node cluster with `num_token: 1`

And your initial range looks like:-

Datacenter: datacenter1
==========
Address    Rack        Status State   Load            Owns
 Token

                             3074457345618258602
127.0.0.1  rack1       Up     Normal  98.96 KiB       66.67%
 -9223372036854775808
127.0.0.2  rack1       Up     Normal  98.96 KiB       66.67%
 -3074457345618258603
127.0.0.3  rack1       Up     Normal  98.96 KiB       66.67%
 3074457345618258602

Now let's say you want to scale out the cluster to twice the current
throughput(means you are adding 3 more nodes)

If you are using AWS EBS volumes then you can use the same volumes and spin
three more nodes by selecting midpoints of existing ranges which means your
new nodes are already having data.
Once you have mounted volumes on your new nodes:-
* You need to delete every system table except schema related tables.
* You need to generate system/local table by yourself which has `Bootstrap
state` as completed and schema-version same as other existing nodes.
* You need to remove extra data on all the machines using cleanup commands

This is how you can scale out Cassandra cluster in the minutes. In case you
want to add nodes one by one then you need to write some small tool which
will always figure out the bigger range in the existing cluster and will
split it into the half.

However, I never tested it thoroughly but this should work conceptually. So
here we are taking advantage of the fact that we have volumes(data) for
the new node beforehand so we no need to bootstrap them.

Thanks & Regards,
Varun Barala

On Tue, Oct 2, 2018 at 2:31 PM onmstester onmstester <on...@zoho.com>
wrote:

>
>
> Sent using Zoho Mail <https://www.zoho.com/mail/>
>
>
> ---- On Mon, 01 Oct 2018 18:36:03 +0330 *Alain RODRIGUEZ
> <arodrime@gmail.com <ar...@gmail.com>>* wrote ----
>
> Hello again :),
>
> I thought a little bit more about this question, and I was actually
> wondering if something like this would work:
>
> Imagine 3 node cluster, and create them using:
> For the 3 nodes: `num_token: 4`
> Node 1: `intial_token: -9223372036854775808, -4611686018427387905, -2,
> 4611686018427387901`
> Node 2: `intial_token: -7686143364045646507, -3074457345618258604,
> 1537228672809129299, 6148914691236517202`
> Node 3: `intial_token: -6148914691236517206, -1537228672809129303,
> 3074457345618258600, 7686143364045646503`
>
>  If you know the initial size of your cluster, you can calculate the total
> number of tokens: number of nodes * vnodes and use the formula/python
> code above to get the tokens. Then use the first token for the first node,
> move to the second node, use the second token and repeat. In my case there
> is a total of 12 tokens (3 nodes, 4 tokens each)
> ```
> >>> number_of_tokens = 12
> >>> [str(((2**64 / number_of_tokens) * i) - 2**63) for i in
> range(number_of_tokens)]
> ['-9223372036854775808', '-7686143364045646507', '-6148914691236517206',
> '-4611686018427387905', '-3074457345618258604', '-1537228672809129303',
> '-2', '1537228672809129299', '3074457345618258600', '4611686018427387901',
> '6148914691236517202', '7686143364045646503']
> ```
>
>
> Using manual initial_token (your idea), how could i add a new node to a
> long running cluster (the procedure)?
>
>

Re: Re: Re: how to configure the Token Allocation Algorithm

Posted by onmstester onmstester <on...@zoho.com>.
Sent using Zoho Mail ---- On Mon, 01 Oct 2018 18:36:03 +0330 Alain RODRIGUEZ <ar...@gmail.com> wrote ---- Hello again :), I thought a little bit more about this question, and I was actually wondering if something like this would work: Imagine 3 node cluster, and create them using: For the 3 nodes: `num_token: 4` Node 1: `intial_token: -9223372036854775808, -4611686018427387905, -2, 4611686018427387901` Node 2: `intial_token: -7686143364045646507, -3074457345618258604, 1537228672809129299, 6148914691236517202` Node 3: `intial_token: -6148914691236517206, -1537228672809129303, 3074457345618258600, 7686143364045646503`  If you know the initial size of your cluster, you can calculate the total number of tokens: number of nodes * vnodes and use the formula/python code above to get the tokens. Then use the first token for the first node, move to the second node, use the second token and repeat. In my case there is a total of 12 tokens (3 nodes, 4 tokens each) ``` >>> number_of_tokens = 12 >>> [str(((2**64 / number_of_tokens) * i) - 2**63) for i in range(number_of_tokens)] ['-9223372036854775808', '-7686143364045646507', '-6148914691236517206', '-4611686018427387905', '-3074457345618258604', '-1537228672809129303', '-2', '1537228672809129299', '3074457345618258600', '4611686018427387901', '6148914691236517202', '7686143364045646503'] ``` Using manual initial_token (your idea), how could i add a new node to a long running cluster (the procedure)?

Re: Re: Re: how to configure the Token Allocation Algorithm

Posted by Alain RODRIGUEZ <ar...@gmail.com>.
Hello again :),

I thought a little bit more about this question, and I was actually
wondering if something like this would work:

Imagine 3 node cluster, and create them using:
For the 3 nodes: `num_token: 4`
Node 1: `intial_token: -9223372036854775808, -4611686018427387905, -2,
4611686018427387901`
Node 2: `intial_token: -7686143364045646507, -3074457345618258604,
1537228672809129299, 6148914691236517202`
Node 3: `intial_token: -6148914691236517206, -1537228672809129303,
3074457345618258600, 7686143364045646503`

 If you know the initial size of your cluster, you can calculate the total
number of tokens: number of nodes * vnodes and use the formula/python code
above to get the tokens. Then use the first token for the first node, move
to the second node, use the second token and repeat. In my case there is a
total of 12 tokens (3 nodes, 4 tokens each)
```
>>> number_of_tokens = 12
>>> [str(((2**64 / number_of_tokens) * i) - 2**63) for i in
range(number_of_tokens)]
['-9223372036854775808', '-7686143364045646507', '-6148914691236517206',
'-4611686018427387905', '-3074457345618258604', '-1537228672809129303',
'-2', '1537228672809129299', '3074457345618258600', '4611686018427387901',
'6148914691236517202', '7686143364045646503']
```

it actually works nicely apparently. Here is a quick ccm test I have run,
with the configuration above:

```

$ ccm node1 nodetool status tlp_lab


Datacenter: datacenter1

=======================

Status=Up/Down

|/ State=Normal/Leaving/Joining/Moving

--  Address    Load       Tokens       Owns (effective)  Host ID
                    Rack

UN  127.0.0.1  82.47 KiB  4            66.7%
1ed8680b-7250-4088-988b-e4679514322f  rack1

UN  127.0.0.2  99.03 KiB  4            66.7%
ab3655b5-c380-496d-b250-51b53efb4c00  rack1

UN  127.0.0.3  82.36 KiB  4            66.7%
ad2b343e-5f6e-4b0d-b79f-a3dfc3ba3c79  rack1
```

Ownership is perfectly distributed, like it would be without vnodes. Tested
with C* 3.11.1 and CCM.

I followed the procedure we were talking about on my second test, after
wiping out the data in my 3 nodes ccm cluster.
RF=2 for tlp_lab, the first node with initial_tokens defined and other
nodes using 'allocate_tokens_for_keyspace: tlp_lab':

$ ccm node1 nodetool status tlp_lab


Datacenter: datacenter1

=======================

Status=Up/Down

|/ State=Normal/Leaving/Joining/Moving

--  Address    Load       Tokens       Owns (effective)  Host ID
                    Rack

UN  127.0.0.1  86.71 KiB  4            96.2%
6e4c0ce0-2e2e-48ff-b7e0-3653e76366a3  rack1

UN  127.0.0.2  65.63 KiB  4            54.2%
592cda85-5807-4e7a-aa3b-0d9ae54cfaf3  rack1

UN  127.0.0.3  99.04 KiB  4            49.7%
f2c4eccc-31cc-458c-a599-5373c1169d3c  rack1

This is not as great. I guess a fourth node would help, but still not make
it as perfect.

I would still check what happens when you add a few more nodes with
'allocate_tokens_for_keyspace' afterward and without 'initial_token', not
to have any surprise.
I did not see anyone using this yet. Please take it as an idea to dig, and
not as a recommendation :).

I also noticed I did not answer the second part of the mail:

My cluster Size won't go beyond 150 nodes, should i still use The
> Allocation Algorithm instead of random with 256 tokens (performance wise or
> load-balance wise)?
>

I would say yes. There is a talk to change this default (256 vnodes), that
is now probably always a bad idea since 'allocate_tokens_for_keyspace' was
added.

Is the Allocation Algorithm, widely used and tested with Community and can
> we migrate all clusters with any size to use this Algorithm Safely?
>

Here again, I would say yes. I am not sure that it is widely used yet, but
I think so. Also, you can always check the ownership with 'nodetool status
<keyspace>' after adding the nodes, and before adding data or traffic to
this data center, so there is probably no real risk if you check ownership
distribution after adding nodes. If you don't like the distribution, you
can decommission the nodes, clean them, and try again, I use to call it
'rolling the dice' when I am still using the random algorithm :). I mean,
once the token ranges ownership are distributed to the nodes, it does not
change anything during the transaction. We don't need this 'algorithm'
after the bootstrap I would say.


> Out of Curiosity, i wonder how people (i.e, in Apple) config and maintain
> token management of clusters with thousands of nodes?
>

I am not sure about Apple, but my understanding is that some of those
companies don't use vnodes and have a 'ring management tool' to perform the
necessary 'nodetool move' around the cluster relatively easily or
automatically. Some other probably use a low number of vnodes (something
between 4 and 32) and 'allocate_tokens_for_keyspace'.

Also, my understanding is that it's very rare to have clusters with
thousands of nodes. You can then start having issues around gossip if I
remember correctly what I read/discussed. I would probably add a second
cluster when the first one is too big (hundreds of nodes) or split per
service/workflow for example. In practice, the operational complexity is
reduced by automated operations and/or having a good tooling to operate
efficiently.



Le lun. 1 oct. 2018 à 12:37, onmstester onmstester <on...@zoho.com> a
écrit :

> Thanks Alex,
> You are right, that would be a mistake.
>
> Sent using Zoho Mail <https://www.zoho.com/mail/>
>
>
> ============ Forwarded message ============
> From : Oleksandr Shulgin <ol...@zalando.de>
> To : "User"<us...@cassandra.apache.org>
> Date : Mon, 01 Oct 2018 13:53:37 +0330
> Subject : Re: Re: how to configure the Token Allocation Algorithm
> ============ Forwarded message ============
>
> On Mon, Oct 1, 2018 at 12:18 PM onmstester onmstester <on...@zoho.com>
> wrote:
>
>
>
> What if instead of running that python and having one node with non-vnode
> config, i remove the first seed node and re-add it after cluster was fully
> up ? so the token ranges of first seed node would also be assigned by
> Allocation Alg
>
>
> I think this is tricky because the random allocation of the very first
> tokens from the first seed affects the choice of tokens made by the
> algorithm on the rest of the nodes: it basically tries to divide the token
> ranges in more or less equal parts.  If your very first 8 tokens resulted
> in really bad balance, you are not going to remove that imbalance by
> removing the node, it would still have the lasting effect on the rest of
> your cluster.
>
> --
> Alex
>
>
>
>

Fwd: Re: Re: how to configure the Token Allocation Algorithm

Posted by onmstester onmstester <on...@zoho.com>.
Thanks Alex, You are right, that would be a mistake. Sent using Zoho Mail ============ Forwarded message ============ From : Oleksandr Shulgin <ol...@zalando.de> To : "User"<us...@cassandra.apache.org> Date : Mon, 01 Oct 2018 13:53:37 +0330 Subject : Re: Re: how to configure the Token Allocation Algorithm ============ Forwarded message ============ On Mon, Oct 1, 2018 at 12:18 PM onmstester onmstester <on...@zoho.com> wrote: What if instead of running that python and having one node with non-vnode config, i remove the first seed node and re-add it after cluster was fully up ? so the token ranges of first seed node would also be assigned by Allocation Alg I think this is tricky because the random allocation of the very first tokens from the first seed affects the choice of tokens made by the algorithm on the rest of the nodes: it basically tries to divide the token ranges in more or less equal parts.  If your very first 8 tokens resulted in really bad balance, you are not going to remove that imbalance by removing the node, it would still have the lasting effect on the rest of your cluster. -- Alex

Re: Re: how to configure the Token Allocation Algorithm

Posted by Oleksandr Shulgin <ol...@zalando.de>.
On Mon, Oct 1, 2018 at 12:18 PM onmstester onmstester <on...@zoho.com>
wrote:

>
> What if instead of running that python and having one node with non-vnode
> config, i remove the first seed node and re-add it after cluster was fully
> up ? so the token ranges of first seed node would also be assigned by
> Allocation Alg
>

I think this is tricky because the random allocation of the very first
tokens from the first seed affects the choice of tokens made by the
algorithm on the rest of the nodes: it basically tries to divide the token
ranges in more or less equal parts.  If your very first 8 tokens resulted
in really bad balance, you are not going to remove that imbalance by
removing the node, it would still have the lasting effect on the rest of
your cluster.

--
Alex

Fwd: Re: how to configure the Token Allocation Algorithm

Posted by onmstester onmstester <on...@zoho.com>.
Thanks Alain, What if instead of running that python and having one node with non-vnode config, i remove the first seed node and re-add it after cluster was fully up ? so the token ranges of first seed node would also be assigned by Allocation Alg ============ Forwarded message ============ From : Alain RODRIGUEZ <ar...@gmail.com> To : "user cassandra.apache.org"<us...@cassandra.apache.org> Date : Mon, 01 Oct 2018 13:14:21 +0330 Subject : Re: how to configure the Token Allocation Algorithm ============ Forwarded message ============ Hello, Your process looks good to me :). Still a couple of comments to make it more efficient (hopefully). - Improving step 2: I believe you can actually get a slightly better distribution picking the tokens for the (first) seed node. This is to prevent the node from randomly calculating its token ranges. You can calculate the token ranges using the following python code:  $ python # Start the python shell [...] >>> number_of_tokens = 8 >>> [str(((2**64 / number_of_tokens) * i) - 2**63) for i in range(number_of_tokens)] ['-9223372036854775808', '-6917529027641081856', '-4611686018427387904', '-2305843009213693952', '0', '2305843009213693952', '4611686018427387904', '6917529027641081856'] Set the 'initial_token' with the above list (coma separated list) and the number of vnodes to 'num_tokens: 8'. This technique proved to be way more efficient (especially for low token numbers / small number of nodes). Luckily it's also easy to test.

Re: how to configure the Token Allocation Algorithm

Posted by Alain RODRIGUEZ <ar...@gmail.com>.
Hello,

Your process looks good to me :). Still a couple of comments to make it
more efficient (hopefully).

*- Improving step 2:*

I believe you can actually get a slightly better distribution picking the
tokens for the (first) seed node. This is to prevent the node from randomly
calculating its token ranges. You can calculate the token ranges using the
following python code:

$ python  # Start the python shell
[...]
>>> number_of_tokens = 8
>>> [str(((2**64 / number_of_tokens) * i) - 2**63) for i in range(number_of_tokens)]
['-9223372036854775808', '-6917529027641081856',
'-4611686018427387904', '-2305843009213693952', '0',
'2305843009213693952', '4611686018427387904', '6917529027641081856']


Set the 'initial_token' with the above list (coma separated list) and the
number of vnodes to 'num_tokens: 8'.

This technique proved to be way more efficient (especially for low token
numbers / small number of nodes). Luckily it's also easy to test.

- *Step 4 might not be needed*

I don't see the need of stopping/starting the seed. The option
'allocate_tokens_for_keyspace'
won't affect this seed node (already initialized) in any way

Also, do not forget to have more nodes becoming 'seeds', either after
bootstrap or just start a couple more of seeds after the first one for
example.

C*heers,
-----------------------
Alain Rodriguez - @arodream - alain@thelastpickle.com
France / Spain

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com



Le lun. 1 oct. 2018 à 07:16, onmstester onmstester <on...@zoho.com> a
écrit :

> Since i failed to find a document on how to configure and use the Token
> Allocation Algorithm (to replace the random Algorithm), just wanted to be
> sure about the procedure i've done:
> 1. Using Apache Cassandra 3.11.2
> 2. Configured one of seed nodes with num_tokens=8 and started it.
> 3. Using Cqlsh created keyspace test with NetworkTopologyStrategy and RF=3.
> 4. Stopped the seed node.
> 5. add this line to cassandra.yaml of all nodes (all have num_tokens=8)
> and started the cluster:
> allocate_tokens_for_keyspace=test
>
> My cluster Size won't go beyond 150 nodes, should i still use The
> Allocation Algorithm instead of random with 256 tokens (performance wise or
> load-balance wise)?
> Is the Allocation Algorithm, widely used and tested with Community and can
> we migrate all clusters with any size to use this Algorithm Safely?
> Out of Curiosity, i wonder how people (i.e, in Apple) config and maintain
> token management of clusters with thousands of nodes?
>
>
> Sent using Zoho Mail <https://www.zoho.com/mail/>
>
>
>