You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Voytek Jarnot <vo...@gmail.com> on 2020/01/24 15:38:49 UTC

sstableloader & num_tokens change

Running 3.11.x, 4 nodes RF=3, default 256 tokens; moving to a different 4
node RF=3 cluster.

I've read that 256 is not an optimal default num_tokens value, and that 32
is likely a better option.

We have the "opportunity" to switch, as we're migrating environments and
will likely be using sstableloader to do so. I'm curious if there are any
gotchas with using sstableloader to restore snapshots taken from 256-token
nodes into a cluster with 32-token nodes (otherwise same # of nodes and
same RF).

Thanks in advance.

Re: sstableloader & num_tokens change

Posted by Jean Carlo <je...@gmail.com>.
Hello

Concerning the original question, I agreed with @eric_ramirez,
sstableloader is transparent for token allocation number.

just for info @voytek, check this post out
https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html
You lay be interested to now if you have your cluster well balanced with 32
tokens. 32 tokens seems to be the future default value, but changing the
default vnodes token numbers seems not to be so straight forward

cheers

Jean Carlo

"The best way to predict the future is to invent it" Alan Kay


On Sat, Jan 25, 2020 at 5:05 AM Erick Ramirez <fl...@gmail.com> wrote:

> On the subject of DSBulk, sstableloader is the tool of choice for this
> scenario.
>
> +1 to Sergio and I'm confirming that DSBulk is designed as a bulk loader
> for CSV/JSON formats. Cheers!
>

Re: sstableloader & num_tokens change

Posted by Erick Ramirez <fl...@gmail.com>.
On the subject of DSBulk, sstableloader is the tool of choice for this
scenario.

+1 to Sergio and I'm confirming that DSBulk is designed as a bulk loader
for CSV/JSON formats. Cheers!

Re: sstableloader & num_tokens change

Posted by Voytek Jarnot <vo...@gmail.com>.
Why? Seems to me that the old Cassandra -> CSV/JSON and CSV/JSON -> new
Cassandra are unnecessary steps in my case.

On Fri, Jan 24, 2020 at 10:34 AM Nitan Kainth <ni...@gmail.com> wrote:

> Instead of sstableloader consider dsbulk by datastax.
>
> On Fri, Jan 24, 2020 at 10:20 AM Reid Pinchback <
> rpinchback@tripadvisor.com> wrote:
>
>> Jon Haddad has previously made the case for num_tokens=4.  His Accelerate
>> 2019 talk is available at:
>>
>>
>>
>> https://www.youtube.com/watch?v=swL7bCnolkU
>>
>>
>>
>> You might want to check that out.  Also I think the amount of effort you
>> put into evening out the token distribution increases as vnode count
>> shrinks.  The caveats are explored at:
>>
>>
>>
>>
>> https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html
>>
>>
>>
>>
>>
>> *From: *Voytek Jarnot <vo...@gmail.com>
>> *Reply-To: *"user@cassandra.apache.org" <us...@cassandra.apache.org>
>> *Date: *Friday, January 24, 2020 at 10:39 AM
>> *To: *"user@cassandra.apache.org" <us...@cassandra.apache.org>
>> *Subject: *sstableloader & num_tokens change
>>
>>
>>
>> *Message from External Sender*
>>
>> Running 3.11.x, 4 nodes RF=3, default 256 tokens; moving to a different 4
>> node RF=3 cluster.
>>
>>
>>
>> I've read that 256 is not an optimal default num_tokens value, and that
>> 32 is likely a better option.
>>
>>
>>
>> We have the "opportunity" to switch, as we're migrating environments and
>> will likely be using sstableloader to do so. I'm curious if there are any
>> gotchas with using sstableloader to restore snapshots taken from 256-token
>> nodes into a cluster with 32-token nodes (otherwise same # of nodes and
>> same RF).
>>
>>
>>
>> Thanks in advance.
>>
>

Re: [EXTERNAL] Re: sstableloader & num_tokens change

Posted by Voytek Jarnot <vo...@gmail.com>.
Odd. Have you seen this behavior? I ran a test last week, loaded snapshots
from 4 nodes to 4 nodes (RF 3 on both ends) and did not notice a spike.
That's not to say that it didn't happen, but I think I'd have noticed as I
was loading approx 250GB x 4 (although sequentially rather than 4x
sstableloader in parallel).

Also, thanks to everyone for confirming no issue with num_tokens and
sstableloader; appreciate it.


On Mon, Jan 27, 2020 at 9:02 AM Durity, Sean R <SE...@homedepot.com>
wrote:

> I would suggest to be aware of potential data size expansion. If you load
> (for example) three copies of the data into a new cluster (because the RF
> of the origin cluster is 3), it will also get written to the RF of the new
> cluster (3 more times). So, you could see data expansion of 9x the original
> data size (or, origin RF * target RF), until compaction can run.
>
>
>
>
>
> Sean Durity – Staff Systems Engineer, Cassandra
>
>
>
> *From:* Erick Ramirez <fl...@gmail.com>
> *Sent:* Friday, January 24, 2020 11:03 PM
> *To:* user@cassandra.apache.org
> *Subject:* [EXTERNAL] Re: sstableloader & num_tokens change
>
>
>
>
>
> If I may just loop this back to the question at hand:
>
> I'm curious if there are any gotchas with using sstableloader to restore
> snapshots taken from 256-token nodes into a cluster with 32-token (or your
> preferred number of tokens) nodes (otherwise same # of nodes and same RF).
>
>
>
> No, there isn't. It will work as designed so you're good to go. Cheers!
>
>
>
>
>
>
> ------------------------------
>
> The information in this Internet Email is confidential and may be legally
> privileged. It is intended solely for the addressee. Access to this Email
> by anyone else is unauthorized. If you are not the intended recipient, any
> disclosure, copying, distribution or any action taken or omitted to be
> taken in reliance on it, is prohibited and may be unlawful. When addressed
> to our clients any opinions or advice contained in this Email are subject
> to the terms and conditions expressed in any applicable governing The Home
> Depot terms of business or client engagement letter. The Home Depot
> disclaims all responsibility and liability for the accuracy and content of
> this attachment and for any damages or losses arising from any
> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
> items of a destructive nature, which may be contained in this attachment
> and shall not be liable for direct, indirect, consequential or special
> damages in connection with this e-mail message or its attachment.
>

RE: [EXTERNAL] Re: sstableloader & num_tokens change

Posted by "Durity, Sean R" <SE...@homedepot.com>.
I would suggest to be aware of potential data size expansion. If you load (for example) three copies of the data into a new cluster (because the RF of the origin cluster is 3), it will also get written to the RF of the new cluster (3 more times). So, you could see data expansion of 9x the original data size (or, origin RF * target RF), until compaction can run.


Sean Durity – Staff Systems Engineer, Cassandra

From: Erick Ramirez <fl...@gmail.com>
Sent: Friday, January 24, 2020 11:03 PM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: sstableloader & num_tokens change


If I may just loop this back to the question at hand:

I'm curious if there are any gotchas with using sstableloader to restore snapshots taken from 256-token nodes into a cluster with 32-token (or your preferred number of tokens) nodes (otherwise same # of nodes and same RF).

No, there isn't. It will work as designed so you're good to go. Cheers!



________________________________

The information in this Internet Email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this Email by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. When addressed to our clients any opinions or advice contained in this Email are subject to the terms and conditions expressed in any applicable governing The Home Depot terms of business or client engagement letter. The Home Depot disclaims all responsibility and liability for the accuracy and content of this attachment and for any damages or losses arising from any inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other items of a destructive nature, which may be contained in this attachment and shall not be liable for direct, indirect, consequential or special damages in connection with this e-mail message or its attachment.

Re: sstableloader & num_tokens change

Posted by Erick Ramirez <fl...@gmail.com>.
> If I may just loop this back to the question at hand:
>
> I'm curious if there are any gotchas with using sstableloader to restore
> snapshots taken from 256-token nodes into a cluster with 32-token (or your
> preferred number of tokens) nodes (otherwise same # of nodes and same RF).
>

No, there isn't. It will work as designed so you're good to go. Cheers!


>

Re: sstableloader & num_tokens change

Posted by Voytek Jarnot <vo...@gmail.com>.
If I may just loop this back to the question at hand:

I'm curious if there are any gotchas with using sstableloader to restore
snapshots taken from 256-token nodes into a cluster with 32-token (or your
preferred number of tokens) nodes (otherwise same # of nodes and same RF).

On Fri, Jan 24, 2020 at 11:15 AM Sergio <la...@gmail.com> wrote:

> https://docs.datastax.com/en/dsbulk/doc/dsbulk/reference/dsbulkLoad.html
>
> Just skimming through the docs
>
> I see examples by loading from CSV / JSON
>
> Maybe there is some other command or doc page that I am missing
>
>
>
>
> On Fri, Jan 24, 2020, 9:10 AM Nitan Kainth <ni...@gmail.com> wrote:
>
>> Dsbulk works same as sstableloder.
>>
>>
>> Regards,
>>
>> Nitan
>>
>> Cell: 510 449 9629
>>
>> On Jan 24, 2020, at 10:40 AM, Sergio <la...@gmail.com> wrote:
>>
>> 
>> I was wondering if that improvement for token allocation would work even
>> with just one rack. It should but I am not sure.
>>
>> Does Dsbulk support migration cluster to cluster without CSV or JSON
>> export?
>>
>> Thanks and Regards
>>
>> On Fri, Jan 24, 2020, 8:34 AM Nitan Kainth <ni...@gmail.com> wrote:
>>
>>> Instead of sstableloader consider dsbulk by datastax.
>>>
>>> On Fri, Jan 24, 2020 at 10:20 AM Reid Pinchback <
>>> rpinchback@tripadvisor.com> wrote:
>>>
>>>> Jon Haddad has previously made the case for num_tokens=4.  His
>>>> Accelerate 2019 talk is available at:
>>>>
>>>>
>>>>
>>>> https://www.youtube.com/watch?v=swL7bCnolkU
>>>>
>>>>
>>>>
>>>> You might want to check that out.  Also I think the amount of effort
>>>> you put into evening out the token distribution increases as vnode count
>>>> shrinks.  The caveats are explored at:
>>>>
>>>>
>>>>
>>>>
>>>> https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *From: *Voytek Jarnot <vo...@gmail.com>
>>>> *Reply-To: *"user@cassandra.apache.org" <us...@cassandra.apache.org>
>>>> *Date: *Friday, January 24, 2020 at 10:39 AM
>>>> *To: *"user@cassandra.apache.org" <us...@cassandra.apache.org>
>>>> *Subject: *sstableloader & num_tokens change
>>>>
>>>>
>>>>
>>>> *Message from External Sender*
>>>>
>>>> Running 3.11.x, 4 nodes RF=3, default 256 tokens; moving to a different
>>>> 4 node RF=3 cluster.
>>>>
>>>>
>>>>
>>>> I've read that 256 is not an optimal default num_tokens value, and that
>>>> 32 is likely a better option.
>>>>
>>>>
>>>>
>>>> We have the "opportunity" to switch, as we're migrating environments
>>>> and will likely be using sstableloader to do so. I'm curious if there are
>>>> any gotchas with using sstableloader to restore snapshots taken from
>>>> 256-token nodes into a cluster with 32-token nodes (otherwise same # of
>>>> nodes and same RF).
>>>>
>>>>
>>>>
>>>> Thanks in advance.
>>>>
>>>

Re: sstableloader & num_tokens change

Posted by Sergio <la...@gmail.com>.
https://docs.datastax.com/en/dsbulk/doc/dsbulk/reference/dsbulkLoad.html

Just skimming through the docs

I see examples by loading from CSV / JSON

Maybe there is some other command or doc page that I am missing




On Fri, Jan 24, 2020, 9:10 AM Nitan Kainth <ni...@gmail.com> wrote:

> Dsbulk works same as sstableloder.
>
>
> Regards,
>
> Nitan
>
> Cell: 510 449 9629
>
> On Jan 24, 2020, at 10:40 AM, Sergio <la...@gmail.com> wrote:
>
> 
> I was wondering if that improvement for token allocation would work even
> with just one rack. It should but I am not sure.
>
> Does Dsbulk support migration cluster to cluster without CSV or JSON
> export?
>
> Thanks and Regards
>
> On Fri, Jan 24, 2020, 8:34 AM Nitan Kainth <ni...@gmail.com> wrote:
>
>> Instead of sstableloader consider dsbulk by datastax.
>>
>> On Fri, Jan 24, 2020 at 10:20 AM Reid Pinchback <
>> rpinchback@tripadvisor.com> wrote:
>>
>>> Jon Haddad has previously made the case for num_tokens=4.  His
>>> Accelerate 2019 talk is available at:
>>>
>>>
>>>
>>> https://www.youtube.com/watch?v=swL7bCnolkU
>>>
>>>
>>>
>>> You might want to check that out.  Also I think the amount of effort you
>>> put into evening out the token distribution increases as vnode count
>>> shrinks.  The caveats are explored at:
>>>
>>>
>>>
>>>
>>> https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html
>>>
>>>
>>>
>>>
>>>
>>> *From: *Voytek Jarnot <vo...@gmail.com>
>>> *Reply-To: *"user@cassandra.apache.org" <us...@cassandra.apache.org>
>>> *Date: *Friday, January 24, 2020 at 10:39 AM
>>> *To: *"user@cassandra.apache.org" <us...@cassandra.apache.org>
>>> *Subject: *sstableloader & num_tokens change
>>>
>>>
>>>
>>> *Message from External Sender*
>>>
>>> Running 3.11.x, 4 nodes RF=3, default 256 tokens; moving to a different
>>> 4 node RF=3 cluster.
>>>
>>>
>>>
>>> I've read that 256 is not an optimal default num_tokens value, and that
>>> 32 is likely a better option.
>>>
>>>
>>>
>>> We have the "opportunity" to switch, as we're migrating environments and
>>> will likely be using sstableloader to do so. I'm curious if there are any
>>> gotchas with using sstableloader to restore snapshots taken from 256-token
>>> nodes into a cluster with 32-token nodes (otherwise same # of nodes and
>>> same RF).
>>>
>>>
>>>
>>> Thanks in advance.
>>>
>>

Re: sstableloader & num_tokens change

Posted by Nitan Kainth <ni...@gmail.com>.
Dsbulk works same as sstableloder.


Regards,
Nitan
Cell: 510 449 9629

> On Jan 24, 2020, at 10:40 AM, Sergio <la...@gmail.com> wrote:
> 
> 
> I was wondering if that improvement for token allocation would work even with just one rack. It should but I am not sure.
> 
> Does Dsbulk support migration cluster to cluster without CSV or JSON export?
> 
> Thanks and Regards
> 
>> On Fri, Jan 24, 2020, 8:34 AM Nitan Kainth <ni...@gmail.com> wrote:
>> Instead of sstableloader consider dsbulk by datastax. 
>> 
>>> On Fri, Jan 24, 2020 at 10:20 AM Reid Pinchback <rp...@tripadvisor.com> wrote:
>>> Jon Haddad has previously made the case for num_tokens=4.  His Accelerate 2019 talk is available at:
>>> 
>>>  
>>> 
>>> https://www.youtube.com/watch?v=swL7bCnolkU
>>> 
>>>  
>>> 
>>> You might want to check that out.  Also I think the amount of effort you put into evening out the token distribution increases as vnode count shrinks.  The caveats are explored at:
>>> 
>>>  
>>> 
>>> https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html
>>> 
>>>  
>>> 
>>>  
>>> 
>>> From: Voytek Jarnot <vo...@gmail.com>
>>> Reply-To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
>>> Date: Friday, January 24, 2020 at 10:39 AM
>>> To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
>>> Subject: sstableloader & num_tokens change
>>> 
>>>  
>>> 
>>> Message from External Sender
>>> 
>>> Running 3.11.x, 4 nodes RF=3, default 256 tokens; moving to a different 4 node RF=3 cluster.
>>> 
>>>  
>>> 
>>> I've read that 256 is not an optimal default num_tokens value, and that 32 is likely a better option.
>>> 
>>>  
>>> 
>>> We have the "opportunity" to switch, as we're migrating environments and will likely be using sstableloader to do so. I'm curious if there are any gotchas with using sstableloader to restore snapshots taken from 256-token nodes into a cluster with 32-token nodes (otherwise same # of nodes and same RF).
>>> 
>>>  
>>> 
>>> Thanks in advance.

Re: sstableloader & num_tokens change

Posted by Sergio <la...@gmail.com>.
I was wondering if that improvement for token allocation would work even
with just one rack. It should but I am not sure.

Does Dsbulk support migration cluster to cluster without CSV or JSON export?

Thanks and Regards

On Fri, Jan 24, 2020, 8:34 AM Nitan Kainth <ni...@gmail.com> wrote:

> Instead of sstableloader consider dsbulk by datastax.
>
> On Fri, Jan 24, 2020 at 10:20 AM Reid Pinchback <
> rpinchback@tripadvisor.com> wrote:
>
>> Jon Haddad has previously made the case for num_tokens=4.  His Accelerate
>> 2019 talk is available at:
>>
>>
>>
>> https://www.youtube.com/watch?v=swL7bCnolkU
>>
>>
>>
>> You might want to check that out.  Also I think the amount of effort you
>> put into evening out the token distribution increases as vnode count
>> shrinks.  The caveats are explored at:
>>
>>
>>
>>
>> https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html
>>
>>
>>
>>
>>
>> *From: *Voytek Jarnot <vo...@gmail.com>
>> *Reply-To: *"user@cassandra.apache.org" <us...@cassandra.apache.org>
>> *Date: *Friday, January 24, 2020 at 10:39 AM
>> *To: *"user@cassandra.apache.org" <us...@cassandra.apache.org>
>> *Subject: *sstableloader & num_tokens change
>>
>>
>>
>> *Message from External Sender*
>>
>> Running 3.11.x, 4 nodes RF=3, default 256 tokens; moving to a different 4
>> node RF=3 cluster.
>>
>>
>>
>> I've read that 256 is not an optimal default num_tokens value, and that
>> 32 is likely a better option.
>>
>>
>>
>> We have the "opportunity" to switch, as we're migrating environments and
>> will likely be using sstableloader to do so. I'm curious if there are any
>> gotchas with using sstableloader to restore snapshots taken from 256-token
>> nodes into a cluster with 32-token nodes (otherwise same # of nodes and
>> same RF).
>>
>>
>>
>> Thanks in advance.
>>
>

Re: sstableloader & num_tokens change

Posted by Nitan Kainth <ni...@gmail.com>.
Instead of sstableloader consider dsbulk by datastax.

On Fri, Jan 24, 2020 at 10:20 AM Reid Pinchback <rp...@tripadvisor.com>
wrote:

> Jon Haddad has previously made the case for num_tokens=4.  His Accelerate
> 2019 talk is available at:
>
>
>
> https://www.youtube.com/watch?v=swL7bCnolkU
>
>
>
> You might want to check that out.  Also I think the amount of effort you
> put into evening out the token distribution increases as vnode count
> shrinks.  The caveats are explored at:
>
>
>
>
> https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html
>
>
>
>
>
> *From: *Voytek Jarnot <vo...@gmail.com>
> *Reply-To: *"user@cassandra.apache.org" <us...@cassandra.apache.org>
> *Date: *Friday, January 24, 2020 at 10:39 AM
> *To: *"user@cassandra.apache.org" <us...@cassandra.apache.org>
> *Subject: *sstableloader & num_tokens change
>
>
>
> *Message from External Sender*
>
> Running 3.11.x, 4 nodes RF=3, default 256 tokens; moving to a different 4
> node RF=3 cluster.
>
>
>
> I've read that 256 is not an optimal default num_tokens value, and that 32
> is likely a better option.
>
>
>
> We have the "opportunity" to switch, as we're migrating environments and
> will likely be using sstableloader to do so. I'm curious if there are any
> gotchas with using sstableloader to restore snapshots taken from 256-token
> nodes into a cluster with 32-token nodes (otherwise same # of nodes and
> same RF).
>
>
>
> Thanks in advance.
>

Re: sstableloader & num_tokens change

Posted by Reid Pinchback <rp...@tripadvisor.com>.
Jon Haddad has previously made the case for num_tokens=4.  His Accelerate 2019 talk is available at:

https://www.youtube.com/watch?v=swL7bCnolkU

You might want to check that out.  Also I think the amount of effort you put into evening out the token distribution increases as vnode count shrinks.  The caveats are explored at:

https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html


From: Voytek Jarnot <vo...@gmail.com>
Reply-To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
Date: Friday, January 24, 2020 at 10:39 AM
To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
Subject: sstableloader & num_tokens change

Message from External Sender
Running 3.11.x, 4 nodes RF=3, default 256 tokens; moving to a different 4 node RF=3 cluster.

I've read that 256 is not an optimal default num_tokens value, and that 32 is likely a better option.

We have the "opportunity" to switch, as we're migrating environments and will likely be using sstableloader to do so. I'm curious if there are any gotchas with using sstableloader to restore snapshots taken from 256-token nodes into a cluster with 32-token nodes (otherwise same # of nodes and same RF).

Thanks in advance.