You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Jens Rantil <je...@tink.se> on 2014/08/29 16:09:39 UTC

Heterogenous cluster and vnodes

Hey,


I have a few of VM host (bare metal) machines with varying amounts of free hard drive space on them. For simplicity let’s say I have three machine like so:
 * Machine 1
  - Harddrive 1: 150 GB available.
 * Machine 2:
  - Harddrive 1: 150 GB available.
  - Harddrive 2: 150 GB available.
 * Machine 3.
  - Harddrive 1: 150 GB available.

I am setting up a Cassandra cluster between them and as I see it I have two options:


1. I set up one Cassandra node/VM per bare metal machine. I assign all free hard drive space to each Cassandra node and I balance the cluster using vnodes proportionally to the amount of free hard drive space (CPU/RAM is not going to be a bottle neck here).


2. I set up four VMs, each running a Cassandra node with equal amount of hard drive space and equal amount of vnodes. Machine 2 runs two VMs.



General question: Is any of these preferable to the other? I understand 1) yields lower high-availability (since nodes are on the same hardware).


Question about alternative 1: With varying vnodes, can I always be sure that replicas are never put on the same virtual machine? Or is varying vnodes really only useful/recommended when migrating from machines with varying hardware (like mentioned in [1])?


[1] http://www.datastax.com/dev/blog/virtual-nodes-in-cassandra-1-2


Thanks,
Jens
———
Jens Rantil
Backend engineer
Tink AB

Email: jens.rantil@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook Linkedin Twitter

Re: Heterogenous cluster and vnodes

Posted by Ben Bromhead <be...@instaclustr.com>.
> Hey,
> 
> I have a few of VM host (bare metal) machines with varying amounts of free hard drive space on them. For simplicity let’s say I have three machine like so:
>  * Machine 1
>   - Harddrive 1: 150 GB available.
>  * Machine 2:
>   - Harddrive 1: 150 GB available.
>   - Harddrive 2: 150 GB available.
>  * Machine 3.
>   - Harddrive 1: 150 GB available.
> 
> I am setting up a Cassandra cluster between them and as I see it I have two options:
> 
> 1. I set up one Cassandra node/VM per bare metal machine. I assign all free hard drive space to each Cassandra node and I balance the cluster using vnodes proportionally to the amount of free hard drive space (CPU/RAM is not going to be a bottle neck here).
> 
> 2. I set up four VMs, each running a Cassandra node with equal amount of hard drive space and equal amount of vnodes. Machine 2 runs two VMs.

This setup will potentially create a situation where if Machine 2 goes down you may lose two replicas. As the two VMs on Machine 2 might be replicas for the same key.

> 
> General question: Is any of these preferable to the other? I understand 1) yields lower high-availability (since nodes are on the same hardware).

Other way around (2 would be potentially lower availability)… Cassandra thinks two of the vm's are separate when they in fact rely on the same underlying machine.

> 
> Question about alternative 1: With varying vnodes, can I always be sure that replicas are never put on the same virtual machine?

Yes… mostly https://issues.apache.org/jira/browse/CASSANDRA-4123

> Or is varying vnodes really only useful/recommended when migrating from machines with varying hardware (like mentioned in [1])?

Changing the number of vnodes changes the portion of the ring a node is responsible for. You can use it to account for different types of hardware, you can also use it for creating awesome situations like hotspots if you aren't careful… ymmv.

At the end of the day I would throw out the extra hard drive / not use it / put more hard drives in the other machines. Why? Hard drives are cheap and your time as an admin for the cluster isn't. If you do add more hard drives you can also split out the commit log etc onto different disks.

I would take less problems over trying to draw every last scrap of performance out of the available hardware any day of the year. 


Ben Bromhead
Instaclustr | www.instaclustr.com | @instaclustr | +61 415 936 359