You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Edward Capriolo <ed...@gmail.com> on 2010/12/08 03:25:02 UTC

Running multiple instances on a single server --micrandra ??

I am quite ready to be stoned for this thread but I have been thinking
about this for a while and I just wanted to bounce these ideas of some
guru's.

Cassandra does allow multiple data directories, but as far as I can
tell no one runs in this configuration. This is something that is very
different between the hbase architecture and the Cassandra
architecture. HBase borrows the concept from hadoop of JBOD
configurations. HBase has many small ish (~256 MB) regions managed
with Zookeeper. Cassandra has a few (1 per node) large node sized
Token Ranges managed by Gossip consensus.

Lets say a node has 6 300 GB disks. You have the options of RAID5,
RAID6, RAID10, or RAID0. The problem I have found with these
configurations are major compactions (of even large minor ones) can
take a long time. Even if your disk is not heavily utilized this is a
lot of data to move through. Thus node joins take a long time. Node
moves take a long time.

The idea behind "micrandra" is for a 6 disk system run 6 instances of
Cassandra, one per disk. Use the RackAwareSnitch to make sure no
replicas live on the same node.

The downsides
1) we would have to manage 6x the instances of cassandra
2) we would have some overhead for each JVM.

The upsides ?
1) Since disk/instance failure only degrades the overall performance
1/6th (RAID0 you lost the entire node) (RAID5 still takes a hit when
down a disk)
2) Moves and joins have less work to do
3) Can scale up a single node by adding a single disk to an existing
system (assuming the ram and cpu is light)
4) OPP would be "easier" to balance out hot spots (maybe not on this
one in not an OPP)

What does everyone thing? Does it ever make sense to run this way?

Re: Running multiple instances on a single server --micrandra ??

Posted by Edward Capriolo <ed...@gmail.com>.
On Fri, Dec 10, 2010 at 11:39 PM, Edward Capriolo <ed...@gmail.com> wrote:
> On Thu, Dec 9, 2010 at 10:40 PM, Bill de hÓra <bi...@dehora.net> wrote:
>>
>>
>> On Tue, 2010-12-07 at 21:25 -0500, Edward Capriolo wrote:
>>
>> The idea behind "micrandra" is for a 6 disk system run 6 instances of
>> Cassandra, one per disk. Use the RackAwareSnitch to make sure no
>> replicas live on the same node.
>>
>> The downsides
>> 1) we would have to manage 6x the instances of cassandra
>> 2) we would have some overhead for each JVM.
>>
>> The upsides ?
>> 1) Since disk/instance failure only degrades the overall performance
>> 1/6th (RAID0 you lost the entire node) (RAID5 still takes a hit when
>> down a disk)
>> 2) Moves and joins have less work to do
>> 3) Can scale up a single node by adding a single disk to an existing
>> system (assuming the ram and cpu is light)
>> 4) OPP would be "easier" to balance out hot spots (maybe not on this
>> one in not an OPP)
>>
>> What does everyone thing? Does it ever make sense to run this way?
>>
>> It might for read heavy loads.
>>
>> When I looked at this, it was pointed out to me it's simpler to run fewer
>> bigger coarser nodes and take the entire node/server out when something goes
>> wrong. Basically give each Cassandra a server.
>>
>> I wonder if it would be better to rethink compaction if that's what's
>> driving the idea. It seems to what is biting everyone, along with GC.
>>
>> Bill
>
> Having 6 IP's on a machine would be a given in this setup. That is not
> an issue for me.
>
> It is not "biting" me. We all know that going from 10-20 nodes is
> pretty simple. However organic growth from 10-16, then a couple months
> later from 16 - 22, can take some effort with 300-600 GB per node,
> since each join and clean up can take a while. I am wondering if
> dividing a single large node into multiple smaller instances would
> make this type of growth easier.
>

To clearly explain the scenario. 5 nodes cluster each node has 20 %
ring. They each have 6 disks. ~ 200 GB data.
Going to 10 nodes is easy. You can join each one directly between each node.

However if you are going from say 5 -> 8. This gets dicey. Do you
calculate the ideal ring position for 10 nodes?
20% | 20% | 10% | 10% | 10% | 10% | 10% | 10%  This results in three
joins and several clean ups. With this choice you save time but hope
you do not get to the point where the first two nodes get overloaded.

If you decide to work with the ideal tokens for 8 you have many moves
joins. Until we have:

https://issues.apache.org/jira/browse/CASSANDRA-1418
https://issues.apache.org/jira/browse/CASSANDRA-1427

Having 6 smaller instances on a node with 6 disks. Would make it
easier to keep close to balanced without having to double your cluster
size each time you grow or doing a series of moves to get balanced
again.

Re: Running multiple instances on a single server --micrandra ??

Posted by Edward Capriolo <ed...@gmail.com>.
On Thu, Dec 9, 2010 at 10:40 PM, Bill de hÓra <bi...@dehora.net> wrote:
>
>
> On Tue, 2010-12-07 at 21:25 -0500, Edward Capriolo wrote:
>
> The idea behind "micrandra" is for a 6 disk system run 6 instances of
> Cassandra, one per disk. Use the RackAwareSnitch to make sure no
> replicas live on the same node.
>
> The downsides
> 1) we would have to manage 6x the instances of cassandra
> 2) we would have some overhead for each JVM.
>
> The upsides ?
> 1) Since disk/instance failure only degrades the overall performance
> 1/6th (RAID0 you lost the entire node) (RAID5 still takes a hit when
> down a disk)
> 2) Moves and joins have less work to do
> 3) Can scale up a single node by adding a single disk to an existing
> system (assuming the ram and cpu is light)
> 4) OPP would be "easier" to balance out hot spots (maybe not on this
> one in not an OPP)
>
> What does everyone thing? Does it ever make sense to run this way?
>
> It might for read heavy loads.
>
> When I looked at this, it was pointed out to me it's simpler to run fewer
> bigger coarser nodes and take the entire node/server out when something goes
> wrong. Basically give each Cassandra a server.
>
> I wonder if it would be better to rethink compaction if that's what's
> driving the idea. It seems to what is biting everyone, along with GC.
>
> Bill

Having 6 IP's on a machine would be a given in this setup. That is not
an issue for me.

It is not "biting" me. We all know that going from 10-20 nodes is
pretty simple. However organic growth from 10-16, then a couple months
later from 16 - 22, can take some effort with 300-600 GB per node,
since each join and clean up can take a while. I am wondering if
dividing a single large node into multiple smaller instances would
make this type of growth easier.

Re: Running multiple instances on a single server --micrandra ??

Posted by Bill de hÓra <bi...@dehora.net>.

On Tue, 2010-12-07 at 21:25 -0500, Edward Capriolo wrote:

> The idea behind "micrandra" is for a 6 disk system run 6 instances of
> Cassandra, one per disk. Use the RackAwareSnitch to make sure no
> replicas live on the same node.
> 
> The downsides
> 1) we would have to manage 6x the instances of cassandra
> 2) we would have some overhead for each JVM.
> 
> The upsides ?
> 1) Since disk/instance failure only degrades the overall performance
> 1/6th (RAID0 you lost the entire node) (RAID5 still takes a hit when
> down a disk)
> 2) Moves and joins have less work to do
> 3) Can scale up a single node by adding a single disk to an existing
> system (assuming the ram and cpu is light)
> 4) OPP would be "easier" to balance out hot spots (maybe not on this
> one in not an OPP)
> 
> What does everyone thing? Does it ever make sense to run this way?


It might for read heavy loads.

When I looked at this, it was pointed out to me it's simpler to run
fewer bigger coarser nodes and take the entire node/server out when
something goes wrong. Basically give each Cassandra a server.

I wonder if it would be better to rethink compaction if that's what's
driving the idea. It seems to what is biting everyone, along with GC.

Bill

Re: Running multiple instances on a single server --micrandra ??

Posted by Edward Capriolo <ed...@gmail.com>.
On Tue, Dec 14, 2010 at 8:52 AM, Gary Dusbabek <gd...@gmail.com> wrote:
> On Tue, Dec 7, 2010 at 20:25, Edward Capriolo <ed...@gmail.com> wrote:
>> I am quite ready to be stoned for this thread but I have been thinking
>> about this for a while and I just wanted to bounce these ideas of some
>> guru's.
>>
>> ...
>>
>> The upsides ?
>> 1) Since disk/instance failure only degrades the overall performance
>> 1/6th (RAID0 you lost the entire node) (RAID5 still takes a hit when
>> down a disk)
>> 2) Moves and joins have less work to do
>> 3) Can scale up a single node by adding a single disk to an existing
>> system (assuming the ram and cpu is light)
>> 4) OPP would be "easier" to balance out hot spots (maybe not on this
>> one in not an OPP)
>>
>
> Sorry for chiming in so late, but another benefit is that it amortizes
> stop-the-world garbage collection across 6 jvms.
>
>> What does everyone thing? Does it ever make sense to run this way?
>>
>
> I think it would be a great way of utilizing CPU and memory, assuming
> you can come up with the IO bandwidth.
>
> Gary.
>

The biggest deal I see is SSTables, bloom filters, and indexes are
less efficient with 4x 10GB entities verses 1x 40 GB entity . On the
flip side of this, 6x independent disks will be able to seek faster
then a RAID0 disk (because of queuing on the raid0). This might be a
big win since a person with more data then memory is doing much more
seeking.

Also a cool flip side about a "micrandra" would be taking a hot swap
disk between nodes, this would give you a blade server like effect,
again much more flexibility then a big RAID set. (and of course a
blade server would provide this functionality :)

Re: Running multiple instances on a single server --micrandra ??

Posted by Gary Dusbabek <gd...@gmail.com>.
On Tue, Dec 7, 2010 at 20:25, Edward Capriolo <ed...@gmail.com> wrote:
> I am quite ready to be stoned for this thread but I have been thinking
> about this for a while and I just wanted to bounce these ideas of some
> guru's.
>
> ...
>
> The upsides ?
> 1) Since disk/instance failure only degrades the overall performance
> 1/6th (RAID0 you lost the entire node) (RAID5 still takes a hit when
> down a disk)
> 2) Moves and joins have less work to do
> 3) Can scale up a single node by adding a single disk to an existing
> system (assuming the ram and cpu is light)
> 4) OPP would be "easier" to balance out hot spots (maybe not on this
> one in not an OPP)
>

Sorry for chiming in so late, but another benefit is that it amortizes
stop-the-world garbage collection across 6 jvms.

> What does everyone thing? Does it ever make sense to run this way?
>

I think it would be a great way of utilizing CPU and memory, assuming
you can come up with the IO bandwidth.

Gary.

Re: Running multiple instances on a single server --micrandra ??

Posted by Anand Somani <me...@gmail.com>.
Interesting idea, .

If it is like dividing the entire load on the system by 6, so if the
effective load is still the same and used SSD's for commit volume we could
get away with 1 commitlog SSD. Even if these 6 instances can handle 80% of
the load (compared to 1 on this machine), that might be acceptable. Could
that help?

I mean the benefits of smaller cassandra nodes does sound very enticing.
Sure we would probably have to throw more memory/CPU at it to get comparable
to 1 instance on that box (or reduce the load), but it does look better than
6 boxes.

On Tue, Dec 7, 2010 at 10:00 PM, Jonathan Ellis <jb...@gmail.com> wrote:

> The major downside is you're going to want to let each instance have
> its own dedicated commitlog spindle too, unless you just don't have
> many updates.
>
> On Tue, Dec 7, 2010 at 8:25 PM, Edward Capriolo <ed...@gmail.com>
> wrote:
> > I am quite ready to be stoned for this thread but I have been thinking
> > about this for a while and I just wanted to bounce these ideas of some
> > guru's.
> >
> > Cassandra does allow multiple data directories, but as far as I can
> > tell no one runs in this configuration. This is something that is very
> > different between the hbase architecture and the Cassandra
> > architecture. HBase borrows the concept from hadoop of JBOD
> > configurations. HBase has many small ish (~256 MB) regions managed
> > with Zookeeper. Cassandra has a few (1 per node) large node sized
> > Token Ranges managed by Gossip consensus.
> >
> > Lets say a node has 6 300 GB disks. You have the options of RAID5,
> > RAID6, RAID10, or RAID0. The problem I have found with these
> > configurations are major compactions (of even large minor ones) can
> > take a long time. Even if your disk is not heavily utilized this is a
> > lot of data to move through. Thus node joins take a long time. Node
> > moves take a long time.
> >
> > The idea behind "micrandra" is for a 6 disk system run 6 instances of
> > Cassandra, one per disk. Use the RackAwareSnitch to make sure no
> > replicas live on the same node.
> >
> > The downsides
> > 1) we would have to manage 6x the instances of cassandra
> > 2) we would have some overhead for each JVM.
> >
> > The upsides ?
> > 1) Since disk/instance failure only degrades the overall performance
> > 1/6th (RAID0 you lost the entire node) (RAID5 still takes a hit when
> > down a disk)
> > 2) Moves and joins have less work to do
> > 3) Can scale up a single node by adding a single disk to an existing
> > system (assuming the ram and cpu is light)
> > 4) OPP would be "easier" to balance out hot spots (maybe not on this
> > one in not an OPP)
> >
> > What does everyone thing? Does it ever make sense to run this way?
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>

Re: Running multiple instances on a single server --micrandra ??

Posted by Ryan King <ry...@twitter.com>.
Overall, I don't think this is a crazy idea, though I think I'd prefer
cassandra to manage this setup.

The problem you will run into is that because the storage port is
assumed to be the same across the cluster you'll only be able to do
this if you can assign multiple IPs to each server (one for each
process) (I know this because I proposed something similar last year
:)).

-ryan

On Tue, Dec 7, 2010 at 10:00 PM, Jonathan Ellis <jb...@gmail.com> wrote:
> The major downside is you're going to want to let each instance have
> its own dedicated commitlog spindle too, unless you just don't have
> many updates.
>
> On Tue, Dec 7, 2010 at 8:25 PM, Edward Capriolo <ed...@gmail.com> wrote:
>> I am quite ready to be stoned for this thread but I have been thinking
>> about this for a while and I just wanted to bounce these ideas of some
>> guru's.
>>
>> Cassandra does allow multiple data directories, but as far as I can
>> tell no one runs in this configuration. This is something that is very
>> different between the hbase architecture and the Cassandra
>> architecture. HBase borrows the concept from hadoop of JBOD
>> configurations. HBase has many small ish (~256 MB) regions managed
>> with Zookeeper. Cassandra has a few (1 per node) large node sized
>> Token Ranges managed by Gossip consensus.
>>
>> Lets say a node has 6 300 GB disks. You have the options of RAID5,
>> RAID6, RAID10, or RAID0. The problem I have found with these
>> configurations are major compactions (of even large minor ones) can
>> take a long time. Even if your disk is not heavily utilized this is a
>> lot of data to move through. Thus node joins take a long time. Node
>> moves take a long time.
>>
>> The idea behind "micrandra" is for a 6 disk system run 6 instances of
>> Cassandra, one per disk. Use the RackAwareSnitch to make sure no
>> replicas live on the same node.
>>
>> The downsides
>> 1) we would have to manage 6x the instances of cassandra
>> 2) we would have some overhead for each JVM.
>>
>> The upsides ?
>> 1) Since disk/instance failure only degrades the overall performance
>> 1/6th (RAID0 you lost the entire node) (RAID5 still takes a hit when
>> down a disk)
>> 2) Moves and joins have less work to do
>> 3) Can scale up a single node by adding a single disk to an existing
>> system (assuming the ram and cpu is light)
>> 4) OPP would be "easier" to balance out hot spots (maybe not on this
>> one in not an OPP)
>>
>> What does everyone thing? Does it ever make sense to run this way?
>>
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>

Re: Running multiple instances on a single server --micrandra ??

Posted by Jonathan Ellis <jb...@gmail.com>.
The major downside is you're going to want to let each instance have
its own dedicated commitlog spindle too, unless you just don't have
many updates.

On Tue, Dec 7, 2010 at 8:25 PM, Edward Capriolo <ed...@gmail.com> wrote:
> I am quite ready to be stoned for this thread but I have been thinking
> about this for a while and I just wanted to bounce these ideas of some
> guru's.
>
> Cassandra does allow multiple data directories, but as far as I can
> tell no one runs in this configuration. This is something that is very
> different between the hbase architecture and the Cassandra
> architecture. HBase borrows the concept from hadoop of JBOD
> configurations. HBase has many small ish (~256 MB) regions managed
> with Zookeeper. Cassandra has a few (1 per node) large node sized
> Token Ranges managed by Gossip consensus.
>
> Lets say a node has 6 300 GB disks. You have the options of RAID5,
> RAID6, RAID10, or RAID0. The problem I have found with these
> configurations are major compactions (of even large minor ones) can
> take a long time. Even if your disk is not heavily utilized this is a
> lot of data to move through. Thus node joins take a long time. Node
> moves take a long time.
>
> The idea behind "micrandra" is for a 6 disk system run 6 instances of
> Cassandra, one per disk. Use the RackAwareSnitch to make sure no
> replicas live on the same node.
>
> The downsides
> 1) we would have to manage 6x the instances of cassandra
> 2) we would have some overhead for each JVM.
>
> The upsides ?
> 1) Since disk/instance failure only degrades the overall performance
> 1/6th (RAID0 you lost the entire node) (RAID5 still takes a hit when
> down a disk)
> 2) Moves and joins have less work to do
> 3) Can scale up a single node by adding a single disk to an existing
> system (assuming the ram and cpu is light)
> 4) OPP would be "easier" to balance out hot spots (maybe not on this
> one in not an OPP)
>
> What does everyone thing? Does it ever make sense to run this way?
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com