You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Joe Hicks <jo...@gmail.com> on 2016/04/01 07:41:57 UTC

Cassandra Resource Planning

I am doing resource planning and could use some help. How many operations
people will I need to manage my Cassandra implementation for two sites with
10 nodes at each site? As, my cluster grows at what point will I need to
add another person?

Re: Cassandra Resource Planning

Posted by Alain RODRIGUEZ <ar...@gmail.com>.
Hi Joe,


> I am doing resource planning and could use some help.


I have been working alone 4 years operating a growing cluster (from 3 to
60+ nodes, from t1.micro instances to I2.2xlarge AWS instances), on the
biggest cluster, +2 other clusters and handling MySQL too :'(. I now joined
a team of Cassandra experts, so I worked in the 2 extremes situations.

So, first thing, it is doable, a guy alone can do this. And I probably
could have handled more nodes (+ I was doing MySQL schema management for
new features).
Second thing, it is a real PITA for one guy to be on its own operating a
production Cassandra cluster, really.

I would say that having a guy working alone is a bad idea. First because
when someone is digging a Cassandra issue, an external point of view is
often very enlightening. It is a complex system and discussing possible
solutions is often worth it. Here the community can help (here, IRC, ...).
Also, anytime your operational guy is out, who will handle operations? What
if your operator leaves your company? This can happen as a lot of people
want to recruit a good Cassandra operator.
My point is using Replication Factor of 2 or 3 for data (with extra cost
induced), and of 1 for people will produce a 'sort of Single Point of
Failure' in Cassandra. Cassandra often need to be 100% up (or close to it),
what happens if Cassandra start failing during your operator's 3 week long
holidays?

I worked during nights, holidays, Christmas, ... in the past, If we would
have been 2 of us, I would have done half (roughly) of the work, and not
during my time off. I might have then stayed longer in my previous company.
Be careful, having only one operator on a Cassandra cluster will probably
exhaust him quite quickly.

>From my own experience, I would say you should probably have a second
person as soon as you can (they can do something else than Cassandra half
of the time if needed at start). But I truly believe 2 people knowing and
able to act on Cassandra is good number to reach asap. If you don't want to
do it, at least make sure to have some other people in your team able to do
support of first level (restart nodes, monitor, understanding how Cassandra
work roughly, apply commands given by the operator - have him preparing
common troubleshooting) and/or consider using external support when your
operator is blocked, as he probably won't be able to answer everything on
his own.

Data is the beating heart of many businesses, and it is still often
under-provisioned (machine and people) as it is a cost, with no direct
income. Think about how much data availability / consistency / latency is
important to keep in a good state in your case and act accordingly :-).

Then, when there is a team of 2, you don't need to scale according to the
number of nodes (Netflix C* team use to be 2 or 3 people and they had 1000+
servers if I remember correctly). The whole things is making sure operators
can, and are encouraged to, automate common actions, script things as much
as they think it is useful. Then a few people can handle a lot of node, it
is far from being linearly related to the number of node.

How many operations people will I need to manage my Cassandra
> implementation for two sites with 10 nodes at each site? As, my cluster
> grows at what point will I need to add another person?


I would finally say that the number of operator needed might actually be
more related to the number of devs / tech team member you have. My team had
60 Devs, and me alone operating Cassandra, which I believe is ridiculous
and I don't recommend. More dev = more features, more modeling work, more
services hitting the database, etc. It also depends on the management /
automation systems in place: basically, if adding a node is a 5 min
operation for this guy or a 2 hours time operation (just to have the node
prepare), you obviously don't need the same amount of people there.

FWIW, here is a post I wrote that I believe might help your operator
handling your small cluster:
http://thelastpickle.com/blog/2016/03/21/running-commands-cluster-wide.html

Those are only personal thoughts and consideration due to my own
experience. Other might have other considerations, maybe see things from
other perspective than the Cassandra operator one (which is mine here). I
hope you will be kind to your operator and find him a friend to talk with!
I think it is better for both the company and for him.

C*heers,
-----------------------
Alain Rodriguez - alain@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

2016-04-01 7:41 GMT+02:00 Joe Hicks <jo...@gmail.com>:

> I am doing resource planning and could use some help. How many operations
> people will I need to manage my Cassandra implementation for two sites with
> 10 nodes at each site? As, my cluster grows at what point will I need to
> add another person?
>