You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Markus Jais <ma...@yahoo.de> on 2014/04/14 11:25:23 UTC

Replication Factor question

Hello,

currently reading the "Practical Cassandra". In the section about replication factors the book says:

"It is generally not recommended to set a replication factor of 3 if you have fewer than six nodes in a data center".

Why is that? What problems would arise if I had a replication factor of 3 and only 5 nodes?

Does that mean that for a replication of 4 I would need at least 8 nodes and for a factor of 5 at least 10 nodes?

Not saying that I would factor 5 andn 10 nodes, just curious about how this works.

All the best,

Markus

Re: Replication Factor question

Posted by Tupshin Harper <tu...@tupshin.com>.

With 3 nodes, and RF=3, you can always use CL=ALL if all nodes are up,
QUORUM if 1 node is down, and ONE if any two nodes are down.

The exact same thing is true if you have more nodes.

-Tupshin
On Apr 14, 2014 7:51 AM, "Markus Jais" <ma...@yahoo.de> wrote:

> Hi all,
>
> thanks. Very helpful.
>
> @Tupshin: With a 3 node cluster and RF 3 isn't it a problem if one node
> fails (due to hardware problems, for example). According to the C* docs,
> writes fail if the number of nodes is smaller than the RF.
> I agree that it will run fine as long as all nodes are up and they can
> handle the load but eventually hardware will fail.
>
> Markus
>
>
>
>
>
>   Tupshin Harper <tu...@tupshin.com> schrieb am 13:44 Montag, 14.April
> 2014:
>
> I do not agree with this advice.  It can be perfectly reasonable to have
> #nodes < 2*RF.
> It is common to deploy a 3 node cluster with RF=3 and it works fine as
> long as each node can handle 100% of your data, and keep up with the
> workload.
> -Tupshin
> On Apr 14, 2014 5:25 AM, "Markus Jais" <ma...@yahoo.de> wrote:
>
> Hello,
>
> currently reading the "Practical Cassandra". In the section about
> replication factors the book says:
>
> "It is generally not recommended to set a replication factor of 3 if you
> have fewer than six nodes in a data center".
>
> Why is that? What problems would arise if I had a replication factor of 3
> and only 5 nodes?
>
> Does that mean that for a replication of 4 I would need at least 8 nodes
> and for a factor of 5 at least 10 nodes?
>
> Not saying that I would factor 5 andn 10 nodes, just curious about how
> this works.
>
> All the best,
>
> Markus
>
>
>
>

Re: Replication Factor question

Posted by Markus Jais <ma...@yahoo.de>.

Hi all,

thanks. Very helpful.

@Tupshin: With a 3 node cluster and RF 3 isn't it a problem if one node fails (due to hardware problems, for example). According to the C* docs, writes fail if the number of nodes is smaller than the RF.
I agree that it will run fine as long as all nodes are up and they can handle the load but eventually hardware will fail.

Markus

Tupshin Harper <tu...@tupshin.com> schrieb am 13:44 Montag, 14.April 2014:

I do not agree with this advice.  It can be perfectly reasonable to have #nodes < 2*RF. 
>It is common to deploy a 3 node cluster with RF=3 and it works fine as long as each node can handle 100% of your data, and keep up with the workload. 
>-Tupshin 
>On Apr 14, 2014 5:25 AM, "Markus Jais" <ma...@yahoo.de> wrote:
>
>Hello,
>>
>>
>>currently reading the "Practical Cassandra". In the section about replication factors the book says:
>>
>>
>>"It is generally not recommended to set a replication factor of 3 if you have fewer than six nodes in a data center".
>>
>>
>>Why is that? What problems would arise if I had a replication factor of 3 and only 5 nodes?
>>
>>
>>Does that mean that for a replication of 4 I would need at least 8 nodes and for a factor of 5 at least 10 nodes?
>>
>>
>>Not saying that I would factor 5 andn 10 nodes, just curious about how this works.
>>
>>
>>All the best,
>>
>>
>>Markus
>
>

Re: Replication Factor question

Posted by Tupshin Harper <tu...@tupshin.com>.

I do not agree with this advice.  It can be perfectly reasonable to have
#nodes < 2*RF.

It is common to deploy a 3 node cluster with RF=3 and it works fine as long
as each node can handle 100% of your data, and keep up with the workload.

-Tupshin
On Apr 14, 2014 5:25 AM, "Markus Jais" <ma...@yahoo.de> wrote:

> Hello,
>
> currently reading the "Practical Cassandra". In the section about
> replication factors the book says:
>
> "It is generally not recommended to set a replication factor of 3 if you
> have fewer than six nodes in a data center".
>
> Why is that? What problems would arise if I had a replication factor of 3
> and only 5 nodes?
>
> Does that mean that for a replication of 4 I would need at least 8 nodes
> and for a factor of 5 at least 10 nodes?
>
> Not saying that I would factor 5 andn 10 nodes, just curious about how
> this works.
>
> All the best,
>
> Markus
>

Re: Replication Factor question

Posted by Robert Coli <rc...@eventbrite.com>.

On Wed, Apr 16, 2014 at 1:47 AM, Markus Jais <ma...@yahoo.de> wrote:

> thanks. How many nodes to you have running in those 5 racks and RF 5? Only
> 5 nodes or more?
>

While I haven't contemplated it too much, I'd think the absolute minimum
would be RF=N=5, sure. The "real minimum" with headroom would depend on
workload, but would probably be at least a few nodes greater than 5.

=Rob

Re: Replication Factor question

Posted by Markus Jais <ma...@yahoo.de>.

Hi Rob,

thanks. How many nodes to you have running in those 5 racks and RF 5? Only 5 nodes or more?

Markus

Robert Coli <rc...@eventbrite.com> schrieb am 20:36 Dienstag, 15.April 2014:

On Tue, Apr 15, 2014 at 6:14 AM, Ken Hancock <ke...@schange.com> wrote:
>
>Keep in mind if you lose the wrong two, you can't satisfy quorum.  In a 5-node cluster with RF=3, it would be impossible to lose 2 nodes without affecting quorum for at least some of your data. In a 6 node cluster, once you've lost one node, if you were to lose another, you only have a 1-in-5 chance of not affecting quorum for some of your data.
>>
>
>
>This is why the real highly available way to run Cassandra with QUORUM is RF=5, with 5 "racks".
>
>
>Briefly, any given node running a JVM based distributed application should be assumed to potentially become transiently unavailable for a short time, for example during long GC pauses or rolling restarts. There is also a chance of non-transient failure (hard down) at any time, and a much smaller chance of two simultaneous non-transient failures. If you have RF=3 and lose two nodes (one transient, the other non-transient) in a range, that range is now unavailable because quorum is 2 and 3-2 is 1, which is less than 2. If you have RF=5 and lose two nodes in the same way, quorum is 3 and 5-2 is 3, which is equal to 3.
>
>
>AFAICT, no one actually runs Cassandra in this way because keeping 5 copies of your already denormalized data seems excessive and is difficult to justify to management.
>
>
>=Rob
>
>

Re: Replication Factor question

Posted by Tupshin Harper <tu...@tupshin.com>.

It is not common,  but I know of multiple organizations running with RF=5,
in at least one DC, for HA reasons.

-Tupshin
On Apr 15, 2014 2:36 PM, "Robert Coli" <rc...@eventbrite.com> wrote:

> On Tue, Apr 15, 2014 at 6:14 AM, Ken Hancock <ke...@schange.com>wrote:
>
>> Keep in mind if you lose the wrong two, you can't satisfy quorum.  In a
>> 5-node cluster with RF=3, it would be impossible to lose 2 nodes without
>> affecting quorum for at least some of your data. In a 6 node cluster, once
>> you've lost one node, if you were to lose another, you only have a 1-in-5
>> chance of not affecting quorum for some of your data.
>>
>
> This is why the real highly available way to run Cassandra with QUORUM is
> RF=5, with 5 "racks".
>
> Briefly, any given node running a JVM based distributed application should
> be assumed to potentially become transiently unavailable for a short time,
> for example during long GC pauses or rolling restarts. There is also a
> chance of non-transient failure (hard down) at any time, and a much smaller
> chance of two simultaneous non-transient failures. If you have RF=3 and
> lose two nodes (one transient, the other non-transient) in a range, that
> range is now unavailable because quorum is 2 and 3-2 is 1, which is less
> than 2. If you have RF=5 and lose two nodes in the same way, quorum is 3
> and 5-2 is 3, which is equal to 3.
>
> AFAICT, no one actually runs Cassandra in this way because keeping 5
> copies of your already denormalized data seems excessive and is difficult
> to justify to management.
>
> =Rob
>
>

Re: Replication Factor question

Posted by Robert Coli <rc...@eventbrite.com>.

On Tue, Apr 15, 2014 at 6:14 AM, Ken Hancock <ke...@schange.com>wrote:

> Keep in mind if you lose the wrong two, you can't satisfy quorum.  In a
> 5-node cluster with RF=3, it would be impossible to lose 2 nodes without
> affecting quorum for at least some of your data. In a 6 node cluster, once
> you've lost one node, if you were to lose another, you only have a 1-in-5
> chance of not affecting quorum for some of your data.
>

This is why the real highly available way to run Cassandra with QUORUM is
RF=5, with 5 "racks".

Briefly, any given node running a JVM based distributed application should
be assumed to potentially become transiently unavailable for a short time,
for example during long GC pauses or rolling restarts. There is also a
chance of non-transient failure (hard down) at any time, and a much smaller
chance of two simultaneous non-transient failures. If you have RF=3 and
lose two nodes (one transient, the other non-transient) in a range, that
range is now unavailable because quorum is 2 and 3-2 is 1, which is less
than 2. If you have RF=5 and lose two nodes in the same way, quorum is 3
and 5-2 is 3, which is equal to 3.

AFAICT, no one actually runs Cassandra in this way because keeping 5 copies
of your already denormalized data seems excessive and is difficult to
justify to management.

=Rob

Re: Replication Factor question

Posted by Markus Jais <ma...@yahoo.de>.

Hi Ken,

thanks. Good point. 

Markus
Ken Hancock <ke...@schange.com> schrieb am 15:15 Dienstag, 15.April 2014:
 
Keep in mind if you lose the wrong two, you can't satisfy quorum.  In a 5-node cluster with RF=3, it would be impossible to lose 2 nodes without affecting quorum for at least some of your data. In a 6 node cluster, once you've lost one node, if you were to lose another, you only have a 1-in-5 chance of not affecting quorum for some of your data.
>
>In much larger clusters, it becomes less probable that you will lose multiple nodes within a RF group.
>
>
>
>
>
>
>
>
>On Tue, Apr 15, 2014 at 4:37 AM, Markus Jais <ma...@yahoo.de> wrote:
>
>Hi all,
>>
>>
>>thanks for your answers. Very helpful. We plan to use enough nodes so that the failure of 1 or 2 machines is no problem. E.g. for a workload to can be handled by 3 nodes all the time, we would use at least 5, better 6 nodes to survive the failure of at least 2 nodes, even when the 2 nodes fail at the same time. This should allow the cluster to rebuild the missing nodes and still serve all requests with a RF=3 and Quorum reads.
>>
>>
>>All the best,
>>
>>
>>Markus
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>Tupshin Harper <tu...@tupshin.com> schrieb am 21:23 Montag, 14.April 2014:
>> 
>>tl;dr make sure you have enough capacity in the event of node failure. For light workloads, that can be fulfilled with nodes=rf. 
>>>-Tupshin
>>>On Apr 14, 2014 2:35 PM, "Robert Coli" <rc...@eventbrite.com> wrote:
>>>
>>>On Mon, Apr 14, 2014 at 2:25 AM, Markus Jais <ma...@yahoo.de> wrote:
>>>>
>>>>"It is generally not recommended to set a replication factor of 3 if you have fewer than six nodes in a data center".
>>>>
>>>>
>>>>I have a detailed post about this somewhere in the archives of this list (which I can't seem to find right now..) but briefly, the "6-for-3" advice relates to the percentage of capacity you have remaining when you have a node down. It has become slightly less accurate over time because vnodes reduce bootstrap time and there have been other improvements to node startup time.
>>>>
>>>>
>>>>If you have fewer than 6 nodes with RF=3, you lose >1/6th of capacity when you lose a single node, which is a significant percentage of total cluster capacity. You then lose another meaningful percentage of your capacity when your existing nodes participate in rebuilding the missing node. If you are then unlucky enough to lose another node, you are missing a very significant percentage of your cluster capacity and have to use a relatively small fraction of it to rebuild the now two down nodes.
>>>>
>>>>
>>>>I wouldn't generalize the rule of thumb as "don't run under N=RF*2", but rather as "probably don't run RF=3 under about 6 nodes". IOW, in my view, the most operationally sane initial number of nodes for RF=3 is likely closer to 6 than 3.
>>>>
>>>>
>>>>=Rob
>>>>
>>>>
>>>
>>>
>
>
>
> 
> 
>
> 
>
> 
>
>

Re: Replication Factor question

Posted by Ken Hancock <ke...@schange.com>.

Keep in mind if you lose the wrong two, you can't satisfy quorum.  In a
5-node cluster with RF=3, it would be impossible to lose 2 nodes without
affecting quorum for at least some of your data. In a 6 node cluster, once
you've lost one node, if you were to lose another, you only have a 1-in-5
chance of not affecting quorum for some of your data.

In much larger clusters, it becomes less probable that you will lose
multiple nodes within a RF group.





On Tue, Apr 15, 2014 at 4:37 AM, Markus Jais <ma...@yahoo.de> wrote:

> Hi all,
>
> thanks for your answers. Very helpful. We plan to use enough nodes so that
> the failure of 1 or 2 machines is no problem. E.g. for a workload to can be
> handled by 3 nodes all the time, we would use at least 5, better 6 nodes to
> survive the failure of at least 2 nodes, even when the 2 nodes fail at the
> same time. This should allow the cluster to rebuild the missing nodes and
> still serve all requests with a RF=3 and Quorum reads.
>
> All the best,
>
> Markus
>
>
>
>
>
>   Tupshin Harper <tu...@tupshin.com> schrieb am 21:23 Montag, 14.April
> 2014:
>
> tl;dr make sure you have enough capacity in the event of node failure. For
> light workloads, that can be fulfilled with nodes=rf.
> -Tupshin
> On Apr 14, 2014 2:35 PM, "Robert Coli" <rc...@eventbrite.com> wrote:
>
> On Mon, Apr 14, 2014 at 2:25 AM, Markus Jais <ma...@yahoo.de> wrote:
>
> "It is generally not recommended to set a replication factor of 3 if you
> have fewer than six nodes in a data center".
>
>
> I have a detailed post about this somewhere in the archives of this list
> (which I can't seem to find right now..) but briefly, the "6-for-3" advice
> relates to the percentage of capacity you have remaining when you have a
> node down. It has become slightly less accurate over time because vnodes
> reduce bootstrap time and there have been other improvements to node
> startup time.
>
> If you have fewer than 6 nodes with RF=3, you lose >1/6th of capacity when
> you lose a single node, which is a significant percentage of total cluster
> capacity. You then lose another meaningful percentage of your capacity when
> your existing nodes participate in rebuilding the missing node. If you are
> then unlucky enough to lose another node, you are missing a very
> significant percentage of your cluster capacity and have to use a
> relatively small fraction of it to rebuild the now two down nodes.
>
> I wouldn't generalize the rule of thumb as "don't run under N=RF*2", but
> rather as "probably don't run RF=3 under about 6 nodes". IOW, in my view,
> the most operationally sane initial number of nodes for RF=3 is likely
> closer to 6 than 3.
>
> =Rob
>
>
>
>

Re: Replication Factor question

Posted by Markus Jais <ma...@yahoo.de>.

Hi all,

thanks for your answers. Very helpful. We plan to use enough nodes so that the failure of 1 or 2 machines is no problem. E.g. for a workload to can be handled by 3 nodes all the time, we would use at least 5, better 6 nodes to survive the failure of at least 2 nodes, even when the 2 nodes fail at the same time. This should allow the cluster to rebuild the missing nodes and still serve all requests with a RF=3 and Quorum reads.

All the best,

Markus





Tupshin Harper <tu...@tupshin.com> schrieb am 21:23 Montag, 14.April 2014:
 
tl;dr make sure you have enough capacity in the event of node failure. For light workloads, that can be fulfilled with nodes=rf. 
>-Tupshin
>On Apr 14, 2014 2:35 PM, "Robert Coli" <rc...@eventbrite.com> wrote:
>
>On Mon, Apr 14, 2014 at 2:25 AM, Markus Jais <ma...@yahoo.de> wrote:
>>
>>"It is generally not recommended to set a replication factor of 3 if you have fewer than six nodes in a data center".
>>
>>
>>I have a detailed post about this somewhere in the archives of this list (which I can't seem to find right now..) but briefly, the "6-for-3" advice relates to the percentage of capacity you have remaining when you have a node down. It has become slightly less accurate over time because vnodes reduce bootstrap time and there have been other improvements to node startup time.
>>
>>
>>If you have fewer than 6 nodes with RF=3, you lose >1/6th of capacity when you lose a single node, which is a significant percentage of total cluster capacity. You then lose another meaningful percentage of your capacity when your existing nodes participate in rebuilding the missing node. If you are then unlucky enough to lose another node, you are missing a very significant percentage of your cluster capacity and have to use a relatively small fraction of it to rebuild the now two down nodes.
>>
>>
>>I wouldn't generalize the rule of thumb as "don't run under N=RF*2", but rather as "probably don't run RF=3 under about 6 nodes". IOW, in my view, the most operationally sane initial number of nodes for RF=3 is likely closer to 6 than 3.
>>
>>
>>=Rob
>>
>>
>
>

Re: Replication Factor question

Posted by Tupshin Harper <tu...@tupshin.com>.

tl;dr make sure you have enough capacity in the event of node failure. For
light workloads, that can be fulfilled with nodes=rf.

-Tupshin
On Apr 14, 2014 2:35 PM, "Robert Coli" <rc...@eventbrite.com> wrote:

> On Mon, Apr 14, 2014 at 2:25 AM, Markus Jais <ma...@yahoo.de> wrote:
>
>> "It is generally not recommended to set a replication factor of 3 if you
>> have fewer than six nodes in a data center".
>>
>
> I have a detailed post about this somewhere in the archives of this list
> (which I can't seem to find right now..) but briefly, the "6-for-3" advice
> relates to the percentage of capacity you have remaining when you have a
> node down. It has become slightly less accurate over time because vnodes
> reduce bootstrap time and there have been other improvements to node
> startup time.
>
> If you have fewer than 6 nodes with RF=3, you lose >1/6th of capacity when
> you lose a single node, which is a significant percentage of total cluster
> capacity. You then lose another meaningful percentage of your capacity when
> your existing nodes participate in rebuilding the missing node. If you are
> then unlucky enough to lose another node, you are missing a very
> significant percentage of your cluster capacity and have to use a
> relatively small fraction of it to rebuild the now two down nodes.
>
> I wouldn't generalize the rule of thumb as "don't run under N=RF*2", but
> rather as "probably don't run RF=3 under about 6 nodes". IOW, in my view,
> the most operationally sane initial number of nodes for RF=3 is likely
> closer to 6 than 3.
>
> =Rob
>
>

Re: Replication Factor question

Posted by Robert Coli <rc...@eventbrite.com>.

On Mon, Apr 14, 2014 at 2:25 AM, Markus Jais <ma...@yahoo.de> wrote:

> "It is generally not recommended to set a replication factor of 3 if you
> have fewer than six nodes in a data center".
>

I have a detailed post about this somewhere in the archives of this list
(which I can't seem to find right now..) but briefly, the "6-for-3" advice
relates to the percentage of capacity you have remaining when you have a
node down. It has become slightly less accurate over time because vnodes
reduce bootstrap time and there have been other improvements to node
startup time.

If you have fewer than 6 nodes with RF=3, you lose >1/6th of capacity when
you lose a single node, which is a significant percentage of total cluster
capacity. You then lose another meaningful percentage of your capacity when
your existing nodes participate in rebuilding the missing node. If you are
then unlucky enough to lose another node, you are missing a very
significant percentage of your cluster capacity and have to use a
relatively small fraction of it to rebuild the now two down nodes.

I wouldn't generalize the rule of thumb as "don't run under N=RF*2", but
rather as "probably don't run RF=3 under about 6 nodes". IOW, in my view,
the most operationally sane initial number of nodes for RF=3 is likely
closer to 6 than 3.

=Rob

Re: Replication Factor question

Posted by Sergey Murylev <se...@gmail.com>.

Hi Markus,
> "It is generally not recommended to set a replication factor of 3 if
> you have fewer than six nodes in a data center".
Actually you can create a cluster with 3 nodes and replication level 3.
But in this case if one of them would fail cluster become inconsistent.
In this way minimum reasonable nodes number is 4 for replication level 3.
In this case we can tolerate single node failure. But in this situation
each node would contain 3/4 of all data. This is not very good. Number 6
is recommended because in this case each node contain 1/2 of all data,
this is quite adequate overhead.

Typically Cassandra clusters don't have big replication level, typically
it is 3 (failure of any single node don't crush cluster) or 5 (failure
of any two nodes don't crush cluster).

For more details you should look to replication level calculator
<http://www.ecyrd.com/cassandracalculator/>.

--
Thanks,
Sergey

On 14/04/14 13:25, Markus Jais wrote:
> Hello,
>
> currently reading the "Practical Cassandra". In the section about
> replication factors the book says:
>
> "It is generally not recommended to set a replication factor of 3 if
> you have fewer than six nodes in a data center".
>
> Why is that? What problems would arise if I had a replication factor
> of 3 and only 5 nodes?
>
> Does that mean that for a replication of 4 I would need at least 8
> nodes and for a factor of 5 at least 10 nodes?
>
> Not saying that I would factor 5 andn 10 nodes, just curious about how
> this works.
>
> All the best,
>
> Markus