You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Rakesh Kumar <ra...@gmail.com> on 2016/03/25 16:40:50 UTC

How many nodes do we require

We have two data centers. Our requirement is simple

Assuming that we have equal number of nodes in each DC we should be able to run with the loss of one DC and loss of at most one node in the surviving DC. Can this be achieved with 6 nodes (3 in each). Obviously for that all data must be available  in any two nodes. Any pointers on replication factor. 

Thanks

--
Sent from mobile.

Re: How many nodes do we require

Posted by Alain RODRIGUEZ <ar...@gmail.com>.

@Rakesh:

Are you telling a SimpleReplication topology with RF=3
> or NetworkTopology with RF=3.


Just always go with NetworkTopology, I see no reason not doing it nowadays,
even on test clusters. If you use SimpleReplicationTopology, all the
machines will be considered as only one datacenter, no matters their
localisation or configuration which can lead to a lot of issues.

C*heers,
-----------------------
Alain Rodriguez - alain@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

2016-03-31 14:03 GMT+02:00 Alain RODRIGUEZ <ar...@gmail.com>:

> Hi,
>
> Because if you lose a node you have chances to lose some data forever if
>> it was not yet replicated.
>
>
> I think I get your point, but keep in mind that CL ONE (or LOCAL_ONE) will
> not prevent the coordinator from sending the data to the 2 other replicas,
> it will just wait for the first ack, but all the nodes are supposed to have
> the write. There are also the hints system, read repairs and repairs. If
> none of this work, the problem is probably bigger than just using CL ONE.
>
> I would advice using QUORUM if a high consistency is important to you. To
> spare some space using RF = 2 or to focus at low latency over consistency.
> Be carful then, loosing consistency means that returned data might change
> depending on the node you hit, if some nodes lose information and entropy
> start being high, same query running twice at the same time can return 2
> distinct values depending where they read from. Generally, using RF = 3 and
> CL = LOCAL_QUORUM for both reads and writes is the best (safest?) option,
> but some people happily run with RF = 2 and CL = LOCAL_ONE.
>
> we should be able to run with the loss of one DC and loss of at most one
>> node in the surviving DC
>
>
> You can parameter that on client side on modern clients using the native
> protocol and there are some consistency considerations there too.
>
> Hope I am correct about all that :-).
>
> C*heers,
> -----------------------
> Alain Rodriguez - alain@thelastpickle.com
> France
>
> The Last Pickle - Apache Cassandra Consulting
> http://www.thelastpickle.com
>
> 2016-03-29 9:56 GMT+02:00 Jacques-Henri Berthemet <
> jacques-henri.berthemet@genesys.com>:
>
>> Because if you lose a node you have chances to lose some data forever if
>> it was not yet replicated.
>>
>>
>>
>> *--*
>>
>> *Jacques-Henri Berthemet*
>>
>>
>>
>> *From:* Jonathan Haddad [mailto:jon@jonhaddad.com]
>> *Sent:* vendredi 25 mars 2016 19:37
>>
>> *To:* user@cassandra.apache.org
>> *Subject:* Re: How many nodes do we require
>>
>>
>>
>> Why would using CL-ONE make your cluster fragile? This isn't obvious to
>> me. It's the most practical setting for high availability, which very much
>> says "not fragile".
>>
>> On Fri, Mar 25, 2016 at 10:44 AM Jacques-Henri Berthemet <
>> jacques-henri.berthemet@genesys.com> wrote:
>>
>> I found this calculator very convenient:
>> http://www.ecyrd.com/cassandracalculator/
>>
>> Regardless of your other DCs you need RF=3 if you write at LOCAL_QUORUM,
>> RF=2 if you write/read at ONE.
>>
>> Obviously using ONE as CL makes your cluster very fragile.
>> --
>> Jacques-Henri Berthemet
>>
>>
>> -----Original Message-----
>> From: Rakesh Kumar [mailto:rakeshkumar464a3@gmail.com]
>> Sent: vendredi 25 mars 2016 18:14
>> To: user@cassandra.apache.org
>> Subject: Re: How many nodes do we require
>>
>> On Fri, Mar 25, 2016 at 11:45 AM, Jack Krupansky
>> <ja...@gmail.com> wrote:
>> > It depends on how much data you have. A single node can store a lot of
>> data,
>> > but the more data you have the longer a repair or node replacement will
>> > take. How long can you tolerate for a full repair or node replacement?
>>
>> At this time, for a foreseeable future, size of data will not be
>> significant. So we can safely disregard the above as a decision
>> factor.
>>
>> >
>> > Generally, RF=3 is both sufficient and recommended.
>>
>> Are you telling a SimpleReplication topology with RF=3
>> or NetworkTopology with RF=3.
>>
>>
>> taken from:
>>
>>
>> https://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureDataDistributeReplication_c.html
>>
>> "
>> Three replicas in each data center: This configuration tolerates
>> either the failure of a one node per replication group at a strong
>> consistency level of LOCAL_QUORUM or multiple node failures per data
>> center using consistency level ONE."
>>
>> In our case, with only 3 nodes in each DC, wouldn't a RF=3 effectively
>> mean ALL.
>>
>> I will state our requirement clearly:
>>
>> If we are going with six nodes (3 in each DC), we should be able to
>> write even with a loss of one DC and loss of one node of the surviving
>> DC. I am open to hearing what compromise we have to do with the reads
>> during the time a DC is down. For us write is critical, more than
>> reads.
>>
>> May be this is not possible with 6 nodes, and requires more.  Pls advise.
>>
>>
>

Re: How many nodes do we require

Posted by Alain RODRIGUEZ <ar...@gmail.com>.

Hi,

Because if you lose a node you have chances to lose some data forever if it
> was not yet replicated.


I think I get your point, but keep in mind that CL ONE (or LOCAL_ONE) will
not prevent the coordinator from sending the data to the 2 other replicas,
it will just wait for the first ack, but all the nodes are supposed to have
the write. There are also the hints system, read repairs and repairs. If
none of this work, the problem is probably bigger than just using CL ONE.

I would advice using QUORUM if a high consistency is important to you. To
spare some space using RF = 2 or to focus at low latency over consistency.
Be carful then, loosing consistency means that returned data might change
depending on the node you hit, if some nodes lose information and entropy
start being high, same query running twice at the same time can return 2
distinct values depending where they read from. Generally, using RF = 3 and
CL = LOCAL_QUORUM for both reads and writes is the best (safest?) option,
but some people happily run with RF = 2 and CL = LOCAL_ONE.

we should be able to run with the loss of one DC and loss of at most one
> node in the surviving DC


You can parameter that on client side on modern clients using the native
protocol and there are some consistency considerations there too.

Hope I am correct about all that :-).

C*heers,
-----------------------
Alain Rodriguez - alain@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

2016-03-29 9:56 GMT+02:00 Jacques-Henri Berthemet <
jacques-henri.berthemet@genesys.com>:

> Because if you lose a node you have chances to lose some data forever if
> it was not yet replicated.
>
>
>
> *--*
>
> *Jacques-Henri Berthemet*
>
>
>
> *From:* Jonathan Haddad [mailto:jon@jonhaddad.com]
> *Sent:* vendredi 25 mars 2016 19:37
>
> *To:* user@cassandra.apache.org
> *Subject:* Re: How many nodes do we require
>
>
>
> Why would using CL-ONE make your cluster fragile? This isn't obvious to
> me. It's the most practical setting for high availability, which very much
> says "not fragile".
>
> On Fri, Mar 25, 2016 at 10:44 AM Jacques-Henri Berthemet <
> jacques-henri.berthemet@genesys.com> wrote:
>
> I found this calculator very convenient:
> http://www.ecyrd.com/cassandracalculator/
>
> Regardless of your other DCs you need RF=3 if you write at LOCAL_QUORUM,
> RF=2 if you write/read at ONE.
>
> Obviously using ONE as CL makes your cluster very fragile.
> --
> Jacques-Henri Berthemet
>
>
> -----Original Message-----
> From: Rakesh Kumar [mailto:rakeshkumar464a3@gmail.com]
> Sent: vendredi 25 mars 2016 18:14
> To: user@cassandra.apache.org
> Subject: Re: How many nodes do we require
>
> On Fri, Mar 25, 2016 at 11:45 AM, Jack Krupansky
> <ja...@gmail.com> wrote:
> > It depends on how much data you have. A single node can store a lot of
> data,
> > but the more data you have the longer a repair or node replacement will
> > take. How long can you tolerate for a full repair or node replacement?
>
> At this time, for a foreseeable future, size of data will not be
> significant. So we can safely disregard the above as a decision
> factor.
>
> >
> > Generally, RF=3 is both sufficient and recommended.
>
> Are you telling a SimpleReplication topology with RF=3
> or NetworkTopology with RF=3.
>
>
> taken from:
>
>
> https://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureDataDistributeReplication_c.html
>
> "
> Three replicas in each data center: This configuration tolerates
> either the failure of a one node per replication group at a strong
> consistency level of LOCAL_QUORUM or multiple node failures per data
> center using consistency level ONE."
>
> In our case, with only 3 nodes in each DC, wouldn't a RF=3 effectively
> mean ALL.
>
> I will state our requirement clearly:
>
> If we are going with six nodes (3 in each DC), we should be able to
> write even with a loss of one DC and loss of one node of the surviving
> DC. I am open to hearing what compromise we have to do with the reads
> during the time a DC is down. For us write is critical, more than
> reads.
>
> May be this is not possible with 6 nodes, and requires more.  Pls advise.
>
>

Re: How many nodes do we require

Posted by Jack Krupansky <ja...@gmail.com>.

Maybe that's a great definition of a modern distributed cluster: each
person (node) has a different notion of priority.

I'll wait for the next user email in which they complain that their data is
"too stable" (missing updates.)

-- Jack Krupansky

On Thu, Mar 31, 2016 at 12:04 PM, Jacques-Henri Berthemet <
jacques-henri.berthemet@genesys.com> wrote:

> You’re right. I meant about data integrity, I understand it’s not
> everybody’s priority!
>
>
>
> *--*
>
> *Jacques-Henri Berthemet*
>
>
>
> *From:* Jonathan Haddad [mailto:jon@jonhaddad.com]
> *Sent:* jeudi 31 mars 2016 17:48
>
> *To:* user@cassandra.apache.org
> *Subject:* Re: How many nodes do we require
>
>
>
> Losing a write is very different from having a fragile cluster.  A fragile
> cluster implies that whole thing will fall apart, that it breaks easily.
> Writing at CL=ONE gives you a pretty damn stable cluster at the potential
> risk of losing a write that hasn't replicated (but has been ack'ed) which
> for a lot of people is preferable to downtime.  CL=ONE gives you the *most
> stable* cluster you can have.
>
> On Tue, Mar 29, 2016 at 12:57 AM Jacques-Henri Berthemet <
> jacques-henri.berthemet@genesys.com> wrote:
>
> Because if you lose a node you have chances to lose some data forever if
> it was not yet replicated.
>
>
>
> *--*
>
> *Jacques-Henri Berthemet*
>
>
>
> *From:* Jonathan Haddad [mailto:jon@jonhaddad.com]
> *Sent:* vendredi 25 mars 2016 19:37
>
>
> *To:* user@cassandra.apache.org
> *Subject:* Re: How many nodes do we require
>
>
>
> Why would using CL-ONE make your cluster fragile? This isn't obvious to
> me. It's the most practical setting for high availability, which very much
> says "not fragile".
>
> On Fri, Mar 25, 2016 at 10:44 AM Jacques-Henri Berthemet <
> jacques-henri.berthemet@genesys.com> wrote:
>
> I found this calculator very convenient:
> http://www.ecyrd.com/cassandracalculator/
>
> Regardless of your other DCs you need RF=3 if you write at LOCAL_QUORUM,
> RF=2 if you write/read at ONE.
>
> Obviously using ONE as CL makes your cluster very fragile.
> --
> Jacques-Henri Berthemet
>
>
> -----Original Message-----
> From: Rakesh Kumar [mailto:rakeshkumar464a3@gmail.com]
> Sent: vendredi 25 mars 2016 18:14
> To: user@cassandra.apache.org
> Subject: Re: How many nodes do we require
>
> On Fri, Mar 25, 2016 at 11:45 AM, Jack Krupansky
> <ja...@gmail.com> wrote:
> > It depends on how much data you have. A single node can store a lot of
> data,
> > but the more data you have the longer a repair or node replacement will
> > take. How long can you tolerate for a full repair or node replacement?
>
> At this time, for a foreseeable future, size of data will not be
> significant. So we can safely disregard the above as a decision
> factor.
>
> >
> > Generally, RF=3 is both sufficient and recommended.
>
> Are you telling a SimpleReplication topology with RF=3
> or NetworkTopology with RF=3.
>
>
> taken from:
>
>
> https://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureDataDistributeReplication_c.html
>
> "
> Three replicas in each data center: This configuration tolerates
> either the failure of a one node per replication group at a strong
> consistency level of LOCAL_QUORUM or multiple node failures per data
> center using consistency level ONE."
>
> In our case, with only 3 nodes in each DC, wouldn't a RF=3 effectively
> mean ALL.
>
> I will state our requirement clearly:
>
> If we are going with six nodes (3 in each DC), we should be able to
> write even with a loss of one DC and loss of one node of the surviving
> DC. I am open to hearing what compromise we have to do with the reads
> during the time a DC is down. For us write is critical, more than
> reads.
>
> May be this is not possible with 6 nodes, and requires more.  Pls advise.
>
>

RE: How many nodes do we require

Posted by Jacques-Henri Berthemet <ja...@genesys.com>.

You’re right. I meant about data integrity, I understand it’s not everybody’s priority!

--
Jacques-Henri Berthemet

From: Jonathan Haddad [mailto:jon@jonhaddad.com]
Sent: jeudi 31 mars 2016 17:48
To: user@cassandra.apache.org
Subject: Re: How many nodes do we require

Losing a write is very different from having a fragile cluster.  A fragile cluster implies that whole thing will fall apart, that it breaks easily.  Writing at CL=ONE gives you a pretty damn stable cluster at the potential risk of losing a write that hasn't replicated (but has been ack'ed) which for a lot of people is preferable to downtime.  CL=ONE gives you the *most stable* cluster you can have.
On Tue, Mar 29, 2016 at 12:57 AM Jacques-Henri Berthemet <ja...@genesys.com>> wrote:
Because if you lose a node you have chances to lose some data forever if it was not yet replicated.

--
Jacques-Henri Berthemet

From: Jonathan Haddad [mailto:jon@jonhaddad.com<ma...@jonhaddad.com>]
Sent: vendredi 25 mars 2016 19:37

To: user@cassandra.apache.org<ma...@cassandra.apache.org>
Subject: Re: How many nodes do we require

Why would using CL-ONE make your cluster fragile? This isn't obvious to me. It's the most practical setting for high availability, which very much says "not fragile".
On Fri, Mar 25, 2016 at 10:44 AM Jacques-Henri Berthemet <ja...@genesys.com>> wrote:
I found this calculator very convenient:
http://www.ecyrd.com/cassandracalculator/

Regardless of your other DCs you need RF=3 if you write at LOCAL_QUORUM, RF=2 if you write/read at ONE.

Obviously using ONE as CL makes your cluster very fragile.
--
Jacques-Henri Berthemet

-----Original Message-----
From: Rakesh Kumar [mailto:rakeshkumar464a3@gmail.com<ma...@gmail.com>]
Sent: vendredi 25 mars 2016 18:14
To: user@cassandra.apache.org<ma...@cassandra.apache.org>
Subject: Re: How many nodes do we require

On Fri, Mar 25, 2016 at 11:45 AM, Jack Krupansky
<ja...@gmail.com>> wrote:
> It depends on how much data you have. A single node can store a lot of data,
> but the more data you have the longer a repair or node replacement will
> take. How long can you tolerate for a full repair or node replacement?

At this time, for a foreseeable future, size of data will not be
significant. So we can safely disregard the above as a decision
factor.

>
> Generally, RF=3 is both sufficient and recommended.

Are you telling a SimpleReplication topology with RF=3
or NetworkTopology with RF=3.

taken from:

https://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureDataDistributeReplication_c.html

"
Three replicas in each data center: This configuration tolerates
either the failure of a one node per replication group at a strong
consistency level of LOCAL_QUORUM or multiple node failures per data
center using consistency level ONE."

In our case, with only 3 nodes in each DC, wouldn't a RF=3 effectively mean ALL.

I will state our requirement clearly:

If we are going with six nodes (3 in each DC), we should be able to
write even with a loss of one DC and loss of one node of the surviving
DC. I am open to hearing what compromise we have to do with the reads
during the time a DC is down. For us write is critical, more than
reads.

May be this is not possible with 6 nodes, and requires more.  Pls advise.

Re: How many nodes do we require

Posted by Jonathan Haddad <jo...@jonhaddad.com>.

Losing a write is very different from having a fragile cluster.  A fragile
cluster implies that whole thing will fall apart, that it breaks easily.
Writing at CL=ONE gives you a pretty damn stable cluster at the potential
risk of losing a write that hasn't replicated (but has been ack'ed) which
for a lot of people is preferable to downtime.  CL=ONE gives you the *most
stable* cluster you can have.

On Tue, Mar 29, 2016 at 12:57 AM Jacques-Henri Berthemet <
jacques-henri.berthemet@genesys.com> wrote:

> Because if you lose a node you have chances to lose some data forever if
> it was not yet replicated.
>
>
>
> *--*
>
> *Jacques-Henri Berthemet*
>
>
>
> *From:* Jonathan Haddad [mailto:jon@jonhaddad.com]
> *Sent:* vendredi 25 mars 2016 19:37
>
>
> *To:* user@cassandra.apache.org
> *Subject:* Re: How many nodes do we require
>
>
>
> Why would using CL-ONE make your cluster fragile? This isn't obvious to
> me. It's the most practical setting for high availability, which very much
> says "not fragile".
>
> On Fri, Mar 25, 2016 at 10:44 AM Jacques-Henri Berthemet <
> jacques-henri.berthemet@genesys.com> wrote:
>
> I found this calculator very convenient:
> http://www.ecyrd.com/cassandracalculator/
>
> Regardless of your other DCs you need RF=3 if you write at LOCAL_QUORUM,
> RF=2 if you write/read at ONE.
>
> Obviously using ONE as CL makes your cluster very fragile.
> --
> Jacques-Henri Berthemet
>
>
> -----Original Message-----
> From: Rakesh Kumar [mailto:rakeshkumar464a3@gmail.com]
> Sent: vendredi 25 mars 2016 18:14
> To: user@cassandra.apache.org
> Subject: Re: How many nodes do we require
>
> On Fri, Mar 25, 2016 at 11:45 AM, Jack Krupansky
> <ja...@gmail.com> wrote:
> > It depends on how much data you have. A single node can store a lot of
> data,
> > but the more data you have the longer a repair or node replacement will
> > take. How long can you tolerate for a full repair or node replacement?
>
> At this time, for a foreseeable future, size of data will not be
> significant. So we can safely disregard the above as a decision
> factor.
>
> >
> > Generally, RF=3 is both sufficient and recommended.
>
> Are you telling a SimpleReplication topology with RF=3
> or NetworkTopology with RF=3.
>
>
> taken from:
>
>
> https://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureDataDistributeReplication_c.html
>
> "
> Three replicas in each data center: This configuration tolerates
> either the failure of a one node per replication group at a strong
> consistency level of LOCAL_QUORUM or multiple node failures per data
> center using consistency level ONE."
>
> In our case, with only 3 nodes in each DC, wouldn't a RF=3 effectively
> mean ALL.
>
> I will state our requirement clearly:
>
> If we are going with six nodes (3 in each DC), we should be able to
> write even with a loss of one DC and loss of one node of the surviving
> DC. I am open to hearing what compromise we have to do with the reads
> during the time a DC is down. For us write is critical, more than
> reads.
>
> May be this is not possible with 6 nodes, and requires more.  Pls advise.
>
>

RE: How many nodes do we require

Posted by Jacques-Henri Berthemet <ja...@genesys.com>.

Because if you lose a node you have chances to lose some data forever if it was not yet replicated.

--
Jacques-Henri Berthemet

From: Jonathan Haddad [mailto:jon@jonhaddad.com]
Sent: vendredi 25 mars 2016 19:37
To: user@cassandra.apache.org
Subject: Re: How many nodes do we require

Why would using CL-ONE make your cluster fragile? This isn't obvious to me. It's the most practical setting for high availability, which very much says "not fragile".
On Fri, Mar 25, 2016 at 10:44 AM Jacques-Henri Berthemet <ja...@genesys.com>> wrote:
I found this calculator very convenient:
http://www.ecyrd.com/cassandracalculator/

Regardless of your other DCs you need RF=3 if you write at LOCAL_QUORUM, RF=2 if you write/read at ONE.

Obviously using ONE as CL makes your cluster very fragile.
--
Jacques-Henri Berthemet

-----Original Message-----
From: Rakesh Kumar [mailto:rakeshkumar464a3@gmail.com<ma...@gmail.com>]
Sent: vendredi 25 mars 2016 18:14
To: user@cassandra.apache.org<ma...@cassandra.apache.org>
Subject: Re: How many nodes do we require

On Fri, Mar 25, 2016 at 11:45 AM, Jack Krupansky
<ja...@gmail.com>> wrote:
> It depends on how much data you have. A single node can store a lot of data,
> but the more data you have the longer a repair or node replacement will
> take. How long can you tolerate for a full repair or node replacement?

At this time, for a foreseeable future, size of data will not be
significant. So we can safely disregard the above as a decision
factor.

>
> Generally, RF=3 is both sufficient and recommended.

Are you telling a SimpleReplication topology with RF=3
or NetworkTopology with RF=3.

taken from:

https://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureDataDistributeReplication_c.html

"
Three replicas in each data center: This configuration tolerates
either the failure of a one node per replication group at a strong
consistency level of LOCAL_QUORUM or multiple node failures per data
center using consistency level ONE."

In our case, with only 3 nodes in each DC, wouldn't a RF=3 effectively mean ALL.

I will state our requirement clearly:

If we are going with six nodes (3 in each DC), we should be able to
write even with a loss of one DC and loss of one node of the surviving
DC. I am open to hearing what compromise we have to do with the reads
during the time a DC is down. For us write is critical, more than
reads.

May be this is not possible with 6 nodes, and requires more.  Pls advise.

Re: How many nodes do we require

Posted by Jonathan Haddad <jo...@jonhaddad.com>.

Why would using CL-ONE make your cluster fragile? This isn't obvious to me.
It's the most practical setting for high availability, which very much says
"not fragile".
On Fri, Mar 25, 2016 at 10:44 AM Jacques-Henri Berthemet <
jacques-henri.berthemet@genesys.com> wrote:

> I found this calculator very convenient:
> http://www.ecyrd.com/cassandracalculator/
>
> Regardless of your other DCs you need RF=3 if you write at LOCAL_QUORUM,
> RF=2 if you write/read at ONE.
>
> Obviously using ONE as CL makes your cluster very fragile.
> --
> Jacques-Henri Berthemet
>
>
> -----Original Message-----
> From: Rakesh Kumar [mailto:rakeshkumar464a3@gmail.com]
> Sent: vendredi 25 mars 2016 18:14
> To: user@cassandra.apache.org
> Subject: Re: How many nodes do we require
>
> On Fri, Mar 25, 2016 at 11:45 AM, Jack Krupansky
> <ja...@gmail.com> wrote:
> > It depends on how much data you have. A single node can store a lot of
> data,
> > but the more data you have the longer a repair or node replacement will
> > take. How long can you tolerate for a full repair or node replacement?
>
> At this time, for a foreseeable future, size of data will not be
> significant. So we can safely disregard the above as a decision
> factor.
>
> >
> > Generally, RF=3 is both sufficient and recommended.
>
> Are you telling a SimpleReplication topology with RF=3
> or NetworkTopology with RF=3.
>
>
> taken from:
>
>
> https://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureDataDistributeReplication_c.html
>
> "
> Three replicas in each data center: This configuration tolerates
> either the failure of a one node per replication group at a strong
> consistency level of LOCAL_QUORUM or multiple node failures per data
> center using consistency level ONE."
>
> In our case, with only 3 nodes in each DC, wouldn't a RF=3 effectively
> mean ALL.
>
> I will state our requirement clearly:
>
> If we are going with six nodes (3 in each DC), we should be able to
> write even with a loss of one DC and loss of one node of the surviving
> DC. I am open to hearing what compromise we have to do with the reads
> during the time a DC is down. For us write is critical, more than
> reads.
>
> May be this is not possible with 6 nodes, and requires more.  Pls advise.
>
>

RE: How many nodes do we require

Posted by Jacques-Henri Berthemet <ja...@genesys.com>.

I found this calculator very convenient:
http://www.ecyrd.com/cassandracalculator/

Regardless of your other DCs you need RF=3 if you write at LOCAL_QUORUM, RF=2 if you write/read at ONE.

Obviously using ONE as CL makes your cluster very fragile.
--
Jacques-Henri Berthemet

-----Original Message-----
From: Rakesh Kumar [mailto:rakeshkumar464a3@gmail.com] 
Sent: vendredi 25 mars 2016 18:14
To: user@cassandra.apache.org
Subject: Re: How many nodes do we require

On Fri, Mar 25, 2016 at 11:45 AM, Jack Krupansky
<ja...@gmail.com> wrote:
> It depends on how much data you have. A single node can store a lot of data,
> but the more data you have the longer a repair or node replacement will
> take. How long can you tolerate for a full repair or node replacement?

At this time, for a foreseeable future, size of data will not be
significant. So we can safely disregard the above as a decision
factor.

>
> Generally, RF=3 is both sufficient and recommended.

Are you telling a SimpleReplication topology with RF=3
or NetworkTopology with RF=3.

taken from:

https://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureDataDistributeReplication_c.html

"
Three replicas in each data center: This configuration tolerates
either the failure of a one node per replication group at a strong
consistency level of LOCAL_QUORUM or multiple node failures per data
center using consistency level ONE."

In our case, with only 3 nodes in each DC, wouldn't a RF=3 effectively mean ALL.

I will state our requirement clearly:

If we are going with six nodes (3 in each DC), we should be able to
write even with a loss of one DC and loss of one node of the surviving
DC. I am open to hearing what compromise we have to do with the reads
during the time a DC is down. For us write is critical, more than
reads.

May be this is not possible with 6 nodes, and requires more.  Pls advise.

Re: How many nodes do we require

Posted by Rakesh Kumar <ra...@gmail.com>.

On Fri, Mar 25, 2016 at 11:45 AM, Jack Krupansky
<ja...@gmail.com> wrote:
> It depends on how much data you have. A single node can store a lot of data,
> but the more data you have the longer a repair or node replacement will
> take. How long can you tolerate for a full repair or node replacement?

At this time, for a foreseeable future, size of data will not be
significant. So we can safely disregard the above as a decision
factor.

>
> Generally, RF=3 is both sufficient and recommended.

Are you telling a SimpleReplication topology with RF=3
or NetworkTopology with RF=3.

taken from:

https://docs.datastax.com/en/cassandra/2.0/cassandra/architecture/architectureDataDistributeReplication_c.html

"
Three replicas in each data center: This configuration tolerates
either the failure of a one node per replication group at a strong
consistency level of LOCAL_QUORUM or multiple node failures per data
center using consistency level ONE."

In our case, with only 3 nodes in each DC, wouldn't a RF=3 effectively mean ALL.

I will state our requirement clearly:

If we are going with six nodes (3 in each DC), we should be able to
write even with a loss of one DC and loss of one node of the surviving
DC. I am open to hearing what compromise we have to do with the reads
during the time a DC is down. For us write is critical, more than
reads.

May be this is not possible with 6 nodes, and requires more.  Pls advise.

Re: How many nodes do we require

Posted by Jack Krupansky <ja...@gmail.com>.

It depends on how much data you have. A single node can store a lot of
data, but the more data you have the longer a repair or node replacement
will take. How long can you tolerate for a full repair or node replacement?

Generally, RF=3 is both sufficient and recommended.

-- Jack Krupansky

On Fri, Mar 25, 2016 at 11:40 AM, Rakesh Kumar <ra...@gmail.com>
wrote:

> We have two data centers. Our requirement is simple
>
> Assuming that we have equal number of nodes in each DC we should be able
> to run with the loss of one DC and loss of at most one node in the
> surviving DC. Can this be achieved with 6 nodes (3 in each). Obviously for
> that all data must be available  in any two nodes. Any pointers on
> replication factor.
>
> Thanks
>
> --
> Sent from mobile.