You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by A J <s5...@gmail.com> on 2011/02/18 16:23:21 UTC

R and N

Questions about R and N (and W):
1. If I set R to Quorum and cassandra identifies a need for read
repair before returning, would the read repair happen on R nodes (I
mean subset of R that needs repair) or N nodes before the data is
delivered to the client ?
2. Also does the repair happen at level of row (key) or at level of column ?

3. During write, if W is met but N-W is not met for some reason; would
cassandra try to repair N-W nodes in the background as and when it
can. Or the N-W are only repaired when a read is issued ?

4. What is the significance of the 'primary' replica for writes from
usage point ? Writes to primary and non-primary replicas all happen
simultaneously. Ensuring W is decided irrespective of it being primary
or not. Ensuring R is decided by any of the R nodes out of N.
I know the tokens are divided per the primary replica. But other than
that, for read and write operations, do the primary replica play any
special role ?

Thanks.

Re: R and N

Posted by Aaron Morton <aa...@thelastpickle.com>.
My understanding..

1 read repair involves the coordinator sending a full data read to CL nodes, resolving the differences and sending writes back. For CL one this happens after returning, for higher CL this happens before. (my understanding of the internals of RR are a little rough though)

2 not sure

3) RR is not used in write, hinted handoff is.

4) e node responsible for the key is often the node asked for the full data of the request, the other nodes are asked for a digest of their response. However the dynamic snitch can re-order the nodes based on load. It's also the starting point when the partitioner is working which nodes replicas shoud be stored on. It's not a point of failure. 

5) partitioner knows where the data was written to. http://thelastpickle.com/2011/02/07/Introduction-to-Cassandra/

Aaron

On 19/02/2011, at 6:28 AM, Anthony John <ch...@gmail.com> wrote:

> K - let me state the facts first (As I see know them)
> - I do not know the inner workings, so interpret my response with that caveat. Although, at an architectural level, one should be able to keep detailed implementation at bay
> - Quorum is (N+!)/2 where N is the Replication Factor (RF)
> - And consistency is a guarantee if R(ead) + W(rite) > RF (Which Quorum gives you, but can be achieved via other permutations, depending on whether Read or Write performance is desired)
> 
> No getting to your questions:- 
> 1. If Read at Q is nondeterministic, it would likely have to read the other (RF-Q) nodes to achieve Quorum on a deterministic value. At which point - sync'ing all with writes should not be that expensive. But at what point precisely the read is returned - do not know - you will have to look at the code. IMO - at this level it should not matter.
> 2. Should be at the granularity of data divergence
> 3. Read Repair or Nodetool (which ever comes first)
> 4. All peer - there is no primary. There might be a connected node - but no special role/privileges
> 5. Tries to Q - returns on deterministic read. If not - see (1)
> 6. Writer supplies timestamp value - can be any value that makes sense within the scope of data/application.
> 
> HTH,
> 
> -JA
> 
> On Fri, Feb 18, 2011 at 10:28 AM, A J <s5...@gmail.com> wrote:
> Couple of more related questions:
> 
> 5. For reads, does Cassandra first read N nodes or just the R nodes it
> selects ? I am thinking unless it reads all the N nodes, how will it
> know which node has the latest write.
> 
> 6. Who decides the timestamp that gets inserted into the timestamp
> field of every column. I would guess the coordinator node picks up its
> system's timestamp.  If that is true, the clocks on all the nodes
> should be synchronized, right ? Otherwise conflict resolution cannot
> be done correctly.
> For a distributed system, this is not always possible. How do folks
> get around this issue ?
> 
> Thanks.
> 
> 
> 
> On Fri, Feb 18, 2011 at 10:23 AM, A J <s5...@gmail.com> wrote:
> > Questions about R and N (and W):
> > 1. If I set R to Quorum and cassandra identifies a need for read
> > repair before returning, would the read repair happen on R nodes (I
> > mean subset of R that needs repair) or N nodes before the data is
> > delivered to the client ?
> > 2. Also does the repair happen at level of row (key) or at level of column ?
> >
> > 3. During write, if W is met but N-W is not met for some reason; would
> > cassandra try to repair N-W nodes in the background as and when it
> > can. Or the N-W are only repaired when a read is issued ?
> >
> > 4. What is the significance of the 'primary' replica for writes from
> > usage point ? Writes to primary and non-primary replicas all happen
> > simultaneously. Ensuring W is decided irrespective of it being primary
> > or not. Ensuring R is decided by any of the R nodes out of N.
> > I know the tokens are divided per the primary replica. But other than
> > that, for read and write operations, do the primary replica play any
> > special role ?
> >
> > Thanks.
> >
> 

Re: R and N

Posted by Anthony John <ch...@gmail.com>.
K - let me state the facts first (As I see know them)
- I do not know the inner workings, so interpret my response with that
caveat. Although, at an architectural level, one should be able to keep
detailed implementation at bay
- Quorum is (N+!)/2 where N is the Replication Factor (RF)
- And consistency is a guarantee if R(ead) + W(rite) > RF (Which Quorum
gives you, but can be achieved via other permutations, depending on whether
Read or Write performance is desired)

No getting to your questions:-
1. If Read at Q is nondeterministic, it would likely have to read the other
(RF-Q) nodes to achieve Quorum on a deterministic value. At which point -
sync'ing all with writes should not be that expensive. But at what point
precisely the read is returned - do not know - you will have to look at the
code. IMO - at this level it should not matter.
2. Should be at the granularity of data divergence
3. Read Repair or Nodetool (which ever comes first)
4. All peer - there is no primary. There might be a connected node - but no
special role/privileges
5. Tries to Q - returns on deterministic read. If not - see (1)
6. Writer supplies timestamp value - can be any value that makes sense
within the scope of data/application.

HTH,

-JA

On Fri, Feb 18, 2011 at 10:28 AM, A J <s5...@gmail.com> wrote:

> Couple of more related questions:
>
> 5. For reads, does Cassandra first read N nodes or just the R nodes it
> selects ? I am thinking unless it reads all the N nodes, how will it
> know which node has the latest write.
>
> 6. Who decides the timestamp that gets inserted into the timestamp
> field of every column. I would guess the coordinator node picks up its
> system's timestamp.  If that is true, the clocks on all the nodes
> should be synchronized, right ? Otherwise conflict resolution cannot
> be done correctly.
> For a distributed system, this is not always possible. How do folks
> get around this issue ?
>
> Thanks.
>
>
>
> On Fri, Feb 18, 2011 at 10:23 AM, A J <s5...@gmail.com> wrote:
> > Questions about R and N (and W):
> > 1. If I set R to Quorum and cassandra identifies a need for read
> > repair before returning, would the read repair happen on R nodes (I
> > mean subset of R that needs repair) or N nodes before the data is
> > delivered to the client ?
> > 2. Also does the repair happen at level of row (key) or at level of
> column ?
> >
> > 3. During write, if W is met but N-W is not met for some reason; would
> > cassandra try to repair N-W nodes in the background as and when it
> > can. Or the N-W are only repaired when a read is issued ?
> >
> > 4. What is the significance of the 'primary' replica for writes from
> > usage point ? Writes to primary and non-primary replicas all happen
> > simultaneously. Ensuring W is decided irrespective of it being primary
> > or not. Ensuring R is decided by any of the R nodes out of N.
> > I know the tokens are divided per the primary replica. But other than
> > that, for read and write operations, do the primary replica play any
> > special role ?
> >
> > Thanks.
> >
>

Re: R and N

Posted by A J <s5...@gmail.com>.
Couple of more related questions:

5. For reads, does Cassandra first read N nodes or just the R nodes it
selects ? I am thinking unless it reads all the N nodes, how will it
know which node has the latest write.

6. Who decides the timestamp that gets inserted into the timestamp
field of every column. I would guess the coordinator node picks up its
system's timestamp.  If that is true, the clocks on all the nodes
should be synchronized, right ? Otherwise conflict resolution cannot
be done correctly.
For a distributed system, this is not always possible. How do folks
get around this issue ?

Thanks.



On Fri, Feb 18, 2011 at 10:23 AM, A J <s5...@gmail.com> wrote:
> Questions about R and N (and W):
> 1. If I set R to Quorum and cassandra identifies a need for read
> repair before returning, would the read repair happen on R nodes (I
> mean subset of R that needs repair) or N nodes before the data is
> delivered to the client ?
> 2. Also does the repair happen at level of row (key) or at level of column ?
>
> 3. During write, if W is met but N-W is not met for some reason; would
> cassandra try to repair N-W nodes in the background as and when it
> can. Or the N-W are only repaired when a read is issued ?
>
> 4. What is the significance of the 'primary' replica for writes from
> usage point ? Writes to primary and non-primary replicas all happen
> simultaneously. Ensuring W is decided irrespective of it being primary
> or not. Ensuring R is decided by any of the R nodes out of N.
> I know the tokens are divided per the primary replica. But other than
> that, for read and write operations, do the primary replica play any
> special role ?
>
> Thanks.
>