You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by mcasandra <mo...@gmail.com> on 2011/02/11 02:36:19 UTC

Basic Cassandra Architecture questions

I am reading interesting white paper about Dynamo. I might have to read it
again :) but I have a simple question, when a request comes in which node
handles the request first and how does it determine which node has the
key/value? Also, how cassandra ensures that read/write always O(1)
complexity? Can someone please explain that in detail?

http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html
-- 
View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Basic-Cassandra-Architecture-questions-tp6014218p6014218.html
Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.

Re: Basic Cassandra Architecture questions

Posted by Deming Shi <de...@gmail.com>.
The client has to specify a node or a set of nodes in the cluster to connect
to. These nodes/this node will handle the request first.
In the cluster, nodes will gossip with each other about their information,
so that it will know which node has the key/value.

Stanley

On Fri, Feb 11, 2011 at 9:36 AM, mcasandra <mo...@gmail.com> wrote:

>
> I am reading interesting white paper about Dynamo. I might have to read it
> again :) but I have a simple question, when a request comes in which node
> handles the request first and how does it determine which node has the
> key/value? Also, how cassandra ensures that read/write always O(1)
> complexity? Can someone please explain that in detail?
>
> http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html
> --
> View this message in context:
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Basic-Cassandra-Architecture-questions-tp6014218p6014218.html
> Sent from the cassandra-user@incubator.apache.org mailing list archive at
> Nabble.com.
>

Re: Basic Cassandra Architecture questions

Posted by Aaron Morton <aa...@thelastpickle.com>.
You can get consistency by using Quorum, or write at All and read at one, or write at one and read at All

Start with quorum.

If you read at one, then read repair will work in the background to fix the data. But the result returned to your client may be inconsistent.

Aaron

On 12/02/2011, at 7:17 AM, mcasandra <mo...@gmail.com> wrote:

> 
> What's the best practice in terms of consistency? I am assuming R+W > N
> should be the best practice.
> 
> I thought even if R+W=N then there is some version level reconciliation that
> kicks off if in case older version of the key/value is read. But to think of
> it may not be possible. But then if R+W <= N it means there is a high chance
> of getting wrong version of the key/value. Is that true?
> -- 
> View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Basic-Cassandra-Architecture-questions-tp6014218p6016731.html
> Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.

RE: Basic Cassandra Architecture questions

Posted by mcasandra <mo...@gmail.com>.
What's the best practice in terms of consistency? I am assuming R+W > N
should be the best practice.

I thought even if R+W=N then there is some version level reconciliation that
kicks off if in case older version of the key/value is read. But to think of
it may not be possible. But then if R+W <= N it means there is a high chance
of getting wrong version of the key/value. Is that true?
-- 
View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Basic-Cassandra-Architecture-questions-tp6014218p6016731.html
Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.

RE: Basic Cassandra Architecture questions

Posted by Shu Zhang <sz...@mediosystems.com>.
"So if Key A is supposed to go to Node, 1,2,3 then the commit log for Key A will be on each of these nodes?"

There isn't a commit log per key, just one for each node tracking what's been written to that node. If a node1 determines node2 or node3 should handle a request it received, it'll route it to one of them and not append anything to its own commit log. 
________________________________________
From: Ryan King [ryan@twitter.com]
Sent: Friday, February 11, 2011 9:46 AM
To: user@cassandra.apache.org
Cc: cassandra-user@incubator.apache.org
Subject: Re: Basic Cassandra Architecture questions

On Fri, Feb 11, 2011 at 9:37 AM, mcasandra <mo...@gmail.com> wrote:
>
> Is commit log file maintained on every node that's responsible to keep key
> ranges? So if Key A is supposed to go to Node, 1,2,3 then the commit log for
> Key A will be on each of these nodes? Is this commit log like redo log of
> oracle, which is used in case of failure to roll forward/back the writes?

Sorta, the commitlog is used to rebuild the memtables when a machine restarts.

> I am trying to think why R + W > N is said to be consistent and not R + W =
> N?

You get consistency when you have a guarantee of overlap between the
read and write sets. R + W > N is just another way of saying "there
are some hosts that are in both the successful read and write sets".

-ryan

Re: Basic Cassandra Architecture questions

Posted by Ryan King <ry...@twitter.com>.
On Fri, Feb 11, 2011 at 9:37 AM, mcasandra <mo...@gmail.com> wrote:
>
> Is commit log file maintained on every node that's responsible to keep key
> ranges? So if Key A is supposed to go to Node, 1,2,3 then the commit log for
> Key A will be on each of these nodes? Is this commit log like redo log of
> oracle, which is used in case of failure to roll forward/back the writes?

Sorta, the commitlog is used to rebuild the memtables when a machine restarts.

> I am trying to think why R + W > N is said to be consistent and not R + W =
> N?

You get consistency when you have a guarantee of overlap between the
read and write sets. R + W > N is just another way of saying "there
are some hosts that are in both the successful read and write sets".

-ryan

Re: Basic Cassandra Architecture questions

Posted by Anthony John <ch...@gmail.com>.
>I am trying to think why R + W > N is said to be consistent and not R + W =
N?

E.g RF of 4  - Write goes to nodes 1/2 and - in R+W=N case - Reads could
happen from 3/4. Does your write could be missed!

HTH,

-JA

On Fri, Feb 11, 2011 at 11:37 AM, mcasandra <mo...@gmail.com> wrote:

>
> Is commit log file maintained on every node that's responsible to keep key
> ranges? So if Key A is supposed to go to Node, 1,2,3 then the commit log
> for
> Key A will be on each of these nodes? Is this commit log like redo log of
> oracle, which is used in case of failure to roll forward/back the writes?
>
> I am trying to think why R + W > N is said to be consistent and not R + W =
> N?
>
>
> --
> View this message in context:
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Basic-Cassandra-Architecture-questions-tp6014218p6016607.html
> Sent from the cassandra-user@incubator.apache.org mailing list archive at
> Nabble.com.
>

RE: Basic Cassandra Architecture questions

Posted by mcasandra <mo...@gmail.com>.
Is commit log file maintained on every node that's responsible to keep key
ranges? So if Key A is supposed to go to Node, 1,2,3 then the commit log for
Key A will be on each of these nodes? Is this commit log like redo log of
oracle, which is used in case of failure to roll forward/back the writes?

I am trying to think why R + W > N is said to be consistent and not R + W =
N?


-- 
View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Basic-Cassandra-Architecture-questions-tp6014218p6016607.html
Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.

RE: Basic Cassandra Architecture questions

Posted by Shu Zhang <sz...@mediosystems.com>.
"when a request comes in which node handles the request first"

You (ie. cassandra client) always specifies the exact node to send requests to. While most higher level clients let's you specify configurations for a whole cluster, that's usually for their own basic load balancing. Each request any of them sends through Cassandra's core thrift client, is sent explicitly to a single cassandra node.

"how does it determine which node has the key/value"

Every node is assigned a token which determines which keys it stores. Every node has access to that information, so any node can take a key for any request and figure out which node(s) should take the request.

"how cassandra ensures that read/write always O(1) complexity"

Cassandra has tuneable consistency that trades off with performance. For the lowest consistency level, writes return as soon as an append is made into the commit log file, an append is O(1) obv. For higher consistency levels, writes return after writing to the in memory table, that's not O(1) but it's in memory and is extremely fast. The on disk representation is updated asynchronously.

Reads are not O(1). Cassandra's basic algorithm for read and write is detailed here: http://wiki.apache.org/cassandra/ArchitectureOverview
________________________________________
From: mcasandra [mohitanchlia@gmail.com]
Sent: Thursday, February 10, 2011 5:36 PM
To: cassandra-user@incubator.apache.org
Subject: Basic Cassandra Architecture questions

I am reading interesting white paper about Dynamo. I might have to read it
again :) but I have a simple question, when a request comes in which node
handles the request first and how does it determine which node has the
key/value? Also, how cassandra ensures that read/write always O(1)
complexity? Can someone please explain that in detail?

http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html
--
View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Basic-Cassandra-Architecture-questions-tp6014218p6014218.html
Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.

Re: Basic Cassandra Architecture questions

Posted by Aaron Morton <aa...@thelastpickle.com>.
Take a look at the introduction here http://thelastpickle.com/2011/02/07/Introduction-to-Cassandra/ 

I've tried to cover the basic how the cluster works questions. Let me know if you have any suggestions on how I can improve it.

Short answer is a Gossip protocol is used, every node knows about every other node in the cluster. They share this information with a set number of other nodes around the cluster every second.

Aaron

On 11 Feb, 2011,at 02:36 PM, mcasandra <mo...@gmail.com> wrote:


I am reading interesting white paper about Dynamo. I might have to read it
again :) but I have a simple question, when a request comes in which node
handles the request first and how does it determine which node has the
key/value? Also, how cassandra ensures that read/write always O(1)
complexity? Can someone please explain that in detail?

http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html
-- 
View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Basic-Cassandra-Architecture-questions-tp6014218p6014218.html
Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.