You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Peter Dijkshoorn <pe...@adyen.com> on 2012/01/23 16:47:11 UTC

architectural understanding of write operation node flow

Hi guys,

I got an architectural question about how a write operation flows
through the nodes.

As far as I understand now, a client sends its write operation to
whatever node it was set to use and if that node does not contain the
data for this key K, then this node forwards the operation to the first
node given by the hash function. This first node having key K then
contacts the replication nodes depending on the selected consistency level.

This means that in the unlucky event you always have a network call
sequence depth of 2 (consistency level one), or 3 (assumed that the
replication nodes are contacted in parallel)

This is more than I expected, so I am not sure whether this is correct?
can someone help me out?

At first I thought that the receiver was the coordinator, and thus doing
all further calls in parallel, the depth as described above would always
be 2. But I just discovered that I was wrong and that it should be
something like above.

Another possibility would be that the client learnt the layout of the
cluster at connection time and thereby tries per request to contact the
coordinator directly, but I never read or see something like this happening.

Remembering the picture of Dean about network and hard disk latencies,
is this 3-sequential-network-call still faster?

Thanks for any thoughts :)

Peter

-- 
Peter Dijkshoorn
Adyen - Payments Made Easy
www.adyen.com

Visiting Address:                 Mail Address:
Stationsplein 57 - 4th floor      P.O. Box 10095
1012 AB Amsterdam                 1001 EB Amsterdam
The Netherlands                   The Netherlands

Office +31.20.240.1240
Email peter.dijkshoorn@adyen.com

Re: architectural understanding of write operation node flow

Posted by Sylvain Lebresne <sy...@datastax.com>.

On Tue, Jan 24, 2012 at 9:57 AM, Peter Dijkshoorn
<pe...@adyen.com> wrote:
> yeah, well main question remains then, is the node receiving the request
> from the client called the coordinator (even if it is not responsible
> for that key)?

Yes.

> Or will that node forward the call to the first responsible node who
> does the coordinating stuff? (as the cassandra and dynamo paper state)
>
> In case of that forwarding, is the client told to connect to another
> node, or does the node receiving the call act as a proxy?
>
> so, is it a 3-deep or a 2-deep network call?

2-deep (except for counter increments which have a slightly different protocol).

> I want to explain this in a small literature part of my thesis to
> distinguish the internal structure against BigTable and various kinds of
> RDBMS replication schemes, that's why I want to know precisely :)
>
> Thanks,
>
> Peter Dijkshoorn
> Adyen - Payments Made Easy
> www.adyen.com
>
> Visiting Address:                 Mail Address:
> Stationsplein 57 - 4th floor      P.O. Box 10095
> 1012 AB Amsterdam                 1001 EB Amsterdam
> The Netherlands                   The Netherlands
>
> Office +31.20.240.1240
> Email peter.dijkshoorn@adyen.com
>
>
> On 01/23/2012 06:59 PM, Daniel Doubleday wrote:
>> Ouch :-) you were asking write ...
>>
>> Well kind of similar
>>
>> 1. Coordinator calculates all nodes
>> 2. If not enough (according to CL) nodes are alive it throughs unavailable
>> 3. If nodes are down it writes and hh is enabled it writes a hint for that row
>> 4. It sends write request to all nodes (including itself / shortcutting messaging)
>> 5. If it receives enough (according to CL) acks before timeout everything is fine otherwise it throughs unavailable
>>
>> errm .. I'm more confident in the read path though especially concerning hh handling so I'm happy to be corrected here. I.e. I'm not sure if hints are written when request time out but CL is reached.
>>
>> On Jan 23, 2012, at 6:47 PM, Daniel Doubleday wrote:
>>
>>> Your first thought was pretty much correct:
>>>
>>> 1. The node which is called by the client is the coordinator
>>> 2. The coordinator determines the nodes in the ring which can handle the request ordered by expected latency (via snitch). The coordinator may or may not be part of these nodes
>>> 3. Given the consistency level and read repair chance the coordinator calculates the min amount of node to ask and sends read requests to them
>>> 4. As soon as the minimum count (according to consistency) of responses is collected the coordinator will respond to the request. Mismatches will lead to repair write requests to the corresponding nodes
>>>
>>> Thus the minimal depth is one (CL = 1 and coordinator can handle the request itself) or two otherwise.
>>>
>>> Hope that helps
>>>
>>> On Jan 23, 2012, at 4:47 PM, Peter Dijkshoorn wrote:
>>>
>>>> Hi guys,
>>>>
>>>> I got an architectural question about how a write operation flows
>>>> through the nodes.
>>>>
>>>> As far as I understand now, a client sends its write operation to
>>>> whatever node it was set to use and if that node does not contain the
>>>> data for this key K, then this node forwards the operation to the first
>>>> node given by the hash function. This first node having key K then
>>>> contacts the replication nodes depending on the selected consistency level.
>>>>
>>>> This means that in the unlucky event you always have a network call
>>>> sequence depth of 2 (consistency level one), or 3 (assumed that the
>>>> replication nodes are contacted in parallel)
>>>>
>>>> This is more than I expected, so I am not sure whether this is correct?
>>>> can someone help me out?
>>>>
>>>> At first I thought that the receiver was the coordinator, and thus doing
>>>> all further calls in parallel, the depth as described above would always
>>>> be 2. But I just discovered that I was wrong and that it should be
>>>> something like above.
>>>>
>>>> Another possibility would be that the client learnt the layout of the
>>>> cluster at connection time and thereby tries per request to contact the
>>>> coordinator directly, but I never read or see something like this happening.
>>>>
>>>> Remembering the picture of Dean about network and hard disk latencies,
>>>> is this 3-sequential-network-call still faster?
>>>>
>>>> Thanks for any thoughts :)
>>>>
>>>> Peter
>>>>
>>>> --
>>>> Peter Dijkshoorn
>>>> Adyen - Payments Made Easy
>>>> www.adyen.com
>>>>
>>>> Visiting Address:                 Mail Address:
>>>> Stationsplein 57 - 4th floor      P.O. Box 10095
>>>> 1012 AB Amsterdam                 1001 EB Amsterdam
>>>> The Netherlands                   The Netherlands
>>>>
>>>> Office +31.20.240.1240
>>>> Email peter.dijkshoorn@adyen.com
>>>>

Re: architectural understanding of write operation node flow

Posted by Peter Dijkshoorn <pe...@adyen.com>.

yeah, well main question remains then, is the node receiving the request
from the client called the coordinator (even if it is not responsible
for that key)?
Or will that node forward the call to the first responsible node who
does the coordinating stuff? (as the cassandra and dynamo paper state)

In case of that forwarding, is the client told to connect to another
node, or does the node receiving the call act as a proxy?

so, is it a 3-deep or a 2-deep network call?

I want to explain this in a small literature part of my thesis to
distinguish the internal structure against BigTable and various kinds of
RDBMS replication schemes, that's why I want to know precisely :)

Thanks,

Peter Dijkshoorn
Adyen - Payments Made Easy
www.adyen.com

Visiting Address:                 Mail Address:
Stationsplein 57 - 4th floor      P.O. Box 10095
1012 AB Amsterdam                 1001 EB Amsterdam
The Netherlands                   The Netherlands

Office +31.20.240.1240
Email peter.dijkshoorn@adyen.com


On 01/23/2012 06:59 PM, Daniel Doubleday wrote:
> Ouch :-) you were asking write ...
>
> Well kind of similar 
>
> 1. Coordinator calculates all nodes
> 2. If not enough (according to CL) nodes are alive it throughs unavailable
> 3. If nodes are down it writes and hh is enabled it writes a hint for that row
> 4. It sends write request to all nodes (including itself / shortcutting messaging)
> 5. If it receives enough (according to CL) acks before timeout everything is fine otherwise it throughs unavailable
>
> errm .. I'm more confident in the read path though especially concerning hh handling so I'm happy to be corrected here. I.e. I'm not sure if hints are written when request time out but CL is reached.
>
> On Jan 23, 2012, at 6:47 PM, Daniel Doubleday wrote:
>
>> Your first thought was pretty much correct:
>>
>> 1. The node which is called by the client is the coordinator
>> 2. The coordinator determines the nodes in the ring which can handle the request ordered by expected latency (via snitch). The coordinator may or may not be part of these nodes
>> 3. Given the consistency level and read repair chance the coordinator calculates the min amount of node to ask and sends read requests to them
>> 4. As soon as the minimum count (according to consistency) of responses is collected the coordinator will respond to the request. Mismatches will lead to repair write requests to the corresponding nodes
>>
>> Thus the minimal depth is one (CL = 1 and coordinator can handle the request itself) or two otherwise.
>>
>> Hope that helps
>>
>> On Jan 23, 2012, at 4:47 PM, Peter Dijkshoorn wrote:
>>
>>> Hi guys,
>>>
>>> I got an architectural question about how a write operation flows
>>> through the nodes.
>>>
>>> As far as I understand now, a client sends its write operation to
>>> whatever node it was set to use and if that node does not contain the
>>> data for this key K, then this node forwards the operation to the first
>>> node given by the hash function. This first node having key K then
>>> contacts the replication nodes depending on the selected consistency level.
>>>
>>> This means that in the unlucky event you always have a network call
>>> sequence depth of 2 (consistency level one), or 3 (assumed that the
>>> replication nodes are contacted in parallel)
>>>
>>> This is more than I expected, so I am not sure whether this is correct?
>>> can someone help me out?
>>>
>>> At first I thought that the receiver was the coordinator, and thus doing
>>> all further calls in parallel, the depth as described above would always
>>> be 2. But I just discovered that I was wrong and that it should be
>>> something like above.
>>>
>>> Another possibility would be that the client learnt the layout of the
>>> cluster at connection time and thereby tries per request to contact the
>>> coordinator directly, but I never read or see something like this happening.
>>>
>>> Remembering the picture of Dean about network and hard disk latencies,
>>> is this 3-sequential-network-call still faster?
>>>
>>> Thanks for any thoughts :)
>>>
>>> Peter
>>>
>>> -- 
>>> Peter Dijkshoorn
>>> Adyen - Payments Made Easy
>>> www.adyen.com
>>>
>>> Visiting Address:                 Mail Address:
>>> Stationsplein 57 - 4th floor      P.O. Box 10095
>>> 1012 AB Amsterdam                 1001 EB Amsterdam
>>> The Netherlands                   The Netherlands
>>>
>>> Office +31.20.240.1240
>>> Email peter.dijkshoorn@adyen.com
>>>

Re: architectural understanding of write operation node flow

Posted by Daniel Doubleday <da...@gmx.net>.

Ouch :-) you were asking write ...

Well kind of similar 

1. Coordinator calculates all nodes
2. If not enough (according to CL) nodes are alive it throughs unavailable
3. If nodes are down it writes and hh is enabled it writes a hint for that row
4. It sends write request to all nodes (including itself / shortcutting messaging)
5. If it receives enough (according to CL) acks before timeout everything is fine otherwise it throughs unavailable

errm .. I'm more confident in the read path though especially concerning hh handling so I'm happy to be corrected here. I.e. I'm not sure if hints are written when request time out but CL is reached.

On Jan 23, 2012, at 6:47 PM, Daniel Doubleday wrote:

> Your first thought was pretty much correct:
> 
> 1. The node which is called by the client is the coordinator
> 2. The coordinator determines the nodes in the ring which can handle the request ordered by expected latency (via snitch). The coordinator may or may not be part of these nodes
> 3. Given the consistency level and read repair chance the coordinator calculates the min amount of node to ask and sends read requests to them
> 4. As soon as the minimum count (according to consistency) of responses is collected the coordinator will respond to the request. Mismatches will lead to repair write requests to the corresponding nodes
> 
> Thus the minimal depth is one (CL = 1 and coordinator can handle the request itself) or two otherwise.
> 
> Hope that helps
> 
> On Jan 23, 2012, at 4:47 PM, Peter Dijkshoorn wrote:
> 
>> 
>> Hi guys,
>> 
>> I got an architectural question about how a write operation flows
>> through the nodes.
>> 
>> As far as I understand now, a client sends its write operation to
>> whatever node it was set to use and if that node does not contain the
>> data for this key K, then this node forwards the operation to the first
>> node given by the hash function. This first node having key K then
>> contacts the replication nodes depending on the selected consistency level.
>> 
>> This means that in the unlucky event you always have a network call
>> sequence depth of 2 (consistency level one), or 3 (assumed that the
>> replication nodes are contacted in parallel)
>> 
>> This is more than I expected, so I am not sure whether this is correct?
>> can someone help me out?
>> 
>> At first I thought that the receiver was the coordinator, and thus doing
>> all further calls in parallel, the depth as described above would always
>> be 2. But I just discovered that I was wrong and that it should be
>> something like above.
>> 
>> Another possibility would be that the client learnt the layout of the
>> cluster at connection time and thereby tries per request to contact the
>> coordinator directly, but I never read or see something like this happening.
>> 
>> Remembering the picture of Dean about network and hard disk latencies,
>> is this 3-sequential-network-call still faster?
>> 
>> Thanks for any thoughts :)
>> 
>> Peter
>> 
>> -- 
>> Peter Dijkshoorn
>> Adyen - Payments Made Easy
>> www.adyen.com
>> 
>> Visiting Address:                 Mail Address:
>> Stationsplein 57 - 4th floor      P.O. Box 10095
>> 1012 AB Amsterdam                 1001 EB Amsterdam
>> The Netherlands                   The Netherlands
>> 
>> Office +31.20.240.1240
>> Email peter.dijkshoorn@adyen.com
>> 
>

Re: architectural understanding of write operation node flow

Posted by Daniel Doubleday <da...@gmx.net>.

Your first thought was pretty much correct:

1. The node which is called by the client is the coordinator
2. The coordinator determines the nodes in the ring which can handle the request ordered by expected latency (via snitch). The coordinator may or may not be part of these nodes
3. Given the consistency level and read repair chance the coordinator calculates the min amount of node to ask and sends read requests to them
4. As soon as the minimum count (according to consistency) of responses is collected the coordinator will respond to the request. Mismatches will lead to repair write requests to the corresponding nodes

Thus the minimal depth is one (CL = 1 and coordinator can handle the request itself) or two otherwise.

Hope that helps

On Jan 23, 2012, at 4:47 PM, Peter Dijkshoorn wrote:

> 
> Hi guys,
> 
> I got an architectural question about how a write operation flows
> through the nodes.
> 
> As far as I understand now, a client sends its write operation to
> whatever node it was set to use and if that node does not contain the
> data for this key K, then this node forwards the operation to the first
> node given by the hash function. This first node having key K then
> contacts the replication nodes depending on the selected consistency level.
> 
> This means that in the unlucky event you always have a network call
> sequence depth of 2 (consistency level one), or 3 (assumed that the
> replication nodes are contacted in parallel)
> 
> This is more than I expected, so I am not sure whether this is correct?
> can someone help me out?
> 
> At first I thought that the receiver was the coordinator, and thus doing
> all further calls in parallel, the depth as described above would always
> be 2. But I just discovered that I was wrong and that it should be
> something like above.
> 
> Another possibility would be that the client learnt the layout of the
> cluster at connection time and thereby tries per request to contact the
> coordinator directly, but I never read or see something like this happening.
> 
> Remembering the picture of Dean about network and hard disk latencies,
> is this 3-sequential-network-call still faster?
> 
> Thanks for any thoughts :)
> 
> Peter
> 
> -- 
> Peter Dijkshoorn
> Adyen - Payments Made Easy
> www.adyen.com
> 
> Visiting Address:                 Mail Address:
> Stationsplein 57 - 4th floor      P.O. Box 10095
> 1012 AB Amsterdam                 1001 EB Amsterdam
> The Netherlands                   The Netherlands
> 
> Office +31.20.240.1240
> Email peter.dijkshoorn@adyen.com
>