You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by "Ryan, Brent" <BR...@cvent.com> on 2013/12/31 15:01:44 UTC

Cassandra client alternatives to mimic Couchbase sharding ???

Assuming that you have a 3 node cassandra cluster with replication factor of 3 (so all nodes have the data)…

Does there exist a cassandra client that would allow a cassandra cluster to behave similarly to a couchbase cluster where for a given RowKey X, the client would always communicate to the same node in the cassandra cluster?  Essentially provides sharding at the client tier by RowKey.  The main reason for doing this would be to avoid some of the issues you run into with eventual consistency and allowing the cluster to resolve conflicts using server side timestamps?

I’m not sure exactly if this would work like I’d want, but thought it might be an interesting use case.  You might even be able to extend this behavior into the client further if the client is aware of the sharding algorithm being applied to the cluster so that you always communicate to a shard that has the data for a given row key.

Thoughts?



Thanks,

Brent

Re: Cassandra client alternatives to mimic Couchbase sharding ???

Posted by Robert Coli <rc...@eventbrite.com>.
On Tue, Dec 31, 2013 at 7:46 AM, Hannu Kröger <hk...@gmail.com> wrote:

> Normally you can have full consistency by doing CL=QUORUM reads and
> CL=QUORUM writes. That way you have high availability and faster reads than
> in CL=ALL case. No matter what the consistency level is, you can get wrong
> results if writes get serialized in wrong order because of the clock drift.
>

Here's some related details...

https://issues.apache.org/jira/browse/CASSANDRA-6147
https://issues.apache.org/jira/browse/CASSANDRA-6123
above two issues prompted by...
http://aphyr.com/posts/294-call-me-maybe-cassandra/

=Rob

Re: Cassandra client alternatives to mimic Couchbase sharding ???

Posted by Hannu Kröger <hk...@gmail.com>.
If you are worried about the server side time drift (which usually should
not be a problem, right?) then you can also generate the timestamp on the
client side if that clock is more accurate. That probably should do it if
you can do that.

If you think that server clock drift will cause you problems (first write
"overwriting" the second because the timestamp of the first write is in the
future compared to the second one because of the clock drift), then maybe
the TokenAwarePolicy will help you. At least writes on the same key should
have the same timestamp (from the same coordinator) as long as the
coordinators are all up and running smoothly.

Normally you can have full consistency by doing CL=QUORUM reads and
CL=QUORUM writes. That way you have high availability and faster reads than
in CL=ALL case. No matter what the consistency level is, you can get wrong
results if writes get serialized in wrong order because of the clock drift.

Interesting topic. Are there other ways to handle that, anyone?

Hannu


2013/12/31 Ryan, Brent <BR...@cvent.com>

>   Thanks Hannu!
>
> I wasn’t aware of the TokenAwarePolicy and that’s exactly what I was
> talking about so I’ll give that a try.  Because setting the consistency
> level doesn’t really work unless I set it to ALL, but I really don’t want
> to pay that sort of performance penalty.  I’d rather stick with QUORUM
> writes and use the TokenAwarePolicy which should avoid situations where
> RowKey X can be written to replication node 1 for request 1 and then
> written to replication node 2 for request 2.  This normally isn’t an issue
> unless the time drift across the cluster > the time between 2 writes for
> the same row key.
>
>
>  Brent
>    From: Hannu Kröger <hk...@gmail.com>
> Reply-To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
> Date: Tuesday, December 31, 2013 at 9:15 AM
> To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
> Subject: Re: Cassandra client alternatives to mimic Couchbase sharding ???
>
>   Hi,
>
>  DataStax Cassandra Java Driver has the possibility to choose the
> coordinator node based on the partition key (TokenAwarePolicy), however
> that probably does not solve the consistency problem you are thinking
> about:
> http://www.datastax.com/dev/blog/ideology-and-testing-of-a-resilient-driver
>
>  If you really want to have full consistency, you should read this and
> tune consitency level accordingly if you haven't already:
> http://www.datastax.com/docs/1.1/dml/data_consistency
>
>  Cheers,
> Hannu
>
>
> 2013/12/31 Ryan, Brent <BR...@cvent.com>
>
>>  Assuming that you have a 3 node cassandra cluster with replication
>> factor of 3 (so all nodes have the data)…
>>
>>  Does there exist a cassandra client that would allow a cassandra
>> cluster to behave similarly to a couchbase cluster where for a given RowKey
>> X, the client would always communicate to the same node in the cassandra
>> cluster?  Essentially provides sharding at the client tier by RowKey.  The
>> main reason for doing this would be to avoid some of the issues you run
>> into with eventual consistency and allowing the cluster to resolve
>> conflicts using server side timestamps?
>>
>>  I’m not sure exactly if this would work like I’d want, but thought it
>> might be an interesting use case.  You might even be able to extend this
>> behavior into the client further if the client is aware of the sharding
>> algorithm being applied to the cluster so that you always communicate to a
>> shard that has the data for a given row key.
>>
>>  Thoughts?
>>
>>
>>  Thanks,
>>
>> Brent
>>
>
>

Re: Cassandra client alternatives to mimic Couchbase sharding ???

Posted by "Ryan, Brent" <BR...@cvent.com>.
Thanks Hannu!

I wasn’t aware of the TokenAwarePolicy and that’s exactly what I was talking about so I’ll give that a try.  Because setting the consistency level doesn’t really work unless I set it to ALL, but I really don’t want to pay that sort of performance penalty.  I’d rather stick with QUORUM writes and use the TokenAwarePolicy which should avoid situations where RowKey X can be written to replication node 1 for request 1 and then written to replication node 2 for request 2.  This normally isn’t an issue unless the time drift across the cluster > the time between 2 writes for the same row key.


Brent

From: Hannu Kröger <hk...@gmail.com>>
Reply-To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Date: Tuesday, December 31, 2013 at 9:15 AM
To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Subject: Re: Cassandra client alternatives to mimic Couchbase sharding ???

Hi,

DataStax Cassandra Java Driver has the possibility to choose the coordinator node based on the partition key (TokenAwarePolicy), however that probably does not solve the consistency problem you are thinking about:
http://www.datastax.com/dev/blog/ideology-and-testing-of-a-resilient-driver

If you really want to have full consistency, you should read this and tune consitency level accordingly if you haven't already:
http://www.datastax.com/docs/1.1/dml/data_consistency

Cheers,
Hannu


2013/12/31 Ryan, Brent <BR...@cvent.com>>
Assuming that you have a 3 node cassandra cluster with replication factor of 3 (so all nodes have the data)…

Does there exist a cassandra client that would allow a cassandra cluster to behave similarly to a couchbase cluster where for a given RowKey X, the client would always communicate to the same node in the cassandra cluster?  Essentially provides sharding at the client tier by RowKey.  The main reason for doing this would be to avoid some of the issues you run into with eventual consistency and allowing the cluster to resolve conflicts using server side timestamps?

I’m not sure exactly if this would work like I’d want, but thought it might be an interesting use case.  You might even be able to extend this behavior into the client further if the client is aware of the sharding algorithm being applied to the cluster so that you always communicate to a shard that has the data for a given row key.

Thoughts?



Thanks,

Brent


Re: Cassandra client alternatives to mimic Couchbase sharding ???

Posted by Hannu Kröger <hk...@gmail.com>.
Hi,

DataStax Cassandra Java Driver has the possibility to choose the
coordinator node based on the partition key (TokenAwarePolicy), however
that probably does not solve the consistency problem you are thinking about:
http://www.datastax.com/dev/blog/ideology-and-testing-of-a-resilient-driver

If you really want to have full consistency, you should read this and tune
consitency level accordingly if you haven't already:
http://www.datastax.com/docs/1.1/dml/data_consistency

Cheers,
Hannu


2013/12/31 Ryan, Brent <BR...@cvent.com>

>  Assuming that you have a 3 node cassandra cluster with replication
> factor of 3 (so all nodes have the data)…
>
>  Does there exist a cassandra client that would allow a cassandra cluster
> to behave similarly to a couchbase cluster where for a given RowKey X, the
> client would always communicate to the same node in the cassandra cluster?
>  Essentially provides sharding at the client tier by RowKey.  The main
> reason for doing this would be to avoid some of the issues you run into
> with eventual consistency and allowing the cluster to resolve conflicts
> using server side timestamps?
>
>  I’m not sure exactly if this would work like I’d want, but thought it
> might be an interesting use case.  You might even be able to extend this
> behavior into the client further if the client is aware of the sharding
> algorithm being applied to the cluster so that you always communicate to a
> shard that has the data for a given row key.
>
>  Thoughts?
>
>
>  Thanks,
>
> Brent
>