You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Paul Prescod <pa...@prescod.net> on 2010/04/04 21:47:38 UTC

Memcached protocol?

Many Cassandra implementations seem to be memcached+X migrations, and some
might be replacing memcached alone. Has anyone considered making a protocol
handler or proxy that would allow Cassandra to talk the memached binary
protocol?

jmemcached + Cassandra = easy migration?

I have barely started to consider the impedance mismatch issues, but the
most glaring one is that the memcached namespace is flat, whereas
Cassandra's has several levels of nesting. I think that this could be
managed through configuration files. Either the user could map all Memcached
stuff to a single ColumnFamily, or they could define a convention for
splitting their keys based on special namespace characters like ":" or "_".
The user could say how to interpret keys without enough parts (i.e. whether
to treat the missing part as the keyspace or the columnfamily).

 Paul Prescod

Re: Memcached protocol?

Posted by Paul Prescod <pr...@gmail.com>.

On Sun, Apr 4, 2010 at 1:16 PM, Joe Stump <jo...@joestump.net> wrote:

> Seems like this would be pretty easy to build on top of the proxy stuff
> that was recently mentioned.
>

I'm new to the list: is it easy for you to dig up a subject line or
message-id that I can google?

I was poking around to see how Avro was handled and noticed that it seems
you must choose between Avro and Thrift at startup. I was expecting that you
could do both at once as you can do (e.g.) FTP and HTTP at the same time in
Apache.


> I don't see a reason why you couldn't just store key/blob-in-column to get
> running quickly.
>

Right, agree.


> Might make for a pretty interesting clustered queue system as well, which
> has been mentioned before on the list as well.
>

Yes, I was thinking about that too...need to think carefully about eventual
consistency issues versus once-and-only-once delivery though...


> In other words, Cassandra is quickly becoming the hammer to everyone's
> cluster nails. :)
>
> --Joe
>
> On Apr 4, 2010, at 12:47 PM, Paul Prescod wrote:
>
> Many Cassandra implementations seem to be memcached+X migrations, and some
> might be replacing memcached alone. Has anyone considered making a protocol
> handler or proxy that would allow Cassandra to talk the memached binary
> protocol?
>
> jmemcached + Cassandra = easy migration?
>
> I have barely started to consider the impedance mismatch issues, but the
> most glaring one is that the memcached namespace is flat, whereas
> Cassandra's has several levels of nesting. I think that this could be
> managed through configuration files. Either the user could map all Memcached
> stuff to a single ColumnFamily, or they could define a convention for
> splitting their keys based on special namespace characters like ":" or "_".
> The user could say how to interpret keys without enough parts (i.e. whether
> to treat the missing part as the keyspace or the columnfamily).
>
>  Paul Prescod
>
>
>

Re: Memcached protocol?

Posted by Joe Stump <jo...@joestump.net>.

Seems like this would be pretty easy to build on top of the proxy stuff that was recently mentioned. I don't see a reason why you couldn't just store key/blob-in-column to get running quickly. Might make for a pretty interesting clustered queue system as well, which has been mentioned before on the list as well.

In other words, Cassandra is quickly becoming the hammer to everyone's cluster nails. :)

--Joe

On Apr 4, 2010, at 12:47 PM, Paul Prescod wrote:

> Many Cassandra implementations seem to be memcached+X migrations, and some might be replacing memcached alone. Has anyone considered making a protocol handler or proxy that would allow Cassandra to talk the memached binary protocol?
> 
> jmemcached + Cassandra = easy migration?
> 
> I have barely started to consider the impedance mismatch issues, but the most glaring one is that the memcached namespace is flat, whereas Cassandra's has several levels of nesting. I think that this could be managed through configuration files. Either the user could map all Memcached stuff to a single ColumnFamily, or they could define a convention for splitting their keys based on special namespace characters like ":" or "_". The user could say how to interpret keys without enough parts (i.e. whether to treat the missing part as the keyspace or the columnfamily).
> 
>  Paul Prescod
>

Re: Memcached protocol?

Posted by Paul Prescod <pa...@prescod.net>.

On Sun, Apr 4, 2010 at 8:48 PM, Benjamin Black <b...@b3k.us> wrote:
> ...
>
> It gives vector clocks, but that does not mean you have a global
> counter you can use as you are describing.  In particular, the "read
> after write to trigger read repair" in cases where read repair is
> actually required is most likely to result in a counter being updated
> several times, with the last value seen by multiple clients, rather
> than each client getting a unique value along the way (unless I am
> misunderstanding what you are describing).  Is that the behavior you
> want?

Yes, as long as all of them converge on a "correct" number eventually,
I think that's the goal. Imagine you are counting site visitors.
Writes will come in left and write and then the conflict resolution
will do the summation to really get the right count. I don't care if
any particular client sees intermediate values, nor that they see
unique values.

 Paul Prescod

Re: Memcached protocol?

Posted by Benjamin Black <b...@b3k.us>.

On Sun, Apr 4, 2010 at 8:42 PM, Paul Prescod <pr...@gmail.com> wrote:
> On Sun, Apr 4, 2010 at 5:06 PM, Benjamin Black <b...@b3k.us> wrote:
>> ...
>>
>> Are you suggesting this would give you counter semantics?
>
> Yes: My understanding of cassandra-580 is that it gives you increment
> and decrement which are the basis of counters.
>

It gives vector clocks, but that does not mean you have a global
counter you can use as you are describing.  In particular, the "read
after write to trigger read repair" in cases where read repair is
actually required is most likely to result in a counter being updated
several times, with the last value seen by multiple clients, rather
than each client getting a unique value along the way (unless I am
misunderstanding what you are describing).  Is that the behavior you
want?

b

Re: Memcached protocol?

Posted by Paul Prescod <pa...@ayogo.com>.

On Mon, Apr 5, 2010 at 1:02 AM, David Strauss <da...@fourkitchens.com> wrote:
> ...
>
> But your "write then read" model lacks the atomicity of the memcached
> API. It's possible for two clients to read the same value.

Do you have an example application where this particular side effect
of eventual consistency is problematic? Obviously memcached and
Cassandra are different because of eventual consistency. The question
is whether they are different enough to break an inconvenient number
of real applications. Do you depend on add returning a unique number
to each client in an application you've deployed? I have always
imagined it as being primarily for simple counters.

 Paul Prescod

Re: Memcached protocol?

Posted by Jonathan Ellis <jb...@gmail.com>.

On Mon, Apr 5, 2010 at 6:48 PM, Tatu Saloranta <ts...@gmail.com> wrote:
> I would think that there is also possibility of losing some
> increments, or perhaps getting duplicate increments?
> It is not just isolation but also correctness that is hard to maintain
> but correctness also. This can be more easily worked around in cases
> where there is additional data that can be used to resolve potentially
> ambiguous changes (like inferring which of shopping cart additions are
> real, which duplicates).
> With more work I am sure it is possible to get things mostly working,
> it's just question of cost/benefit for specific use cases.

Let me inject a couple useful references:

http://pl.atyp.us/wordpress/?p=2601
http://blog.basho.com/2010/04/05/why-vector-clocks-are-hard/

Re: Memcached protocol?

Posted by Tatu Saloranta <ts...@gmail.com>.

On Mon, Apr 5, 2010 at 5:10 PM, Paul Prescod <pa...@ayogo.com> wrote:
> On Mon, Apr 5, 2010 at 4:48 PM, Tatu Saloranta <ts...@gmail.com> wrote:
>> ...
>>
>> I would think that there is also possibility of losing some
>> increments, or perhaps getting duplicate increments?
>
> I believe that with vector clocks in Cassandra 0.7 you won't lose
> anything. The conflict resolver will do the summation for you
> properly.
>
> If I'm wrong, I'd love to hear more, though.

I think the key is that this is not automatic -- there is no general
mechanism for aggregating distinct modifications. Point being that you
could choose one amongst right answers, but not what to do with
concurrent modifications. So what is done instead is have
application-specific resolution strategy which makes use of semantics
of operations, to know how to combine such concurrent modifications
into "correct" answer. I don't know if this is trivial for case of
counter increments: especially since two concurrent increments give
same new value; yet correct combined result would be one higher (both
used base, added one).

That is to say, my understanding was that vector clocks would be
required but not sufficient for reconciliation of concurrent value
updates.

I may be off here; apologies if I have misunderstood some crucial piece.

-+ Tatu +-

Re: What is loadbalance supposed to do? 0.6.0RC1

Posted by Rob Coli <rc...@digg.com>.

On 4/7/10 7:39 AM, Mark Jones wrote:
> Also, if the data is pushed out to the other nodes before the bootstrapping, why has data been lost?  Does this mean that decommissioning a node results in data loss?

As I understand it, in the following scenario :

1) Node A has Keys 0-10.

2) Add Node B as a bootstrapping node, Node A is loadbalanced, sheds 
keys 5-10 to Node B.

Keys 5-10 are not actually removed from the SSTables on Node A until a 
"cleanup compaction" is run. A "cleanup compaction" is a "major 
compaction" which also checks to see whether keys still "belong" on this 
host.

I don't know whether you have actually experienced data loss, but based 
on the above, it should not be possible for you to have.

=Rob

RE: What is loadbalance supposed to do? 0.6.0RC1

Posted by Mark Jones <MJ...@imagehawk.com>.

The log said Bootstrapping  @ 07:34  (since it was 08:35, I assumed it wasn't doing anything, also, CPU usage was < 10%)

Turns out, when I restarted the node, it claimed the time was 7:35 rather than 8:35.  Why would log4j be off by one hour?  We are on CDT here, and have been for more than a week.  The date command returns the appropriate time (Wed Apr  7 09:24:50 CDT 2010), I see no evidence of a TZ variable and /etc/timezone shows "America/Chicago"

If it was off by 6 hours instead of 1, I could understand this, but its only off by one hour.

System.getProperties() reports the timezone as blank

Also, if the data is pushed out to the other nodes before the bootstrapping, why has data been lost?  Does this mean that decommissioning a node results in data loss?



-----Original Message-----
From: Sylvain Lebresne [mailto:sylvain@yakaz.com]
Sent: Wednesday, April 07, 2010 9:07 AM
To: user@cassandra.apache.org
Subject: Re: What is loadbalance supposed to do? 0.6.0RC1

> It shouldn't remove a node from the ring should it?  (appears it did)

It does. As explained here: http://wiki.apache.org/cassandra/Operations,
loadbalance 'decomission' the node and then add it back as a bootstrapping
node (roughly).

So that the node disappear is expected and it is supposed to come back.
But this is not a quick operation (and certainely not one you want to do every
other day). You apparently restarted Cassandra while it was doing its stuff.

Not sure the loss of data is to be expected though.

> It shouldn't remove data from db, should it?  (data size appears to grow, but records are now missing)
>
> Loaded 38 million "rows" and the ring looked like this:
>
>  mark@ec2:~/cassandra/apache-cassandra-0.6.0-rc1$ bin/nodetool --host 192.168.1.116 ring
>  Address       Status     Load          Range                                      Ring
>                                         167730615856220406399741259265091647472
>  192.168.1.116 Up         4.81 GB       54880762918591020775962843965839761529     |<--|
>  192.168.1.119 Up         12.96 GB      160455137948102479104219052453775170160    |   |
>  192.168.1.12  Up         8.98 GB       167730615856220406399741259265091647472    |--
>
> So I did this:
>  mark@record:~/cassandra/apache-cassandra-0.6.0-rc1$ bin/nodetool --host 192.168.1.12 loadbalance
>
> And this happened (even though Cassandra was still running):
>
>  mark@record:~/cassandra/apache-cassandra-0.6.0-rc1$ bin/nodetool --host 192.168.1.12 ring
>  Address       Status     Load          Range                                      Ring
>                                         160455137948102479104219052453775170160
>  192.168.1.116 Up         12.71 GB      54880762918591020775962843965839761529     |<--|
>  192.168.1.119 Up         13.47 GB      160455137948102479104219052453775170160    |-->|
>
> After restarting Cassandra on .12
>
>  mark@record:~/cassandra/apache-cassandra-0.6.0-rc1$ bin/nodetool --host 192.168.1.12 ring
>  Address       Status     Load          Range                                      Ring
>                                         160455137948102479104219052453775170160
>  192.168.1.116 Up         12.71 GB      54880762918591020775962843965839761529     |<--|
>  192.168.1.12  Up         8.98 GB       107669873051407416105654071439122680093    |   |
>  192.168.1.119 Up         13.47 GB      160455137948102479104219052453775170160    |-->|
>
> Now I have more data, but nearly 50% of my queries are failing (not found).  This data was checked before the load balance was done.
>

Re: What is loadbalance supposed to do? 0.6.0RC1

Posted by Sylvain Lebresne <sy...@yakaz.com>.

> It shouldn't remove a node from the ring should it?  (appears it did)

It does. As explained here: http://wiki.apache.org/cassandra/Operations,
loadbalance 'decomission' the node and then add it back as a bootstrapping
node (roughly).

So that the node disappear is expected and it is supposed to come back.
But this is not a quick operation (and certainely not one you want to do every
other day). You apparently restarted Cassandra while it was doing its stuff.

Not sure the loss of data is to be expected though.

> It shouldn't remove data from db, should it?  (data size appears to grow, but records are now missing)
>
> Loaded 38 million "rows" and the ring looked like this:
>
>  mark@ec2:~/cassandra/apache-cassandra-0.6.0-rc1$ bin/nodetool --host 192.168.1.116 ring
>  Address       Status     Load          Range                                      Ring
>                                         167730615856220406399741259265091647472
>  192.168.1.116 Up         4.81 GB       54880762918591020775962843965839761529     |<--|
>  192.168.1.119 Up         12.96 GB      160455137948102479104219052453775170160    |   |
>  192.168.1.12  Up         8.98 GB       167730615856220406399741259265091647472    |--
>
> So I did this:
>  mark@record:~/cassandra/apache-cassandra-0.6.0-rc1$ bin/nodetool --host 192.168.1.12 loadbalance
>
> And this happened (even though Cassandra was still running):
>
>  mark@record:~/cassandra/apache-cassandra-0.6.0-rc1$ bin/nodetool --host 192.168.1.12 ring
>  Address       Status     Load          Range                                      Ring
>                                         160455137948102479104219052453775170160
>  192.168.1.116 Up         12.71 GB      54880762918591020775962843965839761529     |<--|
>  192.168.1.119 Up         13.47 GB      160455137948102479104219052453775170160    |-->|
>
> After restarting Cassandra on .12
>
>  mark@record:~/cassandra/apache-cassandra-0.6.0-rc1$ bin/nodetool --host 192.168.1.12 ring
>  Address       Status     Load          Range                                      Ring
>                                         160455137948102479104219052453775170160
>  192.168.1.116 Up         12.71 GB      54880762918591020775962843965839761529     |<--|
>  192.168.1.12  Up         8.98 GB       107669873051407416105654071439122680093    |   |
>  192.168.1.119 Up         13.47 GB      160455137948102479104219052453775170160    |-->|
>
> Now I have more data, but nearly 50% of my queries are failing (not found).  This data was checked before the load balance was done.
>

What is loadbalance supposed to do? 0.6.0RC1

Posted by Mark Jones <MJ...@imagehawk.com>.

It shouldn't remove a node from the ring should it?  (appears it did)
It shouldn't remove data from db, should it?  (data size appears to grow, but records are now missing)

Loaded 38 million "rows" and the ring looked like this:

  mark@ec2:~/cassandra/apache-cassandra-0.6.0-rc1$ bin/nodetool --host 192.168.1.116 ring
  Address       Status     Load          Range                                      Ring
                                         167730615856220406399741259265091647472
  192.168.1.116 Up         4.81 GB       54880762918591020775962843965839761529     |<--|
  192.168.1.119 Up         12.96 GB      160455137948102479104219052453775170160    |   |
  192.168.1.12  Up         8.98 GB       167730615856220406399741259265091647472    |--

So I did this:
  mark@record:~/cassandra/apache-cassandra-0.6.0-rc1$ bin/nodetool --host 192.168.1.12 loadbalance

And this happened (even though Cassandra was still running):

  mark@record:~/cassandra/apache-cassandra-0.6.0-rc1$ bin/nodetool --host 192.168.1.12 ring
  Address       Status     Load          Range                                      Ring
                                         160455137948102479104219052453775170160
  192.168.1.116 Up         12.71 GB      54880762918591020775962843965839761529     |<--|
  192.168.1.119 Up         13.47 GB      160455137948102479104219052453775170160    |-->|

After restarting Cassandra on .12

  mark@record:~/cassandra/apache-cassandra-0.6.0-rc1$ bin/nodetool --host 192.168.1.12 ring
  Address       Status     Load          Range                                      Ring
                                         160455137948102479104219052453775170160
  192.168.1.116 Up         12.71 GB      54880762918591020775962843965839761529     |<--|
  192.168.1.12  Up         8.98 GB       107669873051407416105654071439122680093    |   |
  192.168.1.119 Up         13.47 GB      160455137948102479104219052453775170160    |-->|

Now I have more data, but nearly 50% of my queries are failing (not found).  This data was checked before the load balance was done.

Re: Memcached protocol?

Posted by gabriele renzi <rf...@gmail.com>.

On Tue, Apr 6, 2010 at 2:10 AM, Paul Prescod <pa...@ayogo.com> wrote:
> On Mon, Apr 5, 2010 at 4:48 PM, Tatu Saloranta <ts...@gmail.com> wrote:
>> ...
>>
>> I would think that there is also possibility of losing some
>> increments, or perhaps getting duplicate increments?
>
> I believe that with vector clocks in Cassandra 0.7 you won't lose
> anything. The conflict resolver will do the summation for you
> properly.
>
> If I'm wrong, I'd love to hear more, though.

I keep reading this in the list, but why would vector clocks allow
consistent counters in a conflicting update?
Say nodes A,B,C where A,B get concurrent updates, if we do
read-and-set this does not seem useful as we'd end up with a vector
<A:x+1,B:x+1> but why would x+1 be the correct value compared to x+2 ?

Or are we imagining spreading pairs <key,INCR>, <key,DECR> in which we
assume the writer client did not look at the existing value?

-- 
blog en: http://www.riffraff.info
blog it: http://riffraff.blogsome.com

Re: Memcached protocol?

Posted by Paul Prescod <pa...@ayogo.com>.

On Mon, Apr 5, 2010 at 4:48 PM, Tatu Saloranta <ts...@gmail.com> wrote:
> ...
>
> I would think that there is also possibility of losing some
> increments, or perhaps getting duplicate increments?

I believe that with vector clocks in Cassandra 0.7 you won't lose
anything. The conflict resolver will do the summation for you
properly.

If I'm wrong, I'd love to hear more, though.

 Paul Prescod

Re: Memcached protocol?

Posted by Tatu Saloranta <ts...@gmail.com>.

On Mon, Apr 5, 2010 at 1:46 PM, Paul Prescod <pa...@ayogo.com> wrote:
> On Mon, Apr 5, 2010 at 1:35 PM, Mike Malone <mi...@simplegeo.com> wrote:
>>> That's useful information Mike. I am a bit curious about what the most
>>> common use cases are for atomic increment/decrement. I'm familiar with
>>> atomic add as a sort of locking mechanism.
>>
>> They're useful for caching denormalized counts of things. Especially things
>> that change rapidly. Instead of invalidating the counter whenever an event
>> occurs that would incr/decr the counter, you can incr/decr the cached count
>> too.
>
> Do you think that a future cassandra increment/decrement would be
> incompatible with those use cases?
>
> It seems to me that in that use case, an eventually consistent counter
> is as useful as any other eventually consistent datum. In other words,
> there is no problem incrementing from 12 to 13 and getting back 15 as
> the return value (due to coinciding increments). 15 is the current
> correct value. It's arguably more correct then a memcached value which
> other processes are trying to update but cannot because of locking.
> Benjamin seemed to think that there were applications that depended on
> the result always being 13.

I would think that there is also possibility of losing some
increments, or perhaps getting duplicate increments?
It is not just isolation but also correctness that is hard to maintain
but correctness also. This can be more easily worked around in cases
where there is additional data that can be used to resolve potentially
ambiguous changes (like inferring which of shopping cart additions are
real, which duplicates).
With more work I am sure it is possible to get things mostly working,
it's just question of cost/benefit for specific use cases.

I think distributed counters are useful, but difficulty depends on
what are expected levels of concurrency/correctness/isolation.
There are many use cases where "about right" (or at least only losing
additions, or only getting extra ones) is enough. For example, when
calculating charges for usage, it is probably ok to lose some usage
charges, but not add bogus ones. If mostly consistent result can be
achieved cheaply, there is no point in implementing more complex
system to get minor increment (prevent loss of, say, 2% of
uncounted-for requests).

-+ Tatu +-

Re: Memcached protocol?

Posted by Mike Malone <mi...@simplegeo.com>.

On Mon, Apr 5, 2010 at 1:46 PM, Paul Prescod <pa...@ayogo.com> wrote:

> On Mon, Apr 5, 2010 at 1:35 PM, Mike Malone <mi...@simplegeo.com> wrote:
> >> That's useful information Mike. I am a bit curious about what the most
> >> common use cases are for atomic increment/decrement. I'm familiar with
> >> atomic add as a sort of locking mechanism.
> >
> > They're useful for caching denormalized counts of things. Especially
> things
> > that change rapidly. Instead of invalidating the counter whenever an
> event
> > occurs that would incr/decr the counter, you can incr/decr the cached
> count
> > too.
>
> Do you think that a future cassandra increment/decrement would be
> incompatible with those use cases?
>
> It seems to me that in that use case, an eventually consistent counter
> is as useful as any other eventually consistent datum.

An eventually consistent count operation in Cassandra would be great, and it
would satisfy all of the use cases I would typically use counts for in
memcached. It's just a matter of reconciling inconsistencies with a more
sophisticated operation than "latest write wins" (specifically, the
reconciliation operation should apply all incr/decr ops).

Mike

Re: Memcached protocol?

Posted by Paul Prescod <pa...@ayogo.com>.

On Mon, Apr 5, 2010 at 1:35 PM, Mike Malone <mi...@simplegeo.com> wrote:
>> That's useful information Mike. I am a bit curious about what the most
>> common use cases are for atomic increment/decrement. I'm familiar with
>> atomic add as a sort of locking mechanism.
>
> They're useful for caching denormalized counts of things. Especially things
> that change rapidly. Instead of invalidating the counter whenever an event
> occurs that would incr/decr the counter, you can incr/decr the cached count
> too.

Do you think that a future cassandra increment/decrement would be
incompatible with those use cases?

It seems to me that in that use case, an eventually consistent counter
is as useful as any other eventually consistent datum. In other words,
there is no problem incrementing from 12 to 13 and getting back 15 as
the return value (due to coinciding increments). 15 is the current
correct value. It's arguably more correct then a memcached value which
other processes are trying to update but cannot because of locking.
Benjamin seemed to think that there were applications that depended on
the result always being 13.

I'm trying to understand whether a future cassandra "eventually
consistent" increment/decrement feature based on vector clocks would
have semantics that are incompatible with most deployed uses of
memcached increment/decrement.

 Paul Prescod

Re: Memcached protocol?

Posted by Mike Malone <mi...@simplegeo.com>.

>
> That's useful information Mike. I am a bit curious about what the most
> common use cases are for atomic increment/decrement. I'm familiar with
> atomic add as a sort of locking mechanism.
>

They're useful for caching denormalized counts of things. Especially things
that change rapidly. Instead of invalidating the counter whenever an event
occurs that would incr/decr the counter, you can incr/decr the cached count
too.

In the case of Cassandra, they're useful for keeping counts of things in
general, since there's no efficient way to perform count operations with
Cassandra.

Mike

Re: Memcached protocol?

Posted by Paul Prescod <pa...@prescod.net>.

On Mon, Apr 5, 2010 at 10:45 AM, Mike Malone <mi...@simplegeo.com> wrote:
> ...
>
> FWIW, I added the atomic increment/decrement operations to the Django cache
> interface (and wrote that documentation) because the functionality was
> useful for large scale apps. I didn't implement atomic increment/decrement
> or atomic add for backends that didn't natively support it because, in my
> opinion (and in the opinion of the other Django contributors) any site that
> requires that sort of functionality should be running memcached as their
> cache backend. So I guess what I'm saying is that the functionality _is_
> useful. However, there probably are some users who would find the subset of
> the memcache protocol that you _can_ implement on top of Cassandra useful.

That's useful information Mike. I am a bit curious about what the most
common use cases are for atomic increment/decrement. I'm familiar with
atomic add as a sort of locking mechanism.

 Paul Prescod

Re: Memcached protocol?

Posted by Mike Malone <mi...@simplegeo.com>.

>
> Here are a couple of example projects for info.
>
> Django:
>
> http://docs.djangoproject.com/en/dev/topics/cache/
>
> It says of "increment/decrement": "incr()/decr() methods are not
> guaranteed to be atomic. On those backends that support atomic
> increment/decrement (most notably, the memcached backend), increment
> and decrement operations will be atomic. However, if the backend
> doesn't natively provide an increment/decrement operation, it will be
> implemented using a two-step retrieve/update."
>
> add() is implied to be atomic.
>
> Django itself does use add() in exactly one line of code that I can
> find. I believe it is just an optimization (don't bother saving this
> object if it already exists) and is not semantically meaningful. In
> fact, I don't believe that there is a code path to the add() call but
> I'm really not investigating very deeply.
>

FWIW, I added the atomic increment/decrement operations to the Django cache
interface (and wrote that documentation) because the functionality was
useful for large scale apps. I didn't implement atomic increment/decrement
or atomic add for backends that didn't natively support it because, in my
opinion (and in the opinion of the other Django contributors) any site that
requires that sort of functionality should be running memcached as their
cache backend. So I guess what I'm saying is that the functionality _is_
useful. However, there probably are some users who would find the subset of
the memcache protocol that you _can_ implement on top of Cassandra useful.

Meh.

Mike

Re: Memcached protocol?

Posted by Paul Prescod <pa...@prescod.net>.

On Mon, Apr 5, 2010 at 10:19 AM, Ryan Daum <ry...@thimbleware.com> wrote:
> Are these applications using memcached for caching or for something else?
> I don't see the point in putting Cassandra in as a level 1 or 2 cache
> replacement? Especially given as it does not support any reasonable
> expiration policy that would be of use in those circumstances.
> Ryan

You're right that without cache expiration, it's of questionable value
for page/fragment caches. I was just curious about what methods are
used out in the real world, so I looked at some big apps that I know
use memcached.

As far as client libraries go, I can attest that in Ruby at least, the
memcached client library is vastly faster than the thrift one. I don't
know about avro. In my tests with Ruby, the marshalling was dominating
the networking in Cassandra performance. 25% of the time in my
benchmark was used by a function called "write_byte" (which is
implemented in Ruby!). I would be happy to hear that I'm Doing
Something Wrong, but I think it's just a consequence of the thrift
protocol and the client implementation.

I have no idea whether Avro is better. I'm not sure if it works well
enough to be tested yet...

 Paul Prescod

Re: Memcached protocol?

Posted by Ryan Daum <ry...@thimbleware.com>.

Are these applications using memcached for caching or for something else?

I don't see the point in putting Cassandra in as a level 1 or 2 cache
replacement? Especially given as it does not support any reasonable
expiration policy that would be of use in those circumstances.

Ryan

On Mon, Apr 5, 2010 at 1:08 PM, Paul Prescod <pr...@gmail.com> wrote:

> On Mon, Apr 5, 2010 at 5:29 AM, Ryan Daum <ry...@thimbleware.com> wrote:
> > It seems pretty clear to me that the full memcached protocol is not
> > appropriate for Cassandra. The question is whether some subset of it is
> of
> > any use to anybody. The only advantage I can see is that there are a
> large
> > number of clients out there that can speak it already; but any app that
> is
> > making extensive use of it is probably doing so in a way that would
> preclude
> > Cassandra+Jmemcached from being a "drop-in" addition.
>
> Here are a couple of example projects for info.
>
> Django:
>
> http://docs.djangoproject.com/en/dev/topics/cache/
>
> It says of "increment/decrement": "incr()/decr() methods are not
> guaranteed to be atomic. On those backends that support atomic
> increment/decrement (most notably, the memcached backend), increment
> and decrement operations will be atomic. However, if the backend
> doesn't natively provide an increment/decrement operation, it will be
> implemented using a two-step retrieve/update."
>
> add() is implied to be atomic.
>
> Django itself does use add() in exactly one line of code that I can
> find. I believe it is just an optimization (don't bother saving this
> object if it already exists) and is not semantically meaningful. In
> fact, I don't believe that there is a code path to the add() call but
> I'm really not investigating very deeply.
>
> Rails:
>
>
> http://github.com/rails/rails/blob/master/actionpack/lib/action_controller/caching/actions.rb
>
> Here is the complete usage of the cache_store object in Rails.
>
> actionpack/lib/action_controller/caching/fragments.rb
> 44:          cache_store.write(key, content, options)
> 55:          result = cache_store.read(key, options)
> 66:          cache_store.exist?(key, options)
> 94:            cache_store.delete_matched(key, options)
> 96:            cache_store.delete(key, options)
>
> actionpack/lib/action_controller/caching.rb
> 79:        cache_store.fetch(ActiveSupport::Cache.expand_cache_key(key,
> :controller), options, &block)
>
> Fetch is an abstraction on top of read. delete_matched is not
> supported by the memcached plugin and not used by Rails.
>
> So as far as I can see, Rails only uses write, read, exist? and delete.
>
> It does expose more functions to the actual application, but the Rails
> framework does not use them. Most of them (including
> increment/decrement) are not even documented, and not supported with
> most cache stores.
>
>  *
> http://api.rubyonrails.org/classes/ActiveSupport/Cache/Store.html#M001029
>
> I checked a few of my own apps. They use get/set/add/delete, but the
> add is almost always used as an optimization.
>
>  Paul Prescod
>

Re: Memcached protocol?

Posted by Paul Prescod <pr...@gmail.com>.

On Mon, Apr 5, 2010 at 5:29 AM, Ryan Daum <ry...@thimbleware.com> wrote:
> It seems pretty clear to me that the full memcached protocol is not
> appropriate for Cassandra. The question is whether some subset of it is of
> any use to anybody. The only advantage I can see is that there are a large
> number of clients out there that can speak it already; but any app that is
> making extensive use of it is probably doing so in a way that would preclude
> Cassandra+Jmemcached from being a "drop-in" addition.

Here are a couple of example projects for info.

Django:

http://docs.djangoproject.com/en/dev/topics/cache/

It says of "increment/decrement": "incr()/decr() methods are not
guaranteed to be atomic. On those backends that support atomic
increment/decrement (most notably, the memcached backend), increment
and decrement operations will be atomic. However, if the backend
doesn't natively provide an increment/decrement operation, it will be
implemented using a two-step retrieve/update."

add() is implied to be atomic.

Django itself does use add() in exactly one line of code that I can
find. I believe it is just an optimization (don't bother saving this
object if it already exists) and is not semantically meaningful. In
fact, I don't believe that there is a code path to the add() call but
I'm really not investigating very deeply.

Rails:

http://github.com/rails/rails/blob/master/actionpack/lib/action_controller/caching/actions.rb

Here is the complete usage of the cache_store object in Rails.

actionpack/lib/action_controller/caching/fragments.rb
44:          cache_store.write(key, content, options)
55:          result = cache_store.read(key, options)
66:          cache_store.exist?(key, options)
94:            cache_store.delete_matched(key, options)
96:            cache_store.delete(key, options)

actionpack/lib/action_controller/caching.rb
79:        cache_store.fetch(ActiveSupport::Cache.expand_cache_key(key,
:controller), options, &block)

Fetch is an abstraction on top of read. delete_matched is not
supported by the memcached plugin and not used by Rails.

So as far as I can see, Rails only uses write, read, exist? and delete.

It does expose more functions to the actual application, but the Rails
framework does not use them. Most of them (including
increment/decrement) are not even documented, and not supported with
most cache stores.

 * http://api.rubyonrails.org/classes/ActiveSupport/Cache/Store.html#M001029

I checked a few of my own apps. They use get/set/add/delete, but the
add is almost always used as an optimization.

 Paul Prescod

Re: Memcached protocol?

Posted by Ryan Daum <ry...@thimbleware.com>.

It seems pretty clear to me that the full memcached protocol is not
appropriate for Cassandra. The question is whether some subset of it is of
any use to anybody. The only advantage I can see is that there are a large
number of clients out there that can speak it already; but any app that is
making extensive use of it is probably doing so in a way that would preclude
Cassandra+Jmemcached from being a "drop-in" addition.

Ryan

On Mon, Apr 5, 2010 at 9:02 AM, David Strauss <da...@fourkitchens.com>wrote:

> On 2010-04-05 07:47, Paul Prescod wrote:
> > On Mon, Apr 5, 2010 at 12:01 AM, David Strauss <da...@fourkitchens.com>
> wrote:
> >> On 2010-04-05 03:42, Paul Prescod wrote:
> >> ...
> >>
> >> There is a difference between Cassandra allowing inc/dec on values and
> >> actually *knowing* the resultant value at the time of the write. It's
> >> likely that inc/dec support will still feature blind writes if at all
> >> possible. The memcached protocol returns a resultant value from inc/dec.
> >
> > Right. That's why I said that the proxy layer would need to read the
> > result with an appropriate consistency level before returning to the
> > memcached client application. The client application would need to
> > declare its consistency preference using a configuration file.
>
> But your "write then read" model lacks the atomicity of the memcached
> API. It's possible for two clients to read the same value.
>
> --
> David Strauss
>   | david@fourkitchens.com
> Four Kitchens
>   | http://fourkitchens.com
>   | +1 512 454 6659 [office]
>   | +1 512 870 8453 [direct]
>
>

Re: Memcached protocol?

Posted by David Strauss <da...@fourkitchens.com>.

On 2010-04-05 07:47, Paul Prescod wrote:
> On Mon, Apr 5, 2010 at 12:01 AM, David Strauss <da...@fourkitchens.com> wrote:
>> On 2010-04-05 03:42, Paul Prescod wrote:
>> ...
>>
>> There is a difference between Cassandra allowing inc/dec on values and
>> actually *knowing* the resultant value at the time of the write. It's
>> likely that inc/dec support will still feature blind writes if at all
>> possible. The memcached protocol returns a resultant value from inc/dec.
> 
> Right. That's why I said that the proxy layer would need to read the
> result with an appropriate consistency level before returning to the
> memcached client application. The client application would need to
> declare its consistency preference using a configuration file.

But your "write then read" model lacks the atomicity of the memcached
API. It's possible for two clients to read the same value.

-- 
David Strauss
   | david@fourkitchens.com
Four Kitchens
   | http://fourkitchens.com
   | +1 512 454 6659 [office]
   | +1 512 870 8453 [direct]

Re: Memcached protocol?

Posted by Paul Prescod <pr...@gmail.com>.

On Mon, Apr 5, 2010 at 12:01 AM, David Strauss <da...@fourkitchens.com> wrote:
> On 2010-04-05 03:42, Paul Prescod wrote:
>...
>
> There is a difference between Cassandra allowing inc/dec on values and
> actually *knowing* the resultant value at the time of the write. It's
> likely that inc/dec support will still feature blind writes if at all
> possible. The memcached protocol returns a resultant value from inc/dec.

Right. That's why I said that the proxy layer would need to read the
result with an appropriate consistency level before returning to the
memcached client application. The client application would need to
declare its consistency preference using a configuration file.

 Paul Prescod

Re: Memcached protocol?

Posted by David Strauss <da...@fourkitchens.com>.

On 2010-04-05 03:42, Paul Prescod wrote:
> On Sun, Apr 4, 2010 at 5:06 PM, Benjamin Black <b...@b3k.us> wrote:
>> ...
>>
>> Are you suggesting this would give you counter semantics?
> 
> Yes: My understanding of cassandra-580 is that it gives you increment
> and decrement which are the basis of counters.

There is a difference between Cassandra allowing inc/dec on values and
actually *knowing* the resultant value at the time of the write. It's
likely that inc/dec support will still feature blind writes if at all
possible. The memcached protocol returns a resultant value from inc/dec.

-- 
David Strauss
   | david@fourkitchens.com
Four Kitchens
   | http://fourkitchens.com
   | +1 512 454 6659 [office]
   | +1 512 870 8453 [direct]

Re: Memcached protocol?

Posted by Paul Prescod <pr...@gmail.com>.

On Sun, Apr 4, 2010 at 5:06 PM, Benjamin Black <b...@b3k.us> wrote:
> ...
>
> Are you suggesting this would give you counter semantics?

Yes: My understanding of cassandra-580 is that it gives you increment
and decrement which are the basis of counters.

 Paul Prescod

Re: Memcached protocol?

Posted by Benjamin Black <b...@b3k.us>.

On Sun, Apr 4, 2010 at 4:52 PM, Paul Prescod <pa...@prescod.net> wrote:
>
> In order to strictly implement Memcached behaviour (where the result
> is returned immediately), you'd need to do a READ just after your
> WRITE, to force the conflict engine to detect and resolve the
> conflict.
>

Are you suggesting this would give you counter semantics?


b

Re: Memcached protocol?

Posted by Paul Prescod <pa...@prescod.net>.

On Sun, Apr 4, 2010 at 2:13 PM, Ryan Daum <ry...@thimbleware.com> wrote:
>
> I'm the author/maintainer of jmemcached; I'd be willing to do this and it'd be quite easy to do, but Cassandra is missing a number of things which would make it so we could only support a subset of the memcache protocol.

Yes, I had presumed that I would need to give up on the various
functions that depended upon the previous value being available. The
only one of these that I use in my applications is "add", to use
Memcached as a hacky lock server. I'm willing to give that up though:
there are many other components like Zookeeper and MySQL that can do
locking. As soon as you have more than one Memcached server and a risk
of partition, you already start to run into issues with depending on
previous memcached values being "correct".

So I'd propose you implement the subset for now. I have another idea
about how to handle the longer term issue, though.

My understanding of http://issues.apache.org/jira/browse/CASSANDRA-580
is that it will allow writes that are meant to be "merged" with other
writes, like appends, increments and conditional sets. If I understand
it correctly, you would register six "handlers" for increment,
decrement, append text, prepend text, set-if-nonexistent,
set-if-old-value-is-the-same.

Vector clocks are slated to be implemented in Cassandra 0.7

In order to strictly implement Memcached behaviour (where the result
is returned immediately), you'd need to do a READ just after your
WRITE, to force the conflict engine to detect and resolve the
conflict.

A configuration file would probably allow the end-user to determine
how slow/consistent this read should be:

 * http://wiki.apache.org/cassandra/API#ConsistencyLevel

If you use memcached's clustering technique, rather than cassandra's,
then all consistency levels would be equivalent.

If there were a race condition in a multi-node situation (two writes
before the nodes were consistent), probably both clients would have
their writes rejected. They could either continue on that basis or
retry with exponential back-off.

> ...
>
> that said, if people see a use case for this, I would do it.

I personally think that it would hit a nice 80/20 point, and once
vector clocks are implemented it might be easy to get to 99% memcached
compatibility.

 Paul Prescod

Re: Memcached protocol?

Posted by Ryan Daum <ry...@thimbleware.com>.

I'm the author/maintainer of jmemcached; I'd be willing to do this and it'd
be quite easy to do, but Cassandra is missing a number of things which would
make it so we could only support a subset of the memcache protocol. Memcache
has:

set-if-not-present ("add")
atomic increment / decrement
compare-and-set
string append / prepend

that said, if people see a use case for this, I would do it. My
implementation of the memcached protocol (built over netty) supports both
its binary and text dialects, and is fast. When run against a
concurrent-linked-hashmap based back end on my box it can do about 40k
ops/second vs native C memcached about 50-6k ops (measured using memslap
benchmark tool).

Ryan

On Sun, Apr 4, 2010 at 8:47 PM, Paul Prescod <pa...@prescod.net> wrote:

> Many Cassandra implementations seem to be memcached+X migrations, and some
> might be replacing memcached alone. Has anyone considered making a protocol
> handler or proxy that would allow Cassandra to talk the memached binary
> protocol?
>
> jmemcached + Cassandra = easy migration?
>
> I have barely started to consider the impedance mismatch issues, but the
> most glaring one is that the memcached namespace is flat, whereas
> Cassandra's has several levels of nesting. I think that this could be
> managed through configuration files. Either the user could map all Memcached
> stuff to a single ColumnFamily, or they could define a convention for
> splitting their keys based on special namespace characters like ":" or "_".
> The user could say how to interpret keys without enough parts (i.e. whether
> to treat the missing part as the keyspace or the columnfamily).
>
>  Paul Prescod
>
>