You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Dmitry Olshansky <dm...@gridnine.com> on 2013/07/01 13:54:00 UTC

Re: Cassandra as storage for cache data

Hello,

thanks to all for your answers and comments.

What we've done:
- increased Java heap memory up to 6 Gb
- changed replication factor to 1
- set durable_writes to false
- set memtable_total_space_in_mb to 5000
- set commitlog_total_space_in_mb to 6000

If I understand correctly the last parameter has no matter since we set 
durable_writes to false.

Now the overall performance is much better but still not outstanding. We 
continue observing quite frequent compactions on every node.

According to OpsCenter's graphs Java Heap never grows above 3.5 Gb. So 
there is enough memory to keep memtables. Why they still get flushed to 
disk triggering compactions?

--
Best regards,
Dmitry Olshansky

Re: Cassandra as storage for cache data

Posted by Terje Marthinussen <tm...@gmail.com>.
If this is a tombstone problem as suggested by some, and it is ok to turn of replication as suggested by others, it may be an idea to do an optimization in cassandra where

if replication_factor < 1:
   do not create tombstones


Terje 


On Jul 2, 2013, at 11:11 PM, Dmitry Olshansky <dm...@gridnine.com> wrote:

> In our case we have continuous flow of data to be cached. Every second we're receiving tens of PUT requests. Every request has 500Kb payload in average and TTL about 20 minutes.
> 
> On the other side we have the similar flow of GET requests. Every GET request is transformed to "get by key" query for cassandra.
> 
> This is very simple and straightforward solution:
> - one CF
> - one key that is directly corresponds to cache entry key
> - one value of type bytes that corresponds to cache entry payload
> 
> To be honest, I don't see how we can switch this solution to multi-CF scheme playing with time-based snapshots.
> 
> Today this solution crashed again with overload symptoms:
> - almost non-stop compactifications on every node in cluster
> - large io-wait in the system
> - clients start failing with timeout exceptions
> 
> At the same time we see that cassandra uses only half of java heap. How we can enforce it to start using all available resources (namely operating memory)?
> 
> Best regards,
> Dmitry Olshansky


Re: Cassandra as storage for cache data

Posted by Dmitry Olshansky <dm...@gridnine.com>.
In our case we have continuous flow of data to be cached. Every second 
we're receiving tens of PUT requests. Every request has 500Kb payload in 
average and TTL about 20 minutes.

On the other side we have the similar flow of GET requests. Every GET 
request is transformed to "get by key" query for cassandra.

This is very simple and straightforward solution:
- one CF
- one key that is directly corresponds to cache entry key
- one value of type bytes that corresponds to cache entry payload

To be honest, I don't see how we can switch this solution to multi-CF 
scheme playing with time-based snapshots.

Today this solution crashed again with overload symptoms:
- almost non-stop compactifications on every node in cluster
- large io-wait in the system
- clients start failing with timeout exceptions

At the same time we see that cassandra uses only half of java heap. How 
we can enforce it to start using all available resources (namely 
operating memory)?

Best regards,
Dmitry Olshansky

Re: Cassandra as storage for cache data

Posted by Robert Coli <rc...@eventbrite.com>.
The most effective way to deal with obsolete Tombstones in the short lived
cache case seems to be to drop them on the floor en masse... :D

a) have two column families that the application alternates between, modulo
time_period
b) truncate and populate the cold one
c) read from the hot one
d) clear snapshots frequently

This avoids the downsides of dealing with Tombstones entirely, with only
the cost of increased complexity to manage snapshots. One could (NOT
RECOMMENDED) also disable automatic snapshotting on truncate...

=Rob
PS - apparently in the past this would have resulted in schema CF growing
without bound, but that is no longer the case...