You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Dmitry Olshansky <dm...@gridnine.com> on 2013/06/25 14:31:28 UTC

Cassandra as storage for cache data

Hello,

we are using Cassandra as a data storage for our caching system. Our 
application generates about 20 put and get requests per second. An 
average size of one cache item is about 500 Kb.

Cache items are placed into one column family with TTL set to 20 - 60 
minutes. Keys and values are bytes (not utf8 strings). Compaction 
strategy is SizeTieredCompactionStrategy.

We setup Cassandra 1.2.6 cluster of 4 nodes. Replication factor is 2. 
Each node has 10GB of RAM and enough space on HDD.

Now when we're putting this cluster into the load it's quickly fills 
with our runtime data (about 5 GB on every node) and we start observing 
performance degradation with often timeouts on client side.

We see that on each node compaction starts very frequently and lasts for 
several minutes to complete. It seems that each node usually busy with 
compaction process.

Here the questions:

What are the recommended setup configuration for our use case?

Is it makes sense to somehow tell Cassandra to keep all data in memory 
(memtables) to eliminate flushing it to disk (sstables) thus decreasing 
number of compactions? How to achieve this behavior?

Cassandra is starting with default shell script that gives the following 
command line:

jsvc.exec -user cassandra -home 
/usr/lib/jvm/java-6-openjdk-amd64/jre/bin/../ -pidfile 
/var/run/cassandra.pid -errfile &1 -outfile 
/var/log/cassandra/output.log -cp <CLASSPATH_SKIPPED> 
-Dlog4j.configuration=log4j-server.properties 
-Dlog4j.defaultInitOverride=true 
-XX:HeapDumpPath=/var/lib/cassandra/java_1371805844.hprof 
-XX:ErrorFile=/var/lib/cassandra/hs_err_1371805844.log -ea 
-javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar 
-XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms2500M -Xmx2500M 
-Xmn400M -XX:+HeapDumpOnOutOfMemoryError -Xss180k -XX:+UseParNewGC 
-XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled 
-XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 
-XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly 
-XX:+UseTLAB -Djava.net.preferIPv4Stack=true 
-Dcom.sun.management.jmxremote.port=7199 
-Dcom.sun.management.jmxremote.ssl=false 
-Dcom.sun.management.jmxremote.authenticate=false 
org.apache.cassandra.service.CassandraDaemon

-- 
Best regards,
Dmitry Olshansky

Re: Cassandra as storage for cache data

Posted by aaron morton <aa...@thelastpickle.com>.

> https://issues.apache.org/jira/browse/CASSANDRA-2958
Thanks

-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 28/06/2013, at 6:30 AM, Robert Coli <rc...@eventbrite.com> wrote:

> On Wed, Jun 26, 2013 at 9:51 PM, aaron morton <aa...@thelastpickle.com> wrote:
>> WARNING: disabling durable_writes means that writes are only in memory and
>> will not be committed to disk until the CF's are flushed. You should
>> *always* use nodetool drain before shutting down a node in this case.
> 
> While a rational and informed observer would infer the need for drain
> from the normally tight relationship of memtables to the contents of
> the commit log, clean shutdown actually explicitly calls flush when
> durable_writes is disabled.
> 
> https://issues.apache.org/jira/browse/CASSANDRA-2958
> 
> =Rob

Re: Cassandra as storage for cache data

Posted by Robert Coli <rc...@eventbrite.com>.

On Wed, Jun 26, 2013 at 9:51 PM, aaron morton <aa...@thelastpickle.com> wrote:
> WARNING: disabling durable_writes means that writes are only in memory and
> will not be committed to disk until the CF's are flushed. You should
> *always* use nodetool drain before shutting down a node in this case.

While a rational and informed observer would infer the need for drain
from the normally tight relationship of memtables to the contents of
the commit log, clean shutdown actually explicitly calls flush when
durable_writes is disabled.

https://issues.apache.org/jira/browse/CASSANDRA-2958

=Rob

Re: Cassandra as storage for cache data

Posted by aaron morton <aa...@thelastpickle.com>.

I'll also add that you are probably running into some memory issues, 2.5 GB is a low heap size 

> -Xms2500M -Xmx2500M -Xmn400M

If you really do have a cache and want to reduce the disk activity disable durable_writes on the KS. That will stop the writes from going to the commit log which is one reason memtables are flushed to disk. The other reason is because the memory usage approaches the memtable_total_space_in_mb setting. Modern (1.2) releases are very good at managing the memory provided the jamm meter is working. With this approach and the other tips below you should be able to get better performance. 

WARNING: disabling durable_writes means that writes are only in memory and will not be committed to disk until the CF's are flushed. You should *always* use nodetool drain before shutting down a node in this case. 

Cheers

-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 26/06/2013, at 8:52 AM, sankalp kohli <ko...@gmail.com> wrote:

> Apart from what Jeremy said, you can try these
> 1) Use replication = 1. It is cache data and you dont need persistence. 
> 2) Try playing with memtable size.
> 3) Use netflix client library as it will reduce one hop. It will chose the node with data as the co ordinator. 
> 4) Work on your schema. You might want to have fewer columns in each row. With fatter rows, bloom filter will give out more sstables which are eligible. 
> 
> -Sankalp
> 
> 
> On Tue, Jun 25, 2013 at 9:04 AM, Jeremy Hanna <je...@gmail.com> wrote:
> If you have rapidly expiring data, then tombstones are probably filling your disk and your heap (depending on how you order the data on disk).  To check to see if your queries are affected by tombstones, you might try using the query tracing that's built-in to 1.2.
> See:
> http://www.datastax.com/dev/blog/cassandra-anti-patterns-queues-and-queue-like-datasets  -- has an example of tracing where you can see tombstones affecting the query
> http://www.datastax.com/dev/blog/tracing-in-cassandra-1-2
> 
> You'll want to consider reducing the gc_grace period from the default of 10 days for those column families - with the understanding why gc_grace exists in the first place, see http://wiki.apache.org/cassandra/DistributedDeletes .  Then once the gc_grace period has passed, the tombstones will stay around until they are compacted away.  So there are two options currently to compact them away more quickly:
> 1) use leveled compaction - see http://www.datastax.com/dev/blog/when-to-use-leveled-compaction  Leveled compaction only requires 10% headroom (as opposed to 50% for size tiered compaction) for amount of disk that needs to be kept free.
> 2) if 1 doesn't work and you're still seeing performance degrading and the tombstones aren't getting cleared out fast enough, you might consider using size tiered compaction but performing regular major compactions to get rid of expired data.
> 
> Keep in mind though that if you use gc_grace of 0 and do any kind of manual deletes outside of TTLs, you probably want to do the deletes at ConsistencyLevel.ALL or else if a node goes down, then comes back up, there's a chance that deleted data may be resurrected.  That only applies to non-ttl data where you manually delete it.  See the explanation of distributed deletes for more information.
> 
> 
> On 25 Jun 2013, at 13:31, Dmitry Olshansky <dm...@gridnine.com> wrote:
> 
> > Hello,
> >
> > we are using Cassandra as a data storage for our caching system. Our application generates about 20 put and get requests per second. An average size of one cache item is about 500 Kb.
> >
> > Cache items are placed into one column family with TTL set to 20 - 60 minutes. Keys and values are bytes (not utf8 strings). Compaction strategy is SizeTieredCompactionStrategy.
> >
> > We setup Cassandra 1.2.6 cluster of 4 nodes. Replication factor is 2. Each node has 10GB of RAM and enough space on HDD.
> >
> > Now when we're putting this cluster into the load it's quickly fills with our runtime data (about 5 GB on every node) and we start observing performance degradation with often timeouts on client side.
> >
> > We see that on each node compaction starts very frequently and lasts for several minutes to complete. It seems that each node usually busy with compaction process.
> >
> > Here the questions:
> >
> > What are the recommended setup configuration for our use case?
> >
> > Is it makes sense to somehow tell Cassandra to keep all data in memory (memtables) to eliminate flushing it to disk (sstables) thus decreasing number of compactions? How to achieve this behavior?
> >
> > Cassandra is starting with default shell script that gives the following command line:
> >
> > jsvc.exec -user cassandra -home /usr/lib/jvm/java-6-openjdk-amd64/jre/bin/../ -pidfile /var/run/cassandra.pid -errfile &1 -outfile /var/log/cassandra/output.log -cp <CLASSPATH_SKIPPED> -Dlog4j.configuration=log4j-server.properties -Dlog4j.defaultInitOverride=true -XX:HeapDumpPath=/var/lib/cassandra/java_1371805844.hprof -XX:ErrorFile=/var/lib/cassandra/hs_err_1371805844.log -ea -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms2500M -Xmx2500M -Xmn400M -XX:+HeapDumpOnOutOfMemoryError -Xss180k -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseTLAB -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=7199 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false org.apache.cassandra.service.CassandraDaemon
> >
> > --
> > Best regards,
> > Dmitry Olshansky
> >
> 
>

Re: Cassandra as storage for cache data

Posted by sankalp kohli <ko...@gmail.com>.

Apart from what Jeremy said, you can try these
1) Use replication = 1. It is cache data and you dont need persistence.
2) Try playing with memtable size.
3) Use netflix client library as it will reduce one hop. It will chose the
node with data as the co ordinator.
4) Work on your schema. You might want to have fewer columns in each row.
With fatter rows, bloom filter will give out more sstables which are
eligible.

-Sankalp


On Tue, Jun 25, 2013 at 9:04 AM, Jeremy Hanna <je...@gmail.com>wrote:

> If you have rapidly expiring data, then tombstones are probably filling
> your disk and your heap (depending on how you order the data on disk).  To
> check to see if your queries are affected by tombstones, you might try
> using the query tracing that's built-in to 1.2.
> See:
>
> http://www.datastax.com/dev/blog/cassandra-anti-patterns-queues-and-queue-like-datasets -- has an example of tracing where you can see tombstones affecting the
> query
> http://www.datastax.com/dev/blog/tracing-in-cassandra-1-2
>
> You'll want to consider reducing the gc_grace period from the default of
> 10 days for those column families - with the understanding why gc_grace
> exists in the first place, see
> http://wiki.apache.org/cassandra/DistributedDeletes .  Then once the
> gc_grace period has passed, the tombstones will stay around until they are
> compacted away.  So there are two options currently to compact them away
> more quickly:
> 1) use leveled compaction - see
> http://www.datastax.com/dev/blog/when-to-use-leveled-compaction  Leveled
> compaction only requires 10% headroom (as opposed to 50% for size tiered
> compaction) for amount of disk that needs to be kept free.
> 2) if 1 doesn't work and you're still seeing performance degrading and the
> tombstones aren't getting cleared out fast enough, you might consider using
> size tiered compaction but performing regular major compactions to get rid
> of expired data.
>
> Keep in mind though that if you use gc_grace of 0 and do any kind of
> manual deletes outside of TTLs, you probably want to do the deletes at
> ConsistencyLevel.ALL or else if a node goes down, then comes back up,
> there's a chance that deleted data may be resurrected.  That only applies
> to non-ttl data where you manually delete it.  See the explanation of
> distributed deletes for more information.
>
>
> On 25 Jun 2013, at 13:31, Dmitry Olshansky <dm...@gridnine.com>
> wrote:
>
> > Hello,
> >
> > we are using Cassandra as a data storage for our caching system. Our
> application generates about 20 put and get requests per second. An average
> size of one cache item is about 500 Kb.
> >
> > Cache items are placed into one column family with TTL set to 20 - 60
> minutes. Keys and values are bytes (not utf8 strings). Compaction strategy
> is SizeTieredCompactionStrategy.
> >
> > We setup Cassandra 1.2.6 cluster of 4 nodes. Replication factor is 2.
> Each node has 10GB of RAM and enough space on HDD.
> >
> > Now when we're putting this cluster into the load it's quickly fills
> with our runtime data (about 5 GB on every node) and we start observing
> performance degradation with often timeouts on client side.
> >
> > We see that on each node compaction starts very frequently and lasts for
> several minutes to complete. It seems that each node usually busy with
> compaction process.
> >
> > Here the questions:
> >
> > What are the recommended setup configuration for our use case?
> >
> > Is it makes sense to somehow tell Cassandra to keep all data in memory
> (memtables) to eliminate flushing it to disk (sstables) thus decreasing
> number of compactions? How to achieve this behavior?
> >
> > Cassandra is starting with default shell script that gives the following
> command line:
> >
> > jsvc.exec -user cassandra -home
> /usr/lib/jvm/java-6-openjdk-amd64/jre/bin/../ -pidfile
> /var/run/cassandra.pid -errfile &1 -outfile /var/log/cassandra/output.log
> -cp <CLASSPATH_SKIPPED> -Dlog4j.configuration=log4j-server.properties
> -Dlog4j.defaultInitOverride=true
> -XX:HeapDumpPath=/var/lib/cassandra/java_1371805844.hprof
> -XX:ErrorFile=/var/lib/cassandra/hs_err_1371805844.log -ea
> -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar -XX:+UseThreadPriorities
> -XX:ThreadPriorityPolicy=42 -Xms2500M -Xmx2500M -Xmn400M
> -XX:+HeapDumpOnOutOfMemoryError -Xss180k -XX:+UseParNewGC
> -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8
> -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75
> -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseTLAB
> -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=7199
> -Dcom.sun.management.jmxremote.ssl=false
> -Dcom.sun.management.jmxremote.authenticate=false
> org.apache.cassandra.service.CassandraDaemon
> >
> > --
> > Best regards,
> > Dmitry Olshansky
> >
>
>

Re: Cassandra as storage for cache data

Posted by Jeremy Hanna <je...@gmail.com>.

If you have rapidly expiring data, then tombstones are probably filling your disk and your heap (depending on how you order the data on disk).  To check to see if your queries are affected by tombstones, you might try using the query tracing that's built-in to 1.2.
See:
http://www.datastax.com/dev/blog/cassandra-anti-patterns-queues-and-queue-like-datasets  -- has an example of tracing where you can see tombstones affecting the query
http://www.datastax.com/dev/blog/tracing-in-cassandra-1-2

You'll want to consider reducing the gc_grace period from the default of 10 days for those column families - with the understanding why gc_grace exists in the first place, see http://wiki.apache.org/cassandra/DistributedDeletes .  Then once the gc_grace period has passed, the tombstones will stay around until they are compacted away.  So there are two options currently to compact them away more quickly:
1) use leveled compaction - see http://www.datastax.com/dev/blog/when-to-use-leveled-compaction  Leveled compaction only requires 10% headroom (as opposed to 50% for size tiered compaction) for amount of disk that needs to be kept free.
2) if 1 doesn't work and you're still seeing performance degrading and the tombstones aren't getting cleared out fast enough, you might consider using size tiered compaction but performing regular major compactions to get rid of expired data.

Keep in mind though that if you use gc_grace of 0 and do any kind of manual deletes outside of TTLs, you probably want to do the deletes at ConsistencyLevel.ALL or else if a node goes down, then comes back up, there's a chance that deleted data may be resurrected.  That only applies to non-ttl data where you manually delete it.  See the explanation of distributed deletes for more information.

On 25 Jun 2013, at 13:31, Dmitry Olshansky <dm...@gridnine.com> wrote:

> Hello,
> 
> we are using Cassandra as a data storage for our caching system. Our application generates about 20 put and get requests per second. An average size of one cache item is about 500 Kb.
> 
> Cache items are placed into one column family with TTL set to 20 - 60 minutes. Keys and values are bytes (not utf8 strings). Compaction strategy is SizeTieredCompactionStrategy.
> 
> We setup Cassandra 1.2.6 cluster of 4 nodes. Replication factor is 2. Each node has 10GB of RAM and enough space on HDD.
> 
> Now when we're putting this cluster into the load it's quickly fills with our runtime data (about 5 GB on every node) and we start observing performance degradation with often timeouts on client side.
> 
> We see that on each node compaction starts very frequently and lasts for several minutes to complete. It seems that each node usually busy with compaction process.
> 
> Here the questions:
> 
> What are the recommended setup configuration for our use case?
> 
> Is it makes sense to somehow tell Cassandra to keep all data in memory (memtables) to eliminate flushing it to disk (sstables) thus decreasing number of compactions? How to achieve this behavior?
> 
> Cassandra is starting with default shell script that gives the following command line:
> 
> jsvc.exec -user cassandra -home /usr/lib/jvm/java-6-openjdk-amd64/jre/bin/../ -pidfile /var/run/cassandra.pid -errfile &1 -outfile /var/log/cassandra/output.log -cp <CLASSPATH_SKIPPED> -Dlog4j.configuration=log4j-server.properties -Dlog4j.defaultInitOverride=true -XX:HeapDumpPath=/var/lib/cassandra/java_1371805844.hprof -XX:ErrorFile=/var/lib/cassandra/hs_err_1371805844.log -ea -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms2500M -Xmx2500M -Xmn400M -XX:+HeapDumpOnOutOfMemoryError -Xss180k -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseTLAB -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=7199 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false org.apache.cassandra.service.CassandraDaemon
> 
> -- 
> Best regards,
> Dmitry Olshansky
>