You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by buddhasystem <po...@bnl.gov> on 2011/02/05 17:59:21 UTC

How bad is teh impact of compaction on performance?

Just wanted to see if someone with experience in running an actual service
can advise me:

how often do you run nodetool compact on your nodes? Do you stagger it in
time, for each node? How badly is performance affected?

I know this all seems too generic but then again no two clusters are created
equal anyhow. Just wanted to get a feel.

Thanks,
Maxim

-- 
View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/How-bad-is-teh-impact-of-compaction-on-performance-tp5995868p5995868.html
Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.

Re: How bad is teh impact of compaction on performance?

Posted by Edward Capriolo <ed...@gmail.com>.
On Sat, Feb 5, 2011 at 12:48 PM, buddhasystem <po...@bnl.gov> wrote:
>
> Thanks Edward. In our usage scenario, there is never downtime, it's a global
> 24/7 operation.
>
> What is impacted the worst, the read or write?
>
> How does a node handle compaction when there is a spike of writes coming to
> it?
>
>
>
> Edward Capriolo wrote:
>>
>> On Sat, Feb 5, 2011 at 11:59 AM, buddhasystem <po...@bnl.gov> wrote:
>>>
>>> Just wanted to see if someone with experience in running an actual
>>> service
>>> can advise me:
>>>
>>> how often do you run nodetool compact on your nodes? Do you stagger it in
>>> time, for each node? How badly is performance affected?
>>>
>>> I know this all seems too generic but then again no two clusters are
>>> created
>>> equal anyhow. Just wanted to get a feel.
>>>
>>> Thanks,
>>> Maxim
>>>
>>> --
>>> View this message in context:
>>> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/How-bad-is-teh-impact-of-compaction-on-performance-tp5995868p5995868.html
>>> Sent from the cassandra-user@incubator.apache.org mailing list archive at
>>> Nabble.com.
>>>
>>
>> This is an interesting topic. Cassandra can now remove tombstones on
>> non-major compaction. For some use cases you may not have to trigger
>> nodetool compact yourself to remove tombstones. Use cases that do not
>> to many updates, deletes may have the least need to run compaction
>> yourself.
>>
>> !However! If you have smaller SSTables, or less SSTables your read
>> operations will be more efficient.
>>
>> if you have downtime such as from 1AM-6AM. Going through a major
>> compaction might shrink you dataset significantly and that will make
>> reads better.
>>
>> Compaction can be more or less intensive. The largest factor is is row
>> size.  Users with large rows probably see faster compaction while
>> smaller rows see it take a long time. You can lower the priority of
>> the compaction thread for experimentation.
>>
>> As to the performance you want to get your cluster to the state where
>> it is not compacting often. This may mean you need more nodes to
>> handle writes.
>>
>> I graph the compaction information from JMX
>> http://www.jointhegrid.com/cassandra/cassandra-cacti-m6.jsp
>> to get a feel for how often a node is compacting on average. Also I
>> cross reference the compaction with Read latency and IO graphs I have
>> to see what impact compaction has on reads.
>>
>> Forcing a major compaction also lowers the chances a compaction will
>> happen during the day on peak time. I major compact a few cluster
>> nodes each night through cron (gc time 3 days). This has been good for
>> keeping our data on disk as small as possible. Forcing the major
>> compact at night uses IO, but i find it saves IO over the course of
>> the day because each read seeks less on disk.
>>
>>
>
> --
> View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/How-bad-is-the-impact-of-compaction-on-performance-tp5995868p5995978.html
> Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.
>

It does not have to be downtime. It just has to be a slow time. Use
your traffic graphs to run major compact at the slowest time so it is
least impacting on performance.

Compaction does not generally effect writes or busts or writes,
especially if your writes go to a separate commit log disk.

In the best case scenario compaction may not effect your performance
at all. An example of this would be if your use case is near 100%
reads are serviced by row cache disk is not a factor.

Generally speaking if you have good fast hard disks, and only a single
node is compacting at a given time the cluster absorbs this. In 0.7.0
dynamic snitch should help re-route traffic away from slower nodes for
even less impact. In other words, making compaction "non impacting" is
all about capacity.

Re: How bad is teh impact of compaction on performance?

Posted by buddhasystem <po...@bnl.gov>.
Thanks Edward. In our usage scenario, there is never downtime, it's a global
24/7 operation.

What is impacted the worst, the read or write?

How does a node handle compaction when there is a spike of writes coming to
it?



Edward Capriolo wrote:
> 
> On Sat, Feb 5, 2011 at 11:59 AM, buddhasystem <po...@bnl.gov> wrote:
>>
>> Just wanted to see if someone with experience in running an actual
>> service
>> can advise me:
>>
>> how often do you run nodetool compact on your nodes? Do you stagger it in
>> time, for each node? How badly is performance affected?
>>
>> I know this all seems too generic but then again no two clusters are
>> created
>> equal anyhow. Just wanted to get a feel.
>>
>> Thanks,
>> Maxim
>>
>> --
>> View this message in context:
>> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/How-bad-is-teh-impact-of-compaction-on-performance-tp5995868p5995868.html
>> Sent from the cassandra-user@incubator.apache.org mailing list archive at
>> Nabble.com.
>>
> 
> This is an interesting topic. Cassandra can now remove tombstones on
> non-major compaction. For some use cases you may not have to trigger
> nodetool compact yourself to remove tombstones. Use cases that do not
> to many updates, deletes may have the least need to run compaction
> yourself.
> 
> !However! If you have smaller SSTables, or less SSTables your read
> operations will be more efficient.
> 
> if you have downtime such as from 1AM-6AM. Going through a major
> compaction might shrink you dataset significantly and that will make
> reads better.
> 
> Compaction can be more or less intensive. The largest factor is is row
> size.  Users with large rows probably see faster compaction while
> smaller rows see it take a long time. You can lower the priority of
> the compaction thread for experimentation.
> 
> As to the performance you want to get your cluster to the state where
> it is not compacting often. This may mean you need more nodes to
> handle writes.
> 
> I graph the compaction information from JMX
> http://www.jointhegrid.com/cassandra/cassandra-cacti-m6.jsp
> to get a feel for how often a node is compacting on average. Also I
> cross reference the compaction with Read latency and IO graphs I have
> to see what impact compaction has on reads.
> 
> Forcing a major compaction also lowers the chances a compaction will
> happen during the day on peak time. I major compact a few cluster
> nodes each night through cron (gc time 3 days). This has been good for
> keeping our data on disk as small as possible. Forcing the major
> compact at night uses IO, but i find it saves IO over the course of
> the day because each read seeks less on disk.
> 
> 

-- 
View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/How-bad-is-the-impact-of-compaction-on-performance-tp5995868p5995978.html
Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.

Re: How bad is teh impact of compaction on performance?

Posted by Edward Capriolo <ed...@gmail.com>.
On Sat, Feb 5, 2011 at 11:59 AM, buddhasystem <po...@bnl.gov> wrote:
>
> Just wanted to see if someone with experience in running an actual service
> can advise me:
>
> how often do you run nodetool compact on your nodes? Do you stagger it in
> time, for each node? How badly is performance affected?
>
> I know this all seems too generic but then again no two clusters are created
> equal anyhow. Just wanted to get a feel.
>
> Thanks,
> Maxim
>
> --
> View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/How-bad-is-teh-impact-of-compaction-on-performance-tp5995868p5995868.html
> Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.
>

This is an interesting topic. Cassandra can now remove tombstones on
non-major compaction. For some use cases you may not have to trigger
nodetool compact yourself to remove tombstones. Use cases that do not
to many updates, deletes may have the least need to run compaction
yourself.

!However! If you have smaller SSTables, or less SSTables your read
operations will be more efficient.

if you have downtime such as from 1AM-6AM. Going through a major
compaction might shrink you dataset significantly and that will make
reads better.

Compaction can be more or less intensive. The largest factor is is row
size.  Users with large rows probably see faster compaction while
smaller rows see it take a long time. You can lower the priority of
the compaction thread for experimentation.

As to the performance you want to get your cluster to the state where
it is not compacting often. This may mean you need more nodes to
handle writes.

I graph the compaction information from JMX
http://www.jointhegrid.com/cassandra/cassandra-cacti-m6.jsp
to get a feel for how often a node is compacting on average. Also I
cross reference the compaction with Read latency and IO graphs I have
to see what impact compaction has on reads.

Forcing a major compaction also lowers the chances a compaction will
happen during the day on peak time. I major compact a few cluster
nodes each night through cron (gc time 3 days). This has been good for
keeping our data on disk as small as possible. Forcing the major
compact at night uses IO, but i find it saves IO over the course of
the day because each read seeks less on disk.