You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by rajasekhar kommineni <ra...@gmail.com> on 2018/09/18 04:38:53 UTC

Compaction Strategy

Hello Folks,

I need advice in deciding the compaction strategy for my C cluster. There are multiple jobs that will load the data with less inserts and more updates but no deletes. Currently I am using Size Tired compaction, but seeing auto compactions after the data load kicks, and also read timeouts during compaction.

Can anyone suggest good compaction strategy for my cluster which will reduce the timeouts.


Thanks,


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
For additional commands, e-mail: user-help@cassandra.apache.org

Re: Compaction Strategy

Posted by Nitan Kainth <ni...@gmail.com>.

It’s not recommended to disable compaction, you will end up with hundreds to thousands of sstables and increased read latency. If your data is immitable, means no update/deletes it will have least impact.

Decreasing compaction throughput will release resources for application but don’t accumulate too many pending compaction tasks.

Sent from my iPhone

> On Sep 19, 2018, at 4:44 PM, rajasekhar kommineni <ra...@gmail.com> wrote:
> 
> Hello,
> 
> Can any one respond to my questions. Is it a good idea to disable auto compaction and schedule it every 3 days. I am unable to control compaction and it is causing timeouts. 
> 
> Also will reducing or increasing compaction_throughput_mb_per_sec eliminate timeouts ?
> 
> Thanks,
> 
> 
>> On Sep 17, 2018, at 9:38 PM, rajasekhar kommineni <ra...@gmail.com> wrote:
>> 
>> Hello Folks,
>> 
>> I need advice in deciding the compaction strategy for my C cluster. There are multiple jobs that will load the data with less inserts and more updates but no deletes. Currently I am using Size Tired compaction, but seeing auto compactions after the data load kicks, and also read timeouts during compaction.
>> 
>> Can anyone suggest good compaction strategy for my cluster which will reduce the timeouts.
>> 
>> 
>> Thanks,
>> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: user-help@cassandra.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
For additional commands, e-mail: user-help@cassandra.apache.org

Re: Newsletter / Marketing: Re: Compaction Strategy

Posted by Ali Hubail <Al...@petrolink.com>.

I suspect that you are CPU bound rather than IO bound. There are a lot of
areas to look into, but I would start with a few.
I could not tell much from the results you shared since at the time, there
were no writes happening. Switching to a different compaction strategy
will most likely make it worse for you. as of now, you only use 1 sstable
per read, and STCS is the least expensive compaction type.

For starters,

1) Revise cassandra.yaml for Common disk settings, i.e., concurrent_reads,
concurrent_writes, etc

https://docs.datastax.com/en/cassandra/3.0/cassandra/configuration/configCassandra_yaml.html

2) Ensure that you optimize your OS for C*
https://docs.datastax.com/en/dse/6.0/dse-admin/datastax_enterprise/config/configRecommendedSettings.html

What I would do next is to monitor the system. The bottleneck you
explained is triggered by clients and it's out of your control. So
3) monitor system resources.
If you have DSE, then use OpsCenter. Otherwise, you can use dstat.
something like 'dstat -taf' would do it. You will have to run this for a
long period of time until the timeouts occur.
So, now you can have a general idea of what resources are saturating.

4) If this is CPU bound, then reduce contention by setting
concurrent_compactors to 1 in cassandra.yaml

5) monitor GC. There are a lot of tools that you can use to do so.
most of the time, it's the GC that is not tuned well. If you are not using
G1GC, then you might want to do so
you can read about GC here briefly:
https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsTuneJVM.html
https://docs.datastax.com/en/dse-trblshoot/doc/troubleshooting/gcPauses.html

6) this sounds naive, but check the logs to see if there is something
interesting there, you can also see the GC pauses there as well.

Ali Hubail

Petrolink International Ltd.

Confidentiality warning: This message and any attachments are intended
only for the persons to whom this message is addressed, are confidential,
and may be privileged. If you are not the intended recipient, you are
hereby notified that any review, retransmission, conversion to hard copy,
copying, modification, circulation or other use of this message and any
attachments is strictly prohibited. If you receive this message in error,
please notify the sender immediately by return email, and delete this
message and any attachments from your system. Petrolink International
Limited its subsidiaries, holding companies and affiliates disclaims all
responsibility from and accepts no liability whatsoever for the
consequences of any unauthorized person acting, or refraining from acting,
on any information contained in this message. For security purposes, staff
training, to assist in resolving complaints and to improve our customer
service, email communications may be monitored and telephone calls may be
recorded.

rajasekhar kommineni <ra...@gmail.com>
09/20/2018 01:14 PM
Please respond to
user@cassandra.apache.org

To
user@cassandra.apache.org,
cc

Subject
Newsletter / Marketing: Re: Compaction Strategy

Hi Ali,

Please find my answers

1) The table holds customer history data, where we receive the transaction
data everyday for multiple vendors and batch job is executed which updates
the data if the customer do any transactions that day, and insert will
happen if he is new customer.
Reads will happen if the customer visits to calculate the relevancy of
items based on the transactions he had done. I attached the tablestats &
tablehistograms output to file.

2) RAM : 30GB, CPU:4, hard drive : Amazon EBS

3) Attached output to file

Thanks,

On Sep 20, 2018, at 10:53 AM, Ali Hubail <Al...@petrolink.com> wrote:

Hello Rajasekhar,

It's not really clear to me what your workload is. As I understand it, you
do heavy writes, but what about reads?
So, could you:

1) execute
nodetool tablestats
nodetool tablehistograms
nodetool compactionstats

we should be able to see the latency, workload type, and the # of sstable
used for reads

2) specify your hardware specs. i.e., memory size, cpu, # of drives (for
data sstables), and type of harddrives (ssd/hdd)
3) cassandra.yaml (make sure to sanitize it)

You have a lot of updates, and your data is most likely scattered across
different sstables. size compaction strategy (STCS) is much less expensive
than level compaction strategy (LCS).

Stopping the background compaction should be approached with caution, I
think your problem is more to do with why STCS compaction is taking more
resources than you expect.

Regards,

Ali Hubail

Petrolink International Ltd
Confidentiality warning: This message and any attachments are intended
only for the persons to whom this message is addressed, are confidential,
and may be privileged. If you are not the intended recipient, you are
hereby notified that any review, retransmission, conversion to hard copy,
copying, modification, circulation or other use of this message and any
attachments is strictly prohibited. If you receive this message in error,
please notify the sender immediately by return email, and delete this
message and any attachments from your system. Petrolink International
Limited its subsidiaries, holding companies and affiliates disclaims all
responsibility from and accepts no liability whatsoever for the
consequences of any unauthorized person acting, or refraining from acting,
on any information contained in this message. For security purposes, staff
training, to assist in resolving complaints and to improve our customer
service, email communications may be monitored and telephone calls may be
recorded.

rajasekhar kommineni <ra...@gmail.com>
09/19/2018 04:44 PM

Please respond to
user@cassandra.apache.org

To
user@cassandra.apache.org,
cc

Subject
Re: Compaction Strategy

Hello,

Can any one respond to my questions. Is it a good idea to disable auto
compaction and schedule it every 3 days. I am unable to control compaction
and it is causing timeouts.

Also will reducing or increasing compaction_throughput_mb_per_sec
eliminate timeouts ?

Thanks,

> On Sep 17, 2018, at 9:38 PM, rajasekhar kommineni <ra...@gmail.com>
wrote:
>
> Hello Folks,
>
> I need advice in deciding the compaction strategy for my C cluster.
There are multiple jobs that will load the data with less inserts and more
updates but no deletes. Currently I am using Size Tired compaction, but
seeing auto compactions after the data load kicks, and also read timeouts
during compaction.
>
> Can anyone suggest good compaction strategy for my cluster which will
reduce the timeouts.
>
>
> Thanks,
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
For additional commands, e-mail: user-help@cassandra.apache.org

Re: Compaction Strategy

Posted by rajasekhar kommineni <ra...@gmail.com>.

Hi Ali,

Please find my answers 

1) The table holds customer history data, where we receive the transaction data everyday for multiple vendors and batch job is executed which updates the data if the customer do any transactions that day, and insert will happen if he is new customer.  
Reads will happen if the customer visits to calculate the relevancy of items based on the transactions he had done.  I attached the tablestats & tablehistograms output to file.

2) RAM : 30GB, CPU:4, hard drive : Amazon EBS

3) Attached output to file

Thanks,



> On Sep 20, 2018, at 10:53 AM, Ali Hubail <Al...@petrolink.com> wrote:
> 
> Hello Rajasekhar, 
> 
> It's not really clear to me what your workload is. As I understand it, you do heavy writes, but what about reads? 
> So, could you: 
> 
> 1) execute 
> nodetool tablestats 
> nodetool tablehistograms 
> nodetool compactionstats 
> 
> we should be able to see the latency, workload type, and the # of sstable used for reads 
> 
> 2) specify your hardware specs. i.e., memory size, cpu, # of drives (for data sstables), and type of harddrives (ssd/hdd) 
> 3) cassandra.yaml (make sure to sanitize it) 
> 
> You have a lot of updates, and your data is most likely scattered across different sstables. size compaction strategy (STCS) is much less expensive than level compaction strategy (LCS). 
> 
> Stopping the background compaction should be approached with caution, I think your problem is more to do with why STCS compaction is taking more resources than you expect. 
> 
> Regards, 
> 
> Ali Hubail
> 
> Petrolink International Ltd
> Confidentiality warning: This message and any attachments are intended only for the persons to whom this message is addressed, are confidential, and may be privileged. If you are not the intended recipient, you are hereby notified that any review, retransmission, conversion to hard copy, copying, modification, circulation or other use of this message and any attachments is strictly prohibited. If you receive this message in error, please notify the sender immediately by return email, and delete this message and any attachments from your system. Petrolink International Limited its subsidiaries, holding companies and affiliates disclaims all responsibility from and accepts no liability whatsoever for the consequences of any unauthorized person acting, or refraining from acting, on any information contained in this message. For security purposes, staff training, to assist in resolving complaints and to improve our customer service, email communications may be monitored and telephone calls may be recorded. 
> 
> 
> rajasekhar kommineni <ra...@gmail.com>
> 09/19/2018 04:44 PM
> Please respond to
> user@cassandra.apache.org
> 
> To
> user@cassandra.apache.org,
> cc
> Subject
> Re: Compaction Strategy
> 
> 
> 
> 
> 
> Hello,
> 
> Can any one respond to my questions. Is it a good idea to disable auto compaction and schedule it every 3 days. I am unable to control compaction and it is causing timeouts. 
> 
> Also will reducing or increasing compaction_throughput_mb_per_sec eliminate timeouts ?
> 
> Thanks,
> 
> 
> > On Sep 17, 2018, at 9:38 PM, rajasekhar kommineni <ra...@gmail.com> wrote:
> > 
> > Hello Folks,
> > 
> > I need advice in deciding the compaction strategy for my C cluster. There are multiple jobs that will load the data with less inserts and more updates but no deletes. Currently I am using Size Tired compaction, but seeing auto compactions after the data load kicks, and also read timeouts during compaction.
> > 
> > Can anyone suggest good compaction strategy for my cluster which will reduce the timeouts.
> > 
> > 
> > Thanks,
> > 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: user-help@cassandra.apache.org
> 
>

Re: Compaction Strategy

Posted by Ali Hubail <Al...@petrolink.com>.

Hello Rajasekhar,

It's not really clear to me what your workload is. As I understand it, you 
do heavy writes, but what about reads?
So, could you:

1) execute 
nodetool tablestats 
nodetool tablehistograms
nodetool compactionstats

we should be able to see the latency, workload type, and the # of sstable 
used for reads

2) specify your hardware specs. i.e., memory size, cpu, # of drives (for 
data sstables), and type of harddrives (ssd/hdd)
3) cassandra.yaml (make sure to sanitize it)

You have a lot of updates, and your data is most likely scattered across 
different sstables. size compaction strategy (STCS) is much less expensive 
than level compaction strategy (LCS). 

Stopping the background compaction should be approached with caution, I 
think your problem is more to do with why STCS compaction is taking more 
resources than you expect.

Regards,

Ali Hubail

Petrolink International Ltd
Confidentiality warning: This message and any attachments are intended 
only for the persons to whom this message is addressed, are confidential, 
and may be privileged. If you are not the intended recipient, you are 
hereby notified that any review, retransmission, conversion to hard copy, 
copying, modification, circulation or other use of this message and any 
attachments is strictly prohibited. If you receive this message in error, 
please notify the sender immediately by return email, and delete this 
message and any attachments from your system. Petrolink International 
Limited its subsidiaries, holding companies and affiliates disclaims all 
responsibility from and accepts no liability whatsoever for the 
consequences of any unauthorized person acting, or refraining from acting, 
on any information contained in this message. For security purposes, staff 
training, to assist in resolving complaints and to improve our customer 
service, email communications may be monitored and telephone calls may be 
recorded.

rajasekhar kommineni <ra...@gmail.com> 
09/19/2018 04:44 PM
Please respond to
user@cassandra.apache.org

To
user@cassandra.apache.org, 
cc

Subject
Re: Compaction Strategy

Hello,

Can any one respond to my questions. Is it a good idea to disable auto 
compaction and schedule it every 3 days. I am unable to control compaction 
and it is causing timeouts. 

Also will reducing or increasing compaction_throughput_mb_per_sec 
eliminate timeouts ?

Thanks,

> On Sep 17, 2018, at 9:38 PM, rajasekhar kommineni <ra...@gmail.com> 
wrote:
> 
> Hello Folks,
> 
> I need advice in deciding the compaction strategy for my C cluster. 
There are multiple jobs that will load the data with less inserts and more 
updates but no deletes. Currently I am using Size Tired compaction, but 
seeing auto compactions after the data load kicks, and also read timeouts 
during compaction.
> 
> Can anyone suggest good compaction strategy for my cluster which will 
reduce the timeouts.
> 
> 
> Thanks,
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
For additional commands, e-mail: user-help@cassandra.apache.org

Re: Compaction Strategy

Posted by rajasekhar kommineni <ra...@gmail.com>.

Hello,

Can any one respond to my questions. Is it a good idea to disable auto compaction and schedule it every 3 days. I am unable to control compaction and it is causing timeouts. 

Also will reducing or increasing compaction_throughput_mb_per_sec eliminate timeouts ?

Thanks,


> On Sep 17, 2018, at 9:38 PM, rajasekhar kommineni <ra...@gmail.com> wrote:
> 
> Hello Folks,
> 
> I need advice in deciding the compaction strategy for my C cluster. There are multiple jobs that will load the data with less inserts and more updates but no deletes. Currently I am using Size Tired compaction, but seeing auto compactions after the data load kicks, and also read timeouts during compaction.
> 
> Can anyone suggest good compaction strategy for my cluster which will reduce the timeouts.
> 
> 
> Thanks,
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
For additional commands, e-mail: user-help@cassandra.apache.org