You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Ali Hubail <Al...@petrolink.com> on 2018/09/21 20:30:28 UTC

Re: Newsletter / Marketing: Re: Compaction Strategy

I suspect that you are CPU bound rather than IO bound. There are a lot of
areas to look into, but I would start with a few.
I could not tell much from the results you shared since at the time, there
were no writes happening. Switching to a different compaction strategy
will most likely make it worse for you. as of now, you only use 1 sstable
per read, and STCS is the least expensive compaction type.

For starters,

1) Revise cassandra.yaml for Common disk settings, i.e., concurrent_reads,
concurrent_writes, etc

https://docs.datastax.com/en/cassandra/3.0/cassandra/configuration/configCassandra_yaml.html

2) Ensure that you optimize your OS for C*
https://docs.datastax.com/en/dse/6.0/dse-admin/datastax_enterprise/config/configRecommendedSettings.html

What I would do next is to monitor the system. The bottleneck you
explained is triggered by clients and it's out of your control. So
3) monitor system resources.
If you have DSE, then use OpsCenter. Otherwise, you can use dstat.
something like 'dstat -taf' would do it. You will have to run this for a
long period of time until the timeouts occur.
So, now you can have a general idea of what resources are saturating.

4) If this is CPU bound, then reduce contention by setting
concurrent_compactors to 1 in cassandra.yaml

5) monitor GC. There are a lot of tools that you can use to do so.
most of the time, it's the GC that is not tuned well. If you are not using
G1GC, then you might want to do so
you can read about GC here briefly:
https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsTuneJVM.html
https://docs.datastax.com/en/dse-trblshoot/doc/troubleshooting/gcPauses.html

6) this sounds naive, but check the logs to see if there is something
interesting there, you can also see the GC pauses there as well.

Ali Hubail

Petrolink International Ltd.

Confidentiality warning: This message and any attachments are intended
only for the persons to whom this message is addressed, are confidential,
and may be privileged. If you are not the intended recipient, you are
hereby notified that any review, retransmission, conversion to hard copy,
copying, modification, circulation or other use of this message and any
attachments is strictly prohibited. If you receive this message in error,
please notify the sender immediately by return email, and delete this
message and any attachments from your system. Petrolink International
Limited its subsidiaries, holding companies and affiliates disclaims all
responsibility from and accepts no liability whatsoever for the
consequences of any unauthorized person acting, or refraining from acting,
on any information contained in this message. For security purposes, staff
training, to assist in resolving complaints and to improve our customer
service, email communications may be monitored and telephone calls may be
recorded.

rajasekhar kommineni <ra...@gmail.com>
09/20/2018 01:14 PM
Please respond to
user@cassandra.apache.org

To
user@cassandra.apache.org,
cc

Subject
Newsletter / Marketing: Re: Compaction Strategy

Hi Ali,

Please find my answers

1) The table holds customer history data, where we receive the transaction
data everyday for multiple vendors and batch job is executed which updates
the data if the customer do any transactions that day, and insert will
happen if he is new customer.
Reads will happen if the customer visits to calculate the relevancy of
items based on the transactions he had done. I attached the tablestats &
tablehistograms output to file.

2) RAM : 30GB, CPU:4, hard drive : Amazon EBS

3) Attached output to file

Thanks,

On Sep 20, 2018, at 10:53 AM, Ali Hubail <Al...@petrolink.com> wrote:

Hello Rajasekhar,

It's not really clear to me what your workload is. As I understand it, you
do heavy writes, but what about reads?
So, could you:

1) execute
nodetool tablestats
nodetool tablehistograms
nodetool compactionstats

we should be able to see the latency, workload type, and the # of sstable
used for reads

2) specify your hardware specs. i.e., memory size, cpu, # of drives (for
data sstables), and type of harddrives (ssd/hdd)
3) cassandra.yaml (make sure to sanitize it)

You have a lot of updates, and your data is most likely scattered across
different sstables. size compaction strategy (STCS) is much less expensive
than level compaction strategy (LCS).

Stopping the background compaction should be approached with caution, I
think your problem is more to do with why STCS compaction is taking more
resources than you expect.

Regards,

Ali Hubail

Petrolink International Ltd
Confidentiality warning: This message and any attachments are intended
only for the persons to whom this message is addressed, are confidential,
and may be privileged. If you are not the intended recipient, you are
hereby notified that any review, retransmission, conversion to hard copy,
copying, modification, circulation or other use of this message and any
attachments is strictly prohibited. If you receive this message in error,
please notify the sender immediately by return email, and delete this
message and any attachments from your system. Petrolink International
Limited its subsidiaries, holding companies and affiliates disclaims all
responsibility from and accepts no liability whatsoever for the
consequences of any unauthorized person acting, or refraining from acting,
on any information contained in this message. For security purposes, staff
training, to assist in resolving complaints and to improve our customer
service, email communications may be monitored and telephone calls may be
recorded.

rajasekhar kommineni <ra...@gmail.com>
09/19/2018 04:44 PM

Please respond to
user@cassandra.apache.org

To
user@cassandra.apache.org,
cc

Subject
Re: Compaction Strategy

Hello,

Can any one respond to my questions. Is it a good idea to disable auto
compaction and schedule it every 3 days. I am unable to control compaction
and it is causing timeouts.

Also will reducing or increasing compaction_throughput_mb_per_sec
eliminate timeouts ?

Thanks,

> On Sep 17, 2018, at 9:38 PM, rajasekhar kommineni <ra...@gmail.com>
wrote:
>
> Hello Folks,
>
> I need advice in deciding the compaction strategy for my C cluster.
There are multiple jobs that will load the data with less inserts and more
updates but no deletes. Currently I am using Size Tired compaction, but
seeing auto compactions after the data load kicks, and also read timeouts
during compaction.
>
> Can anyone suggest good compaction strategy for my cluster which will
reduce the timeouts.
>
>
> Thanks,
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
For additional commands, e-mail: user-help@cassandra.apache.org