You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Shalom Sagges <sh...@liveperson.com> on 2016/11/10 08:36:59 UTC

Can a Select Count(*) Affect Writes in Cassandra?

Hi There!

I'm using C* 2.0.14.
I experienced a scenario where a "select count(*)" that ran every minute on
a table with practically no results limit (yes, this should definitely be
avoided), caused a huge increase in Cassandra writes to around 150 thousand
writes per second for that particular table.

Can anyone explain this behavior? Why would a Select query significantly
increase write count in Cassandra?

Thanks!

Shalom Sagges

<http://www.linkedin.com/company/164748> <http://twitter.com/liveperson>
<http://www.facebook.com/LivePersonInc> We Create Meaningful Connections
<https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>

-- 
This message may contain confidential and/or privileged information. 
If you are not the addressee or authorized to receive this on behalf of the 
addressee you must not use, copy, disclose or take action based on this 
message or any information herein. 
If you have received this message in error, please advise the sender 
immediately by reply email and delete this message. Thank you.

Re: Can a Select Count(*) Affect Writes in Cassandra?

Posted by Alexander Dejanovski <al...@thelastpickle.com>.

Could you check the write count on a per table basis in order to check
which specific table is actually receiving writes ?
Check the OneMinuteRate metric in
org.apache.cassandra.metrics:type=ColumnFamily,keyspace=*keyspace1*,scope=
*standard1*,name=WriteLatency
(Make sure you replace keyspace and table name here).

Also, check if you have tracing turned on as it can indeed generate writes
for every query you send in the sessions and events table :
https://docs.datastax.com/en/cql/3.1/cql/cql_reference/tracing_r.html

Cheers,

On Thu, Nov 10, 2016 at 3:11 PM Shalom Sagges <sh...@liveperson.com>
wrote:

> Hi Alexander,
>
> I'm referring to Writes Count generated from JMX:
> [image: Inline image 1]
>
> The higher curve shows the total write count per second for all nodes in
> the cluster and the lower curve is the average write count per second per
> node.
> The drop in the end is the result of shutting down one application node
> that performed this kind of query (we still haven't removed the query
> itself in this cluster).
>
>
> On a different cluster, where we already removed the "select count(*)"
> query completely, we can see that the issue was resolved (also verified
> this with running nodetool cfstats a few times and checked the write count
> difference):
> [image: Inline image 2]
>
>
> Naturally I asked how can a select query affect the write count of a node
> but weird as it seems, the issue was resolved once the query was removed
> from the code.
>
> Another side note.. One of our developers that wrote the query in the
> code, thought it would be nice to limit the query results to 560,000,000.
> Perhaps the ridiculously high limit might have caused this?
>
> Thanks!
>
>
>
> Shalom Sagges
> DBA
> T: +972-74-700-4035 <+972%2074-700-4035>
> <http://www.linkedin.com/company/164748> <http://twitter.com/liveperson>
> <http://www.facebook.com/LivePersonInc> We Create Meaningful Connections
>
> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>
>
> On Thu, Nov 10, 2016 at 3:21 PM, Alexander Dejanovski <
> alex@thelastpickle.com> wrote:
>
> Hi Shalom,
>
> Cassandra writes (mutations) are INSERTs, UPDATEs or DELETEs, it actually
> has nothing to do with flushes. A flush is the operation of moving data
> from memory (memtable) to disk (SSTable).
>
> The Cassandra write path and read path are two different things and, as
> far as I know, I see no way for a select count(*) to increase your write
> count (if you are indeed talking about actual Cassandra writes, and not I/O
> operations).
>
> Cheers,
>
> On Thu, Nov 10, 2016 at 1:21 PM Shalom Sagges <sh...@liveperson.com>
> wrote:
>
> Yes, I know it's obsolete, but unfortunately this takes time.
> We're in the process of upgrading to 2.2.8 and 3.0.9 in our clusters.
>
> Thanks!
>
>
>
> Shalom Sagges
> DBA
> T: +972-74-700-4035 <+972%2074-700-4035>
> <http://www.linkedin.com/company/164748> <http://twitter.com/liveperson>
> <http://www.facebook.com/LivePersonInc> We Create Meaningful Connections
>
> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>
>
> On Thu, Nov 10, 2016 at 1:31 PM, Vladimir Yudovin <vl...@winguzone.com>
> wrote:
>
> As I said I'm not sure about it, but it will be interesting to check
> memory heap state with any JMX tool, e.g.
> https://github.com/patric-r/jvmtop
>
> By a way, why Cassandra 2.0.14? It's quit old and unsupported version.
> Even in 2.0 branch there is 2.0.17 available.
>
> Best regards, Vladimir Yudovin,
>
> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud
> CassandraLaunch your cluster in minutes.*
>
>
> ---- On Thu, 10 Nov 2016 05:47:37 -0500*Shalom Sagges
> <shaloms@liveperson.com <sh...@liveperson.com>>* wrote ----
>
> Thanks for the quick reply Vladimir.
> Is it really possible that ~12,500 writes per second (per node in a 12
> nodes DC) are caused by memory flushes?
>
>
>
>
>
>
> Shalom Sagges
> DBA
> T: +972-74-700-4035
> <http://www.linkedin.com/company/164748>
> <http://twitter.com/liveperson>
> <http://www.facebook.com/LivePersonInc>
> We Create Meaningful Connections
>
> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>
>
>
> On Thu, Nov 10, 2016 at 11:02 AM, Vladimir Yudovin <vl...@winguzone.com>
> wrote:
>
>
>
> This message may contain confidential and/or privileged information.
> If you are not the addressee or authorized to receive this on behalf of
> the addressee you must not use, copy, disclose or take action based on this
> message or any information herein.
> If you have received this message in error, please advise the sender
> immediately by reply email and delete this message. Thank you.
>
>
> Hi Shalom,
>
> so not sure, but probably excessive memory consumption by this SELECT
> causes C* to flush tables to free memory.
>
> Best regards, Vladimir Yudovin,
>
> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud
> CassandraLaunch your cluster in minutes.*
>
>
> ---- On Thu, 10 Nov 2016 03:36:59 -0500*Shalom Sagges
> <shaloms@liveperson.com <sh...@liveperson.com>>* wrote ----
>
> Hi There!
>
> I'm using C* 2.0.14.
> I experienced a scenario where a "select count(*)" that ran every minute
> on a table with practically no results limit (yes, this should definitely
> be avoided), caused a huge increase in Cassandra writes to around 150
> thousand writes per second for that particular table.
>
> Can anyone explain this behavior? Why would a Select query significantly
> increase write count in Cassandra?
>
> Thanks!
>
>
> Shalom Sagges
>
> <http://www.linkedin.com/company/164748>
> <http://twitter.com/liveperson>
> <http://www.facebook.com/LivePersonInc>
> We Create Meaningful Connections
>
> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>
>
>
> This message may contain confidential and/or privileged information.
> If you are not the addressee or authorized to receive this on behalf of
> the addressee you must not use, copy, disclose or take action based on this
> message or any information herein.
> If you have received this message in error, please advise the sender
> immediately by reply email and delete this message. Thank you.
>
>
>
>
>
> This message may contain confidential and/or privileged information.
> If you are not the addressee or authorized to receive this on behalf of
> the addressee you must not use, copy, disclose or take action based on this
> message or any information herein.
> If you have received this message in error, please advise the sender
> immediately by reply email and delete this message. Thank you.
>
> --
> -----------------
> Alexander Dejanovski
> France
> @alexanderdeja
>
> Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>
>
>
> This message may contain confidential and/or privileged information.
> If you are not the addressee or authorized to receive this on behalf of
> the addressee you must not use, copy, disclose or take action based on this
> message or any information herein.
> If you have received this message in error, please advise the sender
> immediately by reply email and delete this message. Thank you.
>
-- 
-----------------
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

Re: Can a Select Count(*) Affect Writes in Cassandra?

Posted by Chris Lohfink <cl...@gmail.com>.

Basically what Benjamin said...

On Thu, Nov 10, 2016 at 11:08 AM, Chris Lohfink <cl...@gmail.com>
wrote:

> I actually read this completely wrong. Can you check the
> org.apache.cassandra.metrics:type=ReadRepair,name=RepairedBackground and
> org.apache.cassandra.metrics:type=ReadRepair,name=RepairedBlocking metrics?
> Perhaps reading all the data to service the count(*) is causing a lot of
> read repairs if your data is inconsistent.
>
> Chris
>
> On Thu, Nov 10, 2016 at 9:33 AM, Benjamin Roth <be...@jaumo.com>
> wrote:
>
>> Or read repair probability with a lot of out of syncs?
>>
>> Am 10.11.2016 14:42 schrieb "Alexander Dejanovski" <
>> alex@thelastpickle.com>:
>>
>>> Shalom,
>>>
>>> you may have a high trace probability which could explain what you're
>>> observing : https://docs.datastax.com/en/cassandra/2.0/cassandra/tools
>>> /toolsSetTraceProbability.html
>>>
>>> On Thu, Nov 10, 2016 at 3:37 PM Chris Lohfink <cl...@gmail.com>
>>> wrote:
>>>
>>>> count(*) actually pages through all the data. So a select count(*) without
>>>> a limit would be expected to cause a lot of load on the system. The hit is
>>>> more than just IO load and CPU, it also creates a lot of garbage that can
>>>> cause pauses slowing down the entire JVM. Some details here:
>>>> http://www.datastax.com/dev/blog/counting-keys-in-cassandra
>>>> <http://planetcassandra.org/blog/counting-key-in-cassandra/>
>>>>
>>>> You may want to consider maintaining the count yourself, using Spark,
>>>> or if you just want a ball park number you can grab it from JMX.
>>>>
>>>> > Cassandra writes (mutations) are INSERTs, UPDATEs or DELETEs, it
>>>> actually has nothing to do with flushes. A flush is the operation of moving
>>>> data from memory (memtable) to disk (SSTable).
>>>>
>>>> FWIW in 2.0 thats not completely accurate. Before 2.1 the process of
>>>> memtable flushing acquired a switchlock on that blocks mutations during the
>>>> flush (the "pending task" metric is the measure of how many mutations are
>>>> blocked by this lock).
>>>>
>>>> Chris
>>>>
>>>> On Thu, Nov 10, 2016 at 8:10 AM, Shalom Sagges <sh...@liveperson.com>
>>>> wrote:
>>>>
>>>> Hi Alexander,
>>>>
>>>> I'm referring to Writes Count generated from JMX:
>>>> [image: Inline image 1]
>>>>
>>>> The higher curve shows the total write count per second for all nodes
>>>> in the cluster and the lower curve is the average write count per second
>>>> per node.
>>>> The drop in the end is the result of shutting down one application node
>>>> that performed this kind of query (we still haven't removed the query
>>>> itself in this cluster).
>>>>
>>>>
>>>> On a different cluster, where we already removed the "select count(*)"
>>>> query completely, we can see that the issue was resolved (also verified
>>>> this with running nodetool cfstats a few times and checked the write count
>>>> difference):
>>>> [image: Inline image 2]
>>>>
>>>>
>>>> Naturally I asked how can a select query affect the write count of a
>>>> node but weird as it seems, the issue was resolved once the query was
>>>> removed from the code.
>>>>
>>>> Another side note.. One of our developers that wrote the query in the
>>>> code, thought it would be nice to limit the query results to 560,000,000.
>>>> Perhaps the ridiculously high limit might have caused this?
>>>>
>>>> Thanks!
>>>>
>>>>
>>>>
>>>> Shalom Sagges
>>>> DBA
>>>> T: +972-74-700-4035
>>>> <http://www.linkedin.com/company/164748>
>>>> <http://twitter.com/liveperson> <http://www.facebook.com/LivePersonInc> We
>>>> Create Meaningful Connections
>>>>
>>>> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>>>>
>>>>
>>>> On Thu, Nov 10, 2016 at 3:21 PM, Alexander Dejanovski <
>>>> alex@thelastpickle.com> wrote:
>>>>
>>>> Hi Shalom,
>>>>
>>>> Cassandra writes (mutations) are INSERTs, UPDATEs or DELETEs, it
>>>> actually has nothing to do with flushes. A flush is the operation of moving
>>>> data from memory (memtable) to disk (SSTable).
>>>>
>>>> The Cassandra write path and read path are two different things and, as
>>>> far as I know, I see no way for a select count(*) to increase your write
>>>> count (if you are indeed talking about actual Cassandra writes, and not I/O
>>>> operations).
>>>>
>>>> Cheers,
>>>>
>>>> On Thu, Nov 10, 2016 at 1:21 PM Shalom Sagges <sh...@liveperson.com>
>>>> wrote:
>>>>
>>>> Yes, I know it's obsolete, but unfortunately this takes time.
>>>> We're in the process of upgrading to 2.2.8 and 3.0.9 in our clusters.
>>>>
>>>> Thanks!
>>>>
>>>>
>>>>
>>>> Shalom Sagges
>>>> DBA
>>>> T: +972-74-700-4035 <+972%2074-700-4035>
>>>> <http://www.linkedin.com/company/164748>
>>>> <http://twitter.com/liveperson> <http://www.facebook.com/LivePersonInc> We
>>>> Create Meaningful Connections
>>>>
>>>> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>>>>
>>>>
>>>> On Thu, Nov 10, 2016 at 1:31 PM, Vladimir Yudovin <vladyu@winguzone.com
>>>> > wrote:
>>>>
>>>> As I said I'm not sure about it, but it will be interesting to check
>>>> memory heap state with any JMX tool, e.g. https://github.com/patric
>>>> -r/jvmtop
>>>>
>>>> By a way, why Cassandra 2.0.14? It's quit old and unsupported version.
>>>> Even in 2.0 branch there is 2.0.17 available.
>>>>
>>>> Best regards, Vladimir Yudovin,
>>>>
>>>> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud
>>>> CassandraLaunch your cluster in minutes.*
>>>>
>>>>
>>>> ---- On Thu, 10 Nov 2016 05:47:37 -0500*Shalom Sagges
>>>> <shaloms@liveperson.com <sh...@liveperson.com>>* wrote ----
>>>>
>>>> Thanks for the quick reply Vladimir.
>>>> Is it really possible that ~12,500 writes per second (per node in a 12
>>>> nodes DC) are caused by memory flushes?
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Shalom Sagges
>>>> DBA
>>>> T: +972-74-700-4035
>>>> <http://www.linkedin.com/company/164748>
>>>> <http://twitter.com/liveperson>
>>>> <http://www.facebook.com/LivePersonInc>
>>>> We Create Meaningful Connections
>>>>
>>>> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>>>>
>>>>
>>>>
>>>> On Thu, Nov 10, 2016 at 11:02 AM, Vladimir Yudovin <
>>>> vladyu@winguzone.com> wrote:
>>>>
>>>>
>>>>
>>>> This message may contain confidential and/or privileged information.
>>>> If you are not the addressee or authorized to receive this on behalf of
>>>> the addressee you must not use, copy, disclose or take action based on this
>>>> message or any information herein.
>>>> If you have received this message in error, please advise the sender
>>>> immediately by reply email and delete this message. Thank you.
>>>>
>>>>
>>>> Hi Shalom,
>>>>
>>>> so not sure, but probably excessive memory consumption by this SELECT
>>>> causes C* to flush tables to free memory.
>>>>
>>>> Best regards, Vladimir Yudovin,
>>>>
>>>> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud
>>>> CassandraLaunch your cluster in minutes.*
>>>>
>>>>
>>>> ---- On Thu, 10 Nov 2016 03:36:59 -0500*Shalom Sagges
>>>> <shaloms@liveperson.com <sh...@liveperson.com>>* wrote ----
>>>>
>>>> Hi There!
>>>>
>>>> I'm using C* 2.0.14.
>>>> I experienced a scenario where a "select count(*)" that ran every
>>>> minute on a table with practically no results limit (yes, this should
>>>> definitely be avoided), caused a huge increase in Cassandra writes to
>>>> around 150 thousand writes per second for that particular table.
>>>>
>>>> Can anyone explain this behavior? Why would a Select query
>>>> significantly increase write count in Cassandra?
>>>>
>>>> Thanks!
>>>>
>>>>
>>>> Shalom Sagges
>>>>
>>>> <http://www.linkedin.com/company/164748>
>>>> <http://twitter.com/liveperson>
>>>> <http://www.facebook.com/LivePersonInc>
>>>> We Create Meaningful Connections
>>>>
>>>> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>>>>
>>>>
>>>>
>>>> This message may contain confidential and/or privileged information.
>>>> If you are not the addressee or authorized to receive this on behalf of
>>>> the addressee you must not use, copy, disclose or take action based on this
>>>> message or any information herein.
>>>> If you have received this message in error, please advise the sender
>>>> immediately by reply email and delete this message. Thank you.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> This message may contain confidential and/or privileged information.
>>>> If you are not the addressee or authorized to receive this on behalf of
>>>> the addressee you must not use, copy, disclose or take action based on this
>>>> message or any information herein.
>>>> If you have received this message in error, please advise the sender
>>>> immediately by reply email and delete this message. Thank you.
>>>>
>>>> --
>>>> -----------------
>>>> Alexander Dejanovski
>>>> France
>>>> @alexanderdeja
>>>>
>>>> Consultant
>>>> Apache Cassandra Consulting
>>>> http://www.thelastpickle.com
>>>>
>>>>
>>>>
>>>> This message may contain confidential and/or privileged information.
>>>> If you are not the addressee or authorized to receive this on behalf of
>>>> the addressee you must not use, copy, disclose or take action based on this
>>>> message or any information herein.
>>>> If you have received this message in error, please advise the sender
>>>> immediately by reply email and delete this message. Thank you.
>>>>
>>>>
>>>> --
>>> -----------------
>>> Alexander Dejanovski
>>> France
>>> @alexanderdeja
>>>
>>> Consultant
>>> Apache Cassandra Consulting
>>> http://www.thelastpickle.com
>>>
>>
>

Re: Can a Select Count(*) Affect Writes in Cassandra?

Posted by Chris Lohfink <cl...@gmail.com>.

I actually read this completely wrong. Can you check the
org.apache.cassandra.metrics:type=ReadRepair,name=RepairedBackground and
org.apache.cassandra.metrics:type=ReadRepair,name=RepairedBlocking metrics?
Perhaps reading all the data to service the count(*) is causing a lot of
read repairs if your data is inconsistent.

Chris

On Thu, Nov 10, 2016 at 9:33 AM, Benjamin Roth <be...@jaumo.com>
wrote:

> Or read repair probability with a lot of out of syncs?
>
> Am 10.11.2016 14:42 schrieb "Alexander Dejanovski" <alex@thelastpickle.com
> >:
>
>> Shalom,
>>
>> you may have a high trace probability which could explain what you're
>> observing : https://docs.datastax.com/en/cassandra/2.0/cassandra/tools
>> /toolsSetTraceProbability.html
>>
>> On Thu, Nov 10, 2016 at 3:37 PM Chris Lohfink <cl...@gmail.com>
>> wrote:
>>
>>> count(*) actually pages through all the data. So a select count(*) without
>>> a limit would be expected to cause a lot of load on the system. The hit is
>>> more than just IO load and CPU, it also creates a lot of garbage that can
>>> cause pauses slowing down the entire JVM. Some details here:
>>> http://www.datastax.com/dev/blog/counting-keys-in-cassandra
>>> <http://planetcassandra.org/blog/counting-key-in-cassandra/>
>>>
>>> You may want to consider maintaining the count yourself, using Spark, or
>>> if you just want a ball park number you can grab it from JMX.
>>>
>>> > Cassandra writes (mutations) are INSERTs, UPDATEs or DELETEs, it
>>> actually has nothing to do with flushes. A flush is the operation of moving
>>> data from memory (memtable) to disk (SSTable).
>>>
>>> FWIW in 2.0 thats not completely accurate. Before 2.1 the process of
>>> memtable flushing acquired a switchlock on that blocks mutations during the
>>> flush (the "pending task" metric is the measure of how many mutations are
>>> blocked by this lock).
>>>
>>> Chris
>>>
>>> On Thu, Nov 10, 2016 at 8:10 AM, Shalom Sagges <sh...@liveperson.com>
>>> wrote:
>>>
>>> Hi Alexander,
>>>
>>> I'm referring to Writes Count generated from JMX:
>>> [image: Inline image 1]
>>>
>>> The higher curve shows the total write count per second for all nodes in
>>> the cluster and the lower curve is the average write count per second per
>>> node.
>>> The drop in the end is the result of shutting down one application node
>>> that performed this kind of query (we still haven't removed the query
>>> itself in this cluster).
>>>
>>>
>>> On a different cluster, where we already removed the "select count(*)"
>>> query completely, we can see that the issue was resolved (also verified
>>> this with running nodetool cfstats a few times and checked the write count
>>> difference):
>>> [image: Inline image 2]
>>>
>>>
>>> Naturally I asked how can a select query affect the write count of a
>>> node but weird as it seems, the issue was resolved once the query was
>>> removed from the code.
>>>
>>> Another side note.. One of our developers that wrote the query in the
>>> code, thought it would be nice to limit the query results to 560,000,000.
>>> Perhaps the ridiculously high limit might have caused this?
>>>
>>> Thanks!
>>>
>>>
>>>
>>> Shalom Sagges
>>> DBA
>>> T: +972-74-700-4035
>>> <http://www.linkedin.com/company/164748> <http://twitter.com/liveperson>
>>> <http://www.facebook.com/LivePersonInc> We Create Meaningful Connections
>>>
>>> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>>>
>>>
>>> On Thu, Nov 10, 2016 at 3:21 PM, Alexander Dejanovski <
>>> alex@thelastpickle.com> wrote:
>>>
>>> Hi Shalom,
>>>
>>> Cassandra writes (mutations) are INSERTs, UPDATEs or DELETEs, it
>>> actually has nothing to do with flushes. A flush is the operation of moving
>>> data from memory (memtable) to disk (SSTable).
>>>
>>> The Cassandra write path and read path are two different things and, as
>>> far as I know, I see no way for a select count(*) to increase your write
>>> count (if you are indeed talking about actual Cassandra writes, and not I/O
>>> operations).
>>>
>>> Cheers,
>>>
>>> On Thu, Nov 10, 2016 at 1:21 PM Shalom Sagges <sh...@liveperson.com>
>>> wrote:
>>>
>>> Yes, I know it's obsolete, but unfortunately this takes time.
>>> We're in the process of upgrading to 2.2.8 and 3.0.9 in our clusters.
>>>
>>> Thanks!
>>>
>>>
>>>
>>> Shalom Sagges
>>> DBA
>>> T: +972-74-700-4035 <+972%2074-700-4035>
>>> <http://www.linkedin.com/company/164748> <http://twitter.com/liveperson>
>>> <http://www.facebook.com/LivePersonInc> We Create Meaningful Connections
>>>
>>> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>>>
>>>
>>> On Thu, Nov 10, 2016 at 1:31 PM, Vladimir Yudovin <vl...@winguzone.com>
>>> wrote:
>>>
>>> As I said I'm not sure about it, but it will be interesting to check
>>> memory heap state with any JMX tool, e.g. https://github.com/patric
>>> -r/jvmtop
>>>
>>> By a way, why Cassandra 2.0.14? It's quit old and unsupported version.
>>> Even in 2.0 branch there is 2.0.17 available.
>>>
>>> Best regards, Vladimir Yudovin,
>>>
>>> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud
>>> CassandraLaunch your cluster in minutes.*
>>>
>>>
>>> ---- On Thu, 10 Nov 2016 05:47:37 -0500*Shalom Sagges
>>> <shaloms@liveperson.com <sh...@liveperson.com>>* wrote ----
>>>
>>> Thanks for the quick reply Vladimir.
>>> Is it really possible that ~12,500 writes per second (per node in a 12
>>> nodes DC) are caused by memory flushes?
>>>
>>>
>>>
>>>
>>>
>>>
>>> Shalom Sagges
>>> DBA
>>> T: +972-74-700-4035
>>> <http://www.linkedin.com/company/164748>
>>> <http://twitter.com/liveperson>
>>> <http://www.facebook.com/LivePersonInc>
>>> We Create Meaningful Connections
>>>
>>> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>>>
>>>
>>>
>>> On Thu, Nov 10, 2016 at 11:02 AM, Vladimir Yudovin <vladyu@winguzone.com
>>> > wrote:
>>>
>>>
>>>
>>> This message may contain confidential and/or privileged information.
>>> If you are not the addressee or authorized to receive this on behalf of
>>> the addressee you must not use, copy, disclose or take action based on this
>>> message or any information herein.
>>> If you have received this message in error, please advise the sender
>>> immediately by reply email and delete this message. Thank you.
>>>
>>>
>>> Hi Shalom,
>>>
>>> so not sure, but probably excessive memory consumption by this SELECT
>>> causes C* to flush tables to free memory.
>>>
>>> Best regards, Vladimir Yudovin,
>>>
>>> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud
>>> CassandraLaunch your cluster in minutes.*
>>>
>>>
>>> ---- On Thu, 10 Nov 2016 03:36:59 -0500*Shalom Sagges
>>> <shaloms@liveperson.com <sh...@liveperson.com>>* wrote ----
>>>
>>> Hi There!
>>>
>>> I'm using C* 2.0.14.
>>> I experienced a scenario where a "select count(*)" that ran every minute
>>> on a table with practically no results limit (yes, this should definitely
>>> be avoided), caused a huge increase in Cassandra writes to around 150
>>> thousand writes per second for that particular table.
>>>
>>> Can anyone explain this behavior? Why would a Select query significantly
>>> increase write count in Cassandra?
>>>
>>> Thanks!
>>>
>>>
>>> Shalom Sagges
>>>
>>> <http://www.linkedin.com/company/164748>
>>> <http://twitter.com/liveperson>
>>> <http://www.facebook.com/LivePersonInc>
>>> We Create Meaningful Connections
>>>
>>> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>>>
>>>
>>>
>>> This message may contain confidential and/or privileged information.
>>> If you are not the addressee or authorized to receive this on behalf of
>>> the addressee you must not use, copy, disclose or take action based on this
>>> message or any information herein.
>>> If you have received this message in error, please advise the sender
>>> immediately by reply email and delete this message. Thank you.
>>>
>>>
>>>
>>>
>>>
>>> This message may contain confidential and/or privileged information.
>>> If you are not the addressee or authorized to receive this on behalf of
>>> the addressee you must not use, copy, disclose or take action based on this
>>> message or any information herein.
>>> If you have received this message in error, please advise the sender
>>> immediately by reply email and delete this message. Thank you.
>>>
>>> --
>>> -----------------
>>> Alexander Dejanovski
>>> France
>>> @alexanderdeja
>>>
>>> Consultant
>>> Apache Cassandra Consulting
>>> http://www.thelastpickle.com
>>>
>>>
>>>
>>> This message may contain confidential and/or privileged information.
>>> If you are not the addressee or authorized to receive this on behalf of
>>> the addressee you must not use, copy, disclose or take action based on this
>>> message or any information herein.
>>> If you have received this message in error, please advise the sender
>>> immediately by reply email and delete this message. Thank you.
>>>
>>>
>>> --
>> -----------------
>> Alexander Dejanovski
>> France
>> @alexanderdeja
>>
>> Consultant
>> Apache Cassandra Consulting
>> http://www.thelastpickle.com
>>
>

Re: Can a Select Count(*) Affect Writes in Cassandra?

Posted by Benjamin Roth <be...@jaumo.com>.

Or read repair probability with a lot of out of syncs?

Am 10.11.2016 14:42 schrieb "Alexander Dejanovski" <al...@thelastpickle.com>:

> Shalom,
>
> you may have a high trace probability which could explain what you're
> observing : https://docs.datastax.com/en/cassandra/2.0/cassandra/tools/
> toolsSetTraceProbability.html
>
> On Thu, Nov 10, 2016 at 3:37 PM Chris Lohfink <cl...@gmail.com>
> wrote:
>
>> count(*) actually pages through all the data. So a select count(*) without
>> a limit would be expected to cause a lot of load on the system. The hit is
>> more than just IO load and CPU, it also creates a lot of garbage that can
>> cause pauses slowing down the entire JVM. Some details here:
>> http://www.datastax.com/dev/blog/counting-keys-in-cassandra
>> <http://planetcassandra.org/blog/counting-key-in-cassandra/>
>>
>> You may want to consider maintaining the count yourself, using Spark, or
>> if you just want a ball park number you can grab it from JMX.
>>
>> > Cassandra writes (mutations) are INSERTs, UPDATEs or DELETEs, it
>> actually has nothing to do with flushes. A flush is the operation of moving
>> data from memory (memtable) to disk (SSTable).
>>
>> FWIW in 2.0 thats not completely accurate. Before 2.1 the process of
>> memtable flushing acquired a switchlock on that blocks mutations during the
>> flush (the "pending task" metric is the measure of how many mutations are
>> blocked by this lock).
>>
>> Chris
>>
>> On Thu, Nov 10, 2016 at 8:10 AM, Shalom Sagges <sh...@liveperson.com>
>> wrote:
>>
>> Hi Alexander,
>>
>> I'm referring to Writes Count generated from JMX:
>> [image: Inline image 1]
>>
>> The higher curve shows the total write count per second for all nodes in
>> the cluster and the lower curve is the average write count per second per
>> node.
>> The drop in the end is the result of shutting down one application node
>> that performed this kind of query (we still haven't removed the query
>> itself in this cluster).
>>
>>
>> On a different cluster, where we already removed the "select count(*)"
>> query completely, we can see that the issue was resolved (also verified
>> this with running nodetool cfstats a few times and checked the write count
>> difference):
>> [image: Inline image 2]
>>
>>
>> Naturally I asked how can a select query affect the write count of a node
>> but weird as it seems, the issue was resolved once the query was removed
>> from the code.
>>
>> Another side note.. One of our developers that wrote the query in the
>> code, thought it would be nice to limit the query results to 560,000,000.
>> Perhaps the ridiculously high limit might have caused this?
>>
>> Thanks!
>>
>>
>>
>> Shalom Sagges
>> DBA
>> T: +972-74-700-4035
>> <http://www.linkedin.com/company/164748> <http://twitter.com/liveperson>
>> <http://www.facebook.com/LivePersonInc> We Create Meaningful Connections
>>
>> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>>
>>
>> On Thu, Nov 10, 2016 at 3:21 PM, Alexander Dejanovski <
>> alex@thelastpickle.com> wrote:
>>
>> Hi Shalom,
>>
>> Cassandra writes (mutations) are INSERTs, UPDATEs or DELETEs, it actually
>> has nothing to do with flushes. A flush is the operation of moving data
>> from memory (memtable) to disk (SSTable).
>>
>> The Cassandra write path and read path are two different things and, as
>> far as I know, I see no way for a select count(*) to increase your write
>> count (if you are indeed talking about actual Cassandra writes, and not I/O
>> operations).
>>
>> Cheers,
>>
>> On Thu, Nov 10, 2016 at 1:21 PM Shalom Sagges <sh...@liveperson.com>
>> wrote:
>>
>> Yes, I know it's obsolete, but unfortunately this takes time.
>> We're in the process of upgrading to 2.2.8 and 3.0.9 in our clusters.
>>
>> Thanks!
>>
>>
>>
>> Shalom Sagges
>> DBA
>> T: +972-74-700-4035 <+972%2074-700-4035>
>> <http://www.linkedin.com/company/164748> <http://twitter.com/liveperson>
>> <http://www.facebook.com/LivePersonInc> We Create Meaningful Connections
>>
>> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>>
>>
>> On Thu, Nov 10, 2016 at 1:31 PM, Vladimir Yudovin <vl...@winguzone.com>
>> wrote:
>>
>> As I said I'm not sure about it, but it will be interesting to check
>> memory heap state with any JMX tool, e.g. https://github.com/
>> patric-r/jvmtop
>>
>> By a way, why Cassandra 2.0.14? It's quit old and unsupported version.
>> Even in 2.0 branch there is 2.0.17 available.
>>
>> Best regards, Vladimir Yudovin,
>>
>> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud
>> CassandraLaunch your cluster in minutes.*
>>
>>
>> ---- On Thu, 10 Nov 2016 05:47:37 -0500*Shalom Sagges
>> <shaloms@liveperson.com <sh...@liveperson.com>>* wrote ----
>>
>> Thanks for the quick reply Vladimir.
>> Is it really possible that ~12,500 writes per second (per node in a 12
>> nodes DC) are caused by memory flushes?
>>
>>
>>
>>
>>
>>
>> Shalom Sagges
>> DBA
>> T: +972-74-700-4035
>> <http://www.linkedin.com/company/164748>
>> <http://twitter.com/liveperson>
>> <http://www.facebook.com/LivePersonInc>
>> We Create Meaningful Connections
>>
>> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>>
>>
>>
>> On Thu, Nov 10, 2016 at 11:02 AM, Vladimir Yudovin <vl...@winguzone.com>
>> wrote:
>>
>>
>>
>> This message may contain confidential and/or privileged information.
>> If you are not the addressee or authorized to receive this on behalf of
>> the addressee you must not use, copy, disclose or take action based on this
>> message or any information herein.
>> If you have received this message in error, please advise the sender
>> immediately by reply email and delete this message. Thank you.
>>
>>
>> Hi Shalom,
>>
>> so not sure, but probably excessive memory consumption by this SELECT
>> causes C* to flush tables to free memory.
>>
>> Best regards, Vladimir Yudovin,
>>
>> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud
>> CassandraLaunch your cluster in minutes.*
>>
>>
>> ---- On Thu, 10 Nov 2016 03:36:59 -0500*Shalom Sagges
>> <shaloms@liveperson.com <sh...@liveperson.com>>* wrote ----
>>
>> Hi There!
>>
>> I'm using C* 2.0.14.
>> I experienced a scenario where a "select count(*)" that ran every minute
>> on a table with practically no results limit (yes, this should definitely
>> be avoided), caused a huge increase in Cassandra writes to around 150
>> thousand writes per second for that particular table.
>>
>> Can anyone explain this behavior? Why would a Select query significantly
>> increase write count in Cassandra?
>>
>> Thanks!
>>
>>
>> Shalom Sagges
>>
>> <http://www.linkedin.com/company/164748>
>> <http://twitter.com/liveperson>
>> <http://www.facebook.com/LivePersonInc>
>> We Create Meaningful Connections
>>
>> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>>
>>
>>
>> This message may contain confidential and/or privileged information.
>> If you are not the addressee or authorized to receive this on behalf of
>> the addressee you must not use, copy, disclose or take action based on this
>> message or any information herein.
>> If you have received this message in error, please advise the sender
>> immediately by reply email and delete this message. Thank you.
>>
>>
>>
>>
>>
>> This message may contain confidential and/or privileged information.
>> If you are not the addressee or authorized to receive this on behalf of
>> the addressee you must not use, copy, disclose or take action based on this
>> message or any information herein.
>> If you have received this message in error, please advise the sender
>> immediately by reply email and delete this message. Thank you.
>>
>> --
>> -----------------
>> Alexander Dejanovski
>> France
>> @alexanderdeja
>>
>> Consultant
>> Apache Cassandra Consulting
>> http://www.thelastpickle.com
>>
>>
>>
>> This message may contain confidential and/or privileged information.
>> If you are not the addressee or authorized to receive this on behalf of
>> the addressee you must not use, copy, disclose or take action based on this
>> message or any information herein.
>> If you have received this message in error, please advise the sender
>> immediately by reply email and delete this message. Thank you.
>>
>>
>> --
> -----------------
> Alexander Dejanovski
> France
> @alexanderdeja
>
> Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>

Re: Can a Select Count(*) Affect Writes in Cassandra?

Posted by Shalom Sagges <sh...@liveperson.com>.

Got it. Thanks!!


Shalom Sagges
DBA
T: +972-74-700-4035
<http://www.linkedin.com/company/164748> <http://twitter.com/liveperson>
<http://www.facebook.com/LivePersonInc> We Create Meaningful Connections
<https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>


On Thu, Nov 10, 2016 at 8:06 PM, Chris Lohfink <cl...@gmail.com> wrote:

> It can be a tad confusing...
>
> The background metric corresponds to a digest mismatch that occurred after
> a completed read, outside of the client read. Will happen if number of
> nodes queried in the requested consistency level was not all of replicas,
> so it was kicked off after the read (this is based on the read repair
> chance, and is the "attempted" metric).
>
> Blocking metric corresponds to the number of times there was a digest
> mismatch within the requested consistency level and a full data read was
> started within the client read.
>
> The two combined shows you have a lot of read repairs happening, possibly
> due to just latency in the initial mutations (not really a problem). If
> thats the case and you have faith in your repairs offline you could just
> set read repairs chance to 0 to reduce load in resending mutations that
> will become consistent eventually anyway.
>
> Chris
>
> On Thu, Nov 10, 2016 at 11:52 AM, Shalom Sagges <sh...@liveperson.com>
> wrote:
>
>> Thanks a lot for helping on this one.
>> Just one more question... I'm not familiar with the above read repair
>> metrics.
>> Can you please explain what caught your eye there?
>>
>>
>> Shalom Sagges
>> DBA
>> T: +972-74-700-4035
>> <http://www.linkedin.com/company/164748> <http://twitter.com/liveperson>
>> <http://www.facebook.com/LivePersonInc> We Create Meaningful Connections
>>
>> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>>
>>
>> On Thu, Nov 10, 2016 at 7:47 PM, Benjamin Roth <be...@jaumo.com>
>> wrote:
>>
>>> ... sorry for the short reply. To be a bit more detailed:
>>>
>>> 1. You can lower the read repair probability on that table to avoid the
>>> writes. But be aware that then inconsistency also wont be repaired on reads.
>>> 2. Maybe you should run a repair on that table to get it in sync and
>>> reduce the impact of read repairs. By the way, you should run repairs on a
>>> regular basis anyway but this is a different topic, very extensive and
>>> documented on many different places.
>>>
>>> 2016-11-10 17:44 GMT+00:00 Benjamin Roth <be...@jaumo.com>:
>>>
>>>> There you go :)
>>>>
>>>> 2016-11-10 17:24 GMT+00:00 Shalom Sagges <sh...@liveperson.com>:
>>>>
>>>>> That's a possibility I didn't think of...
>>>>>
>>>>> This is what I see from org.apache.cassandra.metr
>>>>> ics:type=ReadRepair,name=RepairedBackground
>>>>> [image: Inline image 1]
>>>>>
>>>>>
>>>>> and from org.apache.cassandra.metrics:type=ReadRepair,name=Repai
>>>>> redBlocking:
>>>>> [image: Inline image 2]
>>>>>
>>>>>
>>>>> Shalom Sagges
>>>>> DBA
>>>>> T: +972-74-700-4035
>>>>> <http://www.linkedin.com/company/164748>
>>>>> <http://twitter.com/liveperson>
>>>>> <http://www.facebook.com/LivePersonInc> We Create Meaningful
>>>>> Connections
>>>>>
>>>>> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>>>>>
>>>>>
>>>>> On Thu, Nov 10, 2016 at 7:16 PM, Shalom Sagges <shaloms@liveperson.com
>>>>> > wrote:
>>>>>
>>>>>> Yes, it's occurring on the table that receives the count(*) query.
>>>>>>
>>>>>>
>>>>>> Shalom Sagges
>>>>>> DBA
>>>>>> T: +972-74-700-4035
>>>>>> <http://www.linkedin.com/company/164748>
>>>>>> <http://twitter.com/liveperson>
>>>>>> <http://www.facebook.com/LivePersonInc> We Create Meaningful
>>>>>> Connections
>>>>>>
>>>>>> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>>>>>>
>>>>>>
>>>>>> On Thu, Nov 10, 2016 at 5:05 PM, Alexander Dejanovski <
>>>>>> alex@thelastpickle.com> wrote:
>>>>>>
>>>>>>> So the huge write count is occurring on the table that receives the
>>>>>>> count(*) query or another table ?
>>>>>>>
>>>>>>> On Thu, Nov 10, 2016 at 4:02 PM Shalom Sagges <
>>>>>>> shaloms@liveperson.com> wrote:
>>>>>>>
>>>>>>>> Tracing is off and so is TracingProbability.
>>>>>>>> Just to elaborate, the huge write count occurs only a single column
>>>>>>>> family which is not one of the system_traces keyspace.
>>>>>>>>
>>>>>>>> I also want to thank you guys for your persistent help regardless
>>>>>>>> if the root cause will be found or not.. You're the best!!
>>>>>>>>
>>>>>>>>
>>>>>>>> Shalom Sagges
>>>>>>>> DBA
>>>>>>>> T: +972-74-700-4035 <+972%2074-700-4035>
>>>>>>>> <http://www.linkedin.com/company/164748>
>>>>>>>> <http://twitter.com/liveperson>
>>>>>>>> <http://www.facebook.com/LivePersonInc> We Create Meaningful
>>>>>>>> Connections
>>>>>>>>
>>>>>>>> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Nov 10, 2016 at 4:41 PM, Alexander Dejanovski <
>>>>>>>> alex@thelastpickle.com> wrote:
>>>>>>>>
>>>>>>>> Shalom,
>>>>>>>>
>>>>>>>> you may have a high trace probability which could explain what
>>>>>>>> you're observing : https://docs.datastax.com/en
>>>>>>>> /cassandra/2.0/cassandra/tools/toolsSetTraceProbability.html
>>>>>>>>
>>>>>>>> On Thu, Nov 10, 2016 at 3:37 PM Chris Lohfink <cl...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> count(*) actually pages through all the data. So a select count(*) without
>>>>>>>> a limit would be expected to cause a lot of load on the system. The hit is
>>>>>>>> more than just IO load and CPU, it also creates a lot of garbage that can
>>>>>>>> cause pauses slowing down the entire JVM. Some details here:
>>>>>>>> http://www.datastax.com/dev/blog/counting-keys-in-cassandra
>>>>>>>> <http://planetcassandra.org/blog/counting-key-in-cassandra/>
>>>>>>>>
>>>>>>>> You may want to consider maintaining the count yourself, using
>>>>>>>> Spark, or if you just want a ball park number you can grab it from JMX.
>>>>>>>>
>>>>>>>> > Cassandra writes (mutations) are INSERTs, UPDATEs or DELETEs, it
>>>>>>>> actually has nothing to do with flushes. A flush is the operation of moving
>>>>>>>> data from memory (memtable) to disk (SSTable).
>>>>>>>>
>>>>>>>> FWIW in 2.0 thats not completely accurate. Before 2.1 the process
>>>>>>>> of memtable flushing acquired a switchlock on that blocks mutations during
>>>>>>>> the flush (the "pending task" metric is the measure of how many mutations
>>>>>>>> are blocked by this lock).
>>>>>>>>
>>>>>>>> Chris
>>>>>>>>
>>>>>>>> On Thu, Nov 10, 2016 at 8:10 AM, Shalom Sagges <
>>>>>>>> shaloms@liveperson.com> wrote:
>>>>>>>>
>>>>>>>> Hi Alexander,
>>>>>>>>
>>>>>>>> I'm referring to Writes Count generated from JMX:
>>>>>>>> [image: Inline image 1]
>>>>>>>>
>>>>>>>> The higher curve shows the total write count per second for all
>>>>>>>> nodes in the cluster and the lower curve is the average write count per
>>>>>>>> second per node.
>>>>>>>> The drop in the end is the result of shutting down one application
>>>>>>>> node that performed this kind of query (we still haven't removed the query
>>>>>>>> itself in this cluster).
>>>>>>>>
>>>>>>>>
>>>>>>>> On a different cluster, where we already removed the "select
>>>>>>>> count(*)" query completely, we can see that the issue was resolved (also
>>>>>>>> verified this with running nodetool cfstats a few times and checked the
>>>>>>>> write count difference):
>>>>>>>> [image: Inline image 2]
>>>>>>>>
>>>>>>>>
>>>>>>>> Naturally I asked how can a select query affect the write count of
>>>>>>>> a node but weird as it seems, the issue was resolved once the query was
>>>>>>>> removed from the code.
>>>>>>>>
>>>>>>>> Another side note.. One of our developers that wrote the query in
>>>>>>>> the code, thought it would be nice to limit the query results to
>>>>>>>> 560,000,000. Perhaps the ridiculously high limit might have caused this?
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Shalom Sagges
>>>>>>>> DBA
>>>>>>>> T: +972-74-700-4035
>>>>>>>> <http://www.linkedin.com/company/164748>
>>>>>>>> <http://twitter.com/liveperson>
>>>>>>>> <http://www.facebook.com/LivePersonInc> We Create Meaningful
>>>>>>>> Connections
>>>>>>>>
>>>>>>>> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Nov 10, 2016 at 3:21 PM, Alexander Dejanovski <
>>>>>>>> alex@thelastpickle.com> wrote:
>>>>>>>>
>>>>>>>> Hi Shalom,
>>>>>>>>
>>>>>>>> Cassandra writes (mutations) are INSERTs, UPDATEs or DELETEs, it
>>>>>>>> actually has nothing to do with flushes. A flush is the operation of moving
>>>>>>>> data from memory (memtable) to disk (SSTable).
>>>>>>>>
>>>>>>>> The Cassandra write path and read path are two different things
>>>>>>>> and, as far as I know, I see no way for a select count(*) to increase your
>>>>>>>> write count (if you are indeed talking about actual Cassandra writes, and
>>>>>>>> not I/O operations).
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>>
>>>>>>>> On Thu, Nov 10, 2016 at 1:21 PM Shalom Sagges <
>>>>>>>> shaloms@liveperson.com> wrote:
>>>>>>>>
>>>>>>>> Yes, I know it's obsolete, but unfortunately this takes time.
>>>>>>>> We're in the process of upgrading to 2.2.8 and 3.0.9 in our
>>>>>>>> clusters.
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Shalom Sagges
>>>>>>>> DBA
>>>>>>>> T: +972-74-700-4035 <+972%2074-700-4035>
>>>>>>>> <http://www.linkedin.com/company/164748>
>>>>>>>> <http://twitter.com/liveperson>
>>>>>>>> <http://www.facebook.com/LivePersonInc> We Create Meaningful
>>>>>>>> Connections
>>>>>>>>
>>>>>>>> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Nov 10, 2016 at 1:31 PM, Vladimir Yudovin <
>>>>>>>> vladyu@winguzone.com> wrote:
>>>>>>>>
>>>>>>>> As I said I'm not sure about it, but it will be interesting to
>>>>>>>> check memory heap state with any JMX tool, e.g.
>>>>>>>> https://github.com/patric-r/jvmtop
>>>>>>>>
>>>>>>>> By a way, why Cassandra 2.0.14? It's quit old and unsupported
>>>>>>>> version. Even in 2.0 branch there is 2.0.17 available.
>>>>>>>>
>>>>>>>> Best regards, Vladimir Yudovin,
>>>>>>>>
>>>>>>>> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud
>>>>>>>> CassandraLaunch your cluster in minutes.*
>>>>>>>>
>>>>>>>>
>>>>>>>> ---- On Thu, 10 Nov 2016 05:47:37 -0500*Shalom Sagges
>>>>>>>> <shaloms@liveperson.com <sh...@liveperson.com>>* wrote ----
>>>>>>>>
>>>>>>>> Thanks for the quick reply Vladimir.
>>>>>>>> Is it really possible that ~12,500 writes per second (per node in a
>>>>>>>> 12 nodes DC) are caused by memory flushes?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Shalom Sagges
>>>>>>>> DBA
>>>>>>>> T: +972-74-700-4035
>>>>>>>> <http://www.linkedin.com/company/164748>
>>>>>>>> <http://twitter.com/liveperson>
>>>>>>>> <http://www.facebook.com/LivePersonInc>
>>>>>>>> We Create Meaningful Connections
>>>>>>>>
>>>>>>>> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Nov 10, 2016 at 11:02 AM, Vladimir Yudovin <
>>>>>>>> vladyu@winguzone.com> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> This message may contain confidential and/or privileged
>>>>>>>> information.
>>>>>>>> If you are not the addressee or authorized to receive this on
>>>>>>>> behalf of the addressee you must not use, copy, disclose or take action
>>>>>>>> based on this message or any information herein.
>>>>>>>> If you have received this message in error, please advise the
>>>>>>>> sender immediately by reply email and delete this message. Thank you.
>>>>>>>>
>>>>>>>>
>>>>>>>> Hi Shalom,
>>>>>>>>
>>>>>>>> so not sure, but probably excessive memory consumption by this
>>>>>>>> SELECT causes C* to flush tables to free memory.
>>>>>>>>
>>>>>>>> Best regards, Vladimir Yudovin,
>>>>>>>>
>>>>>>>> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud
>>>>>>>> CassandraLaunch your cluster in minutes.*
>>>>>>>>
>>>>>>>>
>>>>>>>> ---- On Thu, 10 Nov 2016 03:36:59 -0500*Shalom Sagges
>>>>>>>> <shaloms@liveperson.com <sh...@liveperson.com>>* wrote ----
>>>>>>>>
>>>>>>>> Hi There!
>>>>>>>>
>>>>>>>> I'm using C* 2.0.14.
>>>>>>>> I experienced a scenario where a "select count(*)" that ran every
>>>>>>>> minute on a table with practically no results limit (yes, this should
>>>>>>>> definitely be avoided), caused a huge increase in Cassandra writes to
>>>>>>>> around 150 thousand writes per second for that particular table.
>>>>>>>>
>>>>>>>> Can anyone explain this behavior? Why would a Select query
>>>>>>>> significantly increase write count in Cassandra?
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>>
>>>>>>>>
>>>>>>>> Shalom Sagges
>>>>>>>>
>>>>>>>> <http://www.linkedin.com/company/164748>
>>>>>>>> <http://twitter.com/liveperson>
>>>>>>>> <http://www.facebook.com/LivePersonInc>
>>>>>>>> We Create Meaningful Connections
>>>>>>>>
>>>>>>>> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> This message may contain confidential and/or privileged
>>>>>>>> information.
>>>>>>>> If you are not the addressee or authorized to receive this on
>>>>>>>> behalf of the addressee you must not use, copy, disclose or take action
>>>>>>>> based on this message or any information herein.
>>>>>>>> If you have received this message in error, please advise the
>>>>>>>> sender immediately by reply email and delete this message. Thank you.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> This message may contain confidential and/or privileged
>>>>>>>> information.
>>>>>>>> If you are not the addressee or authorized to receive this on
>>>>>>>> behalf of the addressee you must not use, copy, disclose or take action
>>>>>>>> based on this message or any information herein.
>>>>>>>> If you have received this message in error, please advise the
>>>>>>>> sender immediately by reply email and delete this message. Thank you.
>>>>>>>>
>>>>>>>> --
>>>>>>>> -----------------
>>>>>>>> Alexander Dejanovski
>>>>>>>> France
>>>>>>>> @alexanderdeja
>>>>>>>>
>>>>>>>> Consultant
>>>>>>>> Apache Cassandra Consulting
>>>>>>>> http://www.thelastpickle.com
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> This message may contain confidential and/or privileged
>>>>>>>> information.
>>>>>>>> If you are not the addressee or authorized to receive this on
>>>>>>>> behalf of the addressee you must not use, copy, disclose or take action
>>>>>>>> based on this message or any information herein.
>>>>>>>> If you have received this message in error, please advise the
>>>>>>>> sender immediately by reply email and delete this message. Thank you.
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> -----------------
>>>>>>>> Alexander Dejanovski
>>>>>>>> France
>>>>>>>> @alexanderdeja
>>>>>>>>
>>>>>>>> Consultant
>>>>>>>> Apache Cassandra Consulting
>>>>>>>> http://www.thelastpickle.com
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> This message may contain confidential and/or privileged
>>>>>>>> information.
>>>>>>>> If you are not the addressee or authorized to receive this on
>>>>>>>> behalf of the addressee you must not use, copy, disclose or take action
>>>>>>>> based on this message or any information herein.
>>>>>>>> If you have received this message in error, please advise the
>>>>>>>> sender immediately by reply email and delete this message. Thank you.
>>>>>>>>
>>>>>>> --
>>>>>>> -----------------
>>>>>>> Alexander Dejanovski
>>>>>>> France
>>>>>>> @alexanderdeja
>>>>>>>
>>>>>>> Consultant
>>>>>>> Apache Cassandra Consulting
>>>>>>> http://www.thelastpickle.com
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> This message may contain confidential and/or privileged information.
>>>>> If you are not the addressee or authorized to receive this on behalf
>>>>> of the addressee you must not use, copy, disclose or take action based on
>>>>> this message or any information herein.
>>>>> If you have received this message in error, please advise the sender
>>>>> immediately by reply email and delete this message. Thank you.
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Benjamin Roth
>>>> Prokurist
>>>>
>>>> Jaumo GmbH · www.jaumo.com
>>>> Wehrstraße 46 · 73035 Göppingen · Germany
>>>> Phone +49 7161 304880-6 · Fax +49 7161 304880-1
>>>> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>>>>
>>>
>>>
>>>
>>> --
>>> Benjamin Roth
>>> Prokurist
>>>
>>> Jaumo GmbH · www.jaumo.com
>>> Wehrstraße 46 · 73035 Göppingen · Germany
>>> Phone +49 7161 304880-6 · Fax +49 7161 304880-1
>>> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>>>
>>
>>
>> This message may contain confidential and/or privileged information.
>> If you are not the addressee or authorized to receive this on behalf of
>> the addressee you must not use, copy, disclose or take action based on this
>> message or any information herein.
>> If you have received this message in error, please advise the sender
>> immediately by reply email and delete this message. Thank you.
>>
>
>

-- 
This message may contain confidential and/or privileged information. 
If you are not the addressee or authorized to receive this on behalf of the 
addressee you must not use, copy, disclose or take action based on this 
message or any information herein. 
If you have received this message in error, please advise the sender 
immediately by reply email and delete this message. Thank you.

Re: Can a Select Count(*) Affect Writes in Cassandra?

Posted by Chris Lohfink <cl...@gmail.com>.

It can be a tad confusing...

The background metric corresponds to a digest mismatch that occurred after
a completed read, outside of the client read. Will happen if number of
nodes queried in the requested consistency level was not all of replicas,
so it was kicked off after the read (this is based on the read repair
chance, and is the "attempted" metric).

Blocking metric corresponds to the number of times there was a digest
mismatch within the requested consistency level and a full data read was
started within the client read.

The two combined shows you have a lot of read repairs happening, possibly
due to just latency in the initial mutations (not really a problem). If
thats the case and you have faith in your repairs offline you could just
set read repairs chance to 0 to reduce load in resending mutations that
will become consistent eventually anyway.

Chris

On Thu, Nov 10, 2016 at 11:52 AM, Shalom Sagges <sh...@liveperson.com>
wrote:

> Thanks a lot for helping on this one.
> Just one more question... I'm not familiar with the above read repair
> metrics.
> Can you please explain what caught your eye there?
>
>
> Shalom Sagges
> DBA
> T: +972-74-700-4035
> <http://www.linkedin.com/company/164748> <http://twitter.com/liveperson>
> <http://www.facebook.com/LivePersonInc> We Create Meaningful Connections
>
> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>
>
> On Thu, Nov 10, 2016 at 7:47 PM, Benjamin Roth <be...@jaumo.com>
> wrote:
>
>> ... sorry for the short reply. To be a bit more detailed:
>>
>> 1. You can lower the read repair probability on that table to avoid the
>> writes. But be aware that then inconsistency also wont be repaired on reads.
>> 2. Maybe you should run a repair on that table to get it in sync and
>> reduce the impact of read repairs. By the way, you should run repairs on a
>> regular basis anyway but this is a different topic, very extensive and
>> documented on many different places.
>>
>> 2016-11-10 17:44 GMT+00:00 Benjamin Roth <be...@jaumo.com>:
>>
>>> There you go :)
>>>
>>> 2016-11-10 17:24 GMT+00:00 Shalom Sagges <sh...@liveperson.com>:
>>>
>>>> That's a possibility I didn't think of...
>>>>
>>>> This is what I see from org.apache.cassandra.metr
>>>> ics:type=ReadRepair,name=RepairedBackground
>>>> [image: Inline image 1]
>>>>
>>>>
>>>> and from org.apache.cassandra.metrics:type=ReadRepair,name=Repai
>>>> redBlocking:
>>>> [image: Inline image 2]
>>>>
>>>>
>>>> Shalom Sagges
>>>> DBA
>>>> T: +972-74-700-4035
>>>> <http://www.linkedin.com/company/164748>
>>>> <http://twitter.com/liveperson> <http://www.facebook.com/LivePersonInc> We
>>>> Create Meaningful Connections
>>>>
>>>> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>>>>
>>>>
>>>> On Thu, Nov 10, 2016 at 7:16 PM, Shalom Sagges <sh...@liveperson.com>
>>>> wrote:
>>>>
>>>>> Yes, it's occurring on the table that receives the count(*) query.
>>>>>
>>>>>
>>>>> Shalom Sagges
>>>>> DBA
>>>>> T: +972-74-700-4035
>>>>> <http://www.linkedin.com/company/164748>
>>>>> <http://twitter.com/liveperson>
>>>>> <http://www.facebook.com/LivePersonInc> We Create Meaningful
>>>>> Connections
>>>>>
>>>>> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>>>>>
>>>>>
>>>>> On Thu, Nov 10, 2016 at 5:05 PM, Alexander Dejanovski <
>>>>> alex@thelastpickle.com> wrote:
>>>>>
>>>>>> So the huge write count is occurring on the table that receives the
>>>>>> count(*) query or another table ?
>>>>>>
>>>>>> On Thu, Nov 10, 2016 at 4:02 PM Shalom Sagges <sh...@liveperson.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Tracing is off and so is TracingProbability.
>>>>>>> Just to elaborate, the huge write count occurs only a single column
>>>>>>> family which is not one of the system_traces keyspace.
>>>>>>>
>>>>>>> I also want to thank you guys for your persistent help regardless if
>>>>>>> the root cause will be found or not.. You're the best!!
>>>>>>>
>>>>>>>
>>>>>>> Shalom Sagges
>>>>>>> DBA
>>>>>>> T: +972-74-700-4035 <+972%2074-700-4035>
>>>>>>> <http://www.linkedin.com/company/164748>
>>>>>>> <http://twitter.com/liveperson>
>>>>>>> <http://www.facebook.com/LivePersonInc> We Create Meaningful
>>>>>>> Connections
>>>>>>>
>>>>>>> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Nov 10, 2016 at 4:41 PM, Alexander Dejanovski <
>>>>>>> alex@thelastpickle.com> wrote:
>>>>>>>
>>>>>>> Shalom,
>>>>>>>
>>>>>>> you may have a high trace probability which could explain what
>>>>>>> you're observing : https://docs.datastax.com/en
>>>>>>> /cassandra/2.0/cassandra/tools/toolsSetTraceProbability.html
>>>>>>>
>>>>>>> On Thu, Nov 10, 2016 at 3:37 PM Chris Lohfink <cl...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>> count(*) actually pages through all the data. So a select count(*) without
>>>>>>> a limit would be expected to cause a lot of load on the system. The hit is
>>>>>>> more than just IO load and CPU, it also creates a lot of garbage that can
>>>>>>> cause pauses slowing down the entire JVM. Some details here:
>>>>>>> http://www.datastax.com/dev/blog/counting-keys-in-cassandra
>>>>>>> <http://planetcassandra.org/blog/counting-key-in-cassandra/>
>>>>>>>
>>>>>>> You may want to consider maintaining the count yourself, using
>>>>>>> Spark, or if you just want a ball park number you can grab it from JMX.
>>>>>>>
>>>>>>> > Cassandra writes (mutations) are INSERTs, UPDATEs or DELETEs, it
>>>>>>> actually has nothing to do with flushes. A flush is the operation of moving
>>>>>>> data from memory (memtable) to disk (SSTable).
>>>>>>>
>>>>>>> FWIW in 2.0 thats not completely accurate. Before 2.1 the process of
>>>>>>> memtable flushing acquired a switchlock on that blocks mutations during the
>>>>>>> flush (the "pending task" metric is the measure of how many mutations are
>>>>>>> blocked by this lock).
>>>>>>>
>>>>>>> Chris
>>>>>>>
>>>>>>> On Thu, Nov 10, 2016 at 8:10 AM, Shalom Sagges <
>>>>>>> shaloms@liveperson.com> wrote:
>>>>>>>
>>>>>>> Hi Alexander,
>>>>>>>
>>>>>>> I'm referring to Writes Count generated from JMX:
>>>>>>> [image: Inline image 1]
>>>>>>>
>>>>>>> The higher curve shows the total write count per second for all
>>>>>>> nodes in the cluster and the lower curve is the average write count per
>>>>>>> second per node.
>>>>>>> The drop in the end is the result of shutting down one application
>>>>>>> node that performed this kind of query (we still haven't removed the query
>>>>>>> itself in this cluster).
>>>>>>>
>>>>>>>
>>>>>>> On a different cluster, where we already removed the "select
>>>>>>> count(*)" query completely, we can see that the issue was resolved (also
>>>>>>> verified this with running nodetool cfstats a few times and checked the
>>>>>>> write count difference):
>>>>>>> [image: Inline image 2]
>>>>>>>
>>>>>>>
>>>>>>> Naturally I asked how can a select query affect the write count of a
>>>>>>> node but weird as it seems, the issue was resolved once the query was
>>>>>>> removed from the code.
>>>>>>>
>>>>>>> Another side note.. One of our developers that wrote the query in
>>>>>>> the code, thought it would be nice to limit the query results to
>>>>>>> 560,000,000. Perhaps the ridiculously high limit might have caused this?
>>>>>>>
>>>>>>> Thanks!
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Shalom Sagges
>>>>>>> DBA
>>>>>>> T: +972-74-700-4035
>>>>>>> <http://www.linkedin.com/company/164748>
>>>>>>> <http://twitter.com/liveperson>
>>>>>>> <http://www.facebook.com/LivePersonInc> We Create Meaningful
>>>>>>> Connections
>>>>>>>
>>>>>>> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Nov 10, 2016 at 3:21 PM, Alexander Dejanovski <
>>>>>>> alex@thelastpickle.com> wrote:
>>>>>>>
>>>>>>> Hi Shalom,
>>>>>>>
>>>>>>> Cassandra writes (mutations) are INSERTs, UPDATEs or DELETEs, it
>>>>>>> actually has nothing to do with flushes. A flush is the operation of moving
>>>>>>> data from memory (memtable) to disk (SSTable).
>>>>>>>
>>>>>>> The Cassandra write path and read path are two different things and,
>>>>>>> as far as I know, I see no way for a select count(*) to increase your write
>>>>>>> count (if you are indeed talking about actual Cassandra writes, and not I/O
>>>>>>> operations).
>>>>>>>
>>>>>>> Cheers,
>>>>>>>
>>>>>>> On Thu, Nov 10, 2016 at 1:21 PM Shalom Sagges <
>>>>>>> shaloms@liveperson.com> wrote:
>>>>>>>
>>>>>>> Yes, I know it's obsolete, but unfortunately this takes time.
>>>>>>> We're in the process of upgrading to 2.2.8 and 3.0.9 in our
>>>>>>> clusters.
>>>>>>>
>>>>>>> Thanks!
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Shalom Sagges
>>>>>>> DBA
>>>>>>> T: +972-74-700-4035 <+972%2074-700-4035>
>>>>>>> <http://www.linkedin.com/company/164748>
>>>>>>> <http://twitter.com/liveperson>
>>>>>>> <http://www.facebook.com/LivePersonInc> We Create Meaningful
>>>>>>> Connections
>>>>>>>
>>>>>>> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Nov 10, 2016 at 1:31 PM, Vladimir Yudovin <
>>>>>>> vladyu@winguzone.com> wrote:
>>>>>>>
>>>>>>> As I said I'm not sure about it, but it will be interesting to check
>>>>>>> memory heap state with any JMX tool, e.g. https://github.com/patric
>>>>>>> -r/jvmtop
>>>>>>>
>>>>>>> By a way, why Cassandra 2.0.14? It's quit old and unsupported
>>>>>>> version. Even in 2.0 branch there is 2.0.17 available.
>>>>>>>
>>>>>>> Best regards, Vladimir Yudovin,
>>>>>>>
>>>>>>> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud
>>>>>>> CassandraLaunch your cluster in minutes.*
>>>>>>>
>>>>>>>
>>>>>>> ---- On Thu, 10 Nov 2016 05:47:37 -0500*Shalom Sagges
>>>>>>> <shaloms@liveperson.com <sh...@liveperson.com>>* wrote ----
>>>>>>>
>>>>>>> Thanks for the quick reply Vladimir.
>>>>>>> Is it really possible that ~12,500 writes per second (per node in a
>>>>>>> 12 nodes DC) are caused by memory flushes?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Shalom Sagges
>>>>>>> DBA
>>>>>>> T: +972-74-700-4035
>>>>>>> <http://www.linkedin.com/company/164748>
>>>>>>> <http://twitter.com/liveperson>
>>>>>>> <http://www.facebook.com/LivePersonInc>
>>>>>>> We Create Meaningful Connections
>>>>>>>
>>>>>>> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Nov 10, 2016 at 11:02 AM, Vladimir Yudovin <
>>>>>>> vladyu@winguzone.com> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> This message may contain confidential and/or privileged information.
>>>>>>> If you are not the addressee or authorized to receive this on behalf
>>>>>>> of the addressee you must not use, copy, disclose or take action based on
>>>>>>> this message or any information herein.
>>>>>>> If you have received this message in error, please advise the sender
>>>>>>> immediately by reply email and delete this message. Thank you.
>>>>>>>
>>>>>>>
>>>>>>> Hi Shalom,
>>>>>>>
>>>>>>> so not sure, but probably excessive memory consumption by this
>>>>>>> SELECT causes C* to flush tables to free memory.
>>>>>>>
>>>>>>> Best regards, Vladimir Yudovin,
>>>>>>>
>>>>>>> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud
>>>>>>> CassandraLaunch your cluster in minutes.*
>>>>>>>
>>>>>>>
>>>>>>> ---- On Thu, 10 Nov 2016 03:36:59 -0500*Shalom Sagges
>>>>>>> <shaloms@liveperson.com <sh...@liveperson.com>>* wrote ----
>>>>>>>
>>>>>>> Hi There!
>>>>>>>
>>>>>>> I'm using C* 2.0.14.
>>>>>>> I experienced a scenario where a "select count(*)" that ran every
>>>>>>> minute on a table with practically no results limit (yes, this should
>>>>>>> definitely be avoided), caused a huge increase in Cassandra writes to
>>>>>>> around 150 thousand writes per second for that particular table.
>>>>>>>
>>>>>>> Can anyone explain this behavior? Why would a Select query
>>>>>>> significantly increase write count in Cassandra?
>>>>>>>
>>>>>>> Thanks!
>>>>>>>
>>>>>>>
>>>>>>> Shalom Sagges
>>>>>>>
>>>>>>> <http://www.linkedin.com/company/164748>
>>>>>>> <http://twitter.com/liveperson>
>>>>>>> <http://www.facebook.com/LivePersonInc>
>>>>>>> We Create Meaningful Connections
>>>>>>>
>>>>>>> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> This message may contain confidential and/or privileged information.
>>>>>>> If you are not the addressee or authorized to receive this on behalf
>>>>>>> of the addressee you must not use, copy, disclose or take action based on
>>>>>>> this message or any information herein.
>>>>>>> If you have received this message in error, please advise the sender
>>>>>>> immediately by reply email and delete this message. Thank you.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> This message may contain confidential and/or privileged information.
>>>>>>> If you are not the addressee or authorized to receive this on behalf
>>>>>>> of the addressee you must not use, copy, disclose or take action based on
>>>>>>> this message or any information herein.
>>>>>>> If you have received this message in error, please advise the sender
>>>>>>> immediately by reply email and delete this message. Thank you.
>>>>>>>
>>>>>>> --
>>>>>>> -----------------
>>>>>>> Alexander Dejanovski
>>>>>>> France
>>>>>>> @alexanderdeja
>>>>>>>
>>>>>>> Consultant
>>>>>>> Apache Cassandra Consulting
>>>>>>> http://www.thelastpickle.com
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> This message may contain confidential and/or privileged information.
>>>>>>> If you are not the addressee or authorized to receive this on behalf
>>>>>>> of the addressee you must not use, copy, disclose or take action based on
>>>>>>> this message or any information herein.
>>>>>>> If you have received this message in error, please advise the sender
>>>>>>> immediately by reply email and delete this message. Thank you.
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> -----------------
>>>>>>> Alexander Dejanovski
>>>>>>> France
>>>>>>> @alexanderdeja
>>>>>>>
>>>>>>> Consultant
>>>>>>> Apache Cassandra Consulting
>>>>>>> http://www.thelastpickle.com
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> This message may contain confidential and/or privileged information.
>>>>>>> If you are not the addressee or authorized to receive this on behalf
>>>>>>> of the addressee you must not use, copy, disclose or take action based on
>>>>>>> this message or any information herein.
>>>>>>> If you have received this message in error, please advise the sender
>>>>>>> immediately by reply email and delete this message. Thank you.
>>>>>>>
>>>>>> --
>>>>>> -----------------
>>>>>> Alexander Dejanovski
>>>>>> France
>>>>>> @alexanderdeja
>>>>>>
>>>>>> Consultant
>>>>>> Apache Cassandra Consulting
>>>>>> http://www.thelastpickle.com
>>>>>>
>>>>>
>>>>>
>>>>
>>>> This message may contain confidential and/or privileged information.
>>>> If you are not the addressee or authorized to receive this on behalf of
>>>> the addressee you must not use, copy, disclose or take action based on this
>>>> message or any information herein.
>>>> If you have received this message in error, please advise the sender
>>>> immediately by reply email and delete this message. Thank you.
>>>>
>>>
>>>
>>>
>>> --
>>> Benjamin Roth
>>> Prokurist
>>>
>>> Jaumo GmbH · www.jaumo.com
>>> Wehrstraße 46 · 73035 Göppingen · Germany
>>> Phone +49 7161 304880-6 · Fax +49 7161 304880-1
>>> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>>>
>>
>>
>>
>> --
>> Benjamin Roth
>> Prokurist
>>
>> Jaumo GmbH · www.jaumo.com
>> Wehrstraße 46 · 73035 Göppingen · Germany
>> Phone +49 7161 304880-6 · Fax +49 7161 304880-1
>> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>>
>
>
> This message may contain confidential and/or privileged information.
> If you are not the addressee or authorized to receive this on behalf of
> the addressee you must not use, copy, disclose or take action based on this
> message or any information herein.
> If you have received this message in error, please advise the sender
> immediately by reply email and delete this message. Thank you.
>

Re: Can a Select Count(*) Affect Writes in Cassandra?

Posted by Shalom Sagges <sh...@liveperson.com>.

Thanks a lot for helping on this one.
Just one more question... I'm not familiar with the above read repair
metrics.
Can you please explain what caught your eye there?


Shalom Sagges
DBA
T: +972-74-700-4035
<http://www.linkedin.com/company/164748> <http://twitter.com/liveperson>
<http://www.facebook.com/LivePersonInc> We Create Meaningful Connections
<https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>


On Thu, Nov 10, 2016 at 7:47 PM, Benjamin Roth <be...@jaumo.com>
wrote:

> ... sorry for the short reply. To be a bit more detailed:
>
> 1. You can lower the read repair probability on that table to avoid the
> writes. But be aware that then inconsistency also wont be repaired on reads.
> 2. Maybe you should run a repair on that table to get it in sync and
> reduce the impact of read repairs. By the way, you should run repairs on a
> regular basis anyway but this is a different topic, very extensive and
> documented on many different places.
>
> 2016-11-10 17:44 GMT+00:00 Benjamin Roth <be...@jaumo.com>:
>
>> There you go :)
>>
>> 2016-11-10 17:24 GMT+00:00 Shalom Sagges <sh...@liveperson.com>:
>>
>>> That's a possibility I didn't think of...
>>>
>>> This is what I see from org.apache.cassandra.metr
>>> ics:type=ReadRepair,name=RepairedBackground
>>> [image: Inline image 1]
>>>
>>>
>>> and from org.apache.cassandra.metrics:type=ReadRepair,name=Repai
>>> redBlocking:
>>> [image: Inline image 2]
>>>
>>>
>>> Shalom Sagges
>>> DBA
>>> T: +972-74-700-4035
>>> <http://www.linkedin.com/company/164748> <http://twitter.com/liveperson>
>>> <http://www.facebook.com/LivePersonInc> We Create Meaningful Connections
>>>
>>> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>>>
>>>
>>> On Thu, Nov 10, 2016 at 7:16 PM, Shalom Sagges <sh...@liveperson.com>
>>> wrote:
>>>
>>>> Yes, it's occurring on the table that receives the count(*) query.
>>>>
>>>>
>>>> Shalom Sagges
>>>> DBA
>>>> T: +972-74-700-4035
>>>> <http://www.linkedin.com/company/164748>
>>>> <http://twitter.com/liveperson> <http://www.facebook.com/LivePersonInc> We
>>>> Create Meaningful Connections
>>>>
>>>> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>>>>
>>>>
>>>> On Thu, Nov 10, 2016 at 5:05 PM, Alexander Dejanovski <
>>>> alex@thelastpickle.com> wrote:
>>>>
>>>>> So the huge write count is occurring on the table that receives the
>>>>> count(*) query or another table ?
>>>>>
>>>>> On Thu, Nov 10, 2016 at 4:02 PM Shalom Sagges <sh...@liveperson.com>
>>>>> wrote:
>>>>>
>>>>>> Tracing is off and so is TracingProbability.
>>>>>> Just to elaborate, the huge write count occurs only a single column
>>>>>> family which is not one of the system_traces keyspace.
>>>>>>
>>>>>> I also want to thank you guys for your persistent help regardless if
>>>>>> the root cause will be found or not.. You're the best!!
>>>>>>
>>>>>>
>>>>>> Shalom Sagges
>>>>>> DBA
>>>>>> T: +972-74-700-4035 <+972%2074-700-4035>
>>>>>> <http://www.linkedin.com/company/164748>
>>>>>> <http://twitter.com/liveperson>
>>>>>> <http://www.facebook.com/LivePersonInc> We Create Meaningful
>>>>>> Connections
>>>>>>
>>>>>> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>>>>>>
>>>>>>
>>>>>> On Thu, Nov 10, 2016 at 4:41 PM, Alexander Dejanovski <
>>>>>> alex@thelastpickle.com> wrote:
>>>>>>
>>>>>> Shalom,
>>>>>>
>>>>>> you may have a high trace probability which could explain what you're
>>>>>> observing : https://docs.datastax.com/en
>>>>>> /cassandra/2.0/cassandra/tools/toolsSetTraceProbability.html
>>>>>>
>>>>>> On Thu, Nov 10, 2016 at 3:37 PM Chris Lohfink <cl...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> count(*) actually pages through all the data. So a select count(*) without
>>>>>> a limit would be expected to cause a lot of load on the system. The hit is
>>>>>> more than just IO load and CPU, it also creates a lot of garbage that can
>>>>>> cause pauses slowing down the entire JVM. Some details here:
>>>>>> http://www.datastax.com/dev/blog/counting-keys-in-cassandra
>>>>>> <http://planetcassandra.org/blog/counting-key-in-cassandra/>
>>>>>>
>>>>>> You may want to consider maintaining the count yourself, using Spark,
>>>>>> or if you just want a ball park number you can grab it from JMX.
>>>>>>
>>>>>> > Cassandra writes (mutations) are INSERTs, UPDATEs or DELETEs, it
>>>>>> actually has nothing to do with flushes. A flush is the operation of moving
>>>>>> data from memory (memtable) to disk (SSTable).
>>>>>>
>>>>>> FWIW in 2.0 thats not completely accurate. Before 2.1 the process of
>>>>>> memtable flushing acquired a switchlock on that blocks mutations during the
>>>>>> flush (the "pending task" metric is the measure of how many mutations are
>>>>>> blocked by this lock).
>>>>>>
>>>>>> Chris
>>>>>>
>>>>>> On Thu, Nov 10, 2016 at 8:10 AM, Shalom Sagges <
>>>>>> shaloms@liveperson.com> wrote:
>>>>>>
>>>>>> Hi Alexander,
>>>>>>
>>>>>> I'm referring to Writes Count generated from JMX:
>>>>>> [image: Inline image 1]
>>>>>>
>>>>>> The higher curve shows the total write count per second for all nodes
>>>>>> in the cluster and the lower curve is the average write count per second
>>>>>> per node.
>>>>>> The drop in the end is the result of shutting down one application
>>>>>> node that performed this kind of query (we still haven't removed the query
>>>>>> itself in this cluster).
>>>>>>
>>>>>>
>>>>>> On a different cluster, where we already removed the "select
>>>>>> count(*)" query completely, we can see that the issue was resolved (also
>>>>>> verified this with running nodetool cfstats a few times and checked the
>>>>>> write count difference):
>>>>>> [image: Inline image 2]
>>>>>>
>>>>>>
>>>>>> Naturally I asked how can a select query affect the write count of a
>>>>>> node but weird as it seems, the issue was resolved once the query was
>>>>>> removed from the code.
>>>>>>
>>>>>> Another side note.. One of our developers that wrote the query in the
>>>>>> code, thought it would be nice to limit the query results to 560,000,000.
>>>>>> Perhaps the ridiculously high limit might have caused this?
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>>
>>>>>>
>>>>>> Shalom Sagges
>>>>>> DBA
>>>>>> T: +972-74-700-4035
>>>>>> <http://www.linkedin.com/company/164748>
>>>>>> <http://twitter.com/liveperson>
>>>>>> <http://www.facebook.com/LivePersonInc> We Create Meaningful
>>>>>> Connections
>>>>>>
>>>>>> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>>>>>>
>>>>>>
>>>>>> On Thu, Nov 10, 2016 at 3:21 PM, Alexander Dejanovski <
>>>>>> alex@thelastpickle.com> wrote:
>>>>>>
>>>>>> Hi Shalom,
>>>>>>
>>>>>> Cassandra writes (mutations) are INSERTs, UPDATEs or DELETEs, it
>>>>>> actually has nothing to do with flushes. A flush is the operation of moving
>>>>>> data from memory (memtable) to disk (SSTable).
>>>>>>
>>>>>> The Cassandra write path and read path are two different things and,
>>>>>> as far as I know, I see no way for a select count(*) to increase your write
>>>>>> count (if you are indeed talking about actual Cassandra writes, and not I/O
>>>>>> operations).
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>> On Thu, Nov 10, 2016 at 1:21 PM Shalom Sagges <sh...@liveperson.com>
>>>>>> wrote:
>>>>>>
>>>>>> Yes, I know it's obsolete, but unfortunately this takes time.
>>>>>> We're in the process of upgrading to 2.2.8 and 3.0.9 in our clusters.
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>>
>>>>>>
>>>>>> Shalom Sagges
>>>>>> DBA
>>>>>> T: +972-74-700-4035 <+972%2074-700-4035>
>>>>>> <http://www.linkedin.com/company/164748>
>>>>>> <http://twitter.com/liveperson>
>>>>>> <http://www.facebook.com/LivePersonInc> We Create Meaningful
>>>>>> Connections
>>>>>>
>>>>>> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>>>>>>
>>>>>>
>>>>>> On Thu, Nov 10, 2016 at 1:31 PM, Vladimir Yudovin <
>>>>>> vladyu@winguzone.com> wrote:
>>>>>>
>>>>>> As I said I'm not sure about it, but it will be interesting to check
>>>>>> memory heap state with any JMX tool, e.g. https://github.com/patric
>>>>>> -r/jvmtop
>>>>>>
>>>>>> By a way, why Cassandra 2.0.14? It's quit old and unsupported
>>>>>> version. Even in 2.0 branch there is 2.0.17 available.
>>>>>>
>>>>>> Best regards, Vladimir Yudovin,
>>>>>>
>>>>>> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud
>>>>>> CassandraLaunch your cluster in minutes.*
>>>>>>
>>>>>>
>>>>>> ---- On Thu, 10 Nov 2016 05:47:37 -0500*Shalom Sagges
>>>>>> <shaloms@liveperson.com <sh...@liveperson.com>>* wrote ----
>>>>>>
>>>>>> Thanks for the quick reply Vladimir.
>>>>>> Is it really possible that ~12,500 writes per second (per node in a
>>>>>> 12 nodes DC) are caused by memory flushes?
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Shalom Sagges
>>>>>> DBA
>>>>>> T: +972-74-700-4035
>>>>>> <http://www.linkedin.com/company/164748>
>>>>>> <http://twitter.com/liveperson>
>>>>>> <http://www.facebook.com/LivePersonInc>
>>>>>> We Create Meaningful Connections
>>>>>>
>>>>>> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Nov 10, 2016 at 11:02 AM, Vladimir Yudovin <
>>>>>> vladyu@winguzone.com> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> This message may contain confidential and/or privileged information.
>>>>>> If you are not the addressee or authorized to receive this on behalf
>>>>>> of the addressee you must not use, copy, disclose or take action based on
>>>>>> this message or any information herein.
>>>>>> If you have received this message in error, please advise the sender
>>>>>> immediately by reply email and delete this message. Thank you.
>>>>>>
>>>>>>
>>>>>> Hi Shalom,
>>>>>>
>>>>>> so not sure, but probably excessive memory consumption by this SELECT
>>>>>> causes C* to flush tables to free memory.
>>>>>>
>>>>>> Best regards, Vladimir Yudovin,
>>>>>>
>>>>>> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud
>>>>>> CassandraLaunch your cluster in minutes.*
>>>>>>
>>>>>>
>>>>>> ---- On Thu, 10 Nov 2016 03:36:59 -0500*Shalom Sagges
>>>>>> <shaloms@liveperson.com <sh...@liveperson.com>>* wrote ----
>>>>>>
>>>>>> Hi There!
>>>>>>
>>>>>> I'm using C* 2.0.14.
>>>>>> I experienced a scenario where a "select count(*)" that ran every
>>>>>> minute on a table with practically no results limit (yes, this should
>>>>>> definitely be avoided), caused a huge increase in Cassandra writes to
>>>>>> around 150 thousand writes per second for that particular table.
>>>>>>
>>>>>> Can anyone explain this behavior? Why would a Select query
>>>>>> significantly increase write count in Cassandra?
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>>
>>>>>> Shalom Sagges
>>>>>>
>>>>>> <http://www.linkedin.com/company/164748>
>>>>>> <http://twitter.com/liveperson>
>>>>>> <http://www.facebook.com/LivePersonInc>
>>>>>> We Create Meaningful Connections
>>>>>>
>>>>>> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>>>>>>
>>>>>>
>>>>>>
>>>>>> This message may contain confidential and/or privileged information.
>>>>>> If you are not the addressee or authorized to receive this on behalf
>>>>>> of the addressee you must not use, copy, disclose or take action based on
>>>>>> this message or any information herein.
>>>>>> If you have received this message in error, please advise the sender
>>>>>> immediately by reply email and delete this message. Thank you.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> This message may contain confidential and/or privileged information.
>>>>>> If you are not the addressee or authorized to receive this on behalf
>>>>>> of the addressee you must not use, copy, disclose or take action based on
>>>>>> this message or any information herein.
>>>>>> If you have received this message in error, please advise the sender
>>>>>> immediately by reply email and delete this message. Thank you.
>>>>>>
>>>>>> --
>>>>>> -----------------
>>>>>> Alexander Dejanovski
>>>>>> France
>>>>>> @alexanderdeja
>>>>>>
>>>>>> Consultant
>>>>>> Apache Cassandra Consulting
>>>>>> http://www.thelastpickle.com
>>>>>>
>>>>>>
>>>>>>
>>>>>> This message may contain confidential and/or privileged information.
>>>>>> If you are not the addressee or authorized to receive this on behalf
>>>>>> of the addressee you must not use, copy, disclose or take action based on
>>>>>> this message or any information herein.
>>>>>> If you have received this message in error, please advise the sender
>>>>>> immediately by reply email and delete this message. Thank you.
>>>>>>
>>>>>>
>>>>>> --
>>>>>> -----------------
>>>>>> Alexander Dejanovski
>>>>>> France
>>>>>> @alexanderdeja
>>>>>>
>>>>>> Consultant
>>>>>> Apache Cassandra Consulting
>>>>>> http://www.thelastpickle.com
>>>>>>
>>>>>>
>>>>>>
>>>>>> This message may contain confidential and/or privileged information.
>>>>>> If you are not the addressee or authorized to receive this on behalf
>>>>>> of the addressee you must not use, copy, disclose or take action based on
>>>>>> this message or any information herein.
>>>>>> If you have received this message in error, please advise the sender
>>>>>> immediately by reply email and delete this message. Thank you.
>>>>>>
>>>>> --
>>>>> -----------------
>>>>> Alexander Dejanovski
>>>>> France
>>>>> @alexanderdeja
>>>>>
>>>>> Consultant
>>>>> Apache Cassandra Consulting
>>>>> http://www.thelastpickle.com
>>>>>
>>>>
>>>>
>>>
>>> This message may contain confidential and/or privileged information.
>>> If you are not the addressee or authorized to receive this on behalf of
>>> the addressee you must not use, copy, disclose or take action based on this
>>> message or any information herein.
>>> If you have received this message in error, please advise the sender
>>> immediately by reply email and delete this message. Thank you.
>>>
>>
>>
>>
>> --
>> Benjamin Roth
>> Prokurist
>>
>> Jaumo GmbH · www.jaumo.com
>> Wehrstraße 46 · 73035 Göppingen · Germany
>> Phone +49 7161 304880-6 · Fax +49 7161 304880-1
>> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>>
>
>
>
> --
> Benjamin Roth
> Prokurist
>
> Jaumo GmbH · www.jaumo.com
> Wehrstraße 46 · 73035 Göppingen · Germany
> Phone +49 7161 304880-6 · Fax +49 7161 304880-1
> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>

-- 
This message may contain confidential and/or privileged information. 
If you are not the addressee or authorized to receive this on behalf of the 
addressee you must not use, copy, disclose or take action based on this 
message or any information herein. 
If you have received this message in error, please advise the sender 
immediately by reply email and delete this message. Thank you.

Re: Can a Select Count(*) Affect Writes in Cassandra?

Posted by Benjamin Roth <be...@jaumo.com>.

... sorry for the short reply. To be a bit more detailed:

1. You can lower the read repair probability on that table to avoid the
writes. But be aware that then inconsistency also wont be repaired on reads.
2. Maybe you should run a repair on that table to get it in sync and reduce
the impact of read repairs. By the way, you should run repairs on a regular
basis anyway but this is a different topic, very extensive and documented
on many different places.

2016-11-10 17:44 GMT+00:00 Benjamin Roth <be...@jaumo.com>:

> There you go :)
>
> 2016-11-10 17:24 GMT+00:00 Shalom Sagges <sh...@liveperson.com>:
>
>> That's a possibility I didn't think of...
>>
>> This is what I see from org.apache.cassandra.metr
>> ics:type=ReadRepair,name=RepairedBackground
>> [image: Inline image 1]
>>
>>
>> and from org.apache.cassandra.metrics:type=ReadRepair,name=Repai
>> redBlocking:
>> [image: Inline image 2]
>>
>>
>> Shalom Sagges
>> DBA
>> T: +972-74-700-4035
>> <http://www.linkedin.com/company/164748> <http://twitter.com/liveperson>
>> <http://www.facebook.com/LivePersonInc> We Create Meaningful Connections
>>
>> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>>
>>
>> On Thu, Nov 10, 2016 at 7:16 PM, Shalom Sagges <sh...@liveperson.com>
>> wrote:
>>
>>> Yes, it's occurring on the table that receives the count(*) query.
>>>
>>>
>>> Shalom Sagges
>>> DBA
>>> T: +972-74-700-4035
>>> <http://www.linkedin.com/company/164748> <http://twitter.com/liveperson>
>>> <http://www.facebook.com/LivePersonInc> We Create Meaningful Connections
>>>
>>> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>>>
>>>
>>> On Thu, Nov 10, 2016 at 5:05 PM, Alexander Dejanovski <
>>> alex@thelastpickle.com> wrote:
>>>
>>>> So the huge write count is occurring on the table that receives the
>>>> count(*) query or another table ?
>>>>
>>>> On Thu, Nov 10, 2016 at 4:02 PM Shalom Sagges <sh...@liveperson.com>
>>>> wrote:
>>>>
>>>>> Tracing is off and so is TracingProbability.
>>>>> Just to elaborate, the huge write count occurs only a single column
>>>>> family which is not one of the system_traces keyspace.
>>>>>
>>>>> I also want to thank you guys for your persistent help regardless if
>>>>> the root cause will be found or not.. You're the best!!
>>>>>
>>>>>
>>>>> Shalom Sagges
>>>>> DBA
>>>>> T: +972-74-700-4035 <+972%2074-700-4035>
>>>>> <http://www.linkedin.com/company/164748>
>>>>> <http://twitter.com/liveperson>
>>>>> <http://www.facebook.com/LivePersonInc> We Create Meaningful
>>>>> Connections
>>>>>
>>>>> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>>>>>
>>>>>
>>>>> On Thu, Nov 10, 2016 at 4:41 PM, Alexander Dejanovski <
>>>>> alex@thelastpickle.com> wrote:
>>>>>
>>>>> Shalom,
>>>>>
>>>>> you may have a high trace probability which could explain what you're
>>>>> observing : https://docs.datastax.com/en/cassandra/2.0/cassandra/tools
>>>>> /toolsSetTraceProbability.html
>>>>>
>>>>> On Thu, Nov 10, 2016 at 3:37 PM Chris Lohfink <cl...@gmail.com>
>>>>> wrote:
>>>>>
>>>>> count(*) actually pages through all the data. So a select count(*) without
>>>>> a limit would be expected to cause a lot of load on the system. The hit is
>>>>> more than just IO load and CPU, it also creates a lot of garbage that can
>>>>> cause pauses slowing down the entire JVM. Some details here:
>>>>> http://www.datastax.com/dev/blog/counting-keys-in-cassandra
>>>>> <http://planetcassandra.org/blog/counting-key-in-cassandra/>
>>>>>
>>>>> You may want to consider maintaining the count yourself, using Spark,
>>>>> or if you just want a ball park number you can grab it from JMX.
>>>>>
>>>>> > Cassandra writes (mutations) are INSERTs, UPDATEs or DELETEs, it
>>>>> actually has nothing to do with flushes. A flush is the operation of moving
>>>>> data from memory (memtable) to disk (SSTable).
>>>>>
>>>>> FWIW in 2.0 thats not completely accurate. Before 2.1 the process of
>>>>> memtable flushing acquired a switchlock on that blocks mutations during the
>>>>> flush (the "pending task" metric is the measure of how many mutations are
>>>>> blocked by this lock).
>>>>>
>>>>> Chris
>>>>>
>>>>> On Thu, Nov 10, 2016 at 8:10 AM, Shalom Sagges <shaloms@liveperson.com
>>>>> > wrote:
>>>>>
>>>>> Hi Alexander,
>>>>>
>>>>> I'm referring to Writes Count generated from JMX:
>>>>> [image: Inline image 1]
>>>>>
>>>>> The higher curve shows the total write count per second for all nodes
>>>>> in the cluster and the lower curve is the average write count per second
>>>>> per node.
>>>>> The drop in the end is the result of shutting down one application
>>>>> node that performed this kind of query (we still haven't removed the query
>>>>> itself in this cluster).
>>>>>
>>>>>
>>>>> On a different cluster, where we already removed the "select count(*)"
>>>>> query completely, we can see that the issue was resolved (also verified
>>>>> this with running nodetool cfstats a few times and checked the write count
>>>>> difference):
>>>>> [image: Inline image 2]
>>>>>
>>>>>
>>>>> Naturally I asked how can a select query affect the write count of a
>>>>> node but weird as it seems, the issue was resolved once the query was
>>>>> removed from the code.
>>>>>
>>>>> Another side note.. One of our developers that wrote the query in the
>>>>> code, thought it would be nice to limit the query results to 560,000,000.
>>>>> Perhaps the ridiculously high limit might have caused this?
>>>>>
>>>>> Thanks!
>>>>>
>>>>>
>>>>>
>>>>> Shalom Sagges
>>>>> DBA
>>>>> T: +972-74-700-4035
>>>>> <http://www.linkedin.com/company/164748>
>>>>> <http://twitter.com/liveperson>
>>>>> <http://www.facebook.com/LivePersonInc> We Create Meaningful
>>>>> Connections
>>>>>
>>>>> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>>>>>
>>>>>
>>>>> On Thu, Nov 10, 2016 at 3:21 PM, Alexander Dejanovski <
>>>>> alex@thelastpickle.com> wrote:
>>>>>
>>>>> Hi Shalom,
>>>>>
>>>>> Cassandra writes (mutations) are INSERTs, UPDATEs or DELETEs, it
>>>>> actually has nothing to do with flushes. A flush is the operation of moving
>>>>> data from memory (memtable) to disk (SSTable).
>>>>>
>>>>> The Cassandra write path and read path are two different things and,
>>>>> as far as I know, I see no way for a select count(*) to increase your write
>>>>> count (if you are indeed talking about actual Cassandra writes, and not I/O
>>>>> operations).
>>>>>
>>>>> Cheers,
>>>>>
>>>>> On Thu, Nov 10, 2016 at 1:21 PM Shalom Sagges <sh...@liveperson.com>
>>>>> wrote:
>>>>>
>>>>> Yes, I know it's obsolete, but unfortunately this takes time.
>>>>> We're in the process of upgrading to 2.2.8 and 3.0.9 in our clusters.
>>>>>
>>>>> Thanks!
>>>>>
>>>>>
>>>>>
>>>>> Shalom Sagges
>>>>> DBA
>>>>> T: +972-74-700-4035 <+972%2074-700-4035>
>>>>> <http://www.linkedin.com/company/164748>
>>>>> <http://twitter.com/liveperson>
>>>>> <http://www.facebook.com/LivePersonInc> We Create Meaningful
>>>>> Connections
>>>>>
>>>>> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>>>>>
>>>>>
>>>>> On Thu, Nov 10, 2016 at 1:31 PM, Vladimir Yudovin <
>>>>> vladyu@winguzone.com> wrote:
>>>>>
>>>>> As I said I'm not sure about it, but it will be interesting to check
>>>>> memory heap state with any JMX tool, e.g. https://github.com/patric
>>>>> -r/jvmtop
>>>>>
>>>>> By a way, why Cassandra 2.0.14? It's quit old and unsupported version.
>>>>> Even in 2.0 branch there is 2.0.17 available.
>>>>>
>>>>> Best regards, Vladimir Yudovin,
>>>>>
>>>>> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud
>>>>> CassandraLaunch your cluster in minutes.*
>>>>>
>>>>>
>>>>> ---- On Thu, 10 Nov 2016 05:47:37 -0500*Shalom Sagges
>>>>> <shaloms@liveperson.com <sh...@liveperson.com>>* wrote ----
>>>>>
>>>>> Thanks for the quick reply Vladimir.
>>>>> Is it really possible that ~12,500 writes per second (per node in a 12
>>>>> nodes DC) are caused by memory flushes?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Shalom Sagges
>>>>> DBA
>>>>> T: +972-74-700-4035
>>>>> <http://www.linkedin.com/company/164748>
>>>>> <http://twitter.com/liveperson>
>>>>> <http://www.facebook.com/LivePersonInc>
>>>>> We Create Meaningful Connections
>>>>>
>>>>> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Nov 10, 2016 at 11:02 AM, Vladimir Yudovin <
>>>>> vladyu@winguzone.com> wrote:
>>>>>
>>>>>
>>>>>
>>>>> This message may contain confidential and/or privileged information.
>>>>> If you are not the addressee or authorized to receive this on behalf
>>>>> of the addressee you must not use, copy, disclose or take action based on
>>>>> this message or any information herein.
>>>>> If you have received this message in error, please advise the sender
>>>>> immediately by reply email and delete this message. Thank you.
>>>>>
>>>>>
>>>>> Hi Shalom,
>>>>>
>>>>> so not sure, but probably excessive memory consumption by this SELECT
>>>>> causes C* to flush tables to free memory.
>>>>>
>>>>> Best regards, Vladimir Yudovin,
>>>>>
>>>>> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud
>>>>> CassandraLaunch your cluster in minutes.*
>>>>>
>>>>>
>>>>> ---- On Thu, 10 Nov 2016 03:36:59 -0500*Shalom Sagges
>>>>> <shaloms@liveperson.com <sh...@liveperson.com>>* wrote ----
>>>>>
>>>>> Hi There!
>>>>>
>>>>> I'm using C* 2.0.14.
>>>>> I experienced a scenario where a "select count(*)" that ran every
>>>>> minute on a table with practically no results limit (yes, this should
>>>>> definitely be avoided), caused a huge increase in Cassandra writes to
>>>>> around 150 thousand writes per second for that particular table.
>>>>>
>>>>> Can anyone explain this behavior? Why would a Select query
>>>>> significantly increase write count in Cassandra?
>>>>>
>>>>> Thanks!
>>>>>
>>>>>
>>>>> Shalom Sagges
>>>>>
>>>>> <http://www.linkedin.com/company/164748>
>>>>> <http://twitter.com/liveperson>
>>>>> <http://www.facebook.com/LivePersonInc>
>>>>> We Create Meaningful Connections
>>>>>
>>>>> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>>>>>
>>>>>
>>>>>
>>>>> This message may contain confidential and/or privileged information.
>>>>> If you are not the addressee or authorized to receive this on behalf
>>>>> of the addressee you must not use, copy, disclose or take action based on
>>>>> this message or any information herein.
>>>>> If you have received this message in error, please advise the sender
>>>>> immediately by reply email and delete this message. Thank you.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> This message may contain confidential and/or privileged information.
>>>>> If you are not the addressee or authorized to receive this on behalf
>>>>> of the addressee you must not use, copy, disclose or take action based on
>>>>> this message or any information herein.
>>>>> If you have received this message in error, please advise the sender
>>>>> immediately by reply email and delete this message. Thank you.
>>>>>
>>>>> --
>>>>> -----------------
>>>>> Alexander Dejanovski
>>>>> France
>>>>> @alexanderdeja
>>>>>
>>>>> Consultant
>>>>> Apache Cassandra Consulting
>>>>> http://www.thelastpickle.com
>>>>>
>>>>>
>>>>>
>>>>> This message may contain confidential and/or privileged information.
>>>>> If you are not the addressee or authorized to receive this on behalf
>>>>> of the addressee you must not use, copy, disclose or take action based on
>>>>> this message or any information herein.
>>>>> If you have received this message in error, please advise the sender
>>>>> immediately by reply email and delete this message. Thank you.
>>>>>
>>>>>
>>>>> --
>>>>> -----------------
>>>>> Alexander Dejanovski
>>>>> France
>>>>> @alexanderdeja
>>>>>
>>>>> Consultant
>>>>> Apache Cassandra Consulting
>>>>> http://www.thelastpickle.com
>>>>>
>>>>>
>>>>>
>>>>> This message may contain confidential and/or privileged information.
>>>>> If you are not the addressee or authorized to receive this on behalf
>>>>> of the addressee you must not use, copy, disclose or take action based on
>>>>> this message or any information herein.
>>>>> If you have received this message in error, please advise the sender
>>>>> immediately by reply email and delete this message. Thank you.
>>>>>
>>>> --
>>>> -----------------
>>>> Alexander Dejanovski
>>>> France
>>>> @alexanderdeja
>>>>
>>>> Consultant
>>>> Apache Cassandra Consulting
>>>> http://www.thelastpickle.com
>>>>
>>>
>>>
>>
>> This message may contain confidential and/or privileged information.
>> If you are not the addressee or authorized to receive this on behalf of
>> the addressee you must not use, copy, disclose or take action based on this
>> message or any information herein.
>> If you have received this message in error, please advise the sender
>> immediately by reply email and delete this message. Thank you.
>>
>
>
>
> --
> Benjamin Roth
> Prokurist
>
> Jaumo GmbH · www.jaumo.com
> Wehrstraße 46 · 73035 Göppingen · Germany
> Phone +49 7161 304880-6 · Fax +49 7161 304880-1
> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>



-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

Re: Can a Select Count(*) Affect Writes in Cassandra?

Posted by Benjamin Roth <be...@jaumo.com>.

There you go :)

2016-11-10 17:24 GMT+00:00 Shalom Sagges <sh...@liveperson.com>:

> That's a possibility I didn't think of...
>
> This is what I see from org.apache.cassandra.metrics:type=ReadRepair,name=
> RepairedBackground
> [image: Inline image 1]
>
>
> and from org.apache.cassandra.metrics:type=ReadRepair,name=
> RepairedBlocking:
> [image: Inline image 2]
>
>
> Shalom Sagges
> DBA
> T: +972-74-700-4035
> <http://www.linkedin.com/company/164748> <http://twitter.com/liveperson>
> <http://www.facebook.com/LivePersonInc> We Create Meaningful Connections
>
> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>
>
> On Thu, Nov 10, 2016 at 7:16 PM, Shalom Sagges <sh...@liveperson.com>
> wrote:
>
>> Yes, it's occurring on the table that receives the count(*) query.
>>
>>
>> Shalom Sagges
>> DBA
>> T: +972-74-700-4035
>> <http://www.linkedin.com/company/164748> <http://twitter.com/liveperson>
>> <http://www.facebook.com/LivePersonInc> We Create Meaningful Connections
>>
>> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>>
>>
>> On Thu, Nov 10, 2016 at 5:05 PM, Alexander Dejanovski <
>> alex@thelastpickle.com> wrote:
>>
>>> So the huge write count is occurring on the table that receives the
>>> count(*) query or another table ?
>>>
>>> On Thu, Nov 10, 2016 at 4:02 PM Shalom Sagges <sh...@liveperson.com>
>>> wrote:
>>>
>>>> Tracing is off and so is TracingProbability.
>>>> Just to elaborate, the huge write count occurs only a single column
>>>> family which is not one of the system_traces keyspace.
>>>>
>>>> I also want to thank you guys for your persistent help regardless if
>>>> the root cause will be found or not.. You're the best!!
>>>>
>>>>
>>>> Shalom Sagges
>>>> DBA
>>>> T: +972-74-700-4035 <+972%2074-700-4035>
>>>> <http://www.linkedin.com/company/164748>
>>>> <http://twitter.com/liveperson> <http://www.facebook.com/LivePersonInc> We
>>>> Create Meaningful Connections
>>>>
>>>> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>>>>
>>>>
>>>> On Thu, Nov 10, 2016 at 4:41 PM, Alexander Dejanovski <
>>>> alex@thelastpickle.com> wrote:
>>>>
>>>> Shalom,
>>>>
>>>> you may have a high trace probability which could explain what you're
>>>> observing : https://docs.datastax.com/en/cassandra/2.0/cassandra/tools
>>>> /toolsSetTraceProbability.html
>>>>
>>>> On Thu, Nov 10, 2016 at 3:37 PM Chris Lohfink <cl...@gmail.com>
>>>> wrote:
>>>>
>>>> count(*) actually pages through all the data. So a select count(*) without
>>>> a limit would be expected to cause a lot of load on the system. The hit is
>>>> more than just IO load and CPU, it also creates a lot of garbage that can
>>>> cause pauses slowing down the entire JVM. Some details here:
>>>> http://www.datastax.com/dev/blog/counting-keys-in-cassandra
>>>> <http://planetcassandra.org/blog/counting-key-in-cassandra/>
>>>>
>>>> You may want to consider maintaining the count yourself, using Spark,
>>>> or if you just want a ball park number you can grab it from JMX.
>>>>
>>>> > Cassandra writes (mutations) are INSERTs, UPDATEs or DELETEs, it
>>>> actually has nothing to do with flushes. A flush is the operation of moving
>>>> data from memory (memtable) to disk (SSTable).
>>>>
>>>> FWIW in 2.0 thats not completely accurate. Before 2.1 the process of
>>>> memtable flushing acquired a switchlock on that blocks mutations during the
>>>> flush (the "pending task" metric is the measure of how many mutations are
>>>> blocked by this lock).
>>>>
>>>> Chris
>>>>
>>>> On Thu, Nov 10, 2016 at 8:10 AM, Shalom Sagges <sh...@liveperson.com>
>>>> wrote:
>>>>
>>>> Hi Alexander,
>>>>
>>>> I'm referring to Writes Count generated from JMX:
>>>> [image: Inline image 1]
>>>>
>>>> The higher curve shows the total write count per second for all nodes
>>>> in the cluster and the lower curve is the average write count per second
>>>> per node.
>>>> The drop in the end is the result of shutting down one application node
>>>> that performed this kind of query (we still haven't removed the query
>>>> itself in this cluster).
>>>>
>>>>
>>>> On a different cluster, where we already removed the "select count(*)"
>>>> query completely, we can see that the issue was resolved (also verified
>>>> this with running nodetool cfstats a few times and checked the write count
>>>> difference):
>>>> [image: Inline image 2]
>>>>
>>>>
>>>> Naturally I asked how can a select query affect the write count of a
>>>> node but weird as it seems, the issue was resolved once the query was
>>>> removed from the code.
>>>>
>>>> Another side note.. One of our developers that wrote the query in the
>>>> code, thought it would be nice to limit the query results to 560,000,000.
>>>> Perhaps the ridiculously high limit might have caused this?
>>>>
>>>> Thanks!
>>>>
>>>>
>>>>
>>>> Shalom Sagges
>>>> DBA
>>>> T: +972-74-700-4035
>>>> <http://www.linkedin.com/company/164748>
>>>> <http://twitter.com/liveperson> <http://www.facebook.com/LivePersonInc> We
>>>> Create Meaningful Connections
>>>>
>>>> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>>>>
>>>>
>>>> On Thu, Nov 10, 2016 at 3:21 PM, Alexander Dejanovski <
>>>> alex@thelastpickle.com> wrote:
>>>>
>>>> Hi Shalom,
>>>>
>>>> Cassandra writes (mutations) are INSERTs, UPDATEs or DELETEs, it
>>>> actually has nothing to do with flushes. A flush is the operation of moving
>>>> data from memory (memtable) to disk (SSTable).
>>>>
>>>> The Cassandra write path and read path are two different things and, as
>>>> far as I know, I see no way for a select count(*) to increase your write
>>>> count (if you are indeed talking about actual Cassandra writes, and not I/O
>>>> operations).
>>>>
>>>> Cheers,
>>>>
>>>> On Thu, Nov 10, 2016 at 1:21 PM Shalom Sagges <sh...@liveperson.com>
>>>> wrote:
>>>>
>>>> Yes, I know it's obsolete, but unfortunately this takes time.
>>>> We're in the process of upgrading to 2.2.8 and 3.0.9 in our clusters.
>>>>
>>>> Thanks!
>>>>
>>>>
>>>>
>>>> Shalom Sagges
>>>> DBA
>>>> T: +972-74-700-4035 <+972%2074-700-4035>
>>>> <http://www.linkedin.com/company/164748>
>>>> <http://twitter.com/liveperson> <http://www.facebook.com/LivePersonInc> We
>>>> Create Meaningful Connections
>>>>
>>>> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>>>>
>>>>
>>>> On Thu, Nov 10, 2016 at 1:31 PM, Vladimir Yudovin <vladyu@winguzone.com
>>>> > wrote:
>>>>
>>>> As I said I'm not sure about it, but it will be interesting to check
>>>> memory heap state with any JMX tool, e.g. https://github.com/patric
>>>> -r/jvmtop
>>>>
>>>> By a way, why Cassandra 2.0.14? It's quit old and unsupported version.
>>>> Even in 2.0 branch there is 2.0.17 available.
>>>>
>>>> Best regards, Vladimir Yudovin,
>>>>
>>>> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud
>>>> CassandraLaunch your cluster in minutes.*
>>>>
>>>>
>>>> ---- On Thu, 10 Nov 2016 05:47:37 -0500*Shalom Sagges
>>>> <shaloms@liveperson.com <sh...@liveperson.com>>* wrote ----
>>>>
>>>> Thanks for the quick reply Vladimir.
>>>> Is it really possible that ~12,500 writes per second (per node in a 12
>>>> nodes DC) are caused by memory flushes?
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Shalom Sagges
>>>> DBA
>>>> T: +972-74-700-4035
>>>> <http://www.linkedin.com/company/164748>
>>>> <http://twitter.com/liveperson>
>>>> <http://www.facebook.com/LivePersonInc>
>>>> We Create Meaningful Connections
>>>>
>>>> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>>>>
>>>>
>>>>
>>>> On Thu, Nov 10, 2016 at 11:02 AM, Vladimir Yudovin <
>>>> vladyu@winguzone.com> wrote:
>>>>
>>>>
>>>>
>>>> This message may contain confidential and/or privileged information.
>>>> If you are not the addressee or authorized to receive this on behalf of
>>>> the addressee you must not use, copy, disclose or take action based on this
>>>> message or any information herein.
>>>> If you have received this message in error, please advise the sender
>>>> immediately by reply email and delete this message. Thank you.
>>>>
>>>>
>>>> Hi Shalom,
>>>>
>>>> so not sure, but probably excessive memory consumption by this SELECT
>>>> causes C* to flush tables to free memory.
>>>>
>>>> Best regards, Vladimir Yudovin,
>>>>
>>>> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud
>>>> CassandraLaunch your cluster in minutes.*
>>>>
>>>>
>>>> ---- On Thu, 10 Nov 2016 03:36:59 -0500*Shalom Sagges
>>>> <shaloms@liveperson.com <sh...@liveperson.com>>* wrote ----
>>>>
>>>> Hi There!
>>>>
>>>> I'm using C* 2.0.14.
>>>> I experienced a scenario where a "select count(*)" that ran every
>>>> minute on a table with practically no results limit (yes, this should
>>>> definitely be avoided), caused a huge increase in Cassandra writes to
>>>> around 150 thousand writes per second for that particular table.
>>>>
>>>> Can anyone explain this behavior? Why would a Select query
>>>> significantly increase write count in Cassandra?
>>>>
>>>> Thanks!
>>>>
>>>>
>>>> Shalom Sagges
>>>>
>>>> <http://www.linkedin.com/company/164748>
>>>> <http://twitter.com/liveperson>
>>>> <http://www.facebook.com/LivePersonInc>
>>>> We Create Meaningful Connections
>>>>
>>>> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>>>>
>>>>
>>>>
>>>> This message may contain confidential and/or privileged information.
>>>> If you are not the addressee or authorized to receive this on behalf of
>>>> the addressee you must not use, copy, disclose or take action based on this
>>>> message or any information herein.
>>>> If you have received this message in error, please advise the sender
>>>> immediately by reply email and delete this message. Thank you.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> This message may contain confidential and/or privileged information.
>>>> If you are not the addressee or authorized to receive this on behalf of
>>>> the addressee you must not use, copy, disclose or take action based on this
>>>> message or any information herein.
>>>> If you have received this message in error, please advise the sender
>>>> immediately by reply email and delete this message. Thank you.
>>>>
>>>> --
>>>> -----------------
>>>> Alexander Dejanovski
>>>> France
>>>> @alexanderdeja
>>>>
>>>> Consultant
>>>> Apache Cassandra Consulting
>>>> http://www.thelastpickle.com
>>>>
>>>>
>>>>
>>>> This message may contain confidential and/or privileged information.
>>>> If you are not the addressee or authorized to receive this on behalf of
>>>> the addressee you must not use, copy, disclose or take action based on this
>>>> message or any information herein.
>>>> If you have received this message in error, please advise the sender
>>>> immediately by reply email and delete this message. Thank you.
>>>>
>>>>
>>>> --
>>>> -----------------
>>>> Alexander Dejanovski
>>>> France
>>>> @alexanderdeja
>>>>
>>>> Consultant
>>>> Apache Cassandra Consulting
>>>> http://www.thelastpickle.com
>>>>
>>>>
>>>>
>>>> This message may contain confidential and/or privileged information.
>>>> If you are not the addressee or authorized to receive this on behalf of
>>>> the addressee you must not use, copy, disclose or take action based on this
>>>> message or any information herein.
>>>> If you have received this message in error, please advise the sender
>>>> immediately by reply email and delete this message. Thank you.
>>>>
>>> --
>>> -----------------
>>> Alexander Dejanovski
>>> France
>>> @alexanderdeja
>>>
>>> Consultant
>>> Apache Cassandra Consulting
>>> http://www.thelastpickle.com
>>>
>>
>>
>
> This message may contain confidential and/or privileged information.
> If you are not the addressee or authorized to receive this on behalf of
> the addressee you must not use, copy, disclose or take action based on this
> message or any information herein.
> If you have received this message in error, please advise the sender
> immediately by reply email and delete this message. Thank you.
>



-- 
Benjamin Roth
Prokurist

Jaumo GmbH · www.jaumo.com
Wehrstraße 46 · 73035 Göppingen · Germany
Phone +49 7161 304880-6 · Fax +49 7161 304880-1
AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

Re: Can a Select Count(*) Affect Writes in Cassandra?

Posted by Shalom Sagges <sh...@liveperson.com>.

That's a possibility I didn't think of...

This is what I see
from org.apache.cassandra.metrics:type=ReadRepair,name=RepairedBackground
[image: Inline image 1]


and from org.apache.cassandra.metrics:type=ReadRepair,name=RepairedBlocking:
[image: Inline image 2]


Shalom Sagges
DBA
T: +972-74-700-4035
<http://www.linkedin.com/company/164748> <http://twitter.com/liveperson>
<http://www.facebook.com/LivePersonInc> We Create Meaningful Connections
<https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>


On Thu, Nov 10, 2016 at 7:16 PM, Shalom Sagges <sh...@liveperson.com>
wrote:

> Yes, it's occurring on the table that receives the count(*) query.
>
>
> Shalom Sagges
> DBA
> T: +972-74-700-4035
> <http://www.linkedin.com/company/164748> <http://twitter.com/liveperson>
> <http://www.facebook.com/LivePersonInc> We Create Meaningful Connections
>
> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>
>
> On Thu, Nov 10, 2016 at 5:05 PM, Alexander Dejanovski <
> alex@thelastpickle.com> wrote:
>
>> So the huge write count is occurring on the table that receives the
>> count(*) query or another table ?
>>
>> On Thu, Nov 10, 2016 at 4:02 PM Shalom Sagges <sh...@liveperson.com>
>> wrote:
>>
>>> Tracing is off and so is TracingProbability.
>>> Just to elaborate, the huge write count occurs only a single column
>>> family which is not one of the system_traces keyspace.
>>>
>>> I also want to thank you guys for your persistent help regardless if the
>>> root cause will be found or not.. You're the best!!
>>>
>>>
>>> Shalom Sagges
>>> DBA
>>> T: +972-74-700-4035 <+972%2074-700-4035>
>>> <http://www.linkedin.com/company/164748> <http://twitter.com/liveperson>
>>> <http://www.facebook.com/LivePersonInc> We Create Meaningful Connections
>>>
>>> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>>>
>>>
>>> On Thu, Nov 10, 2016 at 4:41 PM, Alexander Dejanovski <
>>> alex@thelastpickle.com> wrote:
>>>
>>> Shalom,
>>>
>>> you may have a high trace probability which could explain what you're
>>> observing : https://docs.datastax.com/en/cassandra/2.0/cassandra/tools
>>> /toolsSetTraceProbability.html
>>>
>>> On Thu, Nov 10, 2016 at 3:37 PM Chris Lohfink <cl...@gmail.com>
>>> wrote:
>>>
>>> count(*) actually pages through all the data. So a select count(*) without
>>> a limit would be expected to cause a lot of load on the system. The hit is
>>> more than just IO load and CPU, it also creates a lot of garbage that can
>>> cause pauses slowing down the entire JVM. Some details here:
>>> http://www.datastax.com/dev/blog/counting-keys-in-cassandra
>>> <http://planetcassandra.org/blog/counting-key-in-cassandra/>
>>>
>>> You may want to consider maintaining the count yourself, using Spark, or
>>> if you just want a ball park number you can grab it from JMX.
>>>
>>> > Cassandra writes (mutations) are INSERTs, UPDATEs or DELETEs, it
>>> actually has nothing to do with flushes. A flush is the operation of moving
>>> data from memory (memtable) to disk (SSTable).
>>>
>>> FWIW in 2.0 thats not completely accurate. Before 2.1 the process of
>>> memtable flushing acquired a switchlock on that blocks mutations during the
>>> flush (the "pending task" metric is the measure of how many mutations are
>>> blocked by this lock).
>>>
>>> Chris
>>>
>>> On Thu, Nov 10, 2016 at 8:10 AM, Shalom Sagges <sh...@liveperson.com>
>>> wrote:
>>>
>>> Hi Alexander,
>>>
>>> I'm referring to Writes Count generated from JMX:
>>> [image: Inline image 1]
>>>
>>> The higher curve shows the total write count per second for all nodes in
>>> the cluster and the lower curve is the average write count per second per
>>> node.
>>> The drop in the end is the result of shutting down one application node
>>> that performed this kind of query (we still haven't removed the query
>>> itself in this cluster).
>>>
>>>
>>> On a different cluster, where we already removed the "select count(*)"
>>> query completely, we can see that the issue was resolved (also verified
>>> this with running nodetool cfstats a few times and checked the write count
>>> difference):
>>> [image: Inline image 2]
>>>
>>>
>>> Naturally I asked how can a select query affect the write count of a
>>> node but weird as it seems, the issue was resolved once the query was
>>> removed from the code.
>>>
>>> Another side note.. One of our developers that wrote the query in the
>>> code, thought it would be nice to limit the query results to 560,000,000.
>>> Perhaps the ridiculously high limit might have caused this?
>>>
>>> Thanks!
>>>
>>>
>>>
>>> Shalom Sagges
>>> DBA
>>> T: +972-74-700-4035
>>> <http://www.linkedin.com/company/164748> <http://twitter.com/liveperson>
>>> <http://www.facebook.com/LivePersonInc> We Create Meaningful Connections
>>>
>>> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>>>
>>>
>>> On Thu, Nov 10, 2016 at 3:21 PM, Alexander Dejanovski <
>>> alex@thelastpickle.com> wrote:
>>>
>>> Hi Shalom,
>>>
>>> Cassandra writes (mutations) are INSERTs, UPDATEs or DELETEs, it
>>> actually has nothing to do with flushes. A flush is the operation of moving
>>> data from memory (memtable) to disk (SSTable).
>>>
>>> The Cassandra write path and read path are two different things and, as
>>> far as I know, I see no way for a select count(*) to increase your write
>>> count (if you are indeed talking about actual Cassandra writes, and not I/O
>>> operations).
>>>
>>> Cheers,
>>>
>>> On Thu, Nov 10, 2016 at 1:21 PM Shalom Sagges <sh...@liveperson.com>
>>> wrote:
>>>
>>> Yes, I know it's obsolete, but unfortunately this takes time.
>>> We're in the process of upgrading to 2.2.8 and 3.0.9 in our clusters.
>>>
>>> Thanks!
>>>
>>>
>>>
>>> Shalom Sagges
>>> DBA
>>> T: +972-74-700-4035 <+972%2074-700-4035>
>>> <http://www.linkedin.com/company/164748> <http://twitter.com/liveperson>
>>> <http://www.facebook.com/LivePersonInc> We Create Meaningful Connections
>>>
>>> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>>>
>>>
>>> On Thu, Nov 10, 2016 at 1:31 PM, Vladimir Yudovin <vl...@winguzone.com>
>>> wrote:
>>>
>>> As I said I'm not sure about it, but it will be interesting to check
>>> memory heap state with any JMX tool, e.g. https://github.com/patric
>>> -r/jvmtop
>>>
>>> By a way, why Cassandra 2.0.14? It's quit old and unsupported version.
>>> Even in 2.0 branch there is 2.0.17 available.
>>>
>>> Best regards, Vladimir Yudovin,
>>>
>>> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud
>>> CassandraLaunch your cluster in minutes.*
>>>
>>>
>>> ---- On Thu, 10 Nov 2016 05:47:37 -0500*Shalom Sagges
>>> <shaloms@liveperson.com <sh...@liveperson.com>>* wrote ----
>>>
>>> Thanks for the quick reply Vladimir.
>>> Is it really possible that ~12,500 writes per second (per node in a 12
>>> nodes DC) are caused by memory flushes?
>>>
>>>
>>>
>>>
>>>
>>>
>>> Shalom Sagges
>>> DBA
>>> T: +972-74-700-4035
>>> <http://www.linkedin.com/company/164748>
>>> <http://twitter.com/liveperson>
>>> <http://www.facebook.com/LivePersonInc>
>>> We Create Meaningful Connections
>>>
>>> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>>>
>>>
>>>
>>> On Thu, Nov 10, 2016 at 11:02 AM, Vladimir Yudovin <vladyu@winguzone.com
>>> > wrote:
>>>
>>>
>>>
>>> This message may contain confidential and/or privileged information.
>>> If you are not the addressee or authorized to receive this on behalf of
>>> the addressee you must not use, copy, disclose or take action based on this
>>> message or any information herein.
>>> If you have received this message in error, please advise the sender
>>> immediately by reply email and delete this message. Thank you.
>>>
>>>
>>> Hi Shalom,
>>>
>>> so not sure, but probably excessive memory consumption by this SELECT
>>> causes C* to flush tables to free memory.
>>>
>>> Best regards, Vladimir Yudovin,
>>>
>>> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud
>>> CassandraLaunch your cluster in minutes.*
>>>
>>>
>>> ---- On Thu, 10 Nov 2016 03:36:59 -0500*Shalom Sagges
>>> <shaloms@liveperson.com <sh...@liveperson.com>>* wrote ----
>>>
>>> Hi There!
>>>
>>> I'm using C* 2.0.14.
>>> I experienced a scenario where a "select count(*)" that ran every minute
>>> on a table with practically no results limit (yes, this should definitely
>>> be avoided), caused a huge increase in Cassandra writes to around 150
>>> thousand writes per second for that particular table.
>>>
>>> Can anyone explain this behavior? Why would a Select query significantly
>>> increase write count in Cassandra?
>>>
>>> Thanks!
>>>
>>>
>>> Shalom Sagges
>>>
>>> <http://www.linkedin.com/company/164748>
>>> <http://twitter.com/liveperson>
>>> <http://www.facebook.com/LivePersonInc>
>>> We Create Meaningful Connections
>>>
>>> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>>>
>>>
>>>
>>> This message may contain confidential and/or privileged information.
>>> If you are not the addressee or authorized to receive this on behalf of
>>> the addressee you must not use, copy, disclose or take action based on this
>>> message or any information herein.
>>> If you have received this message in error, please advise the sender
>>> immediately by reply email and delete this message. Thank you.
>>>
>>>
>>>
>>>
>>>
>>> This message may contain confidential and/or privileged information.
>>> If you are not the addressee or authorized to receive this on behalf of
>>> the addressee you must not use, copy, disclose or take action based on this
>>> message or any information herein.
>>> If you have received this message in error, please advise the sender
>>> immediately by reply email and delete this message. Thank you.
>>>
>>> --
>>> -----------------
>>> Alexander Dejanovski
>>> France
>>> @alexanderdeja
>>>
>>> Consultant
>>> Apache Cassandra Consulting
>>> http://www.thelastpickle.com
>>>
>>>
>>>
>>> This message may contain confidential and/or privileged information.
>>> If you are not the addressee or authorized to receive this on behalf of
>>> the addressee you must not use, copy, disclose or take action based on this
>>> message or any information herein.
>>> If you have received this message in error, please advise the sender
>>> immediately by reply email and delete this message. Thank you.
>>>
>>>
>>> --
>>> -----------------
>>> Alexander Dejanovski
>>> France
>>> @alexanderdeja
>>>
>>> Consultant
>>> Apache Cassandra Consulting
>>> http://www.thelastpickle.com
>>>
>>>
>>>
>>> This message may contain confidential and/or privileged information.
>>> If you are not the addressee or authorized to receive this on behalf of
>>> the addressee you must not use, copy, disclose or take action based on this
>>> message or any information herein.
>>> If you have received this message in error, please advise the sender
>>> immediately by reply email and delete this message. Thank you.
>>>
>> --
>> -----------------
>> Alexander Dejanovski
>> France
>> @alexanderdeja
>>
>> Consultant
>> Apache Cassandra Consulting
>> http://www.thelastpickle.com
>>
>
>

-- 
This message may contain confidential and/or privileged information. 
If you are not the addressee or authorized to receive this on behalf of the 
addressee you must not use, copy, disclose or take action based on this 
message or any information herein. 
If you have received this message in error, please advise the sender 
immediately by reply email and delete this message. Thank you.

Re: Can a Select Count(*) Affect Writes in Cassandra?

Posted by Shalom Sagges <sh...@liveperson.com>.

Yes, it's occurring on the table that receives the count(*) query.


Shalom Sagges
DBA
T: +972-74-700-4035
<http://www.linkedin.com/company/164748> <http://twitter.com/liveperson>
<http://www.facebook.com/LivePersonInc> We Create Meaningful Connections
<https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>


On Thu, Nov 10, 2016 at 5:05 PM, Alexander Dejanovski <
alex@thelastpickle.com> wrote:

> So the huge write count is occurring on the table that receives the
> count(*) query or another table ?
>
> On Thu, Nov 10, 2016 at 4:02 PM Shalom Sagges <sh...@liveperson.com>
> wrote:
>
>> Tracing is off and so is TracingProbability.
>> Just to elaborate, the huge write count occurs only a single column
>> family which is not one of the system_traces keyspace.
>>
>> I also want to thank you guys for your persistent help regardless if the
>> root cause will be found or not.. You're the best!!
>>
>>
>> Shalom Sagges
>> DBA
>> T: +972-74-700-4035 <+972%2074-700-4035>
>> <http://www.linkedin.com/company/164748> <http://twitter.com/liveperson>
>> <http://www.facebook.com/LivePersonInc> We Create Meaningful Connections
>>
>> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>>
>>
>> On Thu, Nov 10, 2016 at 4:41 PM, Alexander Dejanovski <
>> alex@thelastpickle.com> wrote:
>>
>> Shalom,
>>
>> you may have a high trace probability which could explain what you're
>> observing : https://docs.datastax.com/en/cassandra/2.0/cassandra/tools/
>> toolsSetTraceProbability.html
>>
>> On Thu, Nov 10, 2016 at 3:37 PM Chris Lohfink <cl...@gmail.com>
>> wrote:
>>
>> count(*) actually pages through all the data. So a select count(*) without
>> a limit would be expected to cause a lot of load on the system. The hit is
>> more than just IO load and CPU, it also creates a lot of garbage that can
>> cause pauses slowing down the entire JVM. Some details here:
>> http://www.datastax.com/dev/blog/counting-keys-in-cassandra
>> <http://planetcassandra.org/blog/counting-key-in-cassandra/>
>>
>> You may want to consider maintaining the count yourself, using Spark, or
>> if you just want a ball park number you can grab it from JMX.
>>
>> > Cassandra writes (mutations) are INSERTs, UPDATEs or DELETEs, it
>> actually has nothing to do with flushes. A flush is the operation of moving
>> data from memory (memtable) to disk (SSTable).
>>
>> FWIW in 2.0 thats not completely accurate. Before 2.1 the process of
>> memtable flushing acquired a switchlock on that blocks mutations during the
>> flush (the "pending task" metric is the measure of how many mutations are
>> blocked by this lock).
>>
>> Chris
>>
>> On Thu, Nov 10, 2016 at 8:10 AM, Shalom Sagges <sh...@liveperson.com>
>> wrote:
>>
>> Hi Alexander,
>>
>> I'm referring to Writes Count generated from JMX:
>> [image: Inline image 1]
>>
>> The higher curve shows the total write count per second for all nodes in
>> the cluster and the lower curve is the average write count per second per
>> node.
>> The drop in the end is the result of shutting down one application node
>> that performed this kind of query (we still haven't removed the query
>> itself in this cluster).
>>
>>
>> On a different cluster, where we already removed the "select count(*)"
>> query completely, we can see that the issue was resolved (also verified
>> this with running nodetool cfstats a few times and checked the write count
>> difference):
>> [image: Inline image 2]
>>
>>
>> Naturally I asked how can a select query affect the write count of a node
>> but weird as it seems, the issue was resolved once the query was removed
>> from the code.
>>
>> Another side note.. One of our developers that wrote the query in the
>> code, thought it would be nice to limit the query results to 560,000,000.
>> Perhaps the ridiculously high limit might have caused this?
>>
>> Thanks!
>>
>>
>>
>> Shalom Sagges
>> DBA
>> T: +972-74-700-4035
>> <http://www.linkedin.com/company/164748> <http://twitter.com/liveperson>
>> <http://www.facebook.com/LivePersonInc> We Create Meaningful Connections
>>
>> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>>
>>
>> On Thu, Nov 10, 2016 at 3:21 PM, Alexander Dejanovski <
>> alex@thelastpickle.com> wrote:
>>
>> Hi Shalom,
>>
>> Cassandra writes (mutations) are INSERTs, UPDATEs or DELETEs, it actually
>> has nothing to do with flushes. A flush is the operation of moving data
>> from memory (memtable) to disk (SSTable).
>>
>> The Cassandra write path and read path are two different things and, as
>> far as I know, I see no way for a select count(*) to increase your write
>> count (if you are indeed talking about actual Cassandra writes, and not I/O
>> operations).
>>
>> Cheers,
>>
>> On Thu, Nov 10, 2016 at 1:21 PM Shalom Sagges <sh...@liveperson.com>
>> wrote:
>>
>> Yes, I know it's obsolete, but unfortunately this takes time.
>> We're in the process of upgrading to 2.2.8 and 3.0.9 in our clusters.
>>
>> Thanks!
>>
>>
>>
>> Shalom Sagges
>> DBA
>> T: +972-74-700-4035 <+972%2074-700-4035>
>> <http://www.linkedin.com/company/164748> <http://twitter.com/liveperson>
>> <http://www.facebook.com/LivePersonInc> We Create Meaningful Connections
>>
>> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>>
>>
>> On Thu, Nov 10, 2016 at 1:31 PM, Vladimir Yudovin <vl...@winguzone.com>
>> wrote:
>>
>> As I said I'm not sure about it, but it will be interesting to check
>> memory heap state with any JMX tool, e.g. https://github.com/
>> patric-r/jvmtop
>>
>> By a way, why Cassandra 2.0.14? It's quit old and unsupported version.
>> Even in 2.0 branch there is 2.0.17 available.
>>
>> Best regards, Vladimir Yudovin,
>>
>> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud
>> CassandraLaunch your cluster in minutes.*
>>
>>
>> ---- On Thu, 10 Nov 2016 05:47:37 -0500*Shalom Sagges
>> <shaloms@liveperson.com <sh...@liveperson.com>>* wrote ----
>>
>> Thanks for the quick reply Vladimir.
>> Is it really possible that ~12,500 writes per second (per node in a 12
>> nodes DC) are caused by memory flushes?
>>
>>
>>
>>
>>
>>
>> Shalom Sagges
>> DBA
>> T: +972-74-700-4035
>> <http://www.linkedin.com/company/164748>
>> <http://twitter.com/liveperson>
>> <http://www.facebook.com/LivePersonInc>
>> We Create Meaningful Connections
>>
>> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>>
>>
>>
>> On Thu, Nov 10, 2016 at 11:02 AM, Vladimir Yudovin <vl...@winguzone.com>
>> wrote:
>>
>>
>>
>> This message may contain confidential and/or privileged information.
>> If you are not the addressee or authorized to receive this on behalf of
>> the addressee you must not use, copy, disclose or take action based on this
>> message or any information herein.
>> If you have received this message in error, please advise the sender
>> immediately by reply email and delete this message. Thank you.
>>
>>
>> Hi Shalom,
>>
>> so not sure, but probably excessive memory consumption by this SELECT
>> causes C* to flush tables to free memory.
>>
>> Best regards, Vladimir Yudovin,
>>
>> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud
>> CassandraLaunch your cluster in minutes.*
>>
>>
>> ---- On Thu, 10 Nov 2016 03:36:59 -0500*Shalom Sagges
>> <shaloms@liveperson.com <sh...@liveperson.com>>* wrote ----
>>
>> Hi There!
>>
>> I'm using C* 2.0.14.
>> I experienced a scenario where a "select count(*)" that ran every minute
>> on a table with practically no results limit (yes, this should definitely
>> be avoided), caused a huge increase in Cassandra writes to around 150
>> thousand writes per second for that particular table.
>>
>> Can anyone explain this behavior? Why would a Select query significantly
>> increase write count in Cassandra?
>>
>> Thanks!
>>
>>
>> Shalom Sagges
>>
>> <http://www.linkedin.com/company/164748>
>> <http://twitter.com/liveperson>
>> <http://www.facebook.com/LivePersonInc>
>> We Create Meaningful Connections
>>
>> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>>
>>
>>
>> This message may contain confidential and/or privileged information.
>> If you are not the addressee or authorized to receive this on behalf of
>> the addressee you must not use, copy, disclose or take action based on this
>> message or any information herein.
>> If you have received this message in error, please advise the sender
>> immediately by reply email and delete this message. Thank you.
>>
>>
>>
>>
>>
>> This message may contain confidential and/or privileged information.
>> If you are not the addressee or authorized to receive this on behalf of
>> the addressee you must not use, copy, disclose or take action based on this
>> message or any information herein.
>> If you have received this message in error, please advise the sender
>> immediately by reply email and delete this message. Thank you.
>>
>> --
>> -----------------
>> Alexander Dejanovski
>> France
>> @alexanderdeja
>>
>> Consultant
>> Apache Cassandra Consulting
>> http://www.thelastpickle.com
>>
>>
>>
>> This message may contain confidential and/or privileged information.
>> If you are not the addressee or authorized to receive this on behalf of
>> the addressee you must not use, copy, disclose or take action based on this
>> message or any information herein.
>> If you have received this message in error, please advise the sender
>> immediately by reply email and delete this message. Thank you.
>>
>>
>> --
>> -----------------
>> Alexander Dejanovski
>> France
>> @alexanderdeja
>>
>> Consultant
>> Apache Cassandra Consulting
>> http://www.thelastpickle.com
>>
>>
>>
>> This message may contain confidential and/or privileged information.
>> If you are not the addressee or authorized to receive this on behalf of
>> the addressee you must not use, copy, disclose or take action based on this
>> message or any information herein.
>> If you have received this message in error, please advise the sender
>> immediately by reply email and delete this message. Thank you.
>>
> --
> -----------------
> Alexander Dejanovski
> France
> @alexanderdeja
>
> Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>

-- 
This message may contain confidential and/or privileged information. 
If you are not the addressee or authorized to receive this on behalf of the 
addressee you must not use, copy, disclose or take action based on this 
message or any information herein. 
If you have received this message in error, please advise the sender 
immediately by reply email and delete this message. Thank you.

Re: Can a Select Count(*) Affect Writes in Cassandra?

Posted by Alexander Dejanovski <al...@thelastpickle.com>.

So the huge write count is occurring on the table that receives the
count(*) query or another table ?

On Thu, Nov 10, 2016 at 4:02 PM Shalom Sagges <sh...@liveperson.com>
wrote:

> Tracing is off and so is TracingProbability.
> Just to elaborate, the huge write count occurs only a single column family
> which is not one of the system_traces keyspace.
>
> I also want to thank you guys for your persistent help regardless if the
> root cause will be found or not.. You're the best!!
>
>
> Shalom Sagges
> DBA
> T: +972-74-700-4035 <+972%2074-700-4035>
> <http://www.linkedin.com/company/164748> <http://twitter.com/liveperson>
> <http://www.facebook.com/LivePersonInc> We Create Meaningful Connections
>
> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>
>
> On Thu, Nov 10, 2016 at 4:41 PM, Alexander Dejanovski <
> alex@thelastpickle.com> wrote:
>
> Shalom,
>
> you may have a high trace probability which could explain what you're
> observing :
> https://docs.datastax.com/en/cassandra/2.0/cassandra/tools/toolsSetTraceProbability.html
>
> On Thu, Nov 10, 2016 at 3:37 PM Chris Lohfink <cl...@gmail.com>
> wrote:
>
> count(*) actually pages through all the data. So a select count(*) without
> a limit would be expected to cause a lot of load on the system. The hit is
> more than just IO load and CPU, it also creates a lot of garbage that can
> cause pauses slowing down the entire JVM. Some details here:
> http://www.datastax.com/dev/blog/counting-keys-in-cassandra
> <http://planetcassandra.org/blog/counting-key-in-cassandra/>
>
> You may want to consider maintaining the count yourself, using Spark, or
> if you just want a ball park number you can grab it from JMX.
>
> > Cassandra writes (mutations) are INSERTs, UPDATEs or DELETEs, it
> actually has nothing to do with flushes. A flush is the operation of moving
> data from memory (memtable) to disk (SSTable).
>
> FWIW in 2.0 thats not completely accurate. Before 2.1 the process of
> memtable flushing acquired a switchlock on that blocks mutations during the
> flush (the "pending task" metric is the measure of how many mutations are
> blocked by this lock).
>
> Chris
>
> On Thu, Nov 10, 2016 at 8:10 AM, Shalom Sagges <sh...@liveperson.com>
> wrote:
>
> Hi Alexander,
>
> I'm referring to Writes Count generated from JMX:
> [image: Inline image 1]
>
> The higher curve shows the total write count per second for all nodes in
> the cluster and the lower curve is the average write count per second per
> node.
> The drop in the end is the result of shutting down one application node
> that performed this kind of query (we still haven't removed the query
> itself in this cluster).
>
>
> On a different cluster, where we already removed the "select count(*)"
> query completely, we can see that the issue was resolved (also verified
> this with running nodetool cfstats a few times and checked the write count
> difference):
> [image: Inline image 2]
>
>
> Naturally I asked how can a select query affect the write count of a node
> but weird as it seems, the issue was resolved once the query was removed
> from the code.
>
> Another side note.. One of our developers that wrote the query in the
> code, thought it would be nice to limit the query results to 560,000,000.
> Perhaps the ridiculously high limit might have caused this?
>
> Thanks!
>
>
>
> Shalom Sagges
> DBA
> T: +972-74-700-4035
> <http://www.linkedin.com/company/164748> <http://twitter.com/liveperson>
> <http://www.facebook.com/LivePersonInc> We Create Meaningful Connections
>
> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>
>
> On Thu, Nov 10, 2016 at 3:21 PM, Alexander Dejanovski <
> alex@thelastpickle.com> wrote:
>
> Hi Shalom,
>
> Cassandra writes (mutations) are INSERTs, UPDATEs or DELETEs, it actually
> has nothing to do with flushes. A flush is the operation of moving data
> from memory (memtable) to disk (SSTable).
>
> The Cassandra write path and read path are two different things and, as
> far as I know, I see no way for a select count(*) to increase your write
> count (if you are indeed talking about actual Cassandra writes, and not I/O
> operations).
>
> Cheers,
>
> On Thu, Nov 10, 2016 at 1:21 PM Shalom Sagges <sh...@liveperson.com>
> wrote:
>
> Yes, I know it's obsolete, but unfortunately this takes time.
> We're in the process of upgrading to 2.2.8 and 3.0.9 in our clusters.
>
> Thanks!
>
>
>
> Shalom Sagges
> DBA
> T: +972-74-700-4035 <+972%2074-700-4035>
> <http://www.linkedin.com/company/164748> <http://twitter.com/liveperson>
> <http://www.facebook.com/LivePersonInc> We Create Meaningful Connections
>
> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>
>
> On Thu, Nov 10, 2016 at 1:31 PM, Vladimir Yudovin <vl...@winguzone.com>
> wrote:
>
> As I said I'm not sure about it, but it will be interesting to check
> memory heap state with any JMX tool, e.g.
> https://github.com/patric-r/jvmtop
>
> By a way, why Cassandra 2.0.14? It's quit old and unsupported version.
> Even in 2.0 branch there is 2.0.17 available.
>
> Best regards, Vladimir Yudovin,
>
> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud
> CassandraLaunch your cluster in minutes.*
>
>
> ---- On Thu, 10 Nov 2016 05:47:37 -0500*Shalom Sagges
> <shaloms@liveperson.com <sh...@liveperson.com>>* wrote ----
>
> Thanks for the quick reply Vladimir.
> Is it really possible that ~12,500 writes per second (per node in a 12
> nodes DC) are caused by memory flushes?
>
>
>
>
>
>
> Shalom Sagges
> DBA
> T: +972-74-700-4035
> <http://www.linkedin.com/company/164748>
> <http://twitter.com/liveperson>
> <http://www.facebook.com/LivePersonInc>
> We Create Meaningful Connections
>
> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>
>
>
> On Thu, Nov 10, 2016 at 11:02 AM, Vladimir Yudovin <vl...@winguzone.com>
> wrote:
>
>
>
> This message may contain confidential and/or privileged information.
> If you are not the addressee or authorized to receive this on behalf of
> the addressee you must not use, copy, disclose or take action based on this
> message or any information herein.
> If you have received this message in error, please advise the sender
> immediately by reply email and delete this message. Thank you.
>
>
> Hi Shalom,
>
> so not sure, but probably excessive memory consumption by this SELECT
> causes C* to flush tables to free memory.
>
> Best regards, Vladimir Yudovin,
>
> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud
> CassandraLaunch your cluster in minutes.*
>
>
> ---- On Thu, 10 Nov 2016 03:36:59 -0500*Shalom Sagges
> <shaloms@liveperson.com <sh...@liveperson.com>>* wrote ----
>
> Hi There!
>
> I'm using C* 2.0.14.
> I experienced a scenario where a "select count(*)" that ran every minute
> on a table with practically no results limit (yes, this should definitely
> be avoided), caused a huge increase in Cassandra writes to around 150
> thousand writes per second for that particular table.
>
> Can anyone explain this behavior? Why would a Select query significantly
> increase write count in Cassandra?
>
> Thanks!
>
>
> Shalom Sagges
>
> <http://www.linkedin.com/company/164748>
> <http://twitter.com/liveperson>
> <http://www.facebook.com/LivePersonInc>
> We Create Meaningful Connections
>
> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>
>
>
> This message may contain confidential and/or privileged information.
> If you are not the addressee or authorized to receive this on behalf of
> the addressee you must not use, copy, disclose or take action based on this
> message or any information herein.
> If you have received this message in error, please advise the sender
> immediately by reply email and delete this message. Thank you.
>
>
>
>
>
> This message may contain confidential and/or privileged information.
> If you are not the addressee or authorized to receive this on behalf of
> the addressee you must not use, copy, disclose or take action based on this
> message or any information herein.
> If you have received this message in error, please advise the sender
> immediately by reply email and delete this message. Thank you.
>
> --
> -----------------
> Alexander Dejanovski
> France
> @alexanderdeja
>
> Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>
>
>
> This message may contain confidential and/or privileged information.
> If you are not the addressee or authorized to receive this on behalf of
> the addressee you must not use, copy, disclose or take action based on this
> message or any information herein.
> If you have received this message in error, please advise the sender
> immediately by reply email and delete this message. Thank you.
>
>
> --
> -----------------
> Alexander Dejanovski
> France
> @alexanderdeja
>
> Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>
>
>
> This message may contain confidential and/or privileged information.
> If you are not the addressee or authorized to receive this on behalf of
> the addressee you must not use, copy, disclose or take action based on this
> message or any information herein.
> If you have received this message in error, please advise the sender
> immediately by reply email and delete this message. Thank you.
>
-- 
-----------------
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

Re: Can a Select Count(*) Affect Writes in Cassandra?

Posted by Shalom Sagges <sh...@liveperson.com>.

Tracing is off and so is TracingProbability.
Just to elaborate, the huge write count occurs only a single column family
which is not one of the system_traces keyspace.

I also want to thank you guys for your persistent help regardless if the
root cause will be found or not.. You're the best!!


Shalom Sagges
DBA
T: +972-74-700-4035
<http://www.linkedin.com/company/164748> <http://twitter.com/liveperson>
<http://www.facebook.com/LivePersonInc> We Create Meaningful Connections
<https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>


On Thu, Nov 10, 2016 at 4:41 PM, Alexander Dejanovski <
alex@thelastpickle.com> wrote:

> Shalom,
>
> you may have a high trace probability which could explain what you're
> observing : https://docs.datastax.com/en/cassandra/2.0/cassandra/tools/
> toolsSetTraceProbability.html
>
> On Thu, Nov 10, 2016 at 3:37 PM Chris Lohfink <cl...@gmail.com>
> wrote:
>
>> count(*) actually pages through all the data. So a select count(*) without
>> a limit would be expected to cause a lot of load on the system. The hit is
>> more than just IO load and CPU, it also creates a lot of garbage that can
>> cause pauses slowing down the entire JVM. Some details here:
>> http://www.datastax.com/dev/blog/counting-keys-in-cassandra
>> <http://planetcassandra.org/blog/counting-key-in-cassandra/>
>>
>> You may want to consider maintaining the count yourself, using Spark, or
>> if you just want a ball park number you can grab it from JMX.
>>
>> > Cassandra writes (mutations) are INSERTs, UPDATEs or DELETEs, it
>> actually has nothing to do with flushes. A flush is the operation of moving
>> data from memory (memtable) to disk (SSTable).
>>
>> FWIW in 2.0 thats not completely accurate. Before 2.1 the process of
>> memtable flushing acquired a switchlock on that blocks mutations during the
>> flush (the "pending task" metric is the measure of how many mutations are
>> blocked by this lock).
>>
>> Chris
>>
>> On Thu, Nov 10, 2016 at 8:10 AM, Shalom Sagges <sh...@liveperson.com>
>> wrote:
>>
>> Hi Alexander,
>>
>> I'm referring to Writes Count generated from JMX:
>> [image: Inline image 1]
>>
>> The higher curve shows the total write count per second for all nodes in
>> the cluster and the lower curve is the average write count per second per
>> node.
>> The drop in the end is the result of shutting down one application node
>> that performed this kind of query (we still haven't removed the query
>> itself in this cluster).
>>
>>
>> On a different cluster, where we already removed the "select count(*)"
>> query completely, we can see that the issue was resolved (also verified
>> this with running nodetool cfstats a few times and checked the write count
>> difference):
>> [image: Inline image 2]
>>
>>
>> Naturally I asked how can a select query affect the write count of a node
>> but weird as it seems, the issue was resolved once the query was removed
>> from the code.
>>
>> Another side note.. One of our developers that wrote the query in the
>> code, thought it would be nice to limit the query results to 560,000,000.
>> Perhaps the ridiculously high limit might have caused this?
>>
>> Thanks!
>>
>>
>>
>> Shalom Sagges
>> DBA
>> T: +972-74-700-4035
>> <http://www.linkedin.com/company/164748> <http://twitter.com/liveperson>
>> <http://www.facebook.com/LivePersonInc> We Create Meaningful Connections
>>
>> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>>
>>
>> On Thu, Nov 10, 2016 at 3:21 PM, Alexander Dejanovski <
>> alex@thelastpickle.com> wrote:
>>
>> Hi Shalom,
>>
>> Cassandra writes (mutations) are INSERTs, UPDATEs or DELETEs, it actually
>> has nothing to do with flushes. A flush is the operation of moving data
>> from memory (memtable) to disk (SSTable).
>>
>> The Cassandra write path and read path are two different things and, as
>> far as I know, I see no way for a select count(*) to increase your write
>> count (if you are indeed talking about actual Cassandra writes, and not I/O
>> operations).
>>
>> Cheers,
>>
>> On Thu, Nov 10, 2016 at 1:21 PM Shalom Sagges <sh...@liveperson.com>
>> wrote:
>>
>> Yes, I know it's obsolete, but unfortunately this takes time.
>> We're in the process of upgrading to 2.2.8 and 3.0.9 in our clusters.
>>
>> Thanks!
>>
>>
>>
>> Shalom Sagges
>> DBA
>> T: +972-74-700-4035 <+972%2074-700-4035>
>> <http://www.linkedin.com/company/164748> <http://twitter.com/liveperson>
>> <http://www.facebook.com/LivePersonInc> We Create Meaningful Connections
>>
>> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>>
>>
>> On Thu, Nov 10, 2016 at 1:31 PM, Vladimir Yudovin <vl...@winguzone.com>
>> wrote:
>>
>> As I said I'm not sure about it, but it will be interesting to check
>> memory heap state with any JMX tool, e.g. https://github.com/
>> patric-r/jvmtop
>>
>> By a way, why Cassandra 2.0.14? It's quit old and unsupported version.
>> Even in 2.0 branch there is 2.0.17 available.
>>
>> Best regards, Vladimir Yudovin,
>>
>> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud
>> CassandraLaunch your cluster in minutes.*
>>
>>
>> ---- On Thu, 10 Nov 2016 05:47:37 -0500*Shalom Sagges
>> <shaloms@liveperson.com <sh...@liveperson.com>>* wrote ----
>>
>> Thanks for the quick reply Vladimir.
>> Is it really possible that ~12,500 writes per second (per node in a 12
>> nodes DC) are caused by memory flushes?
>>
>>
>>
>>
>>
>>
>> Shalom Sagges
>> DBA
>> T: +972-74-700-4035
>> <http://www.linkedin.com/company/164748>
>> <http://twitter.com/liveperson>
>> <http://www.facebook.com/LivePersonInc>
>> We Create Meaningful Connections
>>
>> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>>
>>
>>
>> On Thu, Nov 10, 2016 at 11:02 AM, Vladimir Yudovin <vl...@winguzone.com>
>> wrote:
>>
>>
>>
>> This message may contain confidential and/or privileged information.
>> If you are not the addressee or authorized to receive this on behalf of
>> the addressee you must not use, copy, disclose or take action based on this
>> message or any information herein.
>> If you have received this message in error, please advise the sender
>> immediately by reply email and delete this message. Thank you.
>>
>>
>> Hi Shalom,
>>
>> so not sure, but probably excessive memory consumption by this SELECT
>> causes C* to flush tables to free memory.
>>
>> Best regards, Vladimir Yudovin,
>>
>> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud
>> CassandraLaunch your cluster in minutes.*
>>
>>
>> ---- On Thu, 10 Nov 2016 03:36:59 -0500*Shalom Sagges
>> <shaloms@liveperson.com <sh...@liveperson.com>>* wrote ----
>>
>> Hi There!
>>
>> I'm using C* 2.0.14.
>> I experienced a scenario where a "select count(*)" that ran every minute
>> on a table with practically no results limit (yes, this should definitely
>> be avoided), caused a huge increase in Cassandra writes to around 150
>> thousand writes per second for that particular table.
>>
>> Can anyone explain this behavior? Why would a Select query significantly
>> increase write count in Cassandra?
>>
>> Thanks!
>>
>>
>> Shalom Sagges
>>
>> <http://www.linkedin.com/company/164748>
>> <http://twitter.com/liveperson>
>> <http://www.facebook.com/LivePersonInc>
>> We Create Meaningful Connections
>>
>> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>>
>>
>>
>> This message may contain confidential and/or privileged information.
>> If you are not the addressee or authorized to receive this on behalf of
>> the addressee you must not use, copy, disclose or take action based on this
>> message or any information herein.
>> If you have received this message in error, please advise the sender
>> immediately by reply email and delete this message. Thank you.
>>
>>
>>
>>
>>
>> This message may contain confidential and/or privileged information.
>> If you are not the addressee or authorized to receive this on behalf of
>> the addressee you must not use, copy, disclose or take action based on this
>> message or any information herein.
>> If you have received this message in error, please advise the sender
>> immediately by reply email and delete this message. Thank you.
>>
>> --
>> -----------------
>> Alexander Dejanovski
>> France
>> @alexanderdeja
>>
>> Consultant
>> Apache Cassandra Consulting
>> http://www.thelastpickle.com
>>
>>
>>
>> This message may contain confidential and/or privileged information.
>> If you are not the addressee or authorized to receive this on behalf of
>> the addressee you must not use, copy, disclose or take action based on this
>> message or any information herein.
>> If you have received this message in error, please advise the sender
>> immediately by reply email and delete this message. Thank you.
>>
>>
>> --
> -----------------
> Alexander Dejanovski
> France
> @alexanderdeja
>
> Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>

-- 
This message may contain confidential and/or privileged information. 
If you are not the addressee or authorized to receive this on behalf of the 
addressee you must not use, copy, disclose or take action based on this 
message or any information herein. 
If you have received this message in error, please advise the sender 
immediately by reply email and delete this message. Thank you.

Re: Can a Select Count(*) Affect Writes in Cassandra?

Posted by Alexander Dejanovski <al...@thelastpickle.com>.

Shalom,

you may have a high trace probability which could explain what you're
observing :
https://docs.datastax.com/en/cassandra/2.0/cassandra/tools/toolsSetTraceProbability.html

On Thu, Nov 10, 2016 at 3:37 PM Chris Lohfink <cl...@gmail.com> wrote:

> count(*) actually pages through all the data. So a select count(*) without
> a limit would be expected to cause a lot of load on the system. The hit is
> more than just IO load and CPU, it also creates a lot of garbage that can
> cause pauses slowing down the entire JVM. Some details here:
> http://www.datastax.com/dev/blog/counting-keys-in-cassandra
> <http://planetcassandra.org/blog/counting-key-in-cassandra/>
>
> You may want to consider maintaining the count yourself, using Spark, or
> if you just want a ball park number you can grab it from JMX.
>
> > Cassandra writes (mutations) are INSERTs, UPDATEs or DELETEs, it
> actually has nothing to do with flushes. A flush is the operation of moving
> data from memory (memtable) to disk (SSTable).
>
> FWIW in 2.0 thats not completely accurate. Before 2.1 the process of
> memtable flushing acquired a switchlock on that blocks mutations during the
> flush (the "pending task" metric is the measure of how many mutations are
> blocked by this lock).
>
> Chris
>
> On Thu, Nov 10, 2016 at 8:10 AM, Shalom Sagges <sh...@liveperson.com>
> wrote:
>
> Hi Alexander,
>
> I'm referring to Writes Count generated from JMX:
> [image: Inline image 1]
>
> The higher curve shows the total write count per second for all nodes in
> the cluster and the lower curve is the average write count per second per
> node.
> The drop in the end is the result of shutting down one application node
> that performed this kind of query (we still haven't removed the query
> itself in this cluster).
>
>
> On a different cluster, where we already removed the "select count(*)"
> query completely, we can see that the issue was resolved (also verified
> this with running nodetool cfstats a few times and checked the write count
> difference):
> [image: Inline image 2]
>
>
> Naturally I asked how can a select query affect the write count of a node
> but weird as it seems, the issue was resolved once the query was removed
> from the code.
>
> Another side note.. One of our developers that wrote the query in the
> code, thought it would be nice to limit the query results to 560,000,000.
> Perhaps the ridiculously high limit might have caused this?
>
> Thanks!
>
>
>
> Shalom Sagges
> DBA
> T: +972-74-700-4035
> <http://www.linkedin.com/company/164748> <http://twitter.com/liveperson>
> <http://www.facebook.com/LivePersonInc> We Create Meaningful Connections
>
> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>
>
> On Thu, Nov 10, 2016 at 3:21 PM, Alexander Dejanovski <
> alex@thelastpickle.com> wrote:
>
> Hi Shalom,
>
> Cassandra writes (mutations) are INSERTs, UPDATEs or DELETEs, it actually
> has nothing to do with flushes. A flush is the operation of moving data
> from memory (memtable) to disk (SSTable).
>
> The Cassandra write path and read path are two different things and, as
> far as I know, I see no way for a select count(*) to increase your write
> count (if you are indeed talking about actual Cassandra writes, and not I/O
> operations).
>
> Cheers,
>
> On Thu, Nov 10, 2016 at 1:21 PM Shalom Sagges <sh...@liveperson.com>
> wrote:
>
> Yes, I know it's obsolete, but unfortunately this takes time.
> We're in the process of upgrading to 2.2.8 and 3.0.9 in our clusters.
>
> Thanks!
>
>
>
> Shalom Sagges
> DBA
> T: +972-74-700-4035 <+972%2074-700-4035>
> <http://www.linkedin.com/company/164748> <http://twitter.com/liveperson>
> <http://www.facebook.com/LivePersonInc> We Create Meaningful Connections
>
> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>
>
> On Thu, Nov 10, 2016 at 1:31 PM, Vladimir Yudovin <vl...@winguzone.com>
> wrote:
>
> As I said I'm not sure about it, but it will be interesting to check
> memory heap state with any JMX tool, e.g.
> https://github.com/patric-r/jvmtop
>
> By a way, why Cassandra 2.0.14? It's quit old and unsupported version.
> Even in 2.0 branch there is 2.0.17 available.
>
> Best regards, Vladimir Yudovin,
>
> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud
> CassandraLaunch your cluster in minutes.*
>
>
> ---- On Thu, 10 Nov 2016 05:47:37 -0500*Shalom Sagges
> <shaloms@liveperson.com <sh...@liveperson.com>>* wrote ----
>
> Thanks for the quick reply Vladimir.
> Is it really possible that ~12,500 writes per second (per node in a 12
> nodes DC) are caused by memory flushes?
>
>
>
>
>
>
> Shalom Sagges
> DBA
> T: +972-74-700-4035
> <http://www.linkedin.com/company/164748>
> <http://twitter.com/liveperson>
> <http://www.facebook.com/LivePersonInc>
> We Create Meaningful Connections
>
> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>
>
>
> On Thu, Nov 10, 2016 at 11:02 AM, Vladimir Yudovin <vl...@winguzone.com>
> wrote:
>
>
>
> This message may contain confidential and/or privileged information.
> If you are not the addressee or authorized to receive this on behalf of
> the addressee you must not use, copy, disclose or take action based on this
> message or any information herein.
> If you have received this message in error, please advise the sender
> immediately by reply email and delete this message. Thank you.
>
>
> Hi Shalom,
>
> so not sure, but probably excessive memory consumption by this SELECT
> causes C* to flush tables to free memory.
>
> Best regards, Vladimir Yudovin,
>
> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud
> CassandraLaunch your cluster in minutes.*
>
>
> ---- On Thu, 10 Nov 2016 03:36:59 -0500*Shalom Sagges
> <shaloms@liveperson.com <sh...@liveperson.com>>* wrote ----
>
> Hi There!
>
> I'm using C* 2.0.14.
> I experienced a scenario where a "select count(*)" that ran every minute
> on a table with practically no results limit (yes, this should definitely
> be avoided), caused a huge increase in Cassandra writes to around 150
> thousand writes per second for that particular table.
>
> Can anyone explain this behavior? Why would a Select query significantly
> increase write count in Cassandra?
>
> Thanks!
>
>
> Shalom Sagges
>
> <http://www.linkedin.com/company/164748>
> <http://twitter.com/liveperson>
> <http://www.facebook.com/LivePersonInc>
> We Create Meaningful Connections
>
> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>
>
>
> This message may contain confidential and/or privileged information.
> If you are not the addressee or authorized to receive this on behalf of
> the addressee you must not use, copy, disclose or take action based on this
> message or any information herein.
> If you have received this message in error, please advise the sender
> immediately by reply email and delete this message. Thank you.
>
>
>
>
>
> This message may contain confidential and/or privileged information.
> If you are not the addressee or authorized to receive this on behalf of
> the addressee you must not use, copy, disclose or take action based on this
> message or any information herein.
> If you have received this message in error, please advise the sender
> immediately by reply email and delete this message. Thank you.
>
> --
> -----------------
> Alexander Dejanovski
> France
> @alexanderdeja
>
> Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>
>
>
> This message may contain confidential and/or privileged information.
> If you are not the addressee or authorized to receive this on behalf of
> the addressee you must not use, copy, disclose or take action based on this
> message or any information herein.
> If you have received this message in error, please advise the sender
> immediately by reply email and delete this message. Thank you.
>
>
> --
-----------------
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

Re: Can a Select Count(*) Affect Writes in Cassandra?

Posted by Chris Lohfink <cl...@gmail.com>.

count(*) actually pages through all the data. So a select count(*) without
a limit would be expected to cause a lot of load on the system. The hit is
more than just IO load and CPU, it also creates a lot of garbage that can
cause pauses slowing down the entire JVM. Some details here:
http://www.datastax.com/dev/blog/counting-keys-in-cassandra
<http://planetcassandra.org/blog/counting-key-in-cassandra/>

You may want to consider maintaining the count yourself, using Spark, or if
you just want a ball park number you can grab it from JMX.

> Cassandra writes (mutations) are INSERTs, UPDATEs or DELETEs, it actually
has nothing to do with flushes. A flush is the operation of moving data
from memory (memtable) to disk (SSTable).

FWIW in 2.0 thats not completely accurate. Before 2.1 the process of
memtable flushing acquired a switchlock on that blocks mutations during the
flush (the "pending task" metric is the measure of how many mutations are
blocked by this lock).

Chris

On Thu, Nov 10, 2016 at 8:10 AM, Shalom Sagges <sh...@liveperson.com>
wrote:

> Hi Alexander,
>
> I'm referring to Writes Count generated from JMX:
> [image: Inline image 1]
>
> The higher curve shows the total write count per second for all nodes in
> the cluster and the lower curve is the average write count per second per
> node.
> The drop in the end is the result of shutting down one application node
> that performed this kind of query (we still haven't removed the query
> itself in this cluster).
>
>
> On a different cluster, where we already removed the "select count(*)"
> query completely, we can see that the issue was resolved (also verified
> this with running nodetool cfstats a few times and checked the write count
> difference):
> [image: Inline image 2]
>
>
> Naturally I asked how can a select query affect the write count of a node
> but weird as it seems, the issue was resolved once the query was removed
> from the code.
>
> Another side note.. One of our developers that wrote the query in the
> code, thought it would be nice to limit the query results to 560,000,000.
> Perhaps the ridiculously high limit might have caused this?
>
> Thanks!
>
>
>
> Shalom Sagges
> DBA
> T: +972-74-700-4035
> <http://www.linkedin.com/company/164748> <http://twitter.com/liveperson>
> <http://www.facebook.com/LivePersonInc> We Create Meaningful Connections
>
> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>
>
> On Thu, Nov 10, 2016 at 3:21 PM, Alexander Dejanovski <
> alex@thelastpickle.com> wrote:
>
>> Hi Shalom,
>>
>> Cassandra writes (mutations) are INSERTs, UPDATEs or DELETEs, it actually
>> has nothing to do with flushes. A flush is the operation of moving data
>> from memory (memtable) to disk (SSTable).
>>
>> The Cassandra write path and read path are two different things and, as
>> far as I know, I see no way for a select count(*) to increase your write
>> count (if you are indeed talking about actual Cassandra writes, and not I/O
>> operations).
>>
>> Cheers,
>>
>> On Thu, Nov 10, 2016 at 1:21 PM Shalom Sagges <sh...@liveperson.com>
>> wrote:
>>
>>> Yes, I know it's obsolete, but unfortunately this takes time.
>>> We're in the process of upgrading to 2.2.8 and 3.0.9 in our clusters.
>>>
>>> Thanks!
>>>
>>>
>>>
>>> Shalom Sagges
>>> DBA
>>> T: +972-74-700-4035 <+972%2074-700-4035>
>>> <http://www.linkedin.com/company/164748> <http://twitter.com/liveperson>
>>> <http://www.facebook.com/LivePersonInc> We Create Meaningful Connections
>>>
>>> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>>>
>>>
>>> On Thu, Nov 10, 2016 at 1:31 PM, Vladimir Yudovin <vl...@winguzone.com>
>>> wrote:
>>>
>>> As I said I'm not sure about it, but it will be interesting to check
>>> memory heap state with any JMX tool, e.g. https://github.com/patric
>>> -r/jvmtop
>>>
>>> By a way, why Cassandra 2.0.14? It's quit old and unsupported version.
>>> Even in 2.0 branch there is 2.0.17 available.
>>>
>>> Best regards, Vladimir Yudovin,
>>>
>>> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud
>>> CassandraLaunch your cluster in minutes.*
>>>
>>>
>>> ---- On Thu, 10 Nov 2016 05:47:37 -0500*Shalom Sagges
>>> <shaloms@liveperson.com <sh...@liveperson.com>>* wrote ----
>>>
>>> Thanks for the quick reply Vladimir.
>>> Is it really possible that ~12,500 writes per second (per node in a 12
>>> nodes DC) are caused by memory flushes?
>>>
>>>
>>>
>>>
>>>
>>>
>>> Shalom Sagges
>>> DBA
>>> T: +972-74-700-4035
>>> <http://www.linkedin.com/company/164748>
>>> <http://twitter.com/liveperson>
>>> <http://www.facebook.com/LivePersonInc>
>>> We Create Meaningful Connections
>>>
>>> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>>>
>>>
>>>
>>> On Thu, Nov 10, 2016 at 11:02 AM, Vladimir Yudovin <vladyu@winguzone.com
>>> > wrote:
>>>
>>>
>>>
>>> This message may contain confidential and/or privileged information.
>>> If you are not the addressee or authorized to receive this on behalf of
>>> the addressee you must not use, copy, disclose or take action based on this
>>> message or any information herein.
>>> If you have received this message in error, please advise the sender
>>> immediately by reply email and delete this message. Thank you.
>>>
>>>
>>> Hi Shalom,
>>>
>>> so not sure, but probably excessive memory consumption by this SELECT
>>> causes C* to flush tables to free memory.
>>>
>>> Best regards, Vladimir Yudovin,
>>>
>>> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud
>>> CassandraLaunch your cluster in minutes.*
>>>
>>>
>>> ---- On Thu, 10 Nov 2016 03:36:59 -0500*Shalom Sagges
>>> <shaloms@liveperson.com <sh...@liveperson.com>>* wrote ----
>>>
>>> Hi There!
>>>
>>> I'm using C* 2.0.14.
>>> I experienced a scenario where a "select count(*)" that ran every minute
>>> on a table with practically no results limit (yes, this should definitely
>>> be avoided), caused a huge increase in Cassandra writes to around 150
>>> thousand writes per second for that particular table.
>>>
>>> Can anyone explain this behavior? Why would a Select query significantly
>>> increase write count in Cassandra?
>>>
>>> Thanks!
>>>
>>>
>>> Shalom Sagges
>>>
>>> <http://www.linkedin.com/company/164748>
>>> <http://twitter.com/liveperson>
>>> <http://www.facebook.com/LivePersonInc>
>>> We Create Meaningful Connections
>>>
>>> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>>>
>>>
>>>
>>> This message may contain confidential and/or privileged information.
>>> If you are not the addressee or authorized to receive this on behalf of
>>> the addressee you must not use, copy, disclose or take action based on this
>>> message or any information herein.
>>> If you have received this message in error, please advise the sender
>>> immediately by reply email and delete this message. Thank you.
>>>
>>>
>>>
>>>
>>>
>>> This message may contain confidential and/or privileged information.
>>> If you are not the addressee or authorized to receive this on behalf of
>>> the addressee you must not use, copy, disclose or take action based on this
>>> message or any information herein.
>>> If you have received this message in error, please advise the sender
>>> immediately by reply email and delete this message. Thank you.
>>>
>> --
>> -----------------
>> Alexander Dejanovski
>> France
>> @alexanderdeja
>>
>> Consultant
>> Apache Cassandra Consulting
>> http://www.thelastpickle.com
>>
>
>
> This message may contain confidential and/or privileged information.
> If you are not the addressee or authorized to receive this on behalf of
> the addressee you must not use, copy, disclose or take action based on this
> message or any information herein.
> If you have received this message in error, please advise the sender
> immediately by reply email and delete this message. Thank you.
>

Re: Can a Select Count(*) Affect Writes in Cassandra?

Posted by Shalom Sagges <sh...@liveperson.com>.

Hi Alexander,

I'm referring to Writes Count generated from JMX:
[image: Inline image 1]

The higher curve shows the total write count per second for all nodes in
the cluster and the lower curve is the average write count per second per
node.
The drop in the end is the result of shutting down one application node
that performed this kind of query (we still haven't removed the query
itself in this cluster).


On a different cluster, where we already removed the "select count(*)"
query completely, we can see that the issue was resolved (also verified
this with running nodetool cfstats a few times and checked the write count
difference):
[image: Inline image 2]


Naturally I asked how can a select query affect the write count of a node
but weird as it seems, the issue was resolved once the query was removed
from the code.

Another side note.. One of our developers that wrote the query in the code,
thought it would be nice to limit the query results to 560,000,000. Perhaps
the ridiculously high limit might have caused this?

Thanks!



Shalom Sagges
DBA
T: +972-74-700-4035
<http://www.linkedin.com/company/164748> <http://twitter.com/liveperson>
<http://www.facebook.com/LivePersonInc> We Create Meaningful Connections
<https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>


On Thu, Nov 10, 2016 at 3:21 PM, Alexander Dejanovski <
alex@thelastpickle.com> wrote:

> Hi Shalom,
>
> Cassandra writes (mutations) are INSERTs, UPDATEs or DELETEs, it actually
> has nothing to do with flushes. A flush is the operation of moving data
> from memory (memtable) to disk (SSTable).
>
> The Cassandra write path and read path are two different things and, as
> far as I know, I see no way for a select count(*) to increase your write
> count (if you are indeed talking about actual Cassandra writes, and not I/O
> operations).
>
> Cheers,
>
> On Thu, Nov 10, 2016 at 1:21 PM Shalom Sagges <sh...@liveperson.com>
> wrote:
>
>> Yes, I know it's obsolete, but unfortunately this takes time.
>> We're in the process of upgrading to 2.2.8 and 3.0.9 in our clusters.
>>
>> Thanks!
>>
>>
>>
>> Shalom Sagges
>> DBA
>> T: +972-74-700-4035 <+972%2074-700-4035>
>> <http://www.linkedin.com/company/164748> <http://twitter.com/liveperson>
>> <http://www.facebook.com/LivePersonInc> We Create Meaningful Connections
>>
>> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>>
>>
>> On Thu, Nov 10, 2016 at 1:31 PM, Vladimir Yudovin <vl...@winguzone.com>
>> wrote:
>>
>> As I said I'm not sure about it, but it will be interesting to check
>> memory heap state with any JMX tool, e.g. https://github.com/
>> patric-r/jvmtop
>>
>> By a way, why Cassandra 2.0.14? It's quit old and unsupported version.
>> Even in 2.0 branch there is 2.0.17 available.
>>
>> Best regards, Vladimir Yudovin,
>>
>> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud
>> CassandraLaunch your cluster in minutes.*
>>
>>
>> ---- On Thu, 10 Nov 2016 05:47:37 -0500*Shalom Sagges
>> <shaloms@liveperson.com <sh...@liveperson.com>>* wrote ----
>>
>> Thanks for the quick reply Vladimir.
>> Is it really possible that ~12,500 writes per second (per node in a 12
>> nodes DC) are caused by memory flushes?
>>
>>
>>
>>
>>
>>
>> Shalom Sagges
>> DBA
>> T: +972-74-700-4035
>> <http://www.linkedin.com/company/164748>
>> <http://twitter.com/liveperson>
>> <http://www.facebook.com/LivePersonInc>
>> We Create Meaningful Connections
>>
>> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>>
>>
>>
>> On Thu, Nov 10, 2016 at 11:02 AM, Vladimir Yudovin <vl...@winguzone.com>
>> wrote:
>>
>>
>>
>> This message may contain confidential and/or privileged information.
>> If you are not the addressee or authorized to receive this on behalf of
>> the addressee you must not use, copy, disclose or take action based on this
>> message or any information herein.
>> If you have received this message in error, please advise the sender
>> immediately by reply email and delete this message. Thank you.
>>
>>
>> Hi Shalom,
>>
>> so not sure, but probably excessive memory consumption by this SELECT
>> causes C* to flush tables to free memory.
>>
>> Best regards, Vladimir Yudovin,
>>
>> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud
>> CassandraLaunch your cluster in minutes.*
>>
>>
>> ---- On Thu, 10 Nov 2016 03:36:59 -0500*Shalom Sagges
>> <shaloms@liveperson.com <sh...@liveperson.com>>* wrote ----
>>
>> Hi There!
>>
>> I'm using C* 2.0.14.
>> I experienced a scenario where a "select count(*)" that ran every minute
>> on a table with practically no results limit (yes, this should definitely
>> be avoided), caused a huge increase in Cassandra writes to around 150
>> thousand writes per second for that particular table.
>>
>> Can anyone explain this behavior? Why would a Select query significantly
>> increase write count in Cassandra?
>>
>> Thanks!
>>
>>
>> Shalom Sagges
>>
>> <http://www.linkedin.com/company/164748>
>> <http://twitter.com/liveperson>
>> <http://www.facebook.com/LivePersonInc>
>> We Create Meaningful Connections
>>
>> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>>
>>
>>
>> This message may contain confidential and/or privileged information.
>> If you are not the addressee or authorized to receive this on behalf of
>> the addressee you must not use, copy, disclose or take action based on this
>> message or any information herein.
>> If you have received this message in error, please advise the sender
>> immediately by reply email and delete this message. Thank you.
>>
>>
>>
>>
>>
>> This message may contain confidential and/or privileged information.
>> If you are not the addressee or authorized to receive this on behalf of
>> the addressee you must not use, copy, disclose or take action based on this
>> message or any information herein.
>> If you have received this message in error, please advise the sender
>> immediately by reply email and delete this message. Thank you.
>>
> --
> -----------------
> Alexander Dejanovski
> France
> @alexanderdeja
>
> Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>

-- 
This message may contain confidential and/or privileged information. 
If you are not the addressee or authorized to receive this on behalf of the 
addressee you must not use, copy, disclose or take action based on this 
message or any information herein. 
If you have received this message in error, please advise the sender 
immediately by reply email and delete this message. Thank you.

Re: Can a Select Count(*) Affect Writes in Cassandra?

Posted by Alexander Dejanovski <al...@thelastpickle.com>.

Hi Shalom,

Cassandra writes (mutations) are INSERTs, UPDATEs or DELETEs, it actually
has nothing to do with flushes. A flush is the operation of moving data
from memory (memtable) to disk (SSTable).

The Cassandra write path and read path are two different things and, as far
as I know, I see no way for a select count(*) to increase your write count
(if you are indeed talking about actual Cassandra writes, and not I/O
operations).

Cheers,

On Thu, Nov 10, 2016 at 1:21 PM Shalom Sagges <sh...@liveperson.com>
wrote:

> Yes, I know it's obsolete, but unfortunately this takes time.
> We're in the process of upgrading to 2.2.8 and 3.0.9 in our clusters.
>
> Thanks!
>
>
>
> Shalom Sagges
> DBA
> T: +972-74-700-4035 <+972%2074-700-4035>
> <http://www.linkedin.com/company/164748> <http://twitter.com/liveperson>
> <http://www.facebook.com/LivePersonInc> We Create Meaningful Connections
>
> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>
>
> On Thu, Nov 10, 2016 at 1:31 PM, Vladimir Yudovin <vl...@winguzone.com>
> wrote:
>
> As I said I'm not sure about it, but it will be interesting to check
> memory heap state with any JMX tool, e.g.
> https://github.com/patric-r/jvmtop
>
> By a way, why Cassandra 2.0.14? It's quit old and unsupported version.
> Even in 2.0 branch there is 2.0.17 available.
>
> Best regards, Vladimir Yudovin,
>
> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud
> CassandraLaunch your cluster in minutes.*
>
>
> ---- On Thu, 10 Nov 2016 05:47:37 -0500*Shalom Sagges
> <shaloms@liveperson.com <sh...@liveperson.com>>* wrote ----
>
> Thanks for the quick reply Vladimir.
> Is it really possible that ~12,500 writes per second (per node in a 12
> nodes DC) are caused by memory flushes?
>
>
>
>
>
>
> Shalom Sagges
> DBA
> T: +972-74-700-4035
> <http://www.linkedin.com/company/164748>
> <http://twitter.com/liveperson>
> <http://www.facebook.com/LivePersonInc>
> We Create Meaningful Connections
>
> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>
>
>
> On Thu, Nov 10, 2016 at 11:02 AM, Vladimir Yudovin <vl...@winguzone.com>
> wrote:
>
>
>
> This message may contain confidential and/or privileged information.
> If you are not the addressee or authorized to receive this on behalf of
> the addressee you must not use, copy, disclose or take action based on this
> message or any information herein.
> If you have received this message in error, please advise the sender
> immediately by reply email and delete this message. Thank you.
>
>
> Hi Shalom,
>
> so not sure, but probably excessive memory consumption by this SELECT
> causes C* to flush tables to free memory.
>
> Best regards, Vladimir Yudovin,
>
> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud
> CassandraLaunch your cluster in minutes.*
>
>
> ---- On Thu, 10 Nov 2016 03:36:59 -0500*Shalom Sagges
> <shaloms@liveperson.com <sh...@liveperson.com>>* wrote ----
>
> Hi There!
>
> I'm using C* 2.0.14.
> I experienced a scenario where a "select count(*)" that ran every minute
> on a table with practically no results limit (yes, this should definitely
> be avoided), caused a huge increase in Cassandra writes to around 150
> thousand writes per second for that particular table.
>
> Can anyone explain this behavior? Why would a Select query significantly
> increase write count in Cassandra?
>
> Thanks!
>
>
> Shalom Sagges
>
> <http://www.linkedin.com/company/164748>
> <http://twitter.com/liveperson>
> <http://www.facebook.com/LivePersonInc>
> We Create Meaningful Connections
>
> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>
>
>
> This message may contain confidential and/or privileged information.
> If you are not the addressee or authorized to receive this on behalf of
> the addressee you must not use, copy, disclose or take action based on this
> message or any information herein.
> If you have received this message in error, please advise the sender
> immediately by reply email and delete this message. Thank you.
>
>
>
>
>
> This message may contain confidential and/or privileged information.
> If you are not the addressee or authorized to receive this on behalf of
> the addressee you must not use, copy, disclose or take action based on this
> message or any information herein.
> If you have received this message in error, please advise the sender
> immediately by reply email and delete this message. Thank you.
>
-- 
-----------------
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

Re: Can a Select Count(*) Affect Writes in Cassandra?

Posted by Shalom Sagges <sh...@liveperson.com>.

Yes, I know it's obsolete, but unfortunately this takes time.
We're in the process of upgrading to 2.2.8 and 3.0.9 in our clusters.

Thanks!



Shalom Sagges
DBA
T: +972-74-700-4035
<http://www.linkedin.com/company/164748> <http://twitter.com/liveperson>
<http://www.facebook.com/LivePersonInc> We Create Meaningful Connections
<https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>


On Thu, Nov 10, 2016 at 1:31 PM, Vladimir Yudovin <vl...@winguzone.com>
wrote:

> As I said I'm not sure about it, but it will be interesting to check
> memory heap state with any JMX tool, e.g. https://github.com/
> patric-r/jvmtop
>
> By a way, why Cassandra 2.0.14? It's quit old and unsupported version.
> Even in 2.0 branch there is 2.0.17 available.
>
> Best regards, Vladimir Yudovin,
>
> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud
> CassandraLaunch your cluster in minutes.*
>
>
> ---- On Thu, 10 Nov 2016 05:47:37 -0500*Shalom Sagges
> <shaloms@liveperson.com <sh...@liveperson.com>>* wrote ----
>
> Thanks for the quick reply Vladimir.
> Is it really possible that ~12,500 writes per second (per node in a 12
> nodes DC) are caused by memory flushes?
>
>
>
>
>
>
> Shalom Sagges
> DBA
> T: +972-74-700-4035
> <http://www.linkedin.com/company/164748>
> <http://twitter.com/liveperson>
> <http://www.facebook.com/LivePersonInc>
> We Create Meaningful Connections
>
> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>
>
>
> On Thu, Nov 10, 2016 at 11:02 AM, Vladimir Yudovin <vl...@winguzone.com>
> wrote:
>
>
>
> This message may contain confidential and/or privileged information.
> If you are not the addressee or authorized to receive this on behalf of
> the addressee you must not use, copy, disclose or take action based on this
> message or any information herein.
> If you have received this message in error, please advise the sender
> immediately by reply email and delete this message. Thank you.
>
>
> Hi Shalom,
>
> so not sure, but probably excessive memory consumption by this SELECT
> causes C* to flush tables to free memory.
>
> Best regards, Vladimir Yudovin,
>
> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud
> CassandraLaunch your cluster in minutes.*
>
>
> ---- On Thu, 10 Nov 2016 03:36:59 -0500*Shalom Sagges
> <shaloms@liveperson.com <sh...@liveperson.com>>* wrote ----
>
> Hi There!
>
> I'm using C* 2.0.14.
> I experienced a scenario where a "select count(*)" that ran every minute
> on a table with practically no results limit (yes, this should definitely
> be avoided), caused a huge increase in Cassandra writes to around 150
> thousand writes per second for that particular table.
>
> Can anyone explain this behavior? Why would a Select query significantly
> increase write count in Cassandra?
>
> Thanks!
>
>
> Shalom Sagges
>
> <http://www.linkedin.com/company/164748>
> <http://twitter.com/liveperson>
> <http://www.facebook.com/LivePersonInc>
> We Create Meaningful Connections
>
> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>
>
>
> This message may contain confidential and/or privileged information.
> If you are not the addressee or authorized to receive this on behalf of
> the addressee you must not use, copy, disclose or take action based on this
> message or any information herein.
> If you have received this message in error, please advise the sender
> immediately by reply email and delete this message. Thank you.
>
>
>
>

-- 
This message may contain confidential and/or privileged information. 
If you are not the addressee or authorized to receive this on behalf of the 
addressee you must not use, copy, disclose or take action based on this 
message or any information herein. 
If you have received this message in error, please advise the sender 
immediately by reply email and delete this message. Thank you.

Re: Can a Select Count(*) Affect Writes in Cassandra?

Posted by Vladimir Yudovin <vl...@winguzone.com>.

As I said I'm not sure about it, but it will be interesting to check memory heap state with any JMX tool, e.g. https://github.com/patric-r/jvmtop

By a way, why Cassandra 2.0.14? It's quit old and unsupported version. Even in 2.0 branch there is 2.0.17 available.

Best regards, Vladimir Yudovin,

Winguzone - Hosted Cloud Cassandra
Launch your cluster in minutes.

---- On Thu, 10 Nov 2016 05:47:37 -0500Shalom Sagges &lt;shaloms@liveperson.com&gt; wrote ----

Thanks for the quick reply Vladimir.

Is it really possible that ~12,500 writes per second (per node in a 12 nodes DC) are caused by memory flushes?

Shalom Sagges

DBA

T: +972-74-700-4035

We Create Meaningful Connections

On Thu, Nov 10, 2016 at 11:02 AM, Vladimir Yudovin &lt;vladyu@winguzone.com&gt; wrote:

This message may contain confidential and/or privileged information.

If you are not the addressee or authorized to receive this on behalf of the addressee you must not use, copy, disclose or take action based on this message or any information herein.

If you have received this message in error, please advise the sender immediately by reply email and delete this message. Thank you.

Hi Shalom,

so not sure, but probably excessive memory consumption by this SELECT causes C* to flush tables to free memory.

Best regards, Vladimir Yudovin,

Winguzone - Hosted Cloud Cassandra
Launch your cluster in minutes.

---- On Thu, 10 Nov 2016 03:36:59 -0500Shalom Sagges &lt;shaloms@liveperson.com&gt; wrote ----

Hi There!

I'm using C* 2.0.14.

I experienced a scenario where a "select count(*)" that ran every minute on a table with practically no results limit (yes, this should definitely be avoided), caused a huge increase in Cassandra writes to around 150 thousand writes per second for that particular table.

Can anyone explain this behavior? Why would a Select query significantly increase write count in Cassandra?

Thanks!

Shalom Sagges

We Create Meaningful Connections

This message may contain confidential and/or privileged information.

If you are not the addressee or authorized to receive this on behalf of the addressee you must not use, copy, disclose or take action based on this message or any information herein.

If you have received this message in error, please advise the sender immediately by reply email and delete this message. Thank you.

Re: Can a Select Count(*) Affect Writes in Cassandra?

Posted by Shalom Sagges <sh...@liveperson.com>.

Thanks for the quick reply Vladimir.
Is it really possible that ~12,500 writes per second (per node in a 12
nodes DC) are caused by memory flushes?





Shalom Sagges
DBA
T: +972-74-700-4035
<http://www.linkedin.com/company/164748> <http://twitter.com/liveperson>
<http://www.facebook.com/LivePersonInc> We Create Meaningful Connections
<https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>


On Thu, Nov 10, 2016 at 11:02 AM, Vladimir Yudovin <vl...@winguzone.com>
wrote:

> Hi Shalom,
>
> so not sure, but probably excessive memory consumption by this SELECT
> causes C* to flush tables to free memory.
>
> Best regards, Vladimir Yudovin,
>
> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud
> CassandraLaunch your cluster in minutes.*
>
>
> ---- On Thu, 10 Nov 2016 03:36:59 -0500*Shalom Sagges
> <shaloms@liveperson.com <sh...@liveperson.com>>* wrote ----
>
> Hi There!
>
> I'm using C* 2.0.14.
> I experienced a scenario where a "select count(*)" that ran every minute
> on a table with practically no results limit (yes, this should definitely
> be avoided), caused a huge increase in Cassandra writes to around 150
> thousand writes per second for that particular table.
>
> Can anyone explain this behavior? Why would a Select query significantly
> increase write count in Cassandra?
>
> Thanks!
>
>
> Shalom Sagges
>
> <http://www.linkedin.com/company/164748>
> <http://twitter.com/liveperson>
> <http://www.facebook.com/LivePersonInc>
> We Create Meaningful Connections
>
> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>
>
>
> This message may contain confidential and/or privileged information.
> If you are not the addressee or authorized to receive this on behalf of
> the addressee you must not use, copy, disclose or take action based on this
> message or any information herein.
> If you have received this message in error, please advise the sender
> immediately by reply email and delete this message. Thank you.
>
>
>

-- 
This message may contain confidential and/or privileged information. 
If you are not the addressee or authorized to receive this on behalf of the 
addressee you must not use, copy, disclose or take action based on this 
message or any information herein. 
If you have received this message in error, please advise the sender 
immediately by reply email and delete this message. Thank you.

Re: Can a Select Count(*) Affect Writes in Cassandra?

Posted by Vladimir Yudovin <vl...@winguzone.com>.

Hi Shalom,



so not sure, but probably excessive memory consumption by this SELECT causes C* to flush tables to free memory. 


Best regards, Vladimir Yudovin, 

Winguzone - Hosted Cloud Cassandra
Launch your cluster in minutes.





---- On Thu, 10 Nov 2016 03:36:59 -0500Shalom Sagges &lt;shaloms@liveperson.com&gt; wrote ----




Hi There!



I'm using C* 2.0.14. 

I experienced a scenario where a "select count(*)" that ran every minute on a table with practically no results limit (yes, this should definitely be avoided), caused a huge increase in Cassandra writes to around 150 thousand writes per second for that particular table.



Can anyone explain this behavior? Why would a Select query significantly increase write count in Cassandra?



Thanks!

 


 
Shalom Sagges
 

 

 
 
 
 We Create Meaningful Connections
 
 

 

 









This message may contain confidential and/or privileged information. 

If you are not the addressee or authorized to receive this on behalf of the addressee you must not use, copy, disclose or take action based on this message or any information herein. 

If you have received this message in error, please advise the sender immediately by reply email and delete this message. Thank you.