You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Vasileios Vlachos <va...@gmail.com> on 2015/12/11 17:28:25 UTC

Thousands of pending compactions using STCS

Hello,

We use Nagios and MX4J for the majority of the monitoring we do for
Cassandra (version: 2.0.16). For compactions we hit the following URL:

http://
${cassandra_host}:8081/mbean?objectname=org.apache.cassandra.db%3Atype%3DCompactionManager

and check the PendingTasks counter's value.

We have noticed that occasionally one or more nodes will report back that
they have thousands of pending compactions. We have 11 KS in the cluster
and a total of 109 *Data.db files under /var/lib/cassandra/data which gives
approximately 10 SSTables per KS. That makes us think that having thousands
of pending compactions seems unrealistic given the number of SSTables we
seem to have at any given time in each KS/CF directory. The logs show a lot
of flush and compaction activity but we don't think that's unusual. Also,
each CF is configured to have min_compaction_threshold = 2 and
max_compaction_threshold
= 32. The two screenshots below show a cluster-wide view of pending
compactions. Attached you can find the XML files which contain the data
from the MX4J console.

[image: Inline image 2]

And this is from the same graph, but I've selected the time period after
14:00 in order to show what the real compaction activity looks like when
not skewed by the incredibly high number of pending compactions as shown
above:
[image: Inline image 3]

Has anyone else experienced something similar? Is there something else we
can do to see if this is something wrong with our cluster?

Thanks in advance for any help!

Vasilis

Re: Thousands of pending compactions using STCS

Posted by Vasileios Vlachos <va...@gmail.com>.

Anuj, Jeff, thank you both,

Although harmless, sounds like it's time for an upgrade. The ticket
suggests that 2.0.17 is not affected.

Thank you guys!

On Fri, Dec 11, 2015 at 5:25 PM, Jeff Jirsa <je...@crowdstrike.com>
wrote:

> Same bug also affects 2.0.16 -
> https://issues.apache.org/jira/browse/CASSANDRA-9662
>
> From: Jeff Jirsa
> Reply-To: <us...@cassandra.apache.org>
> Date: Friday, December 11, 2015 at 9:12 AM
> To: "user@cassandra.apache.org"
> Subject: Re: Thousands of pending compactions using STCS
>
> There were a few buggy versions in 2.1 (2.1.7, 2.1.8, I believe) that
> showed this behavior. The number of pending compactions was artificially
> high, and not meaningful. As long as they number of –Data.db sstables
> remains normal, compaction is keeping up and you’re fine.
>
> - Jeff
>
> From: Vasileios Vlachos
> Reply-To: "user@cassandra.apache.org"
> Date: Friday, December 11, 2015 at 8:28 AM
> To: "user@cassandra.apache.org"
> Subject: Thousands of pending compactions using STCS
>
> Hello,
>
> We use Nagios and MX4J for the majority of the monitoring we do for
> Cassandra (version: 2.0.16). For compactions we hit the following URL:
>
>
> http://${cassandra_host}:8081/mbean?objectname=org.apache.cassandra.db%3Atype%3DCompactionManager
>
> and check the PendingTasks counter's value.
>
> We have noticed that occasionally one or more nodes will report back that
> they have thousands of pending compactions. We have 11 KS in the cluster
> and a total of 109 *Data.db files under /var/lib/cassandra/data which
> gives approximately 10 SSTables per KS. That makes us think that having
> thousands of pending compactions seems unrealistic given the number of
> SSTables we seem to have at any given time in each KS/CF directory. The
> logs show a lot of flush and compaction activity but we don't think that's
> unusual. Also, each CF is configured to have min_compaction_threshold = 2
> and max_compaction_threshold = 32. The two screenshots below show a
> cluster-wide view of pending compactions. Attached you can find the XML
> files which contain the data from the MX4J console.
>
> [image: Inline image 2]
>
> And this is from the same graph, but I've selected the time period after
> 14:00 in order to show what the real compaction activity looks like when
> not skewed by the incredibly high number of pending compactions as shown
> above:
> [image: Inline image 3]
>
> Has anyone else experienced something similar? Is there something else we
> can do to see if this is something wrong with our cluster?
>
> Thanks in advance for any help!
>
> Vasilis
>

Re: Thousands of pending compactions using STCS

Posted by Jeff Jirsa <je...@crowdstrike.com>.

Same bug also affects 2.0.16 - https://issues.apache.org/jira/browse/CASSANDRA-9662

From:  Jeff Jirsa
Reply-To:  <us...@cassandra.apache.org>
Date:  Friday, December 11, 2015 at 9:12 AM
To:  "user@cassandra.apache.org"
Subject:  Re: Thousands of pending compactions using STCS

There were a few buggy versions in 2.1 (2.1.7, 2.1.8, I believe) that showed this behavior. The number of pending compactions was artificially high, and not meaningful. As long as they number of –Data.db sstables remains normal, compaction is keeping up and you’re fine.

- Jeff

From:  Vasileios Vlachos
Reply-To:  "user@cassandra.apache.org"
Date:  Friday, December 11, 2015 at 8:28 AM
To:  "user@cassandra.apache.org"
Subject:  Thousands of pending compactions using STCS

Hello,

We use Nagios and MX4J for the majority of the monitoring we do for Cassandra (version: 2.0.16). For compactions we hit the following URL:

http://${cassandra_host}:8081/mbean?objectname=org.apache.cassandra.db%3Atype%3DCompactionManager

and check the PendingTasks counter's value. 

We have noticed that occasionally one or more nodes will report back that they have thousands of pending compactions. We have 11 KS in the cluster and a total of 109 *Data.db files under /var/lib/cassandra/data which gives approximately 10 SSTables per KS. That makes us think that having thousands of pending compactions seems unrealistic given the number of SSTables we seem to have at any given time in each KS/CF directory. The logs show a lot of flush and compaction activity but we don't think that's unusual. Also, each CF is configured to have min_compaction_threshold = 2 and max_compaction_threshold = 32. The two screenshots below show a cluster-wide view of pending compactions. Attached you can find the XML files which contain the data from the MX4J console.

And this is from the same graph, but I've selected the time period after 14:00 in order to show what the real compaction activity looks like when not skewed by the incredibly high number of pending compactions as shown above:

Has anyone else experienced something similar? Is there something else we can do to see if this is something wrong with our cluster?

Thanks in advance for any help!

Vasilis

Re: Thousands of pending compactions using STCS

Posted by Anuj Wadehra <an...@yahoo.co.in>.

Sorry I missed the version in your mail..you are on 2.0.16..so it cant be coldness issue..

Anuj 

Sent from Yahoo Mail on Android

From:"Anuj Wadehra" <an...@yahoo.co.in>
Date:Fri, 11 Dec, 2015 at 10:48 pm
Subject:Re: Thousands of pending compactions using STCS

There was a JIRA that cold sstables are not compacted leading to thousands of sstables. Issue got fixed in 2.0.4. Which version of Cassandra are you using?

Anuj

Sent from Yahoo Mail on Android

From:"Jeff Jirsa" <je...@crowdstrike.com>
Date:Fri, 11 Dec, 2015 at 10:42 pm
Subject:Re: Thousands of pending compactions using STCS

There were a few buggy versions in 2.1 (2.1.7, 2.1.8, I believe) that showed this behavior. The number of pending compactions was artificially high, and not meaningful. As long as they number of –Data.db sstables remains normal, compaction is keeping up and you’re fine.

- Jeff

From: Vasileios Vlachos
Reply-To: "user@cassandra.apache.org"
Date: Friday, December 11, 2015 at 8:28 AM
To: "user@cassandra.apache.org"
Subject: Thousands of pending compactions using STCS

Re: Thousands of pending compactions using STCS

Posted by Anuj Wadehra <an...@yahoo.co.in>.

There was a JIRA that cold sstables are not compacted leading to thousands of sstables. Issue got fixed in 2.0.4. Which version of Cassandra are you using?

Anuj

Sent from Yahoo Mail on Android

From:"Jeff Jirsa" <je...@crowdstrike.com>
Date:Fri, 11 Dec, 2015 at 10:42 pm
Subject:Re: Thousands of pending compactions using STCS

There were a few buggy versions in 2.1 (2.1.7, 2.1.8, I believe) that showed this behavior. The number of pending compactions was artificially high, and not meaningful. As long as they number of –Data.db sstables remains normal, compaction is keeping up and you’re fine.

- Jeff

From: Vasileios Vlachos
Reply-To: "user@cassandra.apache.org"
Date: Friday, December 11, 2015 at 8:28 AM
To: "user@cassandra.apache.org"
Subject: Thousands of pending compactions using STCS

Re: Thousands of pending compactions using STCS

Posted by Jeff Jirsa <je...@crowdstrike.com>.

There were a few buggy versions in 2.1 (2.1.7, 2.1.8, I believe) that showed this behavior. The number of pending compactions was artificially high, and not meaningful. As long as they number of –Data.db sstables remains normal, compaction is keeping up and you’re fine.

- Jeff

From:  Vasileios Vlachos
Reply-To:  "user@cassandra.apache.org"
Date:  Friday, December 11, 2015 at 8:28 AM
To:  "user@cassandra.apache.org"
Subject:  Thousands of pending compactions using STCS

Hello,

We use Nagios and MX4J for the majority of the monitoring we do for Cassandra (version: 2.0.16). For compactions we hit the following URL:

http://${cassandra_host}:8081/mbean?objectname=org.apache.cassandra.db%3Atype%3DCompactionManager

and check the PendingTasks counter's value. 

We have noticed that occasionally one or more nodes will report back that they have thousands of pending compactions. We have 11 KS in the cluster and a total of 109 *Data.db files under /var/lib/cassandra/data which gives approximately 10 SSTables per KS. That makes us think that having thousands of pending compactions seems unrealistic given the number of SSTables we seem to have at any given time in each KS/CF directory. The logs show a lot of flush and compaction activity but we don't think that's unusual. Also, each CF is configured to have min_compaction_threshold = 2 and max_compaction_threshold = 32. The two screenshots below show a cluster-wide view of pending compactions. Attached you can find the XML files which contain the data from the MX4J console.

And this is from the same graph, but I've selected the time period after 14:00 in order to show what the real compaction activity looks like when not skewed by the incredibly high number of pending compactions as shown above:

Has anyone else experienced something similar? Is there something else we can do to see if this is something wrong with our cluster?

Thanks in advance for any help!

Vasilis