You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Dave Galbraith <da...@gmail.com> on 2015/03/26 08:51:48 UTC

Disastrous profusion of SSTables

Hey! So I'm running Cassandra 2.1.2 and using the
SizeTieredCompactionStrategy. I'm doing about 3k writes/sec on a single
node. My read performance is terrible, all my queries just time out. So I
do nodetool cfstats:

    Read Count: 42071
    Read Latency: 67.47804242827601 ms.
    Write Count: 131964300
    Write Latency: 0.011721604274792501 ms.
    Pending Flushes: 0
        Table: metrics16513
        SSTable count: 641
        Space used (live): 6366740812
        Space used (total): 6366740812
        Space used by snapshots (total): 0
        SSTable Compression Ratio: 0.25272488401992765
        Memtable cell count: 0
        Memtable data size: 0
        Memtable switch count: 1016
        Local read count: 42071
        Local read latency: 67.479 ms
        Local write count: 131964300
        Local write latency: 0.012 ms
        Pending flushes: 0
        Bloom filter false positives: 994
        Bloom filter false ratio: 0.00000
        Bloom filter space used: 37840376
        Compacted partition minimum bytes: 104
        Compacted partition maximum bytes: 24601
        Compacted partition mean bytes: 255
        Average live cells per slice (last five minutes): 111.67243951154147
        Maximum live cells per slice (last five minutes): 1588.0
        Average tombstones per slice (last five minutes): 0.0
        Maximum tombstones per slice (last five minutes): 0.0

and nodetool cfhistograms:

Percentile  SSTables     Write Latency      Read Latency    Partition
Size        Cell Count
                              (micros)          (micros)
(bytes)
50%            46.00              6.99         154844.95
149                 1
75%           430.00              8.53        3518837.53
179                 1
95%           430.00             11.32        7252897.25
215                 2
98%           430.00             15.54       22103886.34
215                 3
99%           430.00             29.86       22290608.19
1597                50
Min             0.00              1.66             26.91
104                 0
Max           430.00         269795.38       27311364.89
24601               924

Gross!! There are 641 SSTables in there, and all my reads are hitting
hundreds of them and timing out. How could this possibly have happened, and
what can I do about it? Nodetool compactionstats says pending tasks: 0, by
the way. Thanks!

Re: Disastrous profusion of SSTables

Posted by Robert Coli <rc...@eventbrite.com>.

On Thu, Mar 26, 2015 at 12:51 AM, Dave Galbraith <david92galbraith@gmail.com
> wrote:

> Hey! So I'm running Cassandra 2.1.2 and using the
> SizeTieredCompactionStrategy. I'm doing about 3k writes/sec on a single
> node. My read performance is terrible, all my queries just time out. So I
> do nodetool cfstats:
>

For the record, the cassandra download page currently advises using 2.0.x
as the stable production version.

http://cassandra.apache.org/download/
"
The most stable release of Apache Cassandra is 2.0.13 (released on
2015-03-16). If you are in production or planning to be soon, download this
one.
"

=Rob

Re: Disastrous profusion of SSTables

Posted by Dave Galbraith <da...@gmail.com>.

It looks like it was CASSANDRA-8860, setting that cold reads to omit thing
down to zero took my SSTable count from 641 to 1 and made all my queries
work. Thank you!!

On Thu, Mar 26, 2015 at 4:55 AM, graham sanderson <gr...@vast.com> wrote:

> you may be seeing
>
> https://issues.apache.org/jira/browse/CASSANDRA-8860
> https://issues.apache.org/jira/browse/CASSANDRA-8635
>
> related issues (which ends up with excessive numbers of sstables)
>
> we applied
>
> *diff --git
> a/src/java/org/apache/cassandra/db/compaction/SizeTieredCompactionStrategy.java
> b/src/java/org/apache/cassandra/db/compaction/SizeTieredCompactio*
> *index fbd715c..cbb8c8b 100644*
> *---
> a/src/java/org/apache/cassandra/db/compaction/SizeTieredCompactionStrategy.java*
> *+++
> b/src/java/org/apache/cassandra/db/compaction/SizeTieredCompactionStrategy.java*
> @@ -118,7 +118,11 @@ public class SizeTieredCompactionStrategy extends
> AbstractCompactionStrategy
>      static List<SSTableReader> filterColdSSTables(List<SSTableReader>
> sstables, double coldReadsToOmit, int minThreshold)
>      {
>          if (coldReadsToOmit == 0.0)
> +        {
> +            if (!sstables.isEmpty())
> +                logger.debug("Skipping cold sstable filter for list sized
> {} containing {}", sstables.size(), sstables.get(0).getFilename());
>              return sstables;
> +        }
>
>
>          // Sort the sstables by hotness (coldest-first). We first build a
> map because the hotness may change during the sort.
>          final Map<SSTableReader, Double> hotnessSnapshot =
> getHotnessMap(sstables);
> *diff --git
> a/src/java/org/apache/cassandra/db/compaction/SizeTieredCompactionStrategyOptions.java
> b/src/java/org/apache/cassandra/db/compaction/SizeTieredCo*
> *index 84e7d61..c6c5f1b 100644*
> *---
> a/src/java/org/apache/cassandra/db/compaction/SizeTieredCompactionStrategyOptions.java*
> *+++
> b/src/java/org/apache/cassandra/db/compaction/SizeTieredCompactionStrategyOptions.java*
> @@ -26,7 +26,7 @@ public final class SizeTieredCompactionStrategyOptions
>      protected static final long DEFAULT_MIN_SSTABLE_SIZE = 50L * 1024L *
> 1024L;
>      protected static final double DEFAULT_BUCKET_LOW = 0.5;
>      protected static final double DEFAULT_BUCKET_HIGH = 1.5;
> -    protected static final double DEFAULT_COLD_READS_TO_OMIT = 0.05;
> +    protected static final double DEFAULT_COLD_READS_TO_OMIT = 0.0;
>      protected static final String MIN_SSTABLE_SIZE_KEY =
> "min_sstable_size";
>      protected static final String BUCKET_LOW_KEY = "bucket_low";
>      protected static final String BUCKET_HIGH_KEY = "bucket_high";
>
> to our 2.1.3, though the entire coldReadsToOmit is removed in 2.1.4
>
> Note you don’t have to patch your code, you can set the value on each
> table (we just have a lot and dynamically generated ones) - basically try
> setting coldReadsToOmit back to 0 which was the default in 2.0.x
>
> On Mar 26, 2015, at 3:56 AM, Anishek Agarwal <an...@gmail.com> wrote:
>
> Are you frequently updating same rows ? What is the memtable flush size ?
> can you post the table create query here in please.
>
> On Thu, Mar 26, 2015 at 1:21 PM, Dave Galbraith <
> david92galbraith@gmail.com> wrote:
>
>> Hey! So I'm running Cassandra 2.1.2 and using the
>> SizeTieredCompactionStrategy. I'm doing about 3k writes/sec on a single
>> node. My read performance is terrible, all my queries just time out. So I
>> do nodetool cfstats:
>>
>>     Read Count: 42071
>>     Read Latency: 67.47804242827601 ms.
>>     Write Count: 131964300
>>     Write Latency: 0.011721604274792501 ms.
>>     Pending Flushes: 0
>>         Table: metrics16513
>>         SSTable count: 641
>>         Space used (live): 6366740812
>>         Space used (total): 6366740812
>>         Space used by snapshots (total): 0
>>         SSTable Compression Ratio: 0.25272488401992765
>>         Memtable cell count: 0
>>         Memtable data size: 0
>>         Memtable switch count: 1016
>>         Local read count: 42071
>>         Local read latency: 67.479 ms
>>         Local write count: 131964300
>>         Local write latency: 0.012 ms
>>         Pending flushes: 0
>>         Bloom filter false positives: 994
>>         Bloom filter false ratio: 0.00000
>>         Bloom filter space used: 37840376
>>         Compacted partition minimum bytes: 104
>>         Compacted partition maximum bytes: 24601
>>         Compacted partition mean bytes: 255
>>         Average live cells per slice (last five minutes):
>> 111.67243951154147
>>         Maximum live cells per slice (last five minutes): 1588.0
>>         Average tombstones per slice (last five minutes): 0.0
>>         Maximum tombstones per slice (last five minutes): 0.0
>>
>> and nodetool cfhistograms:
>>
>> Percentile  SSTables     Write Latency      Read Latency    Partition
>> Size        Cell Count
>>                               (micros)          (micros)
>> (bytes)
>> 50%            46.00              6.99         154844.95
>> 149                 1
>> 75%           430.00              8.53        3518837.53
>> 179                 1
>> 95%           430.00             11.32        7252897.25
>> 215                 2
>> 98%           430.00             15.54       22103886.34
>> 215                 3
>> 99%           430.00             29.86       22290608.19
>> 1597                50
>> Min             0.00              1.66             26.91
>> 104                 0
>> Max           430.00         269795.38       27311364.89
>> 24601               924
>>
>> Gross!! There are 641 SSTables in there, and all my reads are hitting
>> hundreds of them and timing out. How could this possibly have happened, and
>> what can I do about it? Nodetool compactionstats says pending tasks: 0, by
>> the way. Thanks!
>>
>
>
>

Re: Disastrous profusion of SSTables

Posted by graham sanderson <gr...@vast.com>.

you may be seeing

https://issues.apache.org/jira/browse/CASSANDRA-8860 <https://issues.apache.org/jira/browse/CASSANDRA-8860>
https://issues.apache.org/jira/browse/CASSANDRA-8635 <https://issues.apache.org/jira/browse/CASSANDRA-8635>

related issues (which ends up with excessive numbers of sstables)

we applied

diff --git a/src/java/org/apache/cassandra/db/compaction/SizeTieredCompactionStrategy.java b/src/java/org/apache/cassandra/db/compaction/SizeTieredCompactio
index fbd715c..cbb8c8b 100644
--- a/src/java/org/apache/cassandra/db/compaction/SizeTieredCompactionStrategy.java
+++ b/src/java/org/apache/cassandra/db/compaction/SizeTieredCompactionStrategy.java
@@ -118,7 +118,11 @@ public class SizeTieredCompactionStrategy extends AbstractCompactionStrategy
     static List<SSTableReader> filterColdSSTables(List<SSTableReader> sstables, double coldReadsToOmit, int minThreshold)
     {
         if (coldReadsToOmit == 0.0)
+        {
+            if (!sstables.isEmpty())
+                logger.debug("Skipping cold sstable filter for list sized {} containing {}", sstables.size(), sstables.get(0).getFilename());
             return sstables;
+        }
 
         // Sort the sstables by hotness (coldest-first). We first build a map because the hotness may change during the sort.
         final Map<SSTableReader, Double> hotnessSnapshot = getHotnessMap(sstables);
diff --git a/src/java/org/apache/cassandra/db/compaction/SizeTieredCompactionStrategyOptions.java b/src/java/org/apache/cassandra/db/compaction/SizeTieredCo
index 84e7d61..c6c5f1b 100644
--- a/src/java/org/apache/cassandra/db/compaction/SizeTieredCompactionStrategyOptions.java
+++ b/src/java/org/apache/cassandra/db/compaction/SizeTieredCompactionStrategyOptions.java
@@ -26,7 +26,7 @@ public final class SizeTieredCompactionStrategyOptions
     protected static final long DEFAULT_MIN_SSTABLE_SIZE = 50L * 1024L * 1024L;
     protected static final double DEFAULT_BUCKET_LOW = 0.5;
     protected static final double DEFAULT_BUCKET_HIGH = 1.5;
-    protected static final double DEFAULT_COLD_READS_TO_OMIT = 0.05;
+    protected static final double DEFAULT_COLD_READS_TO_OMIT = 0.0;
     protected static final String MIN_SSTABLE_SIZE_KEY = "min_sstable_size";
     protected static final String BUCKET_LOW_KEY = "bucket_low";
     protected static final String BUCKET_HIGH_KEY = "bucket_high";

to our 2.1.3, though the entire coldReadsToOmit is removed in 2.1.4

Note you don’t have to patch your code, you can set the value on each table (we just have a lot and dynamically generated ones) - basically try setting coldReadsToOmit back to 0 which was the default in 2.0.x

> On Mar 26, 2015, at 3:56 AM, Anishek Agarwal <an...@gmail.com> wrote:
> 
> Are you frequently updating same rows ? What is the memtable flush size ? can you post the table create query here in please.
> 
> On Thu, Mar 26, 2015 at 1:21 PM, Dave Galbraith <david92galbraith@gmail.com <ma...@gmail.com>> wrote:
> Hey! So I'm running Cassandra 2.1.2 and using the SizeTieredCompactionStrategy. I'm doing about 3k writes/sec on a single node. My read performance is terrible, all my queries just time out. So I do nodetool cfstats:
> 
>     Read Count: 42071
>     Read Latency: 67.47804242827601 ms.
>     Write Count: 131964300
>     Write Latency: 0.011721604274792501 ms.
>     Pending Flushes: 0
>         Table: metrics16513
>         SSTable count: 641
>         Space used (live): 6366740812
>         Space used (total): 6366740812
>         Space used by snapshots (total): 0
>         SSTable Compression Ratio: 0.25272488401992765
>         Memtable cell count: 0
>         Memtable data size: 0
>         Memtable switch count: 1016
>         Local read count: 42071
>         Local read latency: 67.479 ms
>         Local write count: 131964300
>         Local write latency: 0.012 ms
>         Pending flushes: 0
>         Bloom filter false positives: 994
>         Bloom filter false ratio: 0.00000
>         Bloom filter space used: 37840376
>         Compacted partition minimum bytes: 104
>         Compacted partition maximum bytes: 24601
>         Compacted partition mean bytes: 255
>         Average live cells per slice (last five minutes): 111.67243951154147
>         Maximum live cells per slice (last five minutes): 1588.0
>         Average tombstones per slice (last five minutes): 0.0
>         Maximum tombstones per slice (last five minutes): 0.0
> 
> and nodetool cfhistograms:
> 
> Percentile  SSTables     Write Latency      Read Latency    Partition Size        Cell Count
>                               (micros)          (micros)           (bytes)                  
> 50%            46.00              6.99         154844.95               149                 1
> 75%           430.00              8.53        3518837.53               179                 1
> 95%           430.00             11.32        7252897.25               215                 2
> 98%           430.00             15.54       22103886.34               215                 3
> 99%           430.00             29.86       22290608.19              1597                50
> Min             0.00              1.66             26.91               104                 0
> Max           430.00         269795.38       27311364.89             24601               924
> 
> Gross!! There are 641 SSTables in there, and all my reads are hitting hundreds of them and timing out. How could this possibly have happened, and what can I do about it? Nodetool compactionstats says pending tasks: 0, by the way. Thanks!
>

Re: Disastrous profusion of SSTables

Posted by Anishek Agarwal <an...@gmail.com>.

Are you frequently updating same rows ? What is the memtable flush size ?
can you post the table create query here in please.

On Thu, Mar 26, 2015 at 1:21 PM, Dave Galbraith <da...@gmail.com>
wrote:

> Hey! So I'm running Cassandra 2.1.2 and using the
> SizeTieredCompactionStrategy. I'm doing about 3k writes/sec on a single
> node. My read performance is terrible, all my queries just time out. So I
> do nodetool cfstats:
>
>     Read Count: 42071
>     Read Latency: 67.47804242827601 ms.
>     Write Count: 131964300
>     Write Latency: 0.011721604274792501 ms.
>     Pending Flushes: 0
>         Table: metrics16513
>         SSTable count: 641
>         Space used (live): 6366740812
>         Space used (total): 6366740812
>         Space used by snapshots (total): 0
>         SSTable Compression Ratio: 0.25272488401992765
>         Memtable cell count: 0
>         Memtable data size: 0
>         Memtable switch count: 1016
>         Local read count: 42071
>         Local read latency: 67.479 ms
>         Local write count: 131964300
>         Local write latency: 0.012 ms
>         Pending flushes: 0
>         Bloom filter false positives: 994
>         Bloom filter false ratio: 0.00000
>         Bloom filter space used: 37840376
>         Compacted partition minimum bytes: 104
>         Compacted partition maximum bytes: 24601
>         Compacted partition mean bytes: 255
>         Average live cells per slice (last five minutes):
> 111.67243951154147
>         Maximum live cells per slice (last five minutes): 1588.0
>         Average tombstones per slice (last five minutes): 0.0
>         Maximum tombstones per slice (last five minutes): 0.0
>
> and nodetool cfhistograms:
>
> Percentile  SSTables     Write Latency      Read Latency    Partition
> Size        Cell Count
>                               (micros)          (micros)
> (bytes)
> 50%            46.00              6.99         154844.95
> 149                 1
> 75%           430.00              8.53        3518837.53
> 179                 1
> 95%           430.00             11.32        7252897.25
> 215                 2
> 98%           430.00             15.54       22103886.34
> 215                 3
> 99%           430.00             29.86       22290608.19
> 1597                50
> Min             0.00              1.66             26.91
> 104                 0
> Max           430.00         269795.38       27311364.89
> 24601               924
>
> Gross!! There are 641 SSTables in there, and all my reads are hitting
> hundreds of them and timing out. How could this possibly have happened, and
> what can I do about it? Nodetool compactionstats says pending tasks: 0, by
> the way. Thanks!
>