You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Nenad (JIRA)" <ji...@apache.org> on 2017/01/26 11:59:24 UTC
[jira] [Comment Edited] (CASSANDRA-13120) Trace and Histogram output misleading

    [ https://issues.apache.org/jira/browse/CASSANDRA-13120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15839613#comment-15839613 ] 

Nenad edited comment on CASSANDRA-13120 at 1/26/17 11:59 AM:
-------------------------------------------------------------

I wanted to share some more details from digging and testing. First thing that I tried is LCS on clean node, to rule out TWCS as a problem. After some data loaded via `cassandra-stress` and some manually I managed to have same behavior.

{code:title=getsstables|borderStyle=solid}
vagrant@cassandra:/var/lib/cassandra/data/rts/lcs_test-2be45680e3b111e6b03e5fb4fc6cee63$ nodetool getsstables rts lcs_test 1eea2a12-e3ad-11e6-bf01-fe55135034f3;
/var/lib/cassandra/data/rts/lcs_test-2be45680e3b111e6b03e5fb4fc6cee63/mb-993-big-Data.db
{code}

So UUID is in single SSTable. Than I went to `cqlsh` and tried tracing and select and here is outcome:
```
cqlsh> SELECT * FROM rts.lcs_test WHERE id = 1eea2a12-e3ad-11e6-bf01-fe55135034f3;

 id                                   | tp_id | ex_uuid
--------------------------------------+-------+----------
 1eea2a12-e3ad-11e6-bf01-fe55135034f3 |     3 | asddddad

(1 rows)

Tracing session: aad45de0-e3bc-11e6-8de0-89a4196d20ae

 activity                                                                                                  | timestamp                  | source        | source_elapsed
-----------------------------------------------------------------------------------------------------------+----------------------------+---------------+----------------
                                                                                        Execute CQL3 query | 2017-01-26 11:43:34.078000 | 192.168.34.20 |              0
 Parsing SELECT * FROM rts.lcs_test WHERE id = 1eea2a12-e3ad-11e6-bf01-fe55135034f3; [SharedPool-Worker-1] | 2017-01-26 11:43:34.078000 | 192.168.34.20 |            199
                                                                 Preparing statement [SharedPool-Worker-1] | 2017-01-26 11:43:34.079000 | 192.168.34.20 |            361
                                        Executing single-partition query on lcs_test [SharedPool-Worker-2] | 2017-01-26 11:43:34.079000 | 192.168.34.20 |            660
                                                        Acquiring sstable references [SharedPool-Worker-2] | 2017-01-26 11:43:34.079000 | 192.168.34.20 |            684
                                           Bloom filter allows skipping sstable 1406 [SharedPool-Worker-2] | 2017-01-26 11:43:34.079000 | 192.168.34.20 |            732
                                                       Key cache hit for sstable 993 [SharedPool-Worker-2] | 2017-01-26 11:43:34.079000 | 192.168.34.20 |            755
           Skipped 0/2 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-2] | 2017-01-26 11:43:34.079000 | 192.168.34.20 |            771
                                          Merging data from memtables and 2 sstables [SharedPool-Worker-2] | 2017-01-26 11:43:34.079001 | 192.168.34.20 |            785
                                                   Read 1 live and 0 tombstone cells [SharedPool-Worker-2] | 2017-01-26 11:43:34.079001 | 192.168.34.20 |            849
                                                                                          Request complete | 2017-01-26 11:43:34.082327 | 192.168.34.20 |           4327

```

So again 2 sstables are merged while bloom filter says one can be skipped.

Next think that I checked is access time of files in linux system with `ls -l --time=atime` command:
```
vagrant@cassandra:/var/lib/cassandra/data/rts/lcs_test-2be45680e3b111e6b03e5fb4fc6cee63$ ls -l --time=atime mb-993-big-*
-rw-r--r-- 1 cassandra cassandra      76 Jan 26 11:42 mb-993-big-CRC.db
-rw-r--r-- 1 cassandra cassandra 1048687 Jan 26 11:42 mb-993-big-Data.db
-rw-r--r-- 1 cassandra cassandra       9 Jan 26 11:42 mb-993-big-Digest.crc32
-rw-r--r-- 1 cassandra cassandra    5744 Jan 26 11:42 mb-993-big-Filter.db
-rw-r--r-- 1 cassandra cassandra  100769 Jan 26 11:42 mb-993-big-Index.db
-rw-r--r-- 1 cassandra cassandra   17703 Jan 26 11:42 mb-993-big-Statistics.db
-rw-r--r-- 1 cassandra cassandra    1072 Jan 26 11:42 mb-993-big-Summary.db
-rw-r--r-- 1 cassandra cassandra      80 Jan 26 11:42 mb-993-big-TOC.txt
```

This timestamp of 11:42 is not changing when I do select on CQLSH and it is timestamp when Cassandra probably read file first time and placed it in memory. This actually proves that Cassandra is not accessing SSTables each time query is done, it is using memory so probably logging is misleading in this case. I would expect if bloom filter allowed skipping of some sstable to log "Merging data from memtables and 1 sstables" instead of 2.

Why [this line in SinglePartitionReadCommand](https://github.com/apache/cassandra/blob/554d6beb0920cba9bcc0124fa9f33c580012b761/src/java/org/apache/cassandra/db/SinglePartitionReadCommand.java#L509) returns 2 sstables when it applies PK filter instead of 1?


was (Author: nbozicns):
I wanted to share some more details from digging and testing. First thing that I tried is LCS on clean node, to rule out TWCS as a problem. After some data loaded via `cassandra-stress` and some manually I managed to have same behavior.

```
vagrant@cassandra:/var/lib/cassandra/data/rts/lcs_test-2be45680e3b111e6b03e5fb4fc6cee63$ nodetool getsstables rts lcs_test 1eea2a12-e3ad-11e6-bf01-fe55135034f3;
/var/lib/cassandra/data/rts/lcs_test-2be45680e3b111e6b03e5fb4fc6cee63/mb-993-big-Data.db
```

So UUID is in single SSTable. Than I went to `cqlsh` and tried tracing and select and here is outcome:
```
cqlsh> SELECT * FROM rts.lcs_test WHERE id = 1eea2a12-e3ad-11e6-bf01-fe55135034f3;

 id                                   | tp_id | ex_uuid
--------------------------------------+-------+----------
 1eea2a12-e3ad-11e6-bf01-fe55135034f3 |     3 | asddddad

(1 rows)

Tracing session: aad45de0-e3bc-11e6-8de0-89a4196d20ae

 activity                                                                                                  | timestamp                  | source        | source_elapsed
-----------------------------------------------------------------------------------------------------------+----------------------------+---------------+----------------
                                                                                        Execute CQL3 query | 2017-01-26 11:43:34.078000 | 192.168.34.20 |              0
 Parsing SELECT * FROM rts.lcs_test WHERE id = 1eea2a12-e3ad-11e6-bf01-fe55135034f3; [SharedPool-Worker-1] | 2017-01-26 11:43:34.078000 | 192.168.34.20 |            199
                                                                 Preparing statement [SharedPool-Worker-1] | 2017-01-26 11:43:34.079000 | 192.168.34.20 |            361
                                        Executing single-partition query on lcs_test [SharedPool-Worker-2] | 2017-01-26 11:43:34.079000 | 192.168.34.20 |            660
                                                        Acquiring sstable references [SharedPool-Worker-2] | 2017-01-26 11:43:34.079000 | 192.168.34.20 |            684
                                           Bloom filter allows skipping sstable 1406 [SharedPool-Worker-2] | 2017-01-26 11:43:34.079000 | 192.168.34.20 |            732
                                                       Key cache hit for sstable 993 [SharedPool-Worker-2] | 2017-01-26 11:43:34.079000 | 192.168.34.20 |            755
           Skipped 0/2 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-2] | 2017-01-26 11:43:34.079000 | 192.168.34.20 |            771
                                          Merging data from memtables and 2 sstables [SharedPool-Worker-2] | 2017-01-26 11:43:34.079001 | 192.168.34.20 |            785
                                                   Read 1 live and 0 tombstone cells [SharedPool-Worker-2] | 2017-01-26 11:43:34.079001 | 192.168.34.20 |            849
                                                                                          Request complete | 2017-01-26 11:43:34.082327 | 192.168.34.20 |           4327

```

So again 2 sstables are merged while bloom filter says one can be skipped.

Next think that I checked is access time of files in linux system with `ls -l --time=atime` command:
```
vagrant@cassandra:/var/lib/cassandra/data/rts/lcs_test-2be45680e3b111e6b03e5fb4fc6cee63$ ls -l --time=atime mb-993-big-*
-rw-r--r-- 1 cassandra cassandra      76 Jan 26 11:42 mb-993-big-CRC.db
-rw-r--r-- 1 cassandra cassandra 1048687 Jan 26 11:42 mb-993-big-Data.db
-rw-r--r-- 1 cassandra cassandra       9 Jan 26 11:42 mb-993-big-Digest.crc32
-rw-r--r-- 1 cassandra cassandra    5744 Jan 26 11:42 mb-993-big-Filter.db
-rw-r--r-- 1 cassandra cassandra  100769 Jan 26 11:42 mb-993-big-Index.db
-rw-r--r-- 1 cassandra cassandra   17703 Jan 26 11:42 mb-993-big-Statistics.db
-rw-r--r-- 1 cassandra cassandra    1072 Jan 26 11:42 mb-993-big-Summary.db
-rw-r--r-- 1 cassandra cassandra      80 Jan 26 11:42 mb-993-big-TOC.txt
```

This timestamp of 11:42 is not changing when I do select on CQLSH and it is timestamp when Cassandra probably read file first time and placed it in memory. This actually proves that Cassandra is not accessing SSTables each time query is done, it is using memory so probably logging is misleading in this case. I would expect if bloom filter allowed skipping of some sstable to log "Merging data from memtables and 1 sstables" instead of 2.

Why [this line in SinglePartitionReadCommand](https://github.com/apache/cassandra/blob/554d6beb0920cba9bcc0124fa9f33c580012b761/src/java/org/apache/cassandra/db/SinglePartitionReadCommand.java#L509) returns 2 sstables when it applies PK filter instead of 1?

> Trace and Histogram output misleading
> -------------------------------------
>
>                 Key: CASSANDRA-13120
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13120
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Adam Hattrell
>            Priority: Minor
>
> If we look at the following output:
> {noformat}
> [centos@cassandra-c-3]$ nodetool getsstables -- keyspace table 60ea4399-6b9f-4419-9ccb-ff2e6742de10
> /mnt/cassandra/data/data/keyspace/table-62f30431acf411e69a4ed7dd11246f8a/mc-647146-big-Data.db
> /mnt/cassandra/data/data/keyspace/table-62f30431acf411e69a4ed7dd11246f8a/mc-647147-big-Data.db
> /mnt/cassandra/data/data/keyspace/table-62f30431acf411e69a4ed7dd11246f8a/mc-647145-big-Data.db
> /mnt/cassandra/data/data/keyspace/table-62f30431acf411e69a4ed7dd11246f8a/mc-647152-big-Data.db
> /mnt/cassandra/data/data/keyspace/table-62f30431acf411e69a4ed7dd11246f8a/mc-647157-big-Data.db
> /mnt/cassandra/data/data/keyspace/table-62f30431acf411e69a4ed7dd11246f8a/mc-648137-big-Data.db
> {noformat}
> We can see that this key value appears in just 6 sstables.  However, when we run a select against the table and key we get:
> {noformat}
> Tracing session: a6c81330-d670-11e6-b00b-c1d403fd6e84
>  activity                                                                                                          | timestamp                  | source         | source_elapsed
> -------------------------------------------------------------------------------------------------------------------+----------------------------+----------------+----------------
>                                                                                                 Execute CQL3 query | 2017-01-09 13:36:40.419000 | 10.200.254.141 |              0
>  Parsing SELECT * FROM keyspace.table WHERE id = 60ea4399-6b9f-4419-9ccb-ff2e6742de10; [SharedPool-Worker-2]       | 2017-01-09 13:36:40.419000 | 10.200.254.141 |            104
>                                                                          Preparing statement [SharedPool-Worker-2] | 2017-01-09 13:36:40.419000 | 10.200.254.141 |            220
>                                         Executing single-partition query on table [SharedPool-Worker-1]            | 2017-01-09 13:36:40.419000 | 10.200.254.141 |            450
>                                                                 Acquiring sstable references [SharedPool-Worker-1] | 2017-01-09 13:36:40.419000 | 10.200.254.141 |            477
>                                                  Bloom filter allows skipping sstable 648146 [SharedPool-Worker-1] | 2017-01-09 13:36:40.419000 | 10.200.254.141 |            496
>                                                  Bloom filter allows skipping sstable 648145 [SharedPool-Worker-1] | 2017-01-09 13:36:40.419001 | 10.200.254.141 |            503
>                                                             Key cache hit for sstable 648140 [SharedPool-Worker-1] | 2017-01-09 13:36:40.419001 | 10.200.254.141 |            513
>                                                  Bloom filter allows skipping sstable 648135 [SharedPool-Worker-1] | 2017-01-09 13:36:40.419001 | 10.200.254.141 |            520
>                                                  Bloom filter allows skipping sstable 648130 [SharedPool-Worker-1] | 2017-01-09 13:36:40.419001 | 10.200.254.141 |            526
>                                                  Bloom filter allows skipping sstable 648048 [SharedPool-Worker-1] | 2017-01-09 13:36:40.419001 | 10.200.254.141 |            530
>                                                  Bloom filter allows skipping sstable 647749 [SharedPool-Worker-1] | 2017-01-09 13:36:40.419001 | 10.200.254.141 |            535
>                                                  Bloom filter allows skipping sstable 647404 [SharedPool-Worker-1] | 2017-01-09 13:36:40.419001 | 10.200.254.141 |            540
>                                                             Key cache hit for sstable 647145 [SharedPool-Worker-1] | 2017-01-09 13:36:40.419001 | 10.200.254.141 |            548
>                                                             Key cache hit for sstable 647146 [SharedPool-Worker-1] | 2017-01-09 13:36:40.419001 | 10.200.254.141 |            556
>                                                             Key cache hit for sstable 647147 [SharedPool-Worker-1] | 2017-01-09 13:36:40.419002 | 10.200.254.141 |            564
>                                                  Bloom filter allows skipping sstable 647148 [SharedPool-Worker-1] | 2017-01-09 13:36:40.419002 | 10.200.254.141 |            570
>                                                  Bloom filter allows skipping sstable 647149 [SharedPool-Worker-1] | 2017-01-09 13:36:40.419002 | 10.200.254.141 |            575
>                                                  Bloom filter allows skipping sstable 647150 [SharedPool-Worker-1] | 2017-01-09 13:36:40.419002 | 10.200.254.141 |            580
>                                                  Bloom filter allows skipping sstable 647151 [SharedPool-Worker-1] | 2017-01-09 13:36:40.419002 | 10.200.254.141 |            585
>                                                             Key cache hit for sstable 647152 [SharedPool-Worker-1] | 2017-01-09 13:36:40.419002 | 10.200.254.141 |            591
>                                                  Bloom filter allows skipping sstable 647153 [SharedPool-Worker-1] | 2017-01-09 13:36:40.419002 | 10.200.254.141 |            597
>                                                  Bloom filter allows skipping sstable 647154 [SharedPool-Worker-1] | 2017-01-09 13:36:40.419002 | 10.200.254.141 |            601
>                                                  Bloom filter allows skipping sstable 647155 [SharedPool-Worker-1] | 2017-01-09 13:36:40.419002 | 10.200.254.141 |            606
>                                                  Bloom filter allows skipping sstable 647156 [SharedPool-Worker-1] | 2017-01-09 13:36:40.419002 | 10.200.254.141 |            611
>                                                             Key cache hit for sstable 647157 [SharedPool-Worker-1] | 2017-01-09 13:36:40.419002 | 10.200.254.141 |            617
>                                                  Bloom filter allows skipping sstable 647158 [SharedPool-Worker-1] | 2017-01-09 13:36:40.419003 | 10.200.254.141 |            623
>                   Skipped 0/22 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-1] | 2017-01-09 13:36:40.419003 | 10.200.254.141 |            644
>                                                  Merging data from memtables and 22 sstables [SharedPool-Worker-1] | 2017-01-09 13:36:40.419003 | 10.200.254.141 |            654
>                                                            Read 9 live and 0 tombstone cells [SharedPool-Worker-1] | 2017-01-09 13:36:40.419003 | 10.200.254.141 |            732
>                                                                                                   Request complete | 2017-01-09 13:36:40.419808 | 10.200.254.141 |            808
> {noformat}
> You'll note we claim not to have skipped any files due to bloom filters - even though we know the data is only in 6 files.
> CFHistograms also report that we're hitting every sstables:
> {noformat}
> Percentile SSTables Write Latency Read Latency Partition Size Cell Count 
> (micros) (micros) (bytes) 
> 50% 24.00 14.24 182.79 103 1 
> 75% 24.00 17.08 315.85 149 2 
> 95% 24.00 20.50 7007.51 372 7 
> 98% 24.00 24.60 10090.81 642 12 
> 99% 24.00 29.52 12108.97 770 14 
> Min 21.00 3.31 29.52 43 0 
> Max 29.00 1358.10 62479.63 1597 35
> {noformat}
> Code for the read is here:
> https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/db/SinglePartitionReadCommand.java#L561
> We seem to iterate over all the sstables and increment the metric as part of that iteration.
> Either the reporting is incorrect - or we should maybe check the bloom filters first and then iterate the tombstones after?  
> In this particular case we were using TWCS which makes the problem more apparent.  TWCS guarantees that we'll keep more sstables in an un-merged state.  With STCS we have to search them all, but most of them should be merged together if Compaction is keeping up.  LCS the read path is restricted which will disguise the impact.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)