You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Brice Figureau (JIRA)" <ji...@apache.org> on 2015/11/13 11:22:11 UTC

[jira] [Created] (CASSANDRA-10698) Static column performance with DISTINCT

Brice Figureau created CASSANDRA-10698:
------------------------------------------

             Summary: Static column performance with DISTINCT
                 Key: CASSANDRA-10698
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10698
             Project: Cassandra
          Issue Type: Bug
          Components: core
         Environment: Linux, cassandra 2.1.11
            Reporter: Brice Figureau
         Attachments: bug-slow-distinct.tar.gz

As described on the mailing list, with the following schema:

{code:sql}
CREATE TABLE akka.messages (
    persistence_id text,
    partition_nr bigint,
    sequence_nr bigint,
    message blob,
    used boolean static,
    PRIMARY KEY ((persistence_id, partition_nr), sequence_nr)
) WITH CLUSTERING ORDER BY (sequence_nr ASC)
    AND bloom_filter_fp_chance = 0.01
    AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
    AND comment = ''
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'}
    AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 216000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99.0PERCENTILE';
{code}

The following query:
{code:sql}
SELECT used from akka.messages WHERE
      persistence_id = 'player-SW11f03e20b8802000' AND
      partition_nr = 0;
{code}

is quite slow, and as slow as the {{distinct}} version:
{code:sql}
SELECT DISTINCT used from akka.messages WHERE
      persistence_id = 'player-SW11f03e20b8802000' AND
      partition_nr = 0;
{code}

As shown with this tracing from a small unloaded 3 nodes cluster:
{noformat}
 activity                                                                                          | timestamp                  | source         | source_elapsed
---------------------------------------------------------------------------------------------------+----------------------------+----------------+----------------
                                                                                Execute CQL3 query | 2015-11-13 11:04:41.771000 | 192.168.168.12 |              0
 Parsing SELECT DISTINCT used ....                                           [SharedPool-Worker-1] | 2015-11-13 11:04:41.771000 | 192.168.168.12 |             78
            READ message received from /192.168.168.12 [MessagingService-Incoming-/192.168.168.12] | 2015-11-13 11:04:41.772000 | 192.168.168.29 |             22
                                                         Preparing statement [SharedPool-Worker-1] | 2015-11-13 11:04:41.772000 | 192.168.168.12 |            271
                                Executing single-partition query on messages [SharedPool-Worker-4] | 2015-11-13 11:04:41.772000 | 192.168.168.29 |            424
                                           reading data from /192.168.168.29 [SharedPool-Worker-1] | 2015-11-13 11:04:41.772000 | 192.168.168.12 |            642
                                                Acquiring sstable references [SharedPool-Worker-4] | 2015-11-13 11:04:41.772000 | 192.168.168.29 |            445
               Sending READ message to /192.168.168.29 [MessagingService-Outgoing-/192.168.168.29] | 2015-11-13 11:04:41.772000 | 192.168.168.12 |            738
                                                 Merging memtable tombstones [SharedPool-Worker-4] | 2015-11-13 11:04:41.772000 | 192.168.168.29 |            476
                                              Key cache hit for sstable 1126 [SharedPool-Worker-4] | 2015-11-13 11:04:41.773000 | 192.168.168.29 |            560
                                 Seeking to partition beginning in data file [SharedPool-Worker-4] | 2015-11-13 11:04:41.773000 | 192.168.168.29 |            592
   Skipped 0/1 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-4] | 2015-11-13 11:04:41.773000 | 192.168.168.29 |            960
                                  Merging data from memtables and 1 sstables [SharedPool-Worker-4] | 2015-11-13 11:04:41.774000 | 192.168.168.29 |            971
                                           Read 1 live and 0 tombstone cells [SharedPool-Worker-4] | 2015-11-13 11:04:47.301000 | 192.168.168.29 |         529270
                                       Enqueuing response to /192.168.168.12 [SharedPool-Worker-4] | 2015-11-13 11:04:47.316000 | 192.168.168.29 |         544885
   Sending REQUEST_RESPONSE message to /192.168.168.12 [MessagingService-Outgoing-/192.168.168.12] | 2015-11-13 11:04:47.317000 | 192.168.168.29 |         545042
REQUEST_RESPONSE message received from /192.168.168.29 [MessagingService-Incoming-/192.168.168.29] | 2015-11-13 11:04:47.429000 | 192.168.168.12 |         657918
                                    Processing response from /192.168.168.29 [SharedPool-Worker-2] | 2015-11-13 11:04:47.429000 | 192.168.168.12 |         658224
                                                                                  Request complete | 2015-11-13 11:04:47.525892 | 192.168.168.12 |         754892

{noformat}

As you can see, there's a 5 seconds delay between "Merging data" and "Read 1 live and 0 tombstones" steps.
Note that this row in particular is quite large, and has certainly a lots of tombstones. But I understood that when querying for a static column,  cassandra shouldn't have to read everything, as the static column value exists in one place.

Note that if the row cache is enabled, the row is then cached and it is very fast on subsequent request.

I've attached the sstable in question, that exhibits the issue for this specific large row.
Note that I didn't run any compaction on the sstable, because I'm not sure if the problem is coming from the tombstones on the said record.









--
This message was sent by Atlassian JIRA
(v6.3.4#6332)