You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Dave Galbraith <da...@gmail.com> on 2015/03/23 05:56:00 UTC

Really high read latency

Hi! So I've got a table like this:

CREATE TABLE "default".metrics (row_time int,attrs varchar,offset int,value
double, PRIMARY KEY(row_time, attrs, offset)) WITH COMPACT STORAGE AND
bloom_filter_fp_chance=0.01 AND caching='KEYS_ONLY' AND comment='' AND
dclocal_read_repair_chance=0 AND gc_grace_seconds=864000 AND
index_interval=128 AND read_repair_chance=1 AND replicate_on_write='true'
AND populate_io_cache_on_flush='false' AND default_time_to_live=0 AND
speculative_retry='NONE' AND memtable_flush_period_in_ms=0 AND
compaction={'class':'DateTieredCompactionStrategy','timestamp_resolution':'MILLISECONDS'}
AND compression={'sstable_compression':'LZ4Compressor'};

and I'm running Cassandra on an EC2 m3.2xlarge out in the cloud, with 4 GB
of heap space. So it's timeseries data that I'm doing so I increment
"row_time" each day, "attrs" is additional identifying information about
each series, and "offset" is the number of milliseconds into the day for
each data point. So for the past 5 days, I've been inserting 3k
points/second distributed across 100k distinct "attrs"es. And now when I
try to run queries on this data that look like

"SELECT * FROM "default".metrics WHERE row_time = 5 AND attrs =
'potatoes_and_jam'"

it takes an absurdly long time and sometimes just times out. I did
"nodetool cftsats default" and here's what I get:

Keyspace: default
    Read Count: 59
    Read Latency: 397.12523728813557 ms.
    Write Count: 155128
    Write Latency: 0.3675690719921613 ms.
    Pending Flushes: 0
        Table: metrics
        SSTable count: 26
        Space used (live): 35146349027
        Space used (total): 35146349027
        Space used by snapshots (total): 0
        SSTable Compression Ratio: 0.10386468749216264
        Memtable cell count: 141800
        Memtable data size: 31071290
        Memtable switch count: 41
        Local read count: 59
        Local read latency: 397.126 ms
        Local write count: 155128
        Local write latency: 0.368 ms
        Pending flushes: 0
        Bloom filter false positives: 0
        Bloom filter false ratio: 0.00000
        Bloom filter space used: 2856
        Compacted partition minimum bytes: 104
        Compacted partition maximum bytes: 36904729268
        Compacted partition mean bytes: 986530969
        Average live cells per slice (last five minutes): 501.66101694915255
        Maximum live cells per slice (last five minutes): 502.0
        Average tombstones per slice (last five minutes): 0.0
        Maximum tombstones per slice (last five minutes): 0.0

Ouch! 400ms of read latency, orders of magnitude higher than it has any
right to be. How could this have happened? Is there something fundamentally
broken about my data model? Thanks!

Re: Really high read latency

Posted by Duncan Sands <du...@gmail.com>.

Hi Dave,

On 23/03/15 05:56, Dave Galbraith wrote:
> Hi! So I've got a table like this:
>
> CREATE TABLE "default".metrics (row_time int,attrs varchar,offset int,value
> double, PRIMARY KEY(row_time, attrs, offset)) WITH COMPACT STORAGE AND
> bloom_filter_fp_chance=0.01 AND caching='KEYS_ONLY' AND comment='' AND
> dclocal_read_repair_chance=0 AND gc_grace_seconds=864000 AND index_interval=128
> AND read_repair_chance=1 AND replicate_on_write='true' AND
> populate_io_cache_on_flush='false' AND default_time_to_live=0 AND
> speculative_retry='NONE' AND memtable_flush_period_in_ms=0 AND
> compaction={'class':'DateTieredCompactionStrategy','timestamp_resolution':'MILLISECONDS'}
> AND compression={'sstable_compression':'LZ4Compressor'};

does it work better with
   PRIMARY KEY((row_time, attrs), offset)
?

Ciao, Duncan.

>
> and I'm running Cassandra on an EC2 m3.2xlarge out in the cloud, with 4 GB of
> heap space. So it's timeseries data that I'm doing so I increment "row_time"
> each day, "attrs" is additional identifying information about each series, and
> "offset" is the number of milliseconds into the day for each data point. So for
> the past 5 days, I've been inserting 3k points/second distributed across 100k
> distinct "attrs"es. And now when I try to run queries on this data that look like
>
> "SELECT * FROM "default".metrics WHERE row_time = 5 AND attrs = 'potatoes_and_jam'"
>
> it takes an absurdly long time and sometimes just times out. I did "nodetool
> cftsats default" and here's what I get:
>
> Keyspace: default
>      Read Count: 59
>      Read Latency: 397.12523728813557 ms.
>      Write Count: 155128
>      Write Latency: 0.3675690719921613 ms.
>      Pending Flushes: 0
>          Table: metrics
>          SSTable count: 26
>          Space used (live): 35146349027
>          Space used (total): 35146349027
>          Space used by snapshots (total): 0
>          SSTable Compression Ratio: 0.10386468749216264
>          Memtable cell count: 141800
>          Memtable data size: 31071290
>          Memtable switch count: 41
>          Local read count: 59
>          Local read latency: 397.126 ms
>          Local write count: 155128
>          Local write latency: 0.368 ms
>          Pending flushes: 0
>          Bloom filter false positives: 0
>          Bloom filter false ratio: 0.00000
>          Bloom filter space used: 2856
>          Compacted partition minimum bytes: 104
>          Compacted partition maximum bytes: 36904729268
>          Compacted partition mean bytes: 986530969
>          Average live cells per slice (last five minutes): 501.66101694915255
>          Maximum live cells per slice (last five minutes): 502.0
>          Average tombstones per slice (last five minutes): 0.0
>          Maximum tombstones per slice (last five minutes): 0.0
>
> Ouch! 400ms of read latency, orders of magnitude higher than it has any right to
> be. How could this have happened? Is there something fundamentally broken about
> my data model? Thanks!
>

Re: Really high read latency

Posted by Ben Bromhead <be...@instaclustr.com>.

nodetool cfhistograms is also very helpful in diagnosing these kinds of
data modelling issues.

On 23 March 2015 at 14:43, Chris Lohfink <cl...@gmail.com> wrote:

>
> >>  Compacted partition maximum bytes: 36904729268
>
> thats huge... 36gb rows are gonna cause a lot of problems, even when you
> specify a precise cell under this it still is going to have an enormous
> column index to deserialize on every read for the partition.  As mentioned
> above, you should include your attribute name in the partition key ((row_time,
> attrs))
>  to spread this out... Id call that critical
>
> Chris
>
> On Mon, Mar 23, 2015 at 4:13 PM, Dave Galbraith <
> david92galbraith@gmail.com> wrote:
>
>> I haven't deleted anything. Here's output from a traced cqlsh query (I
>> tried to make the spaces line up, hope it's legible):
>>
>> Execute CQL3
>> query
>> | 2015-03-23 21:04:37.422000 | 172.31.32.211 |              0
>> Parsing select * from default.metrics where row_time = 16511 and attrs =
>> '[redacted]' limit 100; [SharedPool-Worker-2] | 2015-03-23 21:04:37.423000
>> | 172.31.32.211 |             93
>> Preparing statement
>> [SharedPool-Worker-2]
>> | 2015-03-23 21:04:37.423000 | 172.31.32.211 |            696
>> Executing single-partition query on metrics [SharedPool-Worker-1]
>>
>>                                                   | 2015-03-23
>> 21:04:37.425000 | 172.31.32.211 |           2807
>> Acquiring sstable references [SharedPool-Worker-1]
>>
>>                                         | 2015-03-23 21:04:37.425000 |
>> 172.31.32.211 |           2993
>> Merging memtable tombstones [SharedPool-Worker-1]
>>
>>                                     | 2015-03-23 21:04:37.426000 |
>> 172.31.32.211 |           3049
>> Partition index with 484338 entries found for sstable 15966
>> [SharedPool-Worker-1]
>>                         | 2015-03-23 21:04:38.625000 | 172.31.32.211
>> |         202304
>> Seeking to partition indexed section in data file
>> [SharedPool-Worker-1]
>>             | 2015-03-23 21:04:38.625000 | 172.31.32.211 |         202354
>> Bloom filter allows skipping sstable 5613 [SharedPool-Worker-1]
>>
>>                      | 2015-03-23 21:04:38.625000 | 172.31.32.211 |
>> 202445
>> Bloom filter allows skipping sstable 5582 [SharedPool-Worker-1]
>>
>>                      | 2015-03-23 21:04:38.625000 | 172.31.32.211 |
>> 202478
>> Bloom filter allows skipping sstable 5611 [SharedPool-Worker-1]
>>
>>                      | 2015-03-23 21:04:38.625000 | 172.31.32.211 |
>> 202508
>> Bloom filter allows skipping sstable 5610
>> [SharedPool-Worker-1]
>> | 2015-03-23 21:04:38.625000 | 172.31.32.211 |         202539
>> Bloom filter allows skipping sstable 5549
>> [SharedPool-Worker-1]
>> | 2015-03-23 21:04:38.625001 | 172.31.32.211 |         202678
>> Bloom filter allows skipping sstable 5544 [SharedPool-Worker-1]
>>
>>                      | 2015-03-23 21:04:38.625001 | 172.31.32.211 |
>> 202720
>> Bloom filter allows skipping sstable 5237
>> [SharedPool-Worker-1]
>> | 2015-03-23 21:04:38.625001 | 172.31.32.211 |         202752
>> Bloom filter allows skipping sstable 2516
>> [SharedPool-Worker-1]
>> | 2015-03-23 21:04:38.625001 | 172.31.32.211 |         202782
>> Bloom filter allows skipping sstable 2632 [SharedPool-Worker-1]
>>
>>                     | 2015-03-23 21:04:38.625001 | 172.31.32.211 |
>> 202812
>> Bloom filter allows skipping sstable 3015 [SharedPool-Worker-1]
>>
>>                     | 2015-03-23 21:04:38.625001 | 172.31.32.211 |
>> 202852
>> Skipped 0/11 non-slice-intersecting sstables, included 0 due to
>> tombstones [SharedPool-Worker-1]                               | 2015-03-23
>> 21:04:38.625001 | 172.31.32.211 |         202882
>> Merging data from memtables and 1 sstables [SharedPool-Worker-1]
>>
>> | 2015-03-23 21:04:38.625001 | 172.31.32.211 |         202902
>> Read 101 live and 0 tombstoned cells
>> [SharedPool-Worker-1]
>> | 2015-03-23 21:04:38.626000 | 172.31.32.211 |         203752
>> Request complete
>>
>>                                                              | 2015-03-23
>> 21:04:38.628253 | 172.31.32.211 |         206253
>>
>> On Mon, Mar 23, 2015 at 11:53 AM, Eric Stevens <mi...@gmail.com> wrote:
>>
>>> Enable tracing in cqlsh and see how many sstables are being lifted to
>>> satisfy the query (are you repeatedly writing to the same partition
>>> [row_time]) over time?).
>>>
>>> Also watch for whether you're hitting a lot of tombstones (are you
>>> deleting lots of values in the same partition over time?).
>>>
>>> On Mon, Mar 23, 2015 at 4:01 AM, Dave Galbraith <
>>> david92galbraith@gmail.com> wrote:
>>>
>>>> Duncan: I'm thinking it might be something like that. I'm also seeing
>>>> just a ton of garbage collection on the box, could it be pulling rows for
>>>> all 100k attrs for a given row_time into memory since only row_time is the
>>>> partition key?
>>>>
>>>> Jens: I'm not using EBS (although I used to until I read up on how
>>>> useless it is). I'm not sure what constitutes proper paging but my client
>>>> has a pretty small amount of available memory so I'm doing pages of size 5k
>>>> using the C++ Datastax driver.
>>>>
>>>> Thanks for the replies!
>>>>
>>>> -Dave
>>>>
>>>> On Mon, Mar 23, 2015 at 2:00 AM, Jens Rantil <je...@tink.se>
>>>> wrote:
>>>>
>>>>> Also, two control questions:
>>>>>
>>>>>    - Are you using EBS for data storage? It might introduce
>>>>>    additional latencies.
>>>>>    - Are you doing proper paging when querying the keyspace?
>>>>>
>>>>> Cheers,
>>>>> Jens
>>>>>
>>>>> On Mon, Mar 23, 2015 at 5:56 AM, Dave Galbraith <
>>>>> david92galbraith@gmail.com> wrote:
>>>>>
>>>>>> Hi! So I've got a table like this:
>>>>>>
>>>>>> CREATE TABLE "default".metrics (row_time int,attrs varchar,offset
>>>>>> int,value double, PRIMARY KEY(row_time, attrs, offset)) WITH COMPACT
>>>>>> STORAGE AND bloom_filter_fp_chance=0.01 AND caching='KEYS_ONLY' AND
>>>>>> comment='' AND dclocal_read_repair_chance=0 AND gc_grace_seconds=864000 AND
>>>>>> index_interval=128 AND read_repair_chance=1 AND replicate_on_write='true'
>>>>>> AND populate_io_cache_on_flush='false' AND default_time_to_live=0 AND
>>>>>> speculative_retry='NONE' AND memtable_flush_period_in_ms=0 AND
>>>>>> compaction={'class':'DateTieredCompactionStrategy','timestamp_resolution':'MILLISECONDS'}
>>>>>> AND compression={'sstable_compression':'LZ4Compressor'};
>>>>>>
>>>>>> and I'm running Cassandra on an EC2 m3.2xlarge out in the cloud, with
>>>>>> 4 GB of heap space. So it's timeseries data that I'm doing so I increment
>>>>>> "row_time" each day, "attrs" is additional identifying information about
>>>>>> each series, and "offset" is the number of milliseconds into the day for
>>>>>> each data point. So for the past 5 days, I've been inserting 3k
>>>>>> points/second distributed across 100k distinct "attrs"es. And now when I
>>>>>> try to run queries on this data that look like
>>>>>>
>>>>>> "SELECT * FROM "default".metrics WHERE row_time = 5 AND attrs =
>>>>>> 'potatoes_and_jam'"
>>>>>>
>>>>>> it takes an absurdly long time and sometimes just times out. I did
>>>>>> "nodetool cftsats default" and here's what I get:
>>>>>>
>>>>>> Keyspace: default
>>>>>>     Read Count: 59
>>>>>>     Read Latency: 397.12523728813557 ms.
>>>>>>     Write Count: 155128
>>>>>>     Write Latency: 0.3675690719921613 ms.
>>>>>>     Pending Flushes: 0
>>>>>>         Table: metrics
>>>>>>         SSTable count: 26
>>>>>>         Space used (live): 35146349027
>>>>>>         Space used (total): 35146349027
>>>>>>         Space used by snapshots (total): 0
>>>>>>         SSTable Compression Ratio: 0.10386468749216264
>>>>>>         Memtable cell count: 141800
>>>>>>         Memtable data size: 31071290
>>>>>>         Memtable switch count: 41
>>>>>>         Local read count: 59
>>>>>>         Local read latency: 397.126 ms
>>>>>>         Local write count: 155128
>>>>>>         Local write latency: 0.368 ms
>>>>>>         Pending flushes: 0
>>>>>>         Bloom filter false positives: 0
>>>>>>         Bloom filter false ratio: 0.00000
>>>>>>         Bloom filter space used: 2856
>>>>>>         Compacted partition minimum bytes: 104
>>>>>>         Compacted partition maximum bytes: 36904729268
>>>>>>         Compacted partition mean bytes: 986530969
>>>>>>         Average live cells per slice (last five minutes):
>>>>>> 501.66101694915255
>>>>>>         Maximum live cells per slice (last five minutes): 502.0
>>>>>>         Average tombstones per slice (last five minutes): 0.0
>>>>>>         Maximum tombstones per slice (last five minutes): 0.0
>>>>>>
>>>>>> Ouch! 400ms of read latency, orders of magnitude higher than it has
>>>>>> any right to be. How could this have happened? Is there something
>>>>>> fundamentally broken about my data model? Thanks!
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Jens Rantil
>>>>> Backend engineer
>>>>> Tink AB
>>>>>
>>>>> Email: jens.rantil@tink.se
>>>>> Phone: +46 708 84 18 32
>>>>> Web: www.tink.se
>>>>>
>>>>> Facebook <https://www.facebook.com/#!/tink.se> Linkedin
>>>>> <http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_photo&trkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary>
>>>>>  Twitter <https://twitter.com/tink>
>>>>>
>>>>
>>>>
>>>
>>
>


-- 

Ben Bromhead

Instaclustr | www.instaclustr.com | @instaclustr
<http://twitter.com/instaclustr> | (650) 284 9692

Re: Really high read latency

Posted by Chris Lohfink <cl...@gmail.com>.

>>  Compacted partition maximum bytes: 36904729268

thats huge... 36gb rows are gonna cause a lot of problems, even when you
specify a precise cell under this it still is going to have an enormous
column index to deserialize on every read for the partition.  As mentioned
above, you should include your attribute name in the partition key ((row_time,
attrs))
 to spread this out... Id call that critical

Chris

On Mon, Mar 23, 2015 at 4:13 PM, Dave Galbraith <da...@gmail.com>
wrote:

> I haven't deleted anything. Here's output from a traced cqlsh query (I
> tried to make the spaces line up, hope it's legible):
>
> Execute CQL3
> query
> | 2015-03-23 21:04:37.422000 | 172.31.32.211 |              0
> Parsing select * from default.metrics where row_time = 16511 and attrs =
> '[redacted]' limit 100; [SharedPool-Worker-2] | 2015-03-23 21:04:37.423000
> | 172.31.32.211 |             93
> Preparing statement
> [SharedPool-Worker-2]
> | 2015-03-23 21:04:37.423000 | 172.31.32.211 |            696
> Executing single-partition query on metrics [SharedPool-Worker-1]
>
>                                                   | 2015-03-23
> 21:04:37.425000 | 172.31.32.211 |           2807
> Acquiring sstable references [SharedPool-Worker-1]
>
>                                         | 2015-03-23 21:04:37.425000 |
> 172.31.32.211 |           2993
> Merging memtable tombstones [SharedPool-Worker-1]
>
>                                     | 2015-03-23 21:04:37.426000 |
> 172.31.32.211 |           3049
> Partition index with 484338 entries found for sstable 15966
> [SharedPool-Worker-1]
>                         | 2015-03-23 21:04:38.625000 | 172.31.32.211
> |         202304
> Seeking to partition indexed section in data file
> [SharedPool-Worker-1]
>             | 2015-03-23 21:04:38.625000 | 172.31.32.211 |         202354
> Bloom filter allows skipping sstable 5613 [SharedPool-Worker-1]
>
>                      | 2015-03-23 21:04:38.625000 | 172.31.32.211 |
> 202445
> Bloom filter allows skipping sstable 5582 [SharedPool-Worker-1]
>
>                      | 2015-03-23 21:04:38.625000 | 172.31.32.211 |
> 202478
> Bloom filter allows skipping sstable 5611 [SharedPool-Worker-1]
>
>                      | 2015-03-23 21:04:38.625000 | 172.31.32.211 |
> 202508
> Bloom filter allows skipping sstable 5610
> [SharedPool-Worker-1]
> | 2015-03-23 21:04:38.625000 | 172.31.32.211 |         202539
> Bloom filter allows skipping sstable 5549
> [SharedPool-Worker-1]
> | 2015-03-23 21:04:38.625001 | 172.31.32.211 |         202678
> Bloom filter allows skipping sstable 5544 [SharedPool-Worker-1]
>
>                      | 2015-03-23 21:04:38.625001 | 172.31.32.211 |
> 202720
> Bloom filter allows skipping sstable 5237
> [SharedPool-Worker-1]
> | 2015-03-23 21:04:38.625001 | 172.31.32.211 |         202752
> Bloom filter allows skipping sstable 2516
> [SharedPool-Worker-1]
> | 2015-03-23 21:04:38.625001 | 172.31.32.211 |         202782
> Bloom filter allows skipping sstable 2632 [SharedPool-Worker-1]
>
>                     | 2015-03-23 21:04:38.625001 | 172.31.32.211 |
> 202812
> Bloom filter allows skipping sstable 3015 [SharedPool-Worker-1]
>
>                     | 2015-03-23 21:04:38.625001 | 172.31.32.211 |
> 202852
> Skipped 0/11 non-slice-intersecting sstables, included 0 due to tombstones
> [SharedPool-Worker-1]                               | 2015-03-23
> 21:04:38.625001 | 172.31.32.211 |         202882
> Merging data from memtables and 1 sstables [SharedPool-Worker-1]
>
> | 2015-03-23 21:04:38.625001 | 172.31.32.211 |         202902
> Read 101 live and 0 tombstoned cells
> [SharedPool-Worker-1]
> | 2015-03-23 21:04:38.626000 | 172.31.32.211 |         203752
> Request complete
>
>                                                              | 2015-03-23
> 21:04:38.628253 | 172.31.32.211 |         206253
>
> On Mon, Mar 23, 2015 at 11:53 AM, Eric Stevens <mi...@gmail.com> wrote:
>
>> Enable tracing in cqlsh and see how many sstables are being lifted to
>> satisfy the query (are you repeatedly writing to the same partition
>> [row_time]) over time?).
>>
>> Also watch for whether you're hitting a lot of tombstones (are you
>> deleting lots of values in the same partition over time?).
>>
>> On Mon, Mar 23, 2015 at 4:01 AM, Dave Galbraith <
>> david92galbraith@gmail.com> wrote:
>>
>>> Duncan: I'm thinking it might be something like that. I'm also seeing
>>> just a ton of garbage collection on the box, could it be pulling rows for
>>> all 100k attrs for a given row_time into memory since only row_time is the
>>> partition key?
>>>
>>> Jens: I'm not using EBS (although I used to until I read up on how
>>> useless it is). I'm not sure what constitutes proper paging but my client
>>> has a pretty small amount of available memory so I'm doing pages of size 5k
>>> using the C++ Datastax driver.
>>>
>>> Thanks for the replies!
>>>
>>> -Dave
>>>
>>> On Mon, Mar 23, 2015 at 2:00 AM, Jens Rantil <je...@tink.se>
>>> wrote:
>>>
>>>> Also, two control questions:
>>>>
>>>>    - Are you using EBS for data storage? It might introduce additional
>>>>    latencies.
>>>>    - Are you doing proper paging when querying the keyspace?
>>>>
>>>> Cheers,
>>>> Jens
>>>>
>>>> On Mon, Mar 23, 2015 at 5:56 AM, Dave Galbraith <
>>>> david92galbraith@gmail.com> wrote:
>>>>
>>>>> Hi! So I've got a table like this:
>>>>>
>>>>> CREATE TABLE "default".metrics (row_time int,attrs varchar,offset
>>>>> int,value double, PRIMARY KEY(row_time, attrs, offset)) WITH COMPACT
>>>>> STORAGE AND bloom_filter_fp_chance=0.01 AND caching='KEYS_ONLY' AND
>>>>> comment='' AND dclocal_read_repair_chance=0 AND gc_grace_seconds=864000 AND
>>>>> index_interval=128 AND read_repair_chance=1 AND replicate_on_write='true'
>>>>> AND populate_io_cache_on_flush='false' AND default_time_to_live=0 AND
>>>>> speculative_retry='NONE' AND memtable_flush_period_in_ms=0 AND
>>>>> compaction={'class':'DateTieredCompactionStrategy','timestamp_resolution':'MILLISECONDS'}
>>>>> AND compression={'sstable_compression':'LZ4Compressor'};
>>>>>
>>>>> and I'm running Cassandra on an EC2 m3.2xlarge out in the cloud, with
>>>>> 4 GB of heap space. So it's timeseries data that I'm doing so I increment
>>>>> "row_time" each day, "attrs" is additional identifying information about
>>>>> each series, and "offset" is the number of milliseconds into the day for
>>>>> each data point. So for the past 5 days, I've been inserting 3k
>>>>> points/second distributed across 100k distinct "attrs"es. And now when I
>>>>> try to run queries on this data that look like
>>>>>
>>>>> "SELECT * FROM "default".metrics WHERE row_time = 5 AND attrs =
>>>>> 'potatoes_and_jam'"
>>>>>
>>>>> it takes an absurdly long time and sometimes just times out. I did
>>>>> "nodetool cftsats default" and here's what I get:
>>>>>
>>>>> Keyspace: default
>>>>>     Read Count: 59
>>>>>     Read Latency: 397.12523728813557 ms.
>>>>>     Write Count: 155128
>>>>>     Write Latency: 0.3675690719921613 ms.
>>>>>     Pending Flushes: 0
>>>>>         Table: metrics
>>>>>         SSTable count: 26
>>>>>         Space used (live): 35146349027
>>>>>         Space used (total): 35146349027
>>>>>         Space used by snapshots (total): 0
>>>>>         SSTable Compression Ratio: 0.10386468749216264
>>>>>         Memtable cell count: 141800
>>>>>         Memtable data size: 31071290
>>>>>         Memtable switch count: 41
>>>>>         Local read count: 59
>>>>>         Local read latency: 397.126 ms
>>>>>         Local write count: 155128
>>>>>         Local write latency: 0.368 ms
>>>>>         Pending flushes: 0
>>>>>         Bloom filter false positives: 0
>>>>>         Bloom filter false ratio: 0.00000
>>>>>         Bloom filter space used: 2856
>>>>>         Compacted partition minimum bytes: 104
>>>>>         Compacted partition maximum bytes: 36904729268
>>>>>         Compacted partition mean bytes: 986530969
>>>>>         Average live cells per slice (last five minutes):
>>>>> 501.66101694915255
>>>>>         Maximum live cells per slice (last five minutes): 502.0
>>>>>         Average tombstones per slice (last five minutes): 0.0
>>>>>         Maximum tombstones per slice (last five minutes): 0.0
>>>>>
>>>>> Ouch! 400ms of read latency, orders of magnitude higher than it has
>>>>> any right to be. How could this have happened? Is there something
>>>>> fundamentally broken about my data model? Thanks!
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Jens Rantil
>>>> Backend engineer
>>>> Tink AB
>>>>
>>>> Email: jens.rantil@tink.se
>>>> Phone: +46 708 84 18 32
>>>> Web: www.tink.se
>>>>
>>>> Facebook <https://www.facebook.com/#!/tink.se> Linkedin
>>>> <http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_photo&trkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary>
>>>>  Twitter <https://twitter.com/tink>
>>>>
>>>
>>>
>>
>

Re: Really high read latency

Posted by Dave Galbraith <da...@gmail.com>.

I haven't deleted anything. Here's output from a traced cqlsh query (I
tried to make the spaces line up, hope it's legible):

Execute CQL3
query
| 2015-03-23 21:04:37.422000 | 172.31.32.211 |              0
Parsing select * from default.metrics where row_time = 16511 and attrs =
'[redacted]' limit 100; [SharedPool-Worker-2] | 2015-03-23 21:04:37.423000
| 172.31.32.211 |             93
Preparing statement
[SharedPool-Worker-2]
| 2015-03-23 21:04:37.423000 | 172.31.32.211 |            696
Executing single-partition query on metrics [SharedPool-Worker-1]

                                                  | 2015-03-23
21:04:37.425000 | 172.31.32.211 |           2807
Acquiring sstable references [SharedPool-Worker-1]

                                        | 2015-03-23 21:04:37.425000 |
172.31.32.211 |           2993
Merging memtable tombstones [SharedPool-Worker-1]

                                    | 2015-03-23 21:04:37.426000 |
172.31.32.211 |           3049
Partition index with 484338 entries found for sstable 15966
[SharedPool-Worker-1]
                        | 2015-03-23 21:04:38.625000 | 172.31.32.211
|         202304
Seeking to partition indexed section in data file
[SharedPool-Worker-1]
            | 2015-03-23 21:04:38.625000 | 172.31.32.211 |         202354
Bloom filter allows skipping sstable 5613 [SharedPool-Worker-1]

                     | 2015-03-23 21:04:38.625000 | 172.31.32.211 |
202445
Bloom filter allows skipping sstable 5582 [SharedPool-Worker-1]

                     | 2015-03-23 21:04:38.625000 | 172.31.32.211 |
202478
Bloom filter allows skipping sstable 5611 [SharedPool-Worker-1]

                     | 2015-03-23 21:04:38.625000 | 172.31.32.211 |
202508
Bloom filter allows skipping sstable 5610
[SharedPool-Worker-1]
| 2015-03-23 21:04:38.625000 | 172.31.32.211 |         202539
Bloom filter allows skipping sstable 5549
[SharedPool-Worker-1]
| 2015-03-23 21:04:38.625001 | 172.31.32.211 |         202678
Bloom filter allows skipping sstable 5544 [SharedPool-Worker-1]

                     | 2015-03-23 21:04:38.625001 | 172.31.32.211 |
202720
Bloom filter allows skipping sstable 5237
[SharedPool-Worker-1]
| 2015-03-23 21:04:38.625001 | 172.31.32.211 |         202752
Bloom filter allows skipping sstable 2516
[SharedPool-Worker-1]
| 2015-03-23 21:04:38.625001 | 172.31.32.211 |         202782
Bloom filter allows skipping sstable 2632 [SharedPool-Worker-1]

                    | 2015-03-23 21:04:38.625001 | 172.31.32.211 |
202812
Bloom filter allows skipping sstable 3015 [SharedPool-Worker-1]

                    | 2015-03-23 21:04:38.625001 | 172.31.32.211 |
202852
Skipped 0/11 non-slice-intersecting sstables, included 0 due to tombstones
[SharedPool-Worker-1]                               | 2015-03-23
21:04:38.625001 | 172.31.32.211 |         202882
Merging data from memtables and 1 sstables [SharedPool-Worker-1]

| 2015-03-23 21:04:38.625001 | 172.31.32.211 |         202902
Read 101 live and 0 tombstoned cells
[SharedPool-Worker-1]
| 2015-03-23 21:04:38.626000 | 172.31.32.211 |         203752
Request complete

                                                             | 2015-03-23
21:04:38.628253 | 172.31.32.211 |         206253

On Mon, Mar 23, 2015 at 11:53 AM, Eric Stevens <mi...@gmail.com> wrote:

> Enable tracing in cqlsh and see how many sstables are being lifted to
> satisfy the query (are you repeatedly writing to the same partition
> [row_time]) over time?).
>
> Also watch for whether you're hitting a lot of tombstones (are you
> deleting lots of values in the same partition over time?).
>
> On Mon, Mar 23, 2015 at 4:01 AM, Dave Galbraith <
> david92galbraith@gmail.com> wrote:
>
>> Duncan: I'm thinking it might be something like that. I'm also seeing
>> just a ton of garbage collection on the box, could it be pulling rows for
>> all 100k attrs for a given row_time into memory since only row_time is the
>> partition key?
>>
>> Jens: I'm not using EBS (although I used to until I read up on how
>> useless it is). I'm not sure what constitutes proper paging but my client
>> has a pretty small amount of available memory so I'm doing pages of size 5k
>> using the C++ Datastax driver.
>>
>> Thanks for the replies!
>>
>> -Dave
>>
>> On Mon, Mar 23, 2015 at 2:00 AM, Jens Rantil <je...@tink.se> wrote:
>>
>>> Also, two control questions:
>>>
>>>    - Are you using EBS for data storage? It might introduce additional
>>>    latencies.
>>>    - Are you doing proper paging when querying the keyspace?
>>>
>>> Cheers,
>>> Jens
>>>
>>> On Mon, Mar 23, 2015 at 5:56 AM, Dave Galbraith <
>>> david92galbraith@gmail.com> wrote:
>>>
>>>> Hi! So I've got a table like this:
>>>>
>>>> CREATE TABLE "default".metrics (row_time int,attrs varchar,offset
>>>> int,value double, PRIMARY KEY(row_time, attrs, offset)) WITH COMPACT
>>>> STORAGE AND bloom_filter_fp_chance=0.01 AND caching='KEYS_ONLY' AND
>>>> comment='' AND dclocal_read_repair_chance=0 AND gc_grace_seconds=864000 AND
>>>> index_interval=128 AND read_repair_chance=1 AND replicate_on_write='true'
>>>> AND populate_io_cache_on_flush='false' AND default_time_to_live=0 AND
>>>> speculative_retry='NONE' AND memtable_flush_period_in_ms=0 AND
>>>> compaction={'class':'DateTieredCompactionStrategy','timestamp_resolution':'MILLISECONDS'}
>>>> AND compression={'sstable_compression':'LZ4Compressor'};
>>>>
>>>> and I'm running Cassandra on an EC2 m3.2xlarge out in the cloud, with 4
>>>> GB of heap space. So it's timeseries data that I'm doing so I increment
>>>> "row_time" each day, "attrs" is additional identifying information about
>>>> each series, and "offset" is the number of milliseconds into the day for
>>>> each data point. So for the past 5 days, I've been inserting 3k
>>>> points/second distributed across 100k distinct "attrs"es. And now when I
>>>> try to run queries on this data that look like
>>>>
>>>> "SELECT * FROM "default".metrics WHERE row_time = 5 AND attrs =
>>>> 'potatoes_and_jam'"
>>>>
>>>> it takes an absurdly long time and sometimes just times out. I did
>>>> "nodetool cftsats default" and here's what I get:
>>>>
>>>> Keyspace: default
>>>>     Read Count: 59
>>>>     Read Latency: 397.12523728813557 ms.
>>>>     Write Count: 155128
>>>>     Write Latency: 0.3675690719921613 ms.
>>>>     Pending Flushes: 0
>>>>         Table: metrics
>>>>         SSTable count: 26
>>>>         Space used (live): 35146349027
>>>>         Space used (total): 35146349027
>>>>         Space used by snapshots (total): 0
>>>>         SSTable Compression Ratio: 0.10386468749216264
>>>>         Memtable cell count: 141800
>>>>         Memtable data size: 31071290
>>>>         Memtable switch count: 41
>>>>         Local read count: 59
>>>>         Local read latency: 397.126 ms
>>>>         Local write count: 155128
>>>>         Local write latency: 0.368 ms
>>>>         Pending flushes: 0
>>>>         Bloom filter false positives: 0
>>>>         Bloom filter false ratio: 0.00000
>>>>         Bloom filter space used: 2856
>>>>         Compacted partition minimum bytes: 104
>>>>         Compacted partition maximum bytes: 36904729268
>>>>         Compacted partition mean bytes: 986530969
>>>>         Average live cells per slice (last five minutes):
>>>> 501.66101694915255
>>>>         Maximum live cells per slice (last five minutes): 502.0
>>>>         Average tombstones per slice (last five minutes): 0.0
>>>>         Maximum tombstones per slice (last five minutes): 0.0
>>>>
>>>> Ouch! 400ms of read latency, orders of magnitude higher than it has any
>>>> right to be. How could this have happened? Is there something fundamentally
>>>> broken about my data model? Thanks!
>>>>
>>>>
>>>
>>>
>>> --
>>> Jens Rantil
>>> Backend engineer
>>> Tink AB
>>>
>>> Email: jens.rantil@tink.se
>>> Phone: +46 708 84 18 32
>>> Web: www.tink.se
>>>
>>> Facebook <https://www.facebook.com/#!/tink.se> Linkedin
>>> <http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_photo&trkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary>
>>>  Twitter <https://twitter.com/tink>
>>>
>>
>>
>

Re: Really high read latency

Posted by Eric Stevens <mi...@gmail.com>.

Enable tracing in cqlsh and see how many sstables are being lifted to
satisfy the query (are you repeatedly writing to the same partition
[row_time]) over time?).

Also watch for whether you're hitting a lot of tombstones (are you deleting
lots of values in the same partition over time?).

On Mon, Mar 23, 2015 at 4:01 AM, Dave Galbraith <da...@gmail.com>
wrote:

> Duncan: I'm thinking it might be something like that. I'm also seeing just
> a ton of garbage collection on the box, could it be pulling rows for all
> 100k attrs for a given row_time into memory since only row_time is the
> partition key?
>
> Jens: I'm not using EBS (although I used to until I read up on how useless
> it is). I'm not sure what constitutes proper paging but my client has a
> pretty small amount of available memory so I'm doing pages of size 5k using
> the C++ Datastax driver.
>
> Thanks for the replies!
>
> -Dave
>
> On Mon, Mar 23, 2015 at 2:00 AM, Jens Rantil <je...@tink.se> wrote:
>
>> Also, two control questions:
>>
>>    - Are you using EBS for data storage? It might introduce additional
>>    latencies.
>>    - Are you doing proper paging when querying the keyspace?
>>
>> Cheers,
>> Jens
>>
>> On Mon, Mar 23, 2015 at 5:56 AM, Dave Galbraith <
>> david92galbraith@gmail.com> wrote:
>>
>>> Hi! So I've got a table like this:
>>>
>>> CREATE TABLE "default".metrics (row_time int,attrs varchar,offset
>>> int,value double, PRIMARY KEY(row_time, attrs, offset)) WITH COMPACT
>>> STORAGE AND bloom_filter_fp_chance=0.01 AND caching='KEYS_ONLY' AND
>>> comment='' AND dclocal_read_repair_chance=0 AND gc_grace_seconds=864000 AND
>>> index_interval=128 AND read_repair_chance=1 AND replicate_on_write='true'
>>> AND populate_io_cache_on_flush='false' AND default_time_to_live=0 AND
>>> speculative_retry='NONE' AND memtable_flush_period_in_ms=0 AND
>>> compaction={'class':'DateTieredCompactionStrategy','timestamp_resolution':'MILLISECONDS'}
>>> AND compression={'sstable_compression':'LZ4Compressor'};
>>>
>>> and I'm running Cassandra on an EC2 m3.2xlarge out in the cloud, with 4
>>> GB of heap space. So it's timeseries data that I'm doing so I increment
>>> "row_time" each day, "attrs" is additional identifying information about
>>> each series, and "offset" is the number of milliseconds into the day for
>>> each data point. So for the past 5 days, I've been inserting 3k
>>> points/second distributed across 100k distinct "attrs"es. And now when I
>>> try to run queries on this data that look like
>>>
>>> "SELECT * FROM "default".metrics WHERE row_time = 5 AND attrs =
>>> 'potatoes_and_jam'"
>>>
>>> it takes an absurdly long time and sometimes just times out. I did
>>> "nodetool cftsats default" and here's what I get:
>>>
>>> Keyspace: default
>>>     Read Count: 59
>>>     Read Latency: 397.12523728813557 ms.
>>>     Write Count: 155128
>>>     Write Latency: 0.3675690719921613 ms.
>>>     Pending Flushes: 0
>>>         Table: metrics
>>>         SSTable count: 26
>>>         Space used (live): 35146349027
>>>         Space used (total): 35146349027
>>>         Space used by snapshots (total): 0
>>>         SSTable Compression Ratio: 0.10386468749216264
>>>         Memtable cell count: 141800
>>>         Memtable data size: 31071290
>>>         Memtable switch count: 41
>>>         Local read count: 59
>>>         Local read latency: 397.126 ms
>>>         Local write count: 155128
>>>         Local write latency: 0.368 ms
>>>         Pending flushes: 0
>>>         Bloom filter false positives: 0
>>>         Bloom filter false ratio: 0.00000
>>>         Bloom filter space used: 2856
>>>         Compacted partition minimum bytes: 104
>>>         Compacted partition maximum bytes: 36904729268
>>>         Compacted partition mean bytes: 986530969
>>>         Average live cells per slice (last five minutes):
>>> 501.66101694915255
>>>         Maximum live cells per slice (last five minutes): 502.0
>>>         Average tombstones per slice (last five minutes): 0.0
>>>         Maximum tombstones per slice (last five minutes): 0.0
>>>
>>> Ouch! 400ms of read latency, orders of magnitude higher than it has any
>>> right to be. How could this have happened? Is there something fundamentally
>>> broken about my data model? Thanks!
>>>
>>>
>>
>>
>> --
>> Jens Rantil
>> Backend engineer
>> Tink AB
>>
>> Email: jens.rantil@tink.se
>> Phone: +46 708 84 18 32
>> Web: www.tink.se
>>
>> Facebook <https://www.facebook.com/#!/tink.se> Linkedin
>> <http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_photo&trkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary>
>>  Twitter <https://twitter.com/tink>
>>
>
>

Re: Really high read latency

Posted by Dave Galbraith <da...@gmail.com>.

Duncan: I'm thinking it might be something like that. I'm also seeing just
a ton of garbage collection on the box, could it be pulling rows for all
100k attrs for a given row_time into memory since only row_time is the
partition key?

Jens: I'm not using EBS (although I used to until I read up on how useless
it is). I'm not sure what constitutes proper paging but my client has a
pretty small amount of available memory so I'm doing pages of size 5k using
the C++ Datastax driver.

Thanks for the replies!

-Dave

On Mon, Mar 23, 2015 at 2:00 AM, Jens Rantil <je...@tink.se> wrote:

> Also, two control questions:
>
>    - Are you using EBS for data storage? It might introduce additional
>    latencies.
>    - Are you doing proper paging when querying the keyspace?
>
> Cheers,
> Jens
>
> On Mon, Mar 23, 2015 at 5:56 AM, Dave Galbraith <
> david92galbraith@gmail.com> wrote:
>
>> Hi! So I've got a table like this:
>>
>> CREATE TABLE "default".metrics (row_time int,attrs varchar,offset
>> int,value double, PRIMARY KEY(row_time, attrs, offset)) WITH COMPACT
>> STORAGE AND bloom_filter_fp_chance=0.01 AND caching='KEYS_ONLY' AND
>> comment='' AND dclocal_read_repair_chance=0 AND gc_grace_seconds=864000 AND
>> index_interval=128 AND read_repair_chance=1 AND replicate_on_write='true'
>> AND populate_io_cache_on_flush='false' AND default_time_to_live=0 AND
>> speculative_retry='NONE' AND memtable_flush_period_in_ms=0 AND
>> compaction={'class':'DateTieredCompactionStrategy','timestamp_resolution':'MILLISECONDS'}
>> AND compression={'sstable_compression':'LZ4Compressor'};
>>
>> and I'm running Cassandra on an EC2 m3.2xlarge out in the cloud, with 4
>> GB of heap space. So it's timeseries data that I'm doing so I increment
>> "row_time" each day, "attrs" is additional identifying information about
>> each series, and "offset" is the number of milliseconds into the day for
>> each data point. So for the past 5 days, I've been inserting 3k
>> points/second distributed across 100k distinct "attrs"es. And now when I
>> try to run queries on this data that look like
>>
>> "SELECT * FROM "default".metrics WHERE row_time = 5 AND attrs =
>> 'potatoes_and_jam'"
>>
>> it takes an absurdly long time and sometimes just times out. I did
>> "nodetool cftsats default" and here's what I get:
>>
>> Keyspace: default
>>     Read Count: 59
>>     Read Latency: 397.12523728813557 ms.
>>     Write Count: 155128
>>     Write Latency: 0.3675690719921613 ms.
>>     Pending Flushes: 0
>>         Table: metrics
>>         SSTable count: 26
>>         Space used (live): 35146349027
>>         Space used (total): 35146349027
>>         Space used by snapshots (total): 0
>>         SSTable Compression Ratio: 0.10386468749216264
>>         Memtable cell count: 141800
>>         Memtable data size: 31071290
>>         Memtable switch count: 41
>>         Local read count: 59
>>         Local read latency: 397.126 ms
>>         Local write count: 155128
>>         Local write latency: 0.368 ms
>>         Pending flushes: 0
>>         Bloom filter false positives: 0
>>         Bloom filter false ratio: 0.00000
>>         Bloom filter space used: 2856
>>         Compacted partition minimum bytes: 104
>>         Compacted partition maximum bytes: 36904729268
>>         Compacted partition mean bytes: 986530969
>>         Average live cells per slice (last five minutes):
>> 501.66101694915255
>>         Maximum live cells per slice (last five minutes): 502.0
>>         Average tombstones per slice (last five minutes): 0.0
>>         Maximum tombstones per slice (last five minutes): 0.0
>>
>> Ouch! 400ms of read latency, orders of magnitude higher than it has any
>> right to be. How could this have happened? Is there something fundamentally
>> broken about my data model? Thanks!
>>
>>
>
>
> --
> Jens Rantil
> Backend engineer
> Tink AB
>
> Email: jens.rantil@tink.se
> Phone: +46 708 84 18 32
> Web: www.tink.se
>
> Facebook <https://www.facebook.com/#!/tink.se> Linkedin
> <http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_photo&trkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary>
>  Twitter <https://twitter.com/tink>
>

Re: Really high read latency

Posted by Jens Rantil <je...@tink.se>.

Also, two control questions:

   - Are you using EBS for data storage? It might introduce additional
   latencies.
   - Are you doing proper paging when querying the keyspace?

Cheers,
Jens

On Mon, Mar 23, 2015 at 5:56 AM, Dave Galbraith <da...@gmail.com>
wrote:

> Hi! So I've got a table like this:
>
> CREATE TABLE "default".metrics (row_time int,attrs varchar,offset
> int,value double, PRIMARY KEY(row_time, attrs, offset)) WITH COMPACT
> STORAGE AND bloom_filter_fp_chance=0.01 AND caching='KEYS_ONLY' AND
> comment='' AND dclocal_read_repair_chance=0 AND gc_grace_seconds=864000 AND
> index_interval=128 AND read_repair_chance=1 AND replicate_on_write='true'
> AND populate_io_cache_on_flush='false' AND default_time_to_live=0 AND
> speculative_retry='NONE' AND memtable_flush_period_in_ms=0 AND
> compaction={'class':'DateTieredCompactionStrategy','timestamp_resolution':'MILLISECONDS'}
> AND compression={'sstable_compression':'LZ4Compressor'};
>
> and I'm running Cassandra on an EC2 m3.2xlarge out in the cloud, with 4 GB
> of heap space. So it's timeseries data that I'm doing so I increment
> "row_time" each day, "attrs" is additional identifying information about
> each series, and "offset" is the number of milliseconds into the day for
> each data point. So for the past 5 days, I've been inserting 3k
> points/second distributed across 100k distinct "attrs"es. And now when I
> try to run queries on this data that look like
>
> "SELECT * FROM "default".metrics WHERE row_time = 5 AND attrs =
> 'potatoes_and_jam'"
>
> it takes an absurdly long time and sometimes just times out. I did
> "nodetool cftsats default" and here's what I get:
>
> Keyspace: default
>     Read Count: 59
>     Read Latency: 397.12523728813557 ms.
>     Write Count: 155128
>     Write Latency: 0.3675690719921613 ms.
>     Pending Flushes: 0
>         Table: metrics
>         SSTable count: 26
>         Space used (live): 35146349027
>         Space used (total): 35146349027
>         Space used by snapshots (total): 0
>         SSTable Compression Ratio: 0.10386468749216264
>         Memtable cell count: 141800
>         Memtable data size: 31071290
>         Memtable switch count: 41
>         Local read count: 59
>         Local read latency: 397.126 ms
>         Local write count: 155128
>         Local write latency: 0.368 ms
>         Pending flushes: 0
>         Bloom filter false positives: 0
>         Bloom filter false ratio: 0.00000
>         Bloom filter space used: 2856
>         Compacted partition minimum bytes: 104
>         Compacted partition maximum bytes: 36904729268
>         Compacted partition mean bytes: 986530969
>         Average live cells per slice (last five minutes):
> 501.66101694915255
>         Maximum live cells per slice (last five minutes): 502.0
>         Average tombstones per slice (last five minutes): 0.0
>         Maximum tombstones per slice (last five minutes): 0.0
>
> Ouch! 400ms of read latency, orders of magnitude higher than it has any
> right to be. How could this have happened? Is there something fundamentally
> broken about my data model? Thanks!
>
>


-- 
Jens Rantil
Backend engineer
Tink AB

Email: jens.rantil@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook <https://www.facebook.com/#!/tink.se> Linkedin
<http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_photo&trkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary>
 Twitter <https://twitter.com/tink>