You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@cassandra.apache.org by "Dominic Letz (JIRA)" <ji...@apache.org> on 2015/01/05 07:05:34 UTC

[jira] [Comment Edited] (CASSANDRA-8546) RangeTombstoneList becoming bottleneck on tombstone heavy tasks

    [ https://issues.apache.org/jira/browse/CASSANDRA-8546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14264185#comment-14264185 ] 

Dominic Letz edited comment on CASSANDRA-8546 at 1/5/15 6:05 AM:
-----------------------------------------------------------------

Seems I'm the only one having fun with this one. I've done some more profiling and found that when middle-inserts and searching is alternating the GapLists own binarySearch is not performing well as it "normalizes" the GapList on each search. 
In this update I'm using the built-in Collections.binarySearch instead.


was (Author: dominicletz):
Seems I'm the only one having fun with this one. I've done some more profiling and found that when there are is a pendulum between middle-inserts and searching the GapLists own binarySearch is not performing well as it "normalizes" the GapList on each search. 
In this update I'm using the built-in Collections.binarySearch instead.

> RangeTombstoneList becoming bottleneck on tombstone heavy tasks
> ---------------------------------------------------------------
>
>                 Key: CASSANDRA-8546
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8546
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: 2.0.11 / 2.1
>            Reporter: Dominic Letz
>            Assignee: Dominic Letz
>             Fix For: 2.1.3
>
>         Attachments: cassandra-2.0.11-8546.txt, cassandra-2.1-8546.txt, tombstone_test.tgz
>
>
> I would like to propose a change of the data structure used in the RangeTombstoneList to store and insert tombstone ranges to something with at least O(log N) insert in the middle and at near O(1) and start AND end. Here is why:
> When having tombstone heavy work-loads the current implementation of RangeTombstoneList becomes a bottleneck with slice queries.
> Scanning the number of tombstones up to the default maximum (100k) can take up to 3 minutes of how addInternal() scales on insertion of middle and start elements.
> The attached test shows that with 50k deletes from both sides of a range.
> INSERT 1...110000
> flush()
> DELETE 1...50000
> DELETE 110000...60000
> While one direction performs ok (~400ms on my notebook):
> {code}
> SELECT * FROM timeseries WHERE name = 'a' ORDER BY timestamp DESC LIMIT 1
> {code}
> The other direction underperforms (~7seconds on my notebook)
> {code}
> SELECT * FROM timeseries WHERE name = 'a' ORDER BY timestamp ASC LIMIT 1
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)