You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Sylvain Lebresne (JIRA)" <ji...@apache.org> on 2015/06/01 15:08:18 UTC
[jira] [Commented] (CASSANDRA-8547) Make RangeTombstone.Tracker.isDeleted() faster

    [ https://issues.apache.org/jira/browse/CASSANDRA-8547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14567278#comment-14567278 ] 

Sylvain Lebresne commented on CASSANDRA-8547:
---------------------------------------------

Maybe Tyler meant CASSANDRA-9486?

And I do think the actual problem is the one pointed in CASSANDRA-9486. The idea behind {{RangeTombstone.Tracker}} is that it only tracks tombstones that are actually useful, i.e. those that still cover something. As such, the linear scan of {{isDeleted}} shouldn't be a problem, it shouldn't scan anything uselessly.  However, and that's what CASSANDRA-9486, the tracker is not always use properly, and there is cases where it's {{update}} method is not called, resulting in the non-expected higher cost in {{isDeleted}}. In practice, I'm sure the attached patch does improve things, but that's not really the right fix. And as the right fix is being discussed on CASSANDRA-9486 already, I'm going to mark this as a duplicate.

> Make RangeTombstone.Tracker.isDeleted() faster
> ----------------------------------------------
>
>                 Key: CASSANDRA-8547
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8547
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: 2.0.11
>            Reporter: Dominic Letz
>            Assignee: Dominic Letz
>              Labels: tombstone
>             Fix For: 2.1.x
>
>         Attachments: Selection_044.png, cassandra-2.0.11-8547.txt, cassandra-2.1-8547.txt, rangetombstone.tracker.txt
>
>
> During compaction and repairs with many tombstones an exorbitant amount of time is spend in RangeTombstone.Tracker.isDeleted().
> The amount of time spend there can be so big that compactions and repairs look "stalled" and the time remaining time estimated frozen at the same value for days.
> Using visualvm I've been sample profiling the code during execution and both in Compaction as well as during repairs found this. (point in time backtraces attached)
> Looking at the code the problem is obviously the linear scanning:
> {code}
>         public boolean isDeleted(Column column)
>         {
>             for (RangeTombstone tombstone : ranges)
>             {
>                 if (comparator.compare(column.name(), tombstone.min) >= 0
>                     && comparator.compare(column.name(), tombstone.max) <= 0
>                     && tombstone.maxTimestamp() >= column.timestamp())
>                 {
>                     return true;
>                 }
>             }
>             return false;
>         }
> {code}
> I would like to propose to change this and instead use a sorted list (e.g. RangeTombstoneList) here instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)