You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@cassandra.apache.org by "Dominic Letz (JIRA)" <ji...@apache.org> on 2014/12/30 03:04:13 UTC

[jira] [Created] (CASSANDRA-8547) Make RangeTombstone.Tracker.isDeleted() faster

Dominic Letz created CASSANDRA-8547:
---------------------------------------

             Summary: Make RangeTombstone.Tracker.isDeleted() faster
                 Key: CASSANDRA-8547
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8547
             Project: Cassandra
          Issue Type: Improvement
          Components: Core
         Environment: 2.0.11
            Reporter: Dominic Letz
         Attachments: rangetombstone.tracker.txt

During compaction and repairs with many tombstones an exorbitant amount of time is spend in RangeTombstone.Tracker.isDeleted().
The amount of time spend there can be so big that compactions and repairs look "stalled" and the time remaining time estimated frozen at the same value for days.

Using visualvm I've been sample profiling the code during execution and both in Compaction as well as during repairs found this. (point in time backtraces attached)

Looking at the code the problem is obviously the linear scanning:
{code}
        public boolean isDeleted(Column column)
        {
            for (RangeTombstone tombstone : ranges)
            {
                if (comparator.compare(column.name(), tombstone.min) >= 0
                    && comparator.compare(column.name(), tombstone.max) <= 0
                    && tombstone.maxTimestamp() >= column.timestamp())
                {
                    return true;
                }
            }
            return false;
        }
{code}

I would like to propose to change this and instead use a sorted list (e.g. RangeTombstoneList) here instead.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)