You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Jonathan Ellis (JIRA)" <ji...@apache.org> on 2009/07/09 16:26:14 UTC

[jira] Created: (CASSANDRA-286) slice offset breaks read repair

slice offset breaks read repair
-------------------------------

                 Key: CASSANDRA-286
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-286
             Project: Cassandra
          Issue Type: Bug
            Reporter: Jonathan Ellis


[code]
        int liveColumns = 0;
        int limit = offset + count;

        for (IColumn column : reducedColumns)
        {
            if (liveColumns >= limit)
                break;
            if (!finish.isEmpty()
                && ((isAscending && column.name().compareTo(finish) > 0))
                    || (!isAscending && column.name().compareTo(finish) < 0))
                break;
            if (!column.isMarkedForDelete())
                liveColumns++;

            if (liveColumns > offset)
                returnCF.addColumn(column);
        }
[code]

The problem is that for offset to return the correct "live" columns, it has to ignore tombstones it scans before the first live one post-offset.

This means that instead of being corrected within a few ms of a read, a node can continue returning deleted data indefinitely (until the next anti-entropy pass).

Coupled with offset's inherent inefficiency (see CASSANDRA-261) I think this means we should take it out and leave offset to be computed client-side (which, for datasets under which it was reasonable server-side, will still be reasonable).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-286) slice offset breaks read repair

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12731715#action_12731715 ] 

Jonathan Ellis commented on CASSANDRA-286:
------------------------------------------

patch 2 fixes the bug with tombstone handling in get_key_range that Evan noticed in CASSANDRA-139

> slice offset breaks read repair
> -------------------------------
>
>                 Key: CASSANDRA-286
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-286
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>         Attachments: 0001-CASSANDRA-286-r-m-offset-from-slice-api-we-could-live.txt, 0002-fix-not-including-tombstone-only-keys-in-keyRange.txt
>
>
> [code]
>         int liveColumns = 0;
>         int limit = offset + count;
>         for (IColumn column : reducedColumns)
>         {
>             if (liveColumns >= limit)
>                 break;
>             if (!finish.isEmpty()
>                 && ((isAscending && column.name().compareTo(finish) > 0))
>                     || (!isAscending && column.name().compareTo(finish) < 0))
>                 break;
>             if (!column.isMarkedForDelete())
>                 liveColumns++;
>             if (liveColumns > offset)
>                 returnCF.addColumn(column);
>         }
> [code]
> The problem is that for offset to return the correct "live" columns, it has to ignore tombstones it scans before the first live one post-offset.
> This means that instead of being corrected within a few ms of a read, a node can continue returning deleted data indefinitely (until the next anti-entropy pass).
> Coupled with offset's inherent inefficiency (see CASSANDRA-261) I think this means we should take it out and leave offset to be computed client-side (which, for datasets under which it was reasonable server-side, will still be reasonable).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-286) slice offset breaks read repair

Posted by "Evan Weaver (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12731731#action_12731731 ] 

Evan Weaver commented on CASSANDRA-286:
---------------------------------------

Fixes my bugs. Code looks fine; ship it!

> slice offset breaks read repair
> -------------------------------
>
>                 Key: CASSANDRA-286
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-286
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>         Attachments: 0001-CASSANDRA-286-r-m-offset-from-slice-api-we-could-live.txt, 0002-fix-not-including-tombstone-only-keys-in-keyRange.txt
>
>
> [code]
>         int liveColumns = 0;
>         int limit = offset + count;
>         for (IColumn column : reducedColumns)
>         {
>             if (liveColumns >= limit)
>                 break;
>             if (!finish.isEmpty()
>                 && ((isAscending && column.name().compareTo(finish) > 0))
>                     || (!isAscending && column.name().compareTo(finish) < 0))
>                 break;
>             if (!column.isMarkedForDelete())
>                 liveColumns++;
>             if (liveColumns > offset)
>                 returnCF.addColumn(column);
>         }
> [code]
> The problem is that for offset to return the correct "live" columns, it has to ignore tombstones it scans before the first live one post-offset.
> This means that instead of being corrected within a few ms of a read, a node can continue returning deleted data indefinitely (until the next anti-entropy pass).
> Coupled with offset's inherent inefficiency (see CASSANDRA-261) I think this means we should take it out and leave offset to be computed client-side (which, for datasets under which it was reasonable server-side, will still be reasonable).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-286) slice offset breaks read repair

Posted by "Evan Weaver (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12731728#action_12731728 ] 

Evan Weaver commented on CASSANDRA-286:
---------------------------------------

Twitter collective is in favor of killing offset.

> slice offset breaks read repair
> -------------------------------
>
>                 Key: CASSANDRA-286
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-286
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>         Attachments: 0001-CASSANDRA-286-r-m-offset-from-slice-api-we-could-live.txt, 0002-fix-not-including-tombstone-only-keys-in-keyRange.txt
>
>
> [code]
>         int liveColumns = 0;
>         int limit = offset + count;
>         for (IColumn column : reducedColumns)
>         {
>             if (liveColumns >= limit)
>                 break;
>             if (!finish.isEmpty()
>                 && ((isAscending && column.name().compareTo(finish) > 0))
>                     || (!isAscending && column.name().compareTo(finish) < 0))
>                 break;
>             if (!column.isMarkedForDelete())
>                 liveColumns++;
>             if (liveColumns > offset)
>                 returnCF.addColumn(column);
>         }
> [code]
> The problem is that for offset to return the correct "live" columns, it has to ignore tombstones it scans before the first live one post-offset.
> This means that instead of being corrected within a few ms of a read, a node can continue returning deleted data indefinitely (until the next anti-entropy pass).
> Coupled with offset's inherent inefficiency (see CASSANDRA-261) I think this means we should take it out and leave offset to be computed client-side (which, for datasets under which it was reasonable server-side, will still be reasonable).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-286) slice offset breaks read repair

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12731932#action_12731932 ] 

Hudson commented on CASSANDRA-286:
----------------------------------

Integrated in Cassandra #139 (See [http://hudson.zones.apache.org/hudson/job/Cassandra/139/])
    fix not including tombstone-only keys in keyRange.
patch by jbellis; reviewed by Evan Weaver for 
r/m offset from slice api; we could live with being inefficient but not with breaking read repair.
patch by jbellis; reviewed by Evan Weaver for 


> slice offset breaks read repair
> -------------------------------
>
>                 Key: CASSANDRA-286
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-286
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>         Attachments: 0001-CASSANDRA-286-r-m-offset-from-slice-api-we-could-live.txt, 0002-fix-not-including-tombstone-only-keys-in-keyRange.txt
>
>
> [code]
>         int liveColumns = 0;
>         int limit = offset + count;
>         for (IColumn column : reducedColumns)
>         {
>             if (liveColumns >= limit)
>                 break;
>             if (!finish.isEmpty()
>                 && ((isAscending && column.name().compareTo(finish) > 0))
>                     || (!isAscending && column.name().compareTo(finish) < 0))
>                 break;
>             if (!column.isMarkedForDelete())
>                 liveColumns++;
>             if (liveColumns > offset)
>                 returnCF.addColumn(column);
>         }
> [code]
> The problem is that for offset to return the correct "live" columns, it has to ignore tombstones it scans before the first live one post-offset.
> This means that instead of being corrected within a few ms of a read, a node can continue returning deleted data indefinitely (until the next anti-entropy pass).
> Coupled with offset's inherent inefficiency (see CASSANDRA-261) I think this means we should take it out and leave offset to be computed client-side (which, for datasets under which it was reasonable server-side, will still be reasonable).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-286) slice offset breaks read repair

Posted by "Sandeep Tata (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12729336#action_12729336 ] 

Sandeep Tata commented on CASSANDRA-286:
----------------------------------------

I agree ... offset makes it hard to understand the cost of a get_slice. While it is very convenient for pagination, dropping it from the API is probably the right choice.


> slice offset breaks read repair
> -------------------------------
>
>                 Key: CASSANDRA-286
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-286
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Jonathan Ellis
>
> [code]
>         int liveColumns = 0;
>         int limit = offset + count;
>         for (IColumn column : reducedColumns)
>         {
>             if (liveColumns >= limit)
>                 break;
>             if (!finish.isEmpty()
>                 && ((isAscending && column.name().compareTo(finish) > 0))
>                     || (!isAscending && column.name().compareTo(finish) < 0))
>                 break;
>             if (!column.isMarkedForDelete())
>                 liveColumns++;
>             if (liveColumns > offset)
>                 returnCF.addColumn(column);
>         }
> [code]
> The problem is that for offset to return the correct "live" columns, it has to ignore tombstones it scans before the first live one post-offset.
> This means that instead of being corrected within a few ms of a read, a node can continue returning deleted data indefinitely (until the next anti-entropy pass).
> Coupled with offset's inherent inefficiency (see CASSANDRA-261) I think this means we should take it out and leave offset to be computed client-side (which, for datasets under which it was reasonable server-side, will still be reasonable).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-286) slice offset breaks read repair

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12729306#action_12729306 ] 

Jonathan Ellis commented on CASSANDRA-286:
------------------------------------------

Yes, that's the brute force fix, but it means that in the case of mass deletes in a given CF we could very possibly OOM collecting all the tombstones for a large offset.

Again, my rule of thumb is: features that allow the user to do something that slow things down are ok; features that allow the user to crash the server, are not. :)

> slice offset breaks read repair
> -------------------------------
>
>                 Key: CASSANDRA-286
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-286
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Jonathan Ellis
>
> [code]
>         int liveColumns = 0;
>         int limit = offset + count;
>         for (IColumn column : reducedColumns)
>         {
>             if (liveColumns >= limit)
>                 break;
>             if (!finish.isEmpty()
>                 && ((isAscending && column.name().compareTo(finish) > 0))
>                     || (!isAscending && column.name().compareTo(finish) < 0))
>                 break;
>             if (!column.isMarkedForDelete())
>                 liveColumns++;
>             if (liveColumns > offset)
>                 returnCF.addColumn(column);
>         }
> [code]
> The problem is that for offset to return the correct "live" columns, it has to ignore tombstones it scans before the first live one post-offset.
> This means that instead of being corrected within a few ms of a read, a node can continue returning deleted data indefinitely (until the next anti-entropy pass).
> Coupled with offset's inherent inefficiency (see CASSANDRA-261) I think this means we should take it out and leave offset to be computed client-side (which, for datasets under which it was reasonable server-side, will still be reasonable).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (CASSANDRA-286) slice offset breaks read repair

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis resolved CASSANDRA-286.
--------------------------------------

    Resolution: Fixed

committed

> slice offset breaks read repair
> -------------------------------
>
>                 Key: CASSANDRA-286
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-286
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>         Attachments: 0001-CASSANDRA-286-r-m-offset-from-slice-api-we-could-live.txt, 0002-fix-not-including-tombstone-only-keys-in-keyRange.txt
>
>
> [code]
>         int liveColumns = 0;
>         int limit = offset + count;
>         for (IColumn column : reducedColumns)
>         {
>             if (liveColumns >= limit)
>                 break;
>             if (!finish.isEmpty()
>                 && ((isAscending && column.name().compareTo(finish) > 0))
>                     || (!isAscending && column.name().compareTo(finish) < 0))
>                 break;
>             if (!column.isMarkedForDelete())
>                 liveColumns++;
>             if (liveColumns > offset)
>                 returnCF.addColumn(column);
>         }
> [code]
> The problem is that for offset to return the correct "live" columns, it has to ignore tombstones it scans before the first live one post-offset.
> This means that instead of being corrected within a few ms of a read, a node can continue returning deleted data indefinitely (until the next anti-entropy pass).
> Coupled with offset's inherent inefficiency (see CASSANDRA-261) I think this means we should take it out and leave offset to be computed client-side (which, for datasets under which it was reasonable server-side, will still be reasonable).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (CASSANDRA-286) slice offset breaks read repair

Posted by "Evan Weaver (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12731731#action_12731731 ] 

Evan Weaver edited comment on CASSANDRA-286 at 7/15/09 4:03 PM:
----------------------------------------------------------------

I meant idea. But now I also mean patch.

Fixes my bugs. Code looks fine; ship it!

      was (Author: eweaver):
    Fixes my bugs. Code looks fine; ship it!
  
> slice offset breaks read repair
> -------------------------------
>
>                 Key: CASSANDRA-286
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-286
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>         Attachments: 0001-CASSANDRA-286-r-m-offset-from-slice-api-we-could-live.txt, 0002-fix-not-including-tombstone-only-keys-in-keyRange.txt
>
>
> [code]
>         int liveColumns = 0;
>         int limit = offset + count;
>         for (IColumn column : reducedColumns)
>         {
>             if (liveColumns >= limit)
>                 break;
>             if (!finish.isEmpty()
>                 && ((isAscending && column.name().compareTo(finish) > 0))
>                     || (!isAscending && column.name().compareTo(finish) < 0))
>                 break;
>             if (!column.isMarkedForDelete())
>                 liveColumns++;
>             if (liveColumns > offset)
>                 returnCF.addColumn(column);
>         }
> [code]
> The problem is that for offset to return the correct "live" columns, it has to ignore tombstones it scans before the first live one post-offset.
> This means that instead of being corrected within a few ms of a read, a node can continue returning deleted data indefinitely (until the next anti-entropy pass).
> Coupled with offset's inherent inefficiency (see CASSANDRA-261) I think this means we should take it out and leave offset to be computed client-side (which, for datasets under which it was reasonable server-side, will still be reasonable).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-286) slice offset breaks read repair

Posted by "Jun Rao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12729295#action_12729295 ] 

Jun Rao commented on CASSANDRA-286:
-----------------------------------

This is a problem with any APIs relying on offset, instead of value. All columns before the offset affect the outcome. So, if there is any incorrect column (whether it's missing deletes or missing inserts) before the offset doesn't get fixed immediately, the outcome will be incorrect.

One potential fix is to include all columns before offset in the repair logic, but not in thrift return. This won't affect performance much since we already have to scan those columns. This may complicates the overall logic a bit though.
 

> slice offset breaks read repair
> -------------------------------
>
>                 Key: CASSANDRA-286
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-286
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Jonathan Ellis
>
> [code]
>         int liveColumns = 0;
>         int limit = offset + count;
>         for (IColumn column : reducedColumns)
>         {
>             if (liveColumns >= limit)
>                 break;
>             if (!finish.isEmpty()
>                 && ((isAscending && column.name().compareTo(finish) > 0))
>                     || (!isAscending && column.name().compareTo(finish) < 0))
>                 break;
>             if (!column.isMarkedForDelete())
>                 liveColumns++;
>             if (liveColumns > offset)
>                 returnCF.addColumn(column);
>         }
> [code]
> The problem is that for offset to return the correct "live" columns, it has to ignore tombstones it scans before the first live one post-offset.
> This means that instead of being corrected within a few ms of a read, a node can continue returning deleted data indefinitely (until the next anti-entropy pass).
> Coupled with offset's inherent inefficiency (see CASSANDRA-261) I think this means we should take it out and leave offset to be computed client-side (which, for datasets under which it was reasonable server-side, will still be reasonable).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (CASSANDRA-286) slice offset breaks read repair

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis reassigned CASSANDRA-286:
----------------------------------------

    Assignee: Jonathan Ellis

> slice offset breaks read repair
> -------------------------------
>
>                 Key: CASSANDRA-286
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-286
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>
> [code]
>         int liveColumns = 0;
>         int limit = offset + count;
>         for (IColumn column : reducedColumns)
>         {
>             if (liveColumns >= limit)
>                 break;
>             if (!finish.isEmpty()
>                 && ((isAscending && column.name().compareTo(finish) > 0))
>                     || (!isAscending && column.name().compareTo(finish) < 0))
>                 break;
>             if (!column.isMarkedForDelete())
>                 liveColumns++;
>             if (liveColumns > offset)
>                 returnCF.addColumn(column);
>         }
> [code]
> The problem is that for offset to return the correct "live" columns, it has to ignore tombstones it scans before the first live one post-offset.
> This means that instead of being corrected within a few ms of a read, a node can continue returning deleted data indefinitely (until the next anti-entropy pass).
> Coupled with offset's inherent inefficiency (see CASSANDRA-261) I think this means we should take it out and leave offset to be computed client-side (which, for datasets under which it was reasonable server-side, will still be reasonable).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-286) slice offset breaks read repair

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-286:
-------------------------------------

    Attachment: 0002-fix-not-including-tombstone-only-keys-in-keyRange.txt
                0001-CASSANDRA-286-r-m-offset-from-slice-api-we-could-live.txt

> slice offset breaks read repair
> -------------------------------
>
>                 Key: CASSANDRA-286
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-286
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>         Attachments: 0001-CASSANDRA-286-r-m-offset-from-slice-api-we-could-live.txt, 0002-fix-not-including-tombstone-only-keys-in-keyRange.txt
>
>
> [code]
>         int liveColumns = 0;
>         int limit = offset + count;
>         for (IColumn column : reducedColumns)
>         {
>             if (liveColumns >= limit)
>                 break;
>             if (!finish.isEmpty()
>                 && ((isAscending && column.name().compareTo(finish) > 0))
>                     || (!isAscending && column.name().compareTo(finish) < 0))
>                 break;
>             if (!column.isMarkedForDelete())
>                 liveColumns++;
>             if (liveColumns > offset)
>                 returnCF.addColumn(column);
>         }
> [code]
> The problem is that for offset to return the correct "live" columns, it has to ignore tombstones it scans before the first live one post-offset.
> This means that instead of being corrected within a few ms of a read, a node can continue returning deleted data indefinitely (until the next anti-entropy pass).
> Coupled with offset's inherent inefficiency (see CASSANDRA-261) I think this means we should take it out and leave offset to be computed client-side (which, for datasets under which it was reasonable server-side, will still be reasonable).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-286) slice offset breaks read repair

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12731729#action_12731729 ] 

Jonathan Ellis commented on CASSANDRA-286:
------------------------------------------

does that mean you're +1 on the patch, or the idea? :)

> slice offset breaks read repair
> -------------------------------
>
>                 Key: CASSANDRA-286
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-286
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>         Attachments: 0001-CASSANDRA-286-r-m-offset-from-slice-api-we-could-live.txt, 0002-fix-not-including-tombstone-only-keys-in-keyRange.txt
>
>
> [code]
>         int liveColumns = 0;
>         int limit = offset + count;
>         for (IColumn column : reducedColumns)
>         {
>             if (liveColumns >= limit)
>                 break;
>             if (!finish.isEmpty()
>                 && ((isAscending && column.name().compareTo(finish) > 0))
>                     || (!isAscending && column.name().compareTo(finish) < 0))
>                 break;
>             if (!column.isMarkedForDelete())
>                 liveColumns++;
>             if (liveColumns > offset)
>                 returnCF.addColumn(column);
>         }
> [code]
> The problem is that for offset to return the correct "live" columns, it has to ignore tombstones it scans before the first live one post-offset.
> This means that instead of being corrected within a few ms of a read, a node can continue returning deleted data indefinitely (until the next anti-entropy pass).
> Coupled with offset's inherent inefficiency (see CASSANDRA-261) I think this means we should take it out and leave offset to be computed client-side (which, for datasets under which it was reasonable server-side, will still be reasonable).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.