You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Tyler Hobbs (JIRA)" <ji...@apache.org> on 2015/08/06 22:38:06 UTC

[jira] [Commented] (CASSANDRA-10010) Paging on DISTINCT queries repeats result when first row in partition changes

    [ https://issues.apache.org/jira/browse/CASSANDRA-10010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14660758#comment-14660758 ] 

Tyler Hobbs commented on CASSANDRA-10010:
-----------------------------------------

I've created a dtest that reproduces the issue through a deletion here: https://github.com/thobbs/cassandra-dtest/tree/CASSANDRA-10010-repro.  This seems to only be a problem for 2.1 and 2.2, but not 3.0.

> Paging on DISTINCT queries repeats result when first row in partition changes
> -----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-10010
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10010
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Tyler Hobbs
>            Assignee: Benjamin Lerer
>            Priority: Minor
>             Fix For: 2.1.x, 2.2.x
>
>
> When paging, we always check new pages to see if they start with the same row that the previous page ended with, and if so, we trim that row to avoid duplicates.  With {{DISTINCT}} queries, we only fetch the first row in each partition.  If that row happens to change (it's deleted, or another row is inserted at the front of the partition) in between fetching the two pages, our check for a matching row will fail, resulting in a duplicate row being returned.
> It seems like the correct fix is to handle {{DISTINCT}} queries specially and only check to see if the partition key matches the last returned one instead checking that the rows match.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)