You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Benjamin Lerer (JIRA)" <ji...@apache.org> on 2016/03/15 19:18:33 UTC

[jira] [Commented] (CASSANDRA-11223) Queries with LIMIT filtering on clustering columns can return less row than expected

    [ https://issues.apache.org/jira/browse/CASSANDRA-11223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15195854#comment-15195854 ] 

Benjamin Lerer commented on CASSANDRA-11223:
--------------------------------------------

My initial idea was to filter out earlier in the read path the partitions containing only static columns, in the case where they should not be returned. Unfortunatly, it was the wrong approach. The filtering cannot be done before we have reconciled the data and removed the tombstoned rows as we do not know until that point if the partitions contains some rows or not. This means that we can end up with less rows that requested as the limit has been applied on the replicas taking the static rows into account.
I now think that this problem should probably be solved at the paging level. In the case where the partitions without rows should not be returned, the static rows should not be counted in {{DataLimits}}.


> Queries with LIMIT filtering on clustering columns can return less row than expected
> ------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-11223
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11223
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Local Write-Read Paths
>            Reporter: Benjamin Lerer
>            Assignee: Benjamin Lerer
>
> A query like {{SELECT * FROM %s WHERE b = 1 LIMIT 2 ALLOW FILTERING}} can return less row than expected if the table has some static columns and some of the partition have no rows matching b = 1.
> The problem can be reproduced with the following unit test:
> {code}
>     public void testFilteringOnClusteringColumnsWithLimitAndStaticColumns() throws Throwable
>     {
>         createTable("CREATE TABLE %s (a int, b int, s int static, c int, primary key (a, b))");
>         for (int i = 0; i < 3; i++)
>         {
>             execute("INSERT INTO %s (a, s) VALUES (?, ?)", i, i);
>                 for (int j = 0; j < 3; j++)
>                     if (!(i == 0 && j == 1))
>                         execute("INSERT INTO %s (a, b, c) VALUES (?, ?, ?)", i, j, i + j);
>         }
>         assertRows(execute("SELECT * FROM %s"),
>                    row(1, 0, 1, 1),
>                    row(1, 1, 1, 2),
>                    row(1, 2, 1, 3),
>                    row(0, 0, 0, 0),
>                    row(0, 2, 0, 2),
>                    row(2, 0, 2, 2),
>                    row(2, 1, 2, 3),
>                    row(2, 2, 2, 4));
>         assertRows(execute("SELECT * FROM %s WHERE b = 1 ALLOW FILTERING"),
>                    row(1, 1, 1, 2),
>                    row(2, 1, 2, 3));
>         assertRows(execute("SELECT * FROM %s WHERE b = 1 LIMIT 2 ALLOW FILTERING"),
>                    row(1, 1, 1, 2),
>                    row(2, 1, 2, 3)); // <-------- FAIL It returns only one row because the static row of partition 0 is counted and filtered out in SELECT statement
>     }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)