You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Sylvain Lebresne (JIRA)" <ji...@apache.org> on 2014/02/19 16:36:19 UTC

[jira] [Commented] (CASSANDRA-6722) cross-partition ordering should have warning or be disallowed when paging

    [ https://issues.apache.org/jira/browse/CASSANDRA-6722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13905561#comment-13905561 ] 

Sylvain Lebresne commented on CASSANDRA-6722:
---------------------------------------------

Good catch. We can't indeed properly do post-query reordering in general if we page. Note that currently, there is 2 cases where we do post-query reordering:
# with a IN on the partition key and an ORDER BY (the example above).
# if there is an IN on the last clustering column of a compact table.

The 2nd case is actually a bit weird. The reason we order post-query is so the resultset follows the order of the IN in the query, i.e. if you have the table above but COMPACT and do
{noformat}
SELECT value FROM paging_test WHERE id=1 AND value IN ('b', 'a', 'c')
{noformat}
you will get [ 'b', 'a', 'c' ] in that order as a result set. So far, why not, but there is 2 problems in practice:
* we only do that for compact tables. For non-compact ones, we just return results in clustering order (we don't do the post-query ordering). That inconsistency is an historical accident.
* this actually take precedence over ORDER BY, which is completely broken. I.e. even if you add 'ORDER BY value' in the query above, the results will not be properly ordered.
* this is even more broken than that: CASSANDRA-6701.
This case is a mess. So because it's a problem in the context of this ticket and because there is no reason to guarantee any special ordering of the result set if there is no ORDER BY, I suggest we just remove the behavior for compact storage (it was an implementation detail so far) to make it in line with the non-compact case, thus avoiding us to have to deal with it here.

Back to the more interesting case of the example in the description: a IN on the partition key with ORDER BY. In that case, we could almost support paging properly: the reason it's broken is that the pager queries partitions of the IN one after the other.  But the pager could, in theory, page over each partition simultaneously, querying them all little by little and doing a merge sort.

It is however not all that easy in practice. If you query a full page of each partition and there is many partitions in the IN, you'll load tons of data in memory, defeating in large parts the goal of paging. If you instead query less than the page size of each partition, you now may need to re-query some of the partitions depending on what the merge sort yield on those first pages. Not only would that require serious refactor of the current code to properly handle, but it's rather unclear how efficient this would really be in general. I'm not sure it's really worth it in the end. 

But in any case, such solution is way out of scope for 2.0 (and probably even for 2.1 at this point). So we need a quick solution for now and it probably mean that we should throw an IRE if the query requires post-query paging and paging is on.

Taking a step back, I wonder if allowing post-query reordering by default was a good idea, and if the same way we have ALLOW FILTERING, we shouldn't have required an ALLOW IN-MEMORY REORDERING for such queries. Of course, it's harder to change that now, but maybe we could still do it by adding such flag, deprecate queries that don't use it by logging a warning like we've done in CASSANDRA-6649 for 1 or 2 versions, and forbid it completely afterwards?


> cross-partition ordering should have warning or be disallowed when paging
> -------------------------------------------------------------------------
>
>                 Key: CASSANDRA-6722
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6722
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Russ Hatch
>            Assignee: Sylvain Lebresne
>            Priority: Minor
>             Fix For: 2.0.6
>
>         Attachments: 6722.txt
>
>
> consider this schema/data/query:
> {noformat}
> CREATE TABLE paging_test (
>     id int,
>     value text,
>     PRIMARY KEY (id, value)
> ) WITH CLUSTERING ORDER BY (value ASC)
>             |id|value|
>             |1 |a    |
>             |2 |b    |
>             |1 |c    |
>             |2 |d    | 
>             |1 |e    | 
>             |2 |f    | 
>             |1 |g    | 
>             |2 |h    |
>             |1 |i    |
>             |2 |j    |
> select * from paging_test where id in (1,2) order by value asc;
> {noformat}
> When paging the above query I get the sorted results from id=1 first, then the sorted results from id=2 after that. I was testing this because I was curious if the paging system could somehow globally sort the results but it makes sense that we can't do that, since that would require all results to be collated up front.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)