You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Sylvain Lebresne (JIRA)" <ji...@apache.org> on 2013/05/30 21:14:20 UTC

[jira] [Commented] (CASSANDRA-4415) Add cursor API/auto paging to the native CQL protocol

    [ https://issues.apache.org/jira/browse/CASSANDRA-4415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13670615#comment-13670615 ] 

Sylvain Lebresne commented on CASSANDRA-4415:
---------------------------------------------

Pushed patches for this to https://github.com/pcmanus/cassandra/compare/4415.

There is 5 commits:
# the first one adds a new (more general) internal version of get_paged_slice.
The problem with get_paged_slice is that it's only made to page range of full rows: it always restart at the beginning of the row when it start a new row.  So what we need is to be able to page from some (key, column) start pair (to continue paging where we stopped it), but still only ever return columns that match our column filter. That's what the new PagedRangeCommand does.
I will note that the old get_paged_slice is made a special case of the new command, and strictly speaking this change it's behavior. But I'm pretty sure this is fixing a bug more than anything else. More precisely, if you do:
{noformat}
get_paged_slice("cf", KeyRange("a", "", 1000), "c4", CL.ONE)
{noformat}
and it happens that row "a" doesn't exist, then the command will still start from "c4" for whatever the first row returned is. That feels broken to me, so after this patch it only start from "c4" for row "a". If "a" doesn't exist, it starts from the beginning of whatever is the first row following "a".
# the 2nd commit adds query pagers to page any type of internal query.
# the 3rd commit modify the binary protocol to add the paging support. Basically, it adds a pageSize to query messages that defines how big the next page should be. And then a NEXT message allow to get the following pages one by one.  And result sets have a flag that say "there is more page to get".
# the 4th commit replace 2 existing use of paging by the new pagers:
** in CassandraServer.get_count(). I'll note that this does re-introduce CASSANDRA-5099 until CASSANDRA-5149. Bus as said on the latter ticket, I just think we should fix CASSANDRA-5149 once and for all since all the paging done by those patches is potentially buggy otherwise.
** and we had an existing SliceQueryPager in org.apache.cassandra.db that was use to index incrementally a wide row. It's replaced by the new (equivalent) one.
It also probably wouln't be too hard to add paging to multi_get_count() but there's a slight refactoring to do in CassandraServer and I got lazy. Not sure anyone use that method anyway since no-one complained about the lack of paging.
# the 5th and last commit use the new pagers to page internally 'SELECT count(1)' queries (so we stop OOMing on those).

The patches add a few unit tests for the pagers (arguably not a lot) and I've tested the protocol bits manually (using the debug-cql toy client).

                
> Add cursor API/auto paging to the native CQL protocol
> -----------------------------------------------------
>
>                 Key: CASSANDRA-4415
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4415
>             Project: Cassandra
>          Issue Type: New Feature
>    Affects Versions: 1.2.0 beta 1
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>              Labels: cql, protocol
>             Fix For: 2.0
>
>
> The goal here would be to use a query paging mechanism to the CQL native protocol. Typically the client/server with that would look something like this:
> {noformat}
> C sends query to S.
> S sends N first rows matching the query + flag saying the response is not complete
> C requests the next N rows
> S sends N next rows + flag saying whether there is more
> C requests the next N rows
> ...
> S sends last rows + flag saying there is no more result
> {noformat}
> The clear goal is for user to not have to worry about limiting queries and doing manual paging.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira