You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Sylvain Lebresne (JIRA)" <ji...@apache.org> on 2016/08/02 14:11:20 UTC

[jira] [Commented] (CASSANDRA-10707) Add support for Group By to Select statement

    [ https://issues.apache.org/jira/browse/CASSANDRA-10707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15404044#comment-15404044 ] 

Sylvain Lebresne commented on CASSANDRA-10707:
----------------------------------------------

Last version mostly look good. The main thing I still don't like is the {{filterOnReplica}} method: I feel it's easy to misuse and doesn't feel particulary natural. Thinking about this, I feel the underlying issue we're trying to solve is more general: the {{DataLimits}} holds state (for paging and grouping) which somewhat assumes things are queried sequentially (and in order). However, when we do range queries and send queries in parallel to nodes, that's not true anymore (except maybe for the first range sent), at least not for the queries sent to replica (we still process them in order on the coordinator). So anyway, I think a better way to handle this is to acknowledge that fact in {{StorageProxy.getRangeSlice}} and drop any state from the sub-range commands sent in parallel. I've tried such change in the branch attached below (which is also rebased).

The branch also include a commit with a few nits, mostly around comments. Feel free to ignore some of it if you don't like it.

| [10707-trunk|https://github.com/pcmanus/cassandra/commits/10707-trunk] | [utests|http://cassci.datastax.com/job/pcmanus-10707-trunk-testall] | [dtests|http://cassci.datastax.com/job/pcmanus-10707-trunk-dtest] |

I'll note that the dtest run has failures, but this is a ongoing problem with CI today. Random tests fail with {{Host has been marked down or removed}} but you get that on today trunk run as well: http://cassci.datastax.com/view/trunk/job/trunk_dtest/1322/

Anyway, if we can agree on those 2 small commits, then I'm +1 (though we might want to wait on CI to stabilize on dtests to make sure).

> Add support for Group By to Select statement
> --------------------------------------------
>
>                 Key: CASSANDRA-10707
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10707
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: CQL
>            Reporter: Benjamin Lerer
>            Assignee: Benjamin Lerer
>             Fix For: 3.x
>
>
> Now that Cassandra support aggregate functions, it makes sense to support {{GROUP BY}} on the {{SELECT}} statements.
> It should be possible to group either at the partition level or at the clustering column level.
> {code}
> SELECT partitionKey, max(value) FROM myTable GROUP BY partitionKey;
> SELECT partitionKey, clustering0, clustering1, max(value) FROM myTable GROUP BY partitionKey, clustering0, clustering1; 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)