You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Benjamin Lerer (JIRA)" <ji...@apache.org> on 2016/01/01 22:36:39 UTC

[jira] [Comment Edited] (CASSANDRA-10707) Add support for Group By to Select statement

    [ https://issues.apache.org/jira/browse/CASSANDRA-10707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15076378#comment-15076378 ] 

Benjamin Lerer edited comment on CASSANDRA-10707 at 1/1/16 9:36 PM:
--------------------------------------------------------------------

Both will be supported.
What will not be supported is a {{group by}} clause where only a part of the partition key will be specified. For example, if a table has a primary key like {{PRIMARY KEY((partitionKey1, partitionKey2) clustering1, clustering2)}}, the following query will not be supported:
{{SELECT partitionKey1, MAX(value) FROM myTable GROUP BY partitionKey1}}

As for the aggregates, the grouping will be performed on the coordinator node. By consequence, if the driver use the Token aware policy, a query containing a partition key predicate will be more efficient as the aggregates will be built on the node where the data are located.

>From the syntax point of view, the queries:
{{SELECT partitionKey, clusteringColumn1, Max(value) FROM myTable WHERE partitionKey=5 GROUP BY partitionKey, clusteringColumn1;}}
and  {{SELECT partitionKey, clusteringColumn1, Max(value) FROM myTable WHERE partitionKey=5 GROUP BY clusteringColumn1;}} will be both supported due to the fact that the {{partitionKey}} column is restricted by an {{=}} operator.


was (Author: blerer):
Both will be supported.
What will not be supported is a {{group by}} clause were only a part of the partition key will be specified. For example, if a table has a primary key like {{PRIMARY KEY((partitionKey1, partitionKey2) clustering1, clustering2)}}, the following query will not be supported:
{{SELECT partitionKey1, MAX(value) FROM myTable GROUP BY partitionKey1}}

As for the aggregates, the grouping will be performed on the coordinator node. By consequence, if the driver use the Token aware policy, a query containing a partition key predicate will be more efficient as the aggregates will be built on the node where the data are located.

>From the syntax point of view, the queries:
{{SELECT partitionKey, clusteringColumn1, Max(value) FROM myTable WHERE partitionKey=5 GROUP BY partitionKey, clusteringColumn1;}}
and  {{SELECT partitionKey, clusteringColumn1, Max(value) FROM myTable WHERE partitionKey=5 GROUP BY clusteringColumn1;}} will be both supported due to the fact that the {{partitionKey}} column is restricted by an {{=}} operator.

> Add support for Group By to Select statement
> --------------------------------------------
>
>                 Key: CASSANDRA-10707
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10707
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: CQL
>            Reporter: Benjamin Lerer
>            Assignee: Benjamin Lerer
>
> Now that Cassandra support aggregate functions, it makes sense to support {{GROUP BY}} on the {{SELECT}} statements.
> It should be possible to group either at the partition level or at the clustering column level.
> {code}
> SELECT partitionKey, max(value) FROM myTable GROUP BY partitionKey;
> SELECT partitionKey, clustering0, clustering1, max(value) FROM myTable GROUP BY partitionKey, clustering0, clustering1; 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)