You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Benjamin Lerer (JIRA)" <ji...@apache.org> on 2016/01/01 22:36:39 UTC
[jira] [Comment Edited] (CASSANDRA-10707) Add support for Group By
to Select statement
[ https://issues.apache.org/jira/browse/CASSANDRA-10707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15076378#comment-15076378 ]
Benjamin Lerer edited comment on CASSANDRA-10707 at 1/1/16 9:36 PM:
--------------------------------------------------------------------
Both will be supported.
What will not be supported is a {{group by}} clause where only a part of the partition key will be specified. For example, if a table has a primary key like {{PRIMARY KEY((partitionKey1, partitionKey2) clustering1, clustering2)}}, the following query will not be supported:
{{SELECT partitionKey1, MAX(value) FROM myTable GROUP BY partitionKey1}}
As for the aggregates, the grouping will be performed on the coordinator node. By consequence, if the driver use the Token aware policy, a query containing a partition key predicate will be more efficient as the aggregates will be built on the node where the data are located.
>From the syntax point of view, the queries:
{{SELECT partitionKey, clusteringColumn1, Max(value) FROM myTable WHERE partitionKey=5 GROUP BY partitionKey, clusteringColumn1;}}
and {{SELECT partitionKey, clusteringColumn1, Max(value) FROM myTable WHERE partitionKey=5 GROUP BY clusteringColumn1;}} will be both supported due to the fact that the {{partitionKey}} column is restricted by an {{=}} operator.
was (Author: blerer):
Both will be supported.
What will not be supported is a {{group by}} clause were only a part of the partition key will be specified. For example, if a table has a primary key like {{PRIMARY KEY((partitionKey1, partitionKey2) clustering1, clustering2)}}, the following query will not be supported:
{{SELECT partitionKey1, MAX(value) FROM myTable GROUP BY partitionKey1}}
As for the aggregates, the grouping will be performed on the coordinator node. By consequence, if the driver use the Token aware policy, a query containing a partition key predicate will be more efficient as the aggregates will be built on the node where the data are located.
>From the syntax point of view, the queries:
{{SELECT partitionKey, clusteringColumn1, Max(value) FROM myTable WHERE partitionKey=5 GROUP BY partitionKey, clusteringColumn1;}}
and {{SELECT partitionKey, clusteringColumn1, Max(value) FROM myTable WHERE partitionKey=5 GROUP BY clusteringColumn1;}} will be both supported due to the fact that the {{partitionKey}} column is restricted by an {{=}} operator.
> Add support for Group By to Select statement
> --------------------------------------------
>
> Key: CASSANDRA-10707
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10707
> Project: Cassandra
> Issue Type: Improvement
> Components: CQL
> Reporter: Benjamin Lerer
> Assignee: Benjamin Lerer
>
> Now that Cassandra support aggregate functions, it makes sense to support {{GROUP BY}} on the {{SELECT}} statements.
> It should be possible to group either at the partition level or at the clustering column level.
> {code}
> SELECT partitionKey, max(value) FROM myTable GROUP BY partitionKey;
> SELECT partitionKey, clustering0, clustering1, max(value) FROM myTable GROUP BY partitionKey, clustering0, clustering1;
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)