You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Alexander Ivakov (JIRA)" <ji...@apache.org> on 2018/06/21 23:42:00 UTC

[jira] [Commented] (CASSANDRA-14376) Limiting a clustering column with a range not allowed when using "group by"

    [ https://issues.apache.org/jira/browse/CASSANDRA-14376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16519847#comment-16519847 ] 

Alexander Ivakov commented on CASSANDRA-14376:
----------------------------------------------

Cassandra currently doesn't implement group by on arbitrary columns. When grouping by multiple columns, you can only group by primary key columns in the order they are declared in the primary key, starting from the first. 

 

The only exception allowed is if you have restricted the first n columns with an equality restriction. Note, restricting a column with an "=" is selecting one group, so there is nothing to group in that column, that's why Cassandra allows this. You can then group by the remaining columns, in order and starting from the next column (you can’t skip columns in between).

 

So to group by a column, all preceding primary key columns have to be either restricted by "=" or be in the group by clause. 

 

The range query like the one above fails because the sample column is being restricted by a range, thus still having multiple groups, but is not in the group by clause. LIKE and IN restrictions will also not work in this case. 

 

This query will work:

 

select city, state, sum(count) from samples where name='bob' and partition=1 and sample>=1 and sample<=3 group by sample, city, state; 

 

The partition key columns are restricted by "=" and all the clustering key columns are in the group by clause in the same order as the schema. “Group by sample” and “group by sample, city” will also work.

> Limiting a clustering column with a range not allowed when using "group by"
> ---------------------------------------------------------------------------
>
>                 Key: CASSANDRA-14376
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14376
>             Project: Cassandra
>          Issue Type: Bug
>          Components: CQL
>         Environment: Cassandra 3.11.1
>            Reporter: Chris mildebrandt
>            Priority: Major
>
> I’m trying to use a range to limit a clustering column while at the same time using `group by` and running into issues. Here’s a sample table:
> {{create table if not exists samples (name text, partition int, sample int, city text, state text, count counter, primary key ((name, partition), sample, city, state)) with clustering order by (sample desc);}}
> When I filter `sample` by a range, I get an error:
> {{select city, state, sum(count) from samples where name='bob' and partition=1 and sample>=1 and sample<=3 group by city, state;}}
>  {{{color:#ff0000}InvalidRequest: Error from server: code=2200 [Invalid query] message="Group by currently only support groups of columns following their declared order in the PRIMARY KEY"{color}}}
> However, it allows the query when I change from a range to an equals:
> {{select city, state, sum(count) from samples where name='bob' and partition=1 and sample=1 group by city, state;}}
> {{city | state | system.sum(count)}}
> {{--------++--------------------------}}
> {{ Austin | TX | 2}}
> {{ Denver | CO | 1}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org