You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Sylvain Lebresne (JIRA)" <ji...@apache.org> on 2016/02/24 20:52:18 UTC

[jira] [Commented] (CASSANDRA-10707) Add support for Group By to Select statement

    [ https://issues.apache.org/jira/browse/CASSANDRA-10707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15163608#comment-15163608 ] 

Sylvain Lebresne commented on CASSANDRA-10707:
----------------------------------------------

The first point that needs discussing is that this patch changes the inter-node protocol and this without bumping the protocol version, which is something we've never done.

On the one side bumping the protocol version is something we've kept for major releases and so far planned not to do before 4.0. But on the other side, not being able to add anything to the inter-node messages limits a lot new features (like this one) and doesn't make sense in a tick-tock world.

We could accept this kind of change without bumping the version on the argument that it's only "additive" to the serialization format and that it breaks nothing if the new feature is not used, and we'd document that "you shall not use GROUP BY until all nodes in the cluster are updated to a version that supports it". But it's not something to treat lightly because if a user don't respect that rule, older node will fail deserialization of the message, which will drop the connection, which can impact other on-ongoing queries, which is not good.

Alternatively, we could simply bump the protocol version, forgetting the "not before 4.0" rule. The advantage is that if we do so we can have the sending node (instead of the receiving one) throw if a GROUP BY is used but not all nodes are up to date, which is cleaner and avoid the connection dropped problem.  

So something we need to decide first, and maybe we need a separate ticket to have that discussion.

Anyway, outside of this question, some general remarks on the patch:
* In the parser, we have the {{GROUP BY}} before the {{ORDER BY}}. On the one side it's more consistent with {{SQL}} but on the other side, we do happen to order before we group. And we kind of know that post-query ordering is not really something we can do due to paging (we do happen to do it in a few case when paging is not involved out of backward compatibility but given the limitation, I don't think we'll want to ever extend that) so it does feel the {{ORDER BY}} should come before {{GROUP BY}}.
* The {{forPagingByQueryPager}} and {{forGroupByInternalPaging}} methods are not very intuitive: it's unclear when just looking at {{DataLimits}} why they exist and why you should use them or not. I'm also not entirely sure we need the difference between the two. If I understand correctly, the problem is that when a replica ends a page in the middle of a group, the coordinator don't know if said replica has stopped because it reaches the end of a group which was the last one to return (in which case we're done), or if the replica stopped because of the page limit (but the group itself is not finished). But the coordinator should be able to figure that out by checking it's row counts: if after having consumed the page from the replica we've reached the group limit _or_ there was less row in the page than the page limit, then we're done, otherwise we should ask for more.  Granted, that does mean in the rare case where we've exhausted all data exactly on the page limit, we'll do an additional query (that will be empty), but pretty sure you have that problem in one form or another whatever you do.  Am I missing something here?
* Not sure how I feel about {{GroupSelector}}. It's a bit confusing as of this patch and while I get that its for allowing functions later, it might not be as generic as we may need. For instance, if we ever want to support function on multiple columns (which would be possible, assuming those are consecutive PK columns), this won't work, at least not as implemented. I think I'd slightly prefer removing {{GroupSelector}} for now and just rely on extending {{GroupBySpecification}} for future extension (typically we'll add a {{withFunctionGroupBySpec}} which would have all liberty as to how it is implemented).  Doing so also mean we can simplify the probably common and only currently supported case as we just need to keep the length of the prefix we group by rather than a list of (dumb) selector.
* I feel keeping/passing the {{lastPartitionKey}} and {{lastClustering}} in {{DataLimits.CQLGroupByLimits}} and {{GroupMaker}} is not as future proof/generic as the patch is otherwise trying to be. Typically, if we allow functions, the {{lastClustering}} won't really be a clustering. Granted, it'll still fit into a {{ByteBuffer[]}} but it'll be a bit of an abuse. So I would suggest adding {{GroupMaker.State}} concept whose implementation would be specific to the {{GroupMaker/GroupBySpecification}} but would be transparent to other code. Imo that would make some code cleaner (for instance, {{GroupMaker.GROUP_EVERYTHING}} wouldn't need to store anything, which makes sense).
* {{GroupBySpeccification.NO_GROUPING}} feels confusing because it really means the query is not an aggregate, while the naming makes it sound like it has no {{GROUP BY}}.  Similarly, the {{isGrouping}} method really mean {{hasAggregate}}. In fact, I think it would be more clear to just have no {{GroupBySpecification}} in that case, keeping it {{null}} in SelectStatement (and in {{Selection}}). Having {{groupBySpec != null}} to mean no aggregate is imo more clear than {{!groupBySpec.isGrouping()}}. I'm also confused as to why {{GroupMaker.NO_GROUPING}} even exists, we shouldn't ever create a {{GroupMaker}} in the first place if we're not aggregating at all.
* In {{DataLimits}}, the naming of {{noGroupByLimits}} (and {{distinctNoGroupByLimits}}) is confusing (I mean, the both create a {{CQLGroupByLimits}}!). I think we could remove then and just pass {{cqlRowLimit}} directly as first argument in {{SelectStatement.getDataLimits}}.
* Related to the previous point, some phrasing is a tad confusing, like a comment saying {{GroupByPartitionIterator for queries without Group By}} in {{GroupByQueryPager}}. I think that's because {{GroupSpecification}}, {{GroupByLimits}} and {{GroupByQueryPager}} classes are really about handling aggregates in general, not query having an explicit {{GROUP BY}}.  So maybe renaming to {{AggregationSpecification}}, {{AggregationLimits}} and {{AggregationQueryPager}} would be more clear?
* The logic in {{SelectStatement.RawStatement.getGroupBySpecification}} breaks (doesn't throw a "proper" exception) if someone write {{SELECT * FROM t GROUP BY a, a}} where {{a}} is the sole PK column (the query is stupid, but it should throw a proper exception). More generally, I feel that the logic don't make it too intuitive that we're indeed checking  proper ordering of the column. As a nit, instead of copying both partition and clustering columns in {{CFMetaData.primaryKeyColumns}}, I'd use {{Iterables.concat}} instead (and use an iterator in getGroupBySpecification, which would make it even more apparent than not checking {{hasNext}} is a bad idea).
* Having {{CQLGroupByLimits}} inherit {{estimateTotalResults}} from {{CQLLimits}} is theoretically wrong as we should return an estimate of groups. Now, properly doing that is hard and so for now I'm fine reusing the one of {{CQLLimits}} for simplicity but we should call this out clearly by overriding the method (having it just call {{super.estimateTotalResults()}}) with a comment explaining we know it's off and should ideally be fixed someday. In fact, I wonder if having {{CQLGroupByLimits}} inherit {{CQLLimits}} is a good idea: they work sufficiently differently that it feels risky to have method inherited silently. The only other method not overriden is {{hasEnoughLiveData}} and we could easily use a {{private static}} helper for code reuse in that case. Anyway, I don't mind the inheritence terribly if you prefer but figured I'd mention it.
* Not sure about {{CQLGroupByLimits.forShortReadRetry()}}. I believe putting no limit on the number of rows (and only on the group) might lead to OOM.  In fact, I need to think more carefully about this but I'm not 100% sure that the short read logic isn't throw off by the fact that {{counted()}} returns a number of groups not rows.
* In {{GroupByQueryPager}}, both iterator implementation have a {{correctPageSize()}} method that is unused. And in the case of {{AggregationPartitionIterator}}, it seems fishy that it's not used.
* Shouldn't {{GroupByQueryPager.GoupByPartitionIterator.close()}} set {{closed}} if it isn't set?
* I think we lack some validation/handling of composite partition keys. If I have {{PRIMARY KEY ((a, b), c)}} and I do {{SELECT count( *) FROM t GROUP BY a}}, I believe this silently do the grouping by both {{a}} and {{b}}. We should reject this since we have no way to handle it.


Also, a bunch of minor nits:
* {{GroupByQueryPager.GroupByPartitionIterator.GroupByRowIterator}} could call {{rowIterator.partitionKey()}} rather than stores the partition key separately. Regarding that class, we also should remove all the {{@Override}} according to the [code style|https://wiki.apache.org/cassandra/CodeStyle].
* Why do we have {{rowPageSize}} in {{CQLGroupByLimits}}. The ctor seems to always make it equal to {{rowLimit}} and both fields are {{final}} (so will stay equal).
* In {{Selection.resultSetBuilder}}, the {{isJons}} parameter is a typo for {{isJson}}. I know this is not new to this patch but lets fix it rather than duplicate the typo.
* Your IDE is not respecting the import order of the code-style. It moved {{com.google}} imports after the {{org.apache}} ones in {{Selection.java}} for instance (see the [code style|https://wiki.apache.org/cassandra/CodeStyle]). I also don't love that it expands {{*}} as this adds noise to patches but that might be more personal.
* We can now remove the {{// Note that if there are some nodes in the cluster with a version less than 2.0, we can't use paging (CASSANDRA-6707)}} line (in {{SelectStatement.getDataLimits}}).
* Could add a comment in {{SelectStatement.RawStatement.getGroupBySpecification()}} as to why we don't do anything with partition key columns.
* Was there a reason to change {{s/cqlRowLimit/userLimit}} in the last 2 lines of {{SelectStatement.getDataLimits()}}? Pretty sure it doesn't matter but since I don't find the change particularly better I'm left wondering if I'm missing the reason for the change.
* Could be wrong, but it doesn't seem we need to add {{withUpdatedLimit()}} to {{ReadQuery}} (that is, we only use it on a {{ReadCommand}}). If true, I'd rather remove it from {{ReadQuery}} which allow to remove the unused implementation in {{SinglePartitionReadCommand.Group}}.
* Indentation for the parameters of {{GroupMaker.newInstance}} is off.
* It would make more sense for {{QueryPager.EMPTY.withUpdatedLimit()}} to throw {{UnsupportedOperationException}}.
* In {{DataLimits.CQLLimits.hasEnoughLiveData}}, a comment has been butchered.
* Some indentation is off in {{DataLimits.Serializer.deserialize}}.
* In {{GroupByQueryPager.handlePagingOff}}, the comment is incomplete.
* Unecessary import of {{Optional}} in {{Restrictions.java}}.


> Add support for Group By to Select statement
> --------------------------------------------
>
>                 Key: CASSANDRA-10707
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10707
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: CQL
>            Reporter: Benjamin Lerer
>            Assignee: Benjamin Lerer
>
> Now that Cassandra support aggregate functions, it makes sense to support {{GROUP BY}} on the {{SELECT}} statements.
> It should be possible to group either at the partition level or at the clustering column level.
> {code}
> SELECT partitionKey, max(value) FROM myTable GROUP BY partitionKey;
> SELECT partitionKey, clustering0, clustering1, max(value) FROM myTable GROUP BY partitionKey, clustering0, clustering1; 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)