You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Jacques Nadeau (JIRA)" <ji...@apache.org> on 2015/10/07 22:37:26 UTC
[jira] [Created] (DRILL-3910) Leverage Calcite's Clustered
Collation
Jacques Nadeau created DRILL-3910:
-------------------------------------
Summary: Leverage Calcite's Clustered Collation
Key: DRILL-3910
URL: https://issues.apache.org/jira/browse/DRILL-3910
Project: Apache Drill
Issue Type: Improvement
Components: Query Planning & Optimization
Reporter: Jacques Nadeau
Right now streaming aggregate requires full collation. I was just talking to [~julianhyde] and he pointed out that Calcite has a version of Collation that is Clustered (similar to what MSSQL calls Segment). Realistically, Streaming aggregate only requires a clustered collation and we should switch to requiring this. We should also go through existing operators and make sure we manage whether or not the operators maintain a clustered collation. We should then be able to have flatten produce a clustered output against the carry-through fields. This will allow us to do a better job taking advantage of the clustered-ness of data for doing additional operations. Flatten should also produce data which exposes the distribution trait on the carry-through fields. This means that a query like this:
select a, count(b) from (
select a, flatten(x) as b from t
)x
group by a
Should be executed without redistribution of data.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)