You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@calcite.apache.org by "Julian Hyde (JIRA)" <ji...@apache.org> on 2017/03/17 18:35:41 UTC

[jira] [Created] (CALCITE-1706) DruidAggregateFilterTransposeRule causes very fine-grained aggregations to be pushed to Druid

Julian Hyde created CALCITE-1706:
------------------------------------

             Summary: DruidAggregateFilterTransposeRule causes very fine-grained aggregations to be pushed to Druid
                 Key: CALCITE-1706
                 URL: https://issues.apache.org/jira/browse/CALCITE-1706
             Project: Calcite
          Issue Type: Bug
            Reporter: Julian Hyde
            Assignee: Julian Hyde


Enabling DruidAggregateFilterTransposeRule may cause very fine-grained aggregations to be pushed to Druid.

Running {{DruidAdapterIT.testFilterTimestamp}}, here is the previous plan (with {{DruidAggregateFilterTransposeRule}} disabled):

{noformat}
EnumerableInterpreter
  BindableAggregate(group=[{}], C=[COUNT()])
    BindableFilter(condition=[AND(>=(/INT(Reinterpret($0), 86400000), 1997-01-01), <(/INT(Reinterpret($0), 86400000), 1998-01-01), OR(AND(>=(/INT(Reinterpret($0), 86400000), 1997-04-01), <(/INT(Reinterpret($0), 86400000), 1997-05-01)), AND(>=(/INT(Reinterpret($0), 86400000), 1997-06-01), <(/INT(Reinterpret($0), 86400000), 1997-07-01))))])
      DruidQuery(table=[[foodmart, foodmart]], intervals=[[1900-01-09T00:00:00.000/2992-01-10T00:00:00.000]], projects=[[$0]])
{noformat}

Here is the (in my opinion inferior) plan with {{DruidAggregateFilterTransposeRule}} enabled:

{noformat}
EnumerableInterpreter
  BindableAggregate(group=[{}], C=[$SUM0($1)])
    BindableFilter(condition=[AND(=(EXTRACT_DATE(FLAG(YEAR), /INT(Reinterpret($0), 86400000)), 1997), OR(=(EXTRACT_DATE(FLAG(MONTH), /INT(Reinterpret($0), 86400000)), 4), =(EXTRACT_DATE(FLAG(MONTH), /INT(Reinterpret($0), 86400000)), 6)))])
      DruidQuery(table=[[foodmart, foodmart]], intervals=[[1900-01-09T00:00:00.000/2992-01-10T00:00:00.000]], groups=[{0}], aggs=[[COUNT()]])
{noformat}

Note that the DruidQuery is aggregating on __timestamp. Given that __timestamp is very high cardinality, is this an efficient operation for Druid?

For this particular query, the ideal would be to push the filter into the {{intervals}} clause. Then we would not need to group by __timestamp. I am not sure why this is not happening.

[~nishantbangarwa], [~bslim], How bad is the query with {{DruidAggregateFilterTransposeRule}} enabled, in your opinion? Is this a show-stopper for Calcite 1.12?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)