You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Nishant Bangarwa (JIRA)" <ji...@apache.org> on 2018/01/02 20:21:00 UTC
[jira] [Commented] (CALCITE-2113) Push column pruning to druid when Aggregate cannot be pushed

    [ https://issues.apache.org/jira/browse/CALCITE-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16308651#comment-16308651 ] 

Nishant Bangarwa commented on CALCITE-2113:
-------------------------------------------

[~bslim] the patch attempts to extract a project from aggregator and push it to druid for the cases when we have an Aggregator on top of DruidQuery and that aggregator cannot be pushed to druid. e.g in case the aggregation having grouping sets it cannot be pushed to druid e.g. 
??{{select empno, deptno, sum(empno), from emp group by grouping sets ((empno, deptno),(deptno),(empno))
}}??

[~julianhyde] have updated the PR based on your review comments and added tests. 
- For the inefficiency you pointed out, not sure if i understood the fix correctly,  do you mean to suggest to replace the second param {{Class<? extends RelNode> inputClass to RelSubSet }}? (I tried doing that but the rule did not matched in HepPlanner, is that expected ? ) 
- Hit the case of firing of the rule repeatedly in one Unit Test and added a Predicate to prevent matching an Aggregate when it is already on top of Project. Let me know if there is any better way to handle this. 
- 

> Push column pruning to druid when Aggregate cannot be pushed
> ------------------------------------------------------------
>
>                 Key: CALCITE-2113
>                 URL: https://issues.apache.org/jira/browse/CALCITE-2113
>             Project: Calcite
>          Issue Type: Bug
>          Components: druid
>            Reporter: Nishant Bangarwa
>            Assignee: Nishant Bangarwa
>
> Column pruning will not work when we have an Aggregate on top of a DruidQuery and the aggregate cannot be pushed to druid. (one such case is when it is count on a metric). 
> To fix this, we can introduce a new Rule to extract a Project from the aggregate and push that to DruidQuery before pushing the aggregate.
> {code} 
> INFO  : Executing command(queryId=hive_20171020180303_09fd3ab2-6e4a-42a1-9e85-4bca0e13460b): explain SELECT COUNT(`__time`)
>                           FROM tpcds_denormalized_druid_table_300M
>                           WHERE `__time` >= '1999-11-01 00:00:00'
>                                 AND `__time` <= '1999-11-10 00:00:00'
>                                 AND `__time` < '1999-11-05 00:00:00'
> INFO  : Starting task [Stage-3:EXPLAIN] in serial mode
> INFO  : Resetting the caller context to HIVE_SSN_ID:a5e1f82e-6d6c-405c-a6da-0d74f2248603
> INFO  : Completed executing command(queryId=hive_20171020180303_09fd3ab2-6e4a-42a1-9e85-4bca0e13460b); Time taken: 0.011 seconds
> INFO  : OK
> tpcds_real_bin_partitioned_orc_1000@tpcds_denormalized_druid_table_300m,tpcds_denormalized_druid_table_300m,Tbl:COMPLETE,Col:NONE,Output:["__time"],properties:{"druid.query.json":"{\"queryType\":\"select\",\"dataSource\":\"tpcds_real_bin_partitioned_orc_1000.tpcds_denormalized_druid_table_300M\",\"descending\":false,\"intervals\":[\"1999-11-01T00:00:00.000/1999-11-05T00:00:00.000\"],\"dimensions\":[\"i_item_id\",\"i_rec_start_date\",\"i_rec_end_date\",\"i_item_desc\",\"i_brand_id\",\"i_brand\",\"i_class_id\",\"i_class\",\"i_category_id\",\"i_category\",\"i_manufact_id\",\"i_manufact\",\"i_size\",\"i_formulation\",\"i_color\",\"i_units\",\"i_container\",\"i_manager_id\",\"i_product_name\",\"c_customer_id\",\"c_salutation\",\"c_first_name\",\"c_last_name\",\"c_preferred_cust_flag\",\"c_birth_day\",\"c_birth_month\",\"c_birth_year\",\"c_birth_country\",\"c_login\",\"c_email_address\",\"c_last_review_date\",\"ca_address_id\",\"ca_street_number\",\"ca_street_name\",\"ca_street_type\",\"ca_suite_number\",\"ca_city\",\"ca_county\",\"ca_state\",\"ca_zip\",\"ca_country\",\"ca_gmt_offset\",\"s_rec_end_date\",\"s_store_name\",\"s_hours\",\"s_manager\",\"s_market_id\",\"s_geography_class\",\"s_market_desc\",\"s_market_manager\",\"s_division_id\",\"s_division_name\",\"s_company_id\",\"s_company_name\",\"s_street_number\",\"s_street_name\",\"s_street_type\",\"s_suite_number\",\"s_city\",\"s_county\",\"s_state\",\"s_zip\",\"s_country\",\"s_gmt_offset\"],\"metrics\":[\"ss_ticket_number\",\"ss_quantity\",\"ss_wholesale_cost\",\"ss_list_price\",\"ss_sales_price\",\"ss_ext_discount_amt\",\"ss_ext_sales_price\",\"ss_ext_wholesale_cost\",\"ss_ext_list_price\",\"ss_ext_tax\",\"ss_coupon_amt\",\"ss_net_paid\",\"ss_net_paid_inc_tax\",\"ss_net_profit\",\"i_current_price\",\"i_wholesale_cost\",\"s_number_employees\",\"s_floor_space\",\"s_tax_precentage\"],\"granularity\":\"all\",\"pagingSpec\":{\"threshold\":16384,\"fromNext\":true},\"context\":{\"druid.query.fetch\":false}}","druid.query.type":"select"}  |
> {code} 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)