You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Aman Sinha (JIRA)" <ji...@apache.org> on 2016/11/29 16:48:58 UTC

[jira] [Comment Edited] (DRILL-4347) Planning time for query64 from TPCDS test suite has increased 10 times compared to 1.4 release

    [ https://issues.apache.org/jira/browse/DRILL-4347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15705810#comment-15705810 ] 

Aman Sinha edited comment on DRILL-4347 at 11/29/16 4:48 PM:
-------------------------------------------------------------

The jstack is long because of the complex query.  It shows that the planner is stuck during Calcite's {{ReflectiveRelMetadataProvider.apply()}}  call during the post-processing phase of Drill planning.  At this phase, the logical and physical planning are done and planner is in SwapHashJoin phase.  During this, it calls getRows() on the inputs of all the hash joins to makes its decisions.  The getRows() eventually calls {{RelMdDistinctRowCount.getDistinctRowCount()}} since there is a GROUP-BY and the row count of a grouped aggregate is determined by the number of distinct rows for its group-by columns.  Note that Calcite needs the distinct row count also from the Join operators (not just Aggregates) if the output of the Join is feeding into an Aggregate.

Note that the stack trace is different from a similar (but not same) issue reported in CALCITE-1053. It is unclear what is the root cause of the deeply nested reflexive calls getting stuck, but one important observation is that Drill is needlessly doing this computation twice - once during logical planning phase and once during physical planning.  The distinct row count of all the Joins can be computed during logical planning and cached for future use during physical planning because this value is not going to change.   For complex queries such as these with many joins, it also saves planning time.  I am proposing to fix the issue by doing this caching of distinct row count for Joins.  



was (Author: amansinha100):
The jstack is long because of the complex query.  It shows that the planner is stuck during Calcite's {bq} ReflectiveRelMetadataProvider.apply() {bq} call during the post-processing phase of Drill planning.  At this phase, the logical and physical planning are done and planner is in SwapHashJoin phase.  During this, it calls getRows() on the inputs of all the hash joins to makes its decisions.  The getRows() eventually calls {bq}RelMdDistinctRowCount.getDistinctRowCount(){bq} since there is a GROUP-BY and the row count of a grouped aggregate is determined by the number of distinct rows for its group-by columns.  Note that Calcite needs the distinct row count also from the Join operators (not just Aggregates) if the output of the Join is feeding into an Aggregate.

It is unclear what is the root cause of the Calcite call either stuck or taking too long (there could be some issues with the deeply nested reflexive calls), but one important observation is that Drill is needlessly doing this computation twice - once during logical planning phase and once during physical planning.  The distinct row count of all the Joins can be computed during logical planning and cached for future use during physical planning because this value is not going to change.   For complex queries such as these with many joins, it also saves planning time.  I am proposing to fix the issue by doing this caching of distinct row count for Joins.  


> Planning time for query64 from TPCDS test suite has increased 10 times compared to 1.4 release
> ----------------------------------------------------------------------------------------------
>
>                 Key: DRILL-4347
>                 URL: https://issues.apache.org/jira/browse/DRILL-4347
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Query Planning & Optimization
>    Affects Versions: 1.5.0
>            Reporter: Victoria Markman
>            Assignee: Aman Sinha
>             Fix For: Future
>
>         Attachments: 294e9fb9-cdda-a89f-d1a7-b852878926a1.sys.drill_1.4.0, 294ea418-9fb8-3082-1725-74e3cfe38fe9.sys.drill_1.5.0, drill4347_jstack.txt
>
>
> mapr-drill-1.5.0.201602012001-1.noarch.rpm
> {code}
> 0: jdbc:drill:schema=dfs> WITH cs_ui
> . . . . . . . . . . . . >      AS (SELECT cs_item_sk,
> . . . . . . . . . . . . >                 Sum(cs_ext_list_price) AS sale,
> . . . . . . . . . . . . >                 Sum(cr_refunded_cash + cr_reversed_charge
> . . . . . . . . . . . . >                     + cr_store_credit) AS refund
> . . . . . . . . . . . . >          FROM   catalog_sales,
> . . . . . . . . . . . . >                 catalog_returns
> . . . . . . . . . . . . >          WHERE  cs_item_sk = cr_item_sk
> . . . . . . . . . . . . >                 AND cs_order_number = cr_order_number
> . . . . . . . . . . . . >          GROUP  BY cs_item_sk
> . . . . . . . . . . . . >          HAVING Sum(cs_ext_list_price) > 2 * Sum(
> . . . . . . . . . . . . >                 cr_refunded_cash + cr_reversed_charge
> . . . . . . . . . . . . >                 + cr_store_credit)),
> . . . . . . . . . . . . >      cross_sales
> . . . . . . . . . . . . >      AS (SELECT i_product_name         product_name,
> . . . . . . . . . . . . >                 i_item_sk              item_sk,
> . . . . . . . . . . . . >                 s_store_name           store_name,
> . . . . . . . . . . . . >                 s_zip                  store_zip,
> . . . . . . . . . . . . >                 ad1.ca_street_number   b_street_number,
> . . . . . . . . . . . . >                 ad1.ca_street_name     b_streen_name,
> . . . . . . . . . . . . >                 ad1.ca_city            b_city,
> . . . . . . . . . . . . >                 ad1.ca_zip             b_zip,
> . . . . . . . . . . . . >                 ad2.ca_street_number   c_street_number,
> . . . . . . . . . . . . >                 ad2.ca_street_name     c_street_name,
> . . . . . . . . . . . . >                 ad2.ca_city            c_city,
> . . . . . . . . . . . . >                 ad2.ca_zip             c_zip,
> . . . . . . . . . . . . >                 d1.d_year              AS syear,
> . . . . . . . . . . . . >                 d2.d_year              AS fsyear,
> . . . . . . . . . . . . >                 d3.d_year              s2year,
> . . . . . . . . . . . . >                 Count(*)               cnt,
> . . . . . . . . . . . . >                 Sum(ss_wholesale_cost) s1,
> . . . . . . . . . . . . >                 Sum(ss_list_price)     s2,
> . . . . . . . . . . . . >                 Sum(ss_coupon_amt)     s3
> . . . . . . . . . . . . >          FROM   store_sales,
> . . . . . . . . . . . . >                 store_returns,
> . . . . . . . . . . . . >                 cs_ui,
> . . . . . . . . . . . . >                 date_dim d1,
> . . . . . . . . . . . . >                 date_dim d2,
> . . . . . . . . . . . . >                 date_dim d3,
> . . . . . . . . . . . . >                 store,
> . . . . . . . . . . . . >                 customer,
> . . . . . . . . . . . . >                 customer_demographics cd1,
> . . . . . . . . . . . . >                 customer_demographics cd2,
> . . . . . . . . . . . . >                 promotion,
> . . . . . . . . . . . . >                 household_demographics hd1,
> . . . . . . . . . . . . >                 household_demographics hd2,
> . . . . . . . . . . . . >                 customer_address ad1,
> . . . . . . . . . . . . >                 customer_address ad2,
> . . . . . . . . . . . . >                 income_band ib1,
> . . . . . . . . . . . . >                 income_band ib2,
> . . . . . . . . . . . . >                 item
> . . . . . . . . . . . . >          WHERE  ss_store_sk = s_store_sk
> . . . . . . . . . . . . >                 AND ss_sold_date_sk = d1.d_date_sk
> . . . . . . . . . . . . >                 AND ss_customer_sk = c_customer_sk
> . . . . . . . . . . . . >                 AND ss_cdemo_sk = cd1.cd_demo_sk
> . . . . . . . . . . . . >                 AND ss_hdemo_sk = hd1.hd_demo_sk
> . . . . . . . . . . . . >                 AND ss_addr_sk = ad1.ca_address_sk
> . . . . . . . . . . . . >                 AND ss_item_sk = i_item_sk
> . . . . . . . . . . . . >                 AND ss_item_sk = sr_item_sk
> . . . . . . . . . . . . >                 AND ss_ticket_number = sr_ticket_number
> . . . . . . . . . . . . >                 AND ss_item_sk = cs_ui.cs_item_sk
> . . . . . . . . . . . . >                 AND c_current_cdemo_sk = cd2.cd_demo_sk
> . . . . . . . . . . . . >                 AND c_current_hdemo_sk = hd2.hd_demo_sk
> . . . . . . . . . . . . >                 AND c_current_addr_sk = ad2.ca_address_sk
> . . . . . . . . . . . . >                 AND c_first_sales_date_sk = d2.d_date_sk
> . . . . . . . . . . . . >                 AND c_first_shipto_date_sk = d3.d_date_sk
> . . . . . . . . . . . . >                 AND ss_promo_sk = p_promo_sk
> . . . . . . . . . . . . >                 AND hd1.hd_income_band_sk = ib1.ib_income_band_sk
> . . . . . . . . . . . . >                 AND hd2.hd_income_band_sk = ib2.ib_income_band_sk
> . . . . . . . . . . . . >                 AND cd1.cd_marital_status <> cd2.cd_marital_status
> . . . . . . . . . . . . >                 AND i_color IN ( 'cyan', 'peach', 'blush', 'frosted',
> . . . . . . . . . . . . >                                  'powder', 'orange' )
> . . . . . . . . . . . . >                 AND i_current_price BETWEEN 58 AND 58 + 10
> . . . . . . . . . . . . >                 AND i_current_price BETWEEN 58 + 1 AND 58 + 15
> . . . . . . . . . . . . >          GROUP  BY i_product_name,
> . . . . . . . . . . . . >                    i_item_sk,
> . . . . . . . . . . . . >                    s_store_name,
> . . . . . . . . . . . . >                    s_zip,
> . . . . . . . . . . . . >                    ad1.ca_street_number,
> . . . . . . . . . . . . >                    ad1.ca_street_name,
> . . . . . . . . . . . . >                    ad1.ca_city,
> . . . . . . . . . . . . >                    ad1.ca_zip,
> . . . . . . . . . . . . >                    ad2.ca_street_number,
> . . . . . . . . . . . . >                    ad2.ca_street_name,
> . . . . . . . . . . . . >                    ad2.ca_city,
> . . . . . . . . . . . . >                    ad2.ca_zip,
> . . . . . . . . . . . . >                    d1.d_year,
> . . . . . . . . . . . . >                    d2.d_year,
> . . . . . . . . . . . . >                    d3.d_year)
> . . . . . . . . . . . . > SELECT cs1.product_name,
> . . . . . . . . . . . . >        cs1.store_name,
> . . . . . . . . . . . . >        cs1.store_zip,
> . . . . . . . . . . . . >        cs1.b_street_number,
> . . . . . . . . . . . . >        cs1.b_streen_name,
> . . . . . . . . . . . . >        cs1.b_city,
> . . . . . . . . . . . . >        cs1.b_zip,
> . . . . . . . . . . . . >        cs1.c_street_number,
> . . . . . . . . . . . . >        cs1.c_street_name,
> . . . . . . . . . . . . >        cs1.c_city,
> . . . . . . . . . . . . >        cs1.c_zip,
> . . . . . . . . . . . . >        cs1.syear,
> . . . . . . . . . . . . >        cs1.cnt,
> . . . . . . . . . . . . >        cs1.s1,
> . . . . . . . . . . . . >        cs1.s2,
> . . . . . . . . . . . . >        cs1.s3,
> . . . . . . . . . . . . >        cs2.s1,
> . . . . . . . . . . . . >        cs2.s2,
> . . . . . . . . . . . . >        cs2.s3,
> . . . . . . . . . . . . >        cs2.syear,
> . . . . . . . . . . . . >        cs2.cnt
> . . . . . . . . . . . . > FROM   cross_sales cs1,
> . . . . . . . . . . . . >        cross_sales cs2
> . . . . . . . . . . . . > WHERE  cs1.item_sk = cs2.item_sk
> . . . . . . . . . . . . >        AND cs1.syear = 2001
> . . . . . . . . . . . . >        AND cs2.syear = 2001 + 1
> . . . . . . . . . . . . >        AND cs2.cnt <= cs1.cnt
> . . . . . . . . . . . . >        AND cs1.store_name = cs2.store_name
> . . . . . . . . . . . . >        AND cs1.store_zip = cs2.store_zip
> . . . . . . . . . . . . > ORDER  BY cs1.product_name,
> . . . . . . . . . . . . >           cs1.store_name,
> . . . . . . . . . . . . >           cs2.cnt;
> +----------------+-------------+------------+------------------+----------------+------------+--------+------------------+----------------+----------------+--------+--------+------+--------+-------+------+--------+---------+--------+---------+-------+
> |  product_name  | store_name  | store_zip  | b_street_number  | b_streen_name  |   b_city   | b_zip  | c_street_number  | c_street_name  |     c_city     | c_zip  | syear  | cnt  |   s1   |  s2   |  s3  |  s10   |   s20   |  s30   | syear0  | cnt0  |
> +----------------+-------------+------------+------------------+----------------+------------+--------+------------------+----------------+----------------+--------+--------+------+--------+-------+------+--------+---------+--------+---------+-------+
> | antin station  | ation       | 31904      | 483              | Maple          | Woodville  | 14289  | 600              | 13th Highland  | Spring Valley  | 36060  | 2001   | 1    | 33.13  | 63.6  | 0.0  | 97.04  | 112.56  | 15.11  | 2002    | 1     |
> +----------------+-------------+------------+------------------+----------------+------------+--------+------------------+----------------+----------------+--------+--------+------+--------+-------+------+--------+---------+--------+---------+-------+
> 1 row selected (387.503 seconds)
> {code}
> Compared to mapr-drill-1.4.0.201601071151-1.noarch.rpm
> {code}
> 0: jdbc:drill:schema=dfs> WITH cs_ui
> . . . . . . . . . . . . >      AS (SELECT cs_item_sk,
> . . . . . . . . . . . . >                 Sum(cs_ext_list_price) AS sale,
> . . . . . . . . . . . . >                 Sum(cr_refunded_cash + cr_reversed_charge
> . . . . . . . . . . . . >                     + cr_store_credit) AS refund
> . . . . . . . . . . . . >          FROM   catalog_sales,
> . . . . . . . . . . . . >                 catalog_returns
> . . . . . . . . . . . . >          WHERE  cs_item_sk = cr_item_sk
> . . . . . . . . . . . . >                 AND cs_order_number = cr_order_number
> . . . . . . . . . . . . >          GROUP  BY cs_item_sk
> . . . . . . . . . . . . >          HAVING Sum(cs_ext_list_price) > 2 * Sum(
> . . . . . . . . . . . . >                 cr_refunded_cash + cr_reversed_charge
> . . . . . . . . . . . . >                 + cr_store_credit)),
> . . . . . . . . . . . . >      cross_sales
> . . . . . . . . . . . . >      AS (SELECT i_product_name         product_name,
> . . . . . . . . . . . . >                 i_item_sk              item_sk,
> . . . . . . . . . . . . >                 s_store_name           store_name,
> . . . . . . . . . . . . >                 s_zip                  store_zip,
> . . . . . . . . . . . . >                 ad1.ca_street_number   b_street_number,
> . . . . . . . . . . . . >                 ad1.ca_street_name     b_streen_name,
> . . . . . . . . . . . . >                 ad1.ca_city            b_city,
> . . . . . . . . . . . . >                 ad1.ca_zip             b_zip,
> . . . . . . . . . . . . >                 ad2.ca_street_number   c_street_number,
> . . . . . . . . . . . . >                 ad2.ca_street_name     c_street_name,
> . . . . . . . . . . . . >                 ad2.ca_city            c_city,
> . . . . . . . . . . . . >                 ad2.ca_zip             c_zip,
> . . . . . . . . . . . . >                 d1.d_year              AS syear,
> . . . . . . . . . . . . >                 d2.d_year              AS fsyear,
> . . . . . . . . . . . . >                 d3.d_year              s2year,
> . . . . . . . . . . . . >                 Count(*)               cnt,
> . . . . . . . . . . . . >                 Sum(ss_wholesale_cost) s1,
> . . . . . . . . . . . . >                 Sum(ss_list_price)     s2,
> . . . . . . . . . . . . >                 Sum(ss_coupon_amt)     s3
> . . . . . . . . . . . . >          FROM   store_sales,
> . . . . . . . . . . . . >                 store_returns,
> . . . . . . . . . . . . >                 cs_ui,
> . . . . . . . . . . . . >                 date_dim d1,
> . . . . . . . . . . . . >                 date_dim d2,
> . . . . . . . . . . . . >                 date_dim d3,
> . . . . . . . . . . . . >                 store,
> . . . . . . . . . . . . >                 customer,
> . . . . . . . . . . . . >                 customer_demographics cd1,
> . . . . . . . . . . . . >                 customer_demographics cd2,
> . . . . . . . . . . . . >                 promotion,
> . . . . . . . . . . . . >                 household_demographics hd1,
> . . . . . . . . . . . . >                 household_demographics hd2,
> . . . . . . . . . . . . >                 customer_address ad1,
> . . . . . . . . . . . . >                 customer_address ad2,
> . . . . . . . . . . . . >                 income_band ib1,
> . . . . . . . . . . . . >                 income_band ib2,
> . . . . . . . . . . . . >                 item
> . . . . . . . . . . . . >          WHERE  ss_store_sk = s_store_sk
> . . . . . . . . . . . . >                 AND ss_sold_date_sk = d1.d_date_sk
> . . . . . . . . . . . . >                 AND ss_customer_sk = c_customer_sk
> . . . . . . . . . . . . >                 AND ss_cdemo_sk = cd1.cd_demo_sk
> . . . . . . . . . . . . >                 AND ss_hdemo_sk = hd1.hd_demo_sk
> . . . . . . . . . . . . >                 AND ss_addr_sk = ad1.ca_address_sk
> . . . . . . . . . . . . >                 AND ss_item_sk = i_item_sk
> . . . . . . . . . . . . >                 AND ss_item_sk = sr_item_sk
> . . . . . . . . . . . . >                 AND ss_ticket_number = sr_ticket_number
> . . . . . . . . . . . . >                 AND ss_item_sk = cs_ui.cs_item_sk
> . . . . . . . . . . . . >                 AND c_current_cdemo_sk = cd2.cd_demo_sk
> . . . . . . . . . . . . >                 AND c_current_hdemo_sk = hd2.hd_demo_sk
> . . . . . . . . . . . . >                 AND c_current_addr_sk = ad2.ca_address_sk
> . . . . . . . . . . . . >                 AND c_first_sales_date_sk = d2.d_date_sk
> . . . . . . . . . . . . >                 AND c_first_shipto_date_sk = d3.d_date_sk
> . . . . . . . . . . . . >                 AND ss_promo_sk = p_promo_sk
> . . . . . . . . . . . . >                 AND hd1.hd_income_band_sk = ib1.ib_income_band_sk
> . . . . . . . . . . . . >                 AND hd2.hd_income_band_sk = ib2.ib_income_band_sk
> . . . . . . . . . . . . >                 AND cd1.cd_marital_status <> cd2.cd_marital_status
> . . . . . . . . . . . . >                 AND i_color IN ( 'cyan', 'peach', 'blush', 'frosted',
> . . . . . . . . . . . . >                                  'powder', 'orange' )
> . . . . . . . . . . . . >                 AND i_current_price BETWEEN 58 AND 58 + 10
> . . . . . . . . . . . . >                 AND i_current_price BETWEEN 58 + 1 AND 58 + 15
> . . . . . . . . . . . . >          GROUP  BY i_product_name,
> . . . . . . . . . . . . >                    i_item_sk,
> . . . . . . . . . . . . >                    s_store_name,
> . . . . . . . . . . . . >                    s_zip,
> . . . . . . . . . . . . >                    ad1.ca_street_number,
> . . . . . . . . . . . . >                    ad1.ca_street_name,
> . . . . . . . . . . . . >                    ad1.ca_city,
> . . . . . . . . . . . . >                    ad1.ca_zip,
> . . . . . . . . . . . . >                    ad2.ca_street_number,
> . . . . . . . . . . . . >                    ad2.ca_street_name,
> . . . . . . . . . . . . >                    ad2.ca_city,
> . . . . . . . . . . . . >                    ad2.ca_zip,
> . . . . . . . . . . . . >                    d1.d_year,
> . . . . . . . . . . . . >                    d2.d_year,
> . . . . . . . . . . . . >                    d3.d_year)
> . . . . . . . . . . . . > SELECT cs1.product_name,
> . . . . . . . . . . . . >        cs1.store_name,
> . . . . . . . . . . . . >        cs1.store_zip,
> . . . . . . . . . . . . >        cs1.b_street_number,
> . . . . . . . . . . . . >        cs1.b_streen_name,
> . . . . . . . . . . . . >        cs1.b_city,
> . . . . . . . . . . . . >        cs1.b_zip,
> . . . . . . . . . . . . >        cs1.c_street_number,
> . . . . . . . . . . . . >        cs1.c_street_name,
> . . . . . . . . . . . . >        cs1.c_city,
> . . . . . . . . . . . . >        cs1.c_zip,
> . . . . . . . . . . . . >        cs1.syear,
> . . . . . . . . . . . . >        cs1.cnt,
> . . . . . . . . . . . . >        cs1.s1,
> . . . . . . . . . . . . >        cs1.s2,
> . . . . . . . . . . . . >        cs1.s3,
> . . . . . . . . . . . . >        cs2.s1,
> . . . . . . . . . . . . >        cs2.s2,
> . . . . . . . . . . . . >        cs2.s3,
> . . . . . . . . . . . . >        cs2.syear,
> . . . . . . . . . . . . >        cs2.cnt
> . . . . . . . . . . . . > FROM   cross_sales cs1,
> . . . . . . . . . . . . >        cross_sales cs2
> . . . . . . . . . . . . > WHERE  cs1.item_sk = cs2.item_sk
> . . . . . . . . . . . . >        AND cs1.syear = 2001
> . . . . . . . . . . . . >        AND cs2.syear = 2001 + 1
> . . . . . . . . . . . . >        AND cs2.cnt <= cs1.cnt
> . . . . . . . . . . . . >        AND cs1.store_name = cs2.store_name
> . . . . . . . . . . . . >        AND cs1.store_zip = cs2.store_zip
> . . . . . . . . . . . . > ORDER  BY cs1.product_name,
> . . . . . . . . . . . . >           cs1.store_name,
> . . . . . . . . . . . . >           cs2.cnt;
> +----------------+-------------+------------+------------------+----------------+------------+--------+------------------+----------------+----------------+--------+--------+------+--------+-------+------+--------+---------+--------+---------+-------+
> |  product_name  | store_name  | store_zip  | b_street_number  | b_streen_name  |   b_city   | b_zip  | c_street_number  | c_street_name  |     c_city     | c_zip  | syear  | cnt  |   s1   |  s2   |  s3  |  s10   |   s20   |  s30   | syear0  | cnt0  |
> +----------------+-------------+------------+------------------+----------------+------------+--------+------------------+----------------+----------------+--------+--------+------+--------+-------+------+--------+---------+--------+---------+-------+
> | antin station  | ation       | 31904      | 483              | Maple          | Woodville  | 14289  | 600              | 13th Highland  | Spring Valley  | 36060  | 2001   | 1    | 33.13  | 63.6  | 0.0  | 97.04  | 112.56  | 15.11  | 2002    | 1     |
> +----------------+-------------+------------+------------------+----------------+------------+--------+------------------+----------------+----------------+--------+--------+------+--------+-------+------+--------+---------+--------+---------+-------+
> 1 row selected (46.24 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)