You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2019/05/08 22:50:00 UTC

[jira] [Commented] (DRILL-7245) TPCDS queries 1, 45, 65, 97 are 3x slower when Statistics is enabled at sf 100

    [ https://issues.apache.org/jira/browse/DRILL-7245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16835950#comment-16835950 ] 

ASF GitHub Bot commented on DRILL-7245:
---------------------------------------

gparai commented on pull request #1786:  DRILL-7245: Cap NDV at row count after applying filters
URL: https://github.com/apache/drill/pull/1786
 
 
   @amansinha100 can you please review the commit 5f7eee6? Thanks!
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> TPCDS queries 1, 45, 65, 97 are 3x slower when Statistics is enabled at sf 100
> ------------------------------------------------------------------------------
>
>                 Key: DRILL-7245
>                 URL: https://issues.apache.org/jira/browse/DRILL-7245
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.16.0
>            Reporter: Gautam Parai
>            Assignee: Gautam Parai
>            Priority: Major
>             Fix For: 1.17.0
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Here is query 65:
> {noformat}
> SELECT s_store_name, 
>                i_item_desc, 
>                sc.revenue, 
>                i_current_price, 
>                i_wholesale_cost, 
>                i_brand 
> FROM   store, 
>        item, 
>        (SELECT ss_store_sk, 
>                Avg(revenue) AS ave 
>         FROM   (SELECT ss_store_sk, 
>                        ss_item_sk, 
>                        Sum(ss_sales_price) AS revenue 
>                 FROM   store_sales, 
>                        date_dim 
>                 WHERE  ss_sold_date_sk = d_date_sk 
>                        AND d_month_seq BETWEEN 1199 AND 1199 + 11 
>                 GROUP  BY ss_store_sk, 
>                           ss_item_sk) sa 
>         GROUP  BY ss_store_sk) sb, 
>        (SELECT ss_store_sk, 
>                ss_item_sk, 
>                Sum(ss_sales_price) AS revenue 
>         FROM   store_sales, 
>                date_dim 
>         WHERE  ss_sold_date_sk = d_date_sk 
>                AND d_month_seq BETWEEN 1199 AND 1199 + 11 
>         GROUP  BY ss_store_sk, 
>                   ss_item_sk) sc 
> WHERE  sb.ss_store_sk = sc.ss_store_sk 
>        AND sc.revenue <= 0.1 * sb.ave 
>        AND s_store_sk = sc.ss_store_sk 
>        AND i_item_sk = sc.ss_item_sk 
> ORDER  BY s_store_name, 
>           i_item_desc
> LIMIT 100; 
> {noformat}
> Here is the new plan.
> {noformat}
> 00-00    Screen : rowType = RecordType(ANY s_store_name, ANY i_item_desc, ANY revenue, ANY i_current_price, ANY i_wholesale_cost, ANY i_brand): rowcount = 100.0, cumulative cost = {1.170402776556728E9 rows, 9.49229008144672E9 cpu, 2.305289192E9 io, 5.9855814764732E10 network, 1.3320421632282749E8 memory}, id = 45433
> 00-01      Project(s_store_name=[$0], i_item_desc=[$1], revenue=[$2], i_current_price=[$3], i_wholesale_cost=[$4], i_brand=[$5]) : rowType = RecordType(ANY s_store_name, ANY i_item_desc, ANY revenue, ANY i_current_price, ANY i_wholesale_cost, ANY i_brand): rowcount = 100.0, cumulative cost = {1.170402766556728E9 rows, 9.49229007144672E9 cpu, 2.305289192E9 io, 5.9855814764732E10 network, 1.3320421632282749E8 memory}, id = 45432
> 00-02        SelectionVectorRemover : rowType = RecordType(ANY s_store_name, ANY i_item_desc, ANY revenue, ANY i_current_price, ANY i_wholesale_cost, ANY i_brand): rowcount = 100.0, cumulative cost = {1.170402666556728E9 rows, 9.49228947144672E9 cpu, 2.305289192E9 io, 5.9855814764732E10 network, 1.3320421632282749E8 memory}, id = 45431
> 00-03          Limit(fetch=[100]) : rowType = RecordType(ANY s_store_name, ANY i_item_desc, ANY revenue, ANY i_current_price, ANY i_wholesale_cost, ANY i_brand): rowcount = 100.0, cumulative cost = {1.170402566556728E9 rows, 9.49228937144672E9 cpu, 2.305289192E9 io, 5.9855814764732E10 network, 1.3320421632282749E8 memory}, id = 45430
> 00-04            SingleMergeExchange(sort0=[0], sort1=[1]) : rowType = RecordType(ANY s_store_name, ANY i_item_desc, ANY revenue, ANY i_current_price, ANY i_wholesale_cost, ANY i_brand): rowcount = 100.0, cumulative cost = {1.170402466556728E9 rows, 9.49228897144672E9 cpu, 2.305289192E9 io, 5.9855814764732E10 network, 1.3320421632282749E8 memory}, id = 45429
> 01-01              OrderedMuxExchange(sort0=[0], sort1=[1]) : rowType = RecordType(ANY s_store_name, ANY i_item_desc, ANY revenue, ANY i_current_price, ANY i_wholesale_cost, ANY i_brand): rowcount = 100.0, cumulative cost = {1.170402366556728E9 rows, 9.492286842675482E9 cpu, 2.305289192E9 io, 5.9853357164732E10 network, 1.3320421632282749E8 memory}, id = 45428
> 02-01                SelectionVectorRemover : rowType = RecordType(ANY s_store_name, ANY i_item_desc, ANY revenue, ANY i_current_price, ANY i_wholesale_cost, ANY i_brand): rowcount = 100.0, cumulative cost = {1.170402266556728E9 rows, 9.492286742675482E9 cpu, 2.305289192E9 io, 5.9853357164732E10 network, 1.3320421632282749E8 memory}, id = 45427
> 02-02                  Limit(fetch=[100]) : rowType = RecordType(ANY s_store_name, ANY i_item_desc, ANY revenue, ANY i_current_price, ANY i_wholesale_cost, ANY i_brand): rowcount = 100.0, cumulative cost = {1.170402166556728E9 rows, 9.492286642675482E9 cpu, 2.305289192E9 io, 5.9853357164732E10 network, 1.3320421632282749E8 memory}, id = 45426
> 02-03                    SelectionVectorRemover : rowType = RecordType(ANY s_store_name, ANY i_item_desc, ANY revenue, ANY i_current_price, ANY i_wholesale_cost, ANY i_brand): rowcount = 416478.4359081372, cumulative cost = {1.170402066556728E9 rows, 9.492286242675482E9 cpu, 2.305289192E9 io, 5.9853357164732E10 network, 1.3320421632282749E8 memory}, id = 45425
> 02-04                      TopN(limit=[100]) : rowType = RecordType(ANY s_store_name, ANY i_item_desc, ANY revenue, ANY i_current_price, ANY i_wholesale_cost, ANY i_brand): rowcount = 416478.4359081372, cumulative cost = {1.1699855881208198E9 rows, 9.491869764239574E9 cpu, 2.305289192E9 io, 5.9853357164732E10 network, 1.3320421632282749E8 memory}, id = 45424
> 02-05                        HashToRandomExchange(dist0=[[$0]], dist1=[[$1]]) : rowType = RecordType(ANY s_store_name, ANY i_item_desc, ANY revenue, ANY i_current_price, ANY i_wholesale_cost, ANY i_brand): rowcount = 416478.4359081372, cumulative cost = {1.1695691096849117E9 rows, 9.469733581565046E9 cpu, 2.305289192E9 io, 5.9853357164732E10 network, 1.3320421632282749E8 memory}, id = 45423
> 03-01                          Project(s_store_name=[$1], i_item_desc=[$3], revenue=[$11], i_current_price=[$4], i_wholesale_cost=[$5], i_brand=[$6]) : rowType = RecordType(ANY s_store_name, ANY i_item_desc, ANY revenue, ANY i_current_price, ANY i_wholesale_cost, ANY i_brand): rowcount = 416478.4359081372, cumulative cost = {1.1691526312490036E9 rows, 9.464735840334148E9 cpu, 2.305289192E9 io, 4.961798312385362E10 network, 1.3320421632282749E8 memory}, id = 45422
> 03-02                            SelectionVectorRemover : rowType = RecordType(ANY s_store_sk, ANY s_store_name, ANY i_item_sk, ANY i_item_desc, ANY i_current_price, ANY i_wholesale_cost, ANY i_brand, ANY ss_store_sk, ANY ave, ANY ss_store_sk0, ANY ss_item_sk, ANY revenue): rowcount = 416478.4359081372, cumulative cost = {1.1687361528130956E9 rows, 9.4622369697187E9 cpu, 2.305289192E9 io, 4.961798312385362E10 network, 1.3320421632282749E8 memory}, id = 45421
> 03-03                              Filter(condition=[<=($11, *(0.1, $8))]) : rowType = RecordType(ANY s_store_sk, ANY s_store_name, ANY i_item_sk, ANY i_item_desc, ANY i_current_price, ANY i_wholesale_cost, ANY i_brand, ANY ss_store_sk, ANY ave, ANY ss_store_sk0, ANY ss_item_sk, ANY revenue): rowcount = 416478.4359081372, cumulative cost = {1.1683196743771875E9 rows, 9.461820491282793E9 cpu, 2.305289192E9 io, 4.961798312385362E10 network, 1.3320421632282749E8 memory}, id = 45420
> 03-04                                Project(s_store_sk=[$5], s_store_name=[$6], i_item_sk=[$7], i_item_desc=[$8], i_current_price=[$9], i_wholesale_cost=[$10], i_brand=[$11], ss_store_sk=[$3], ave=[$4], ss_store_sk0=[$0], ss_item_sk=[$1], revenue=[$2]) : rowType = RecordType(ANY s_store_sk, ANY s_store_name, ANY i_item_sk, ANY i_item_desc, ANY i_current_price, ANY i_wholesale_cost, ANY i_brand, ANY ss_store_sk, ANY ave, ANY ss_store_sk0, ANY ss_item_sk, ANY revenue): rowcount = 832956.8718162744, cumulative cost = {1.1674867175053713E9 rows, 9.456822750051895E9 cpu, 2.305289192E9 io, 4.961798312385362E10 network, 1.3320421632282749E8 memory}, id = 45419
> 03-05                                  HashJoin(condition=[=($7, $1)], joinType=[inner], semi-join: =[false]) : rowType = RecordType(ANY ss_store_sk, ANY ss_item_sk, ANY revenue, ANY ss_store_sk0, ANY ave, ANY s_store_sk, ANY s_store_name, ANY i_item_sk, ANY i_item_desc, ANY i_current_price, ANY i_wholesale_cost, ANY i_brand): rowcount = 832956.8718162744, cumulative cost = {1.1666537606335552E9 rows, 9.4468272675901E9 cpu, 2.305289192E9 io, 4.961798312385362E10 network, 1.3320421632282749E8 memory}, id = 45418
> 03-07                                    HashToRandomExchange(dist0=[[$1]]) : rowType = RecordType(ANY ss_store_sk, ANY ss_item_sk, ANY revenue, ANY ss_store_sk0, ANY ave, ANY s_store_sk, ANY s_store_name): rowcount = 832377.0685035395, cumulative cost = {1.1652093835650516E9 rows, 9.430922742768057E9 cpu, 2.304269192E9 io, 4.544006312385362E10 network, 1.2961381632282749E8 memory}, id = 45415
> 04-01                                      HashJoin(condition=[=($5, $0)], joinType=[inner], semi-join: =[false]) : rowType = RecordType(ANY ss_store_sk, ANY ss_item_sk, ANY revenue, ANY ss_store_sk0, ANY ave, ANY s_store_sk, ANY s_store_name): rowcount = 832377.0685035395, cumulative cost = {1.164377006496548E9 rows, 9.417604709672E9 cpu, 2.304269192E9 io, 2.157414781572014E10 network, 1.2961381632282749E8 memory}, id = 45414
> 04-03                                        HashJoin(condition=[=($3, $0)], joinType=[inner], semi-join: =[false]) : rowType = RecordType(ANY ss_store_sk, ANY ss_item_sk, ANY revenue, ANY ss_store_sk0, ANY ave): rowcount = 832377.0685035397, cumulative cost = {1.1635434234280443E9 rows, 9.407608948849958E9 cpu, 2.304268388E9 io, 2.1544179841320137E10 network, 1.2960674112282749E8 memory}, id = 45411
> 04-05                                          HashAgg(group=[{0, 1}], revenue=[SUM($2)]) : rowType = RecordType(ANY ss_store_sk, ANY ss_item_sk, ANY revenue): rowcount = 832377.0685035397, cumulative cost = {5.804395044044166E8 rows, 4.68265977388499E9 cpu, 1.152134194E9 io, 1.0253148182571495E10 network, 5.674418293829948E7 memory}, id = 45393
> 04-07                                            HashToRandomExchange(dist0=[[$0]], dist1=[[$1]]) : rowType = RecordType(ANY ss_store_sk, ANY ss_item_sk, ANY revenue): rowcount = 832377.0685035397, cumulative cost = {5.796071273359131E8 rows, 4.65935321596689E9 cpu, 1.152134194E9 io, 1.0253148182571495E10 network, 3.476942832980602E7 memory}, id = 45392
> 06-01                                              HashAgg(group=[{0, 1}], revenue=[SUM($2)]) : rowType = RecordType(ANY ss_store_sk, ANY ss_item_sk, ANY revenue): rowcount = 832377.0685035397, cumulative cost = {5.787747502674096E8 rows, 4.649364691144848E9 cpu, 1.152134194E9 io, 2.48987648E7 network, 3.476942832980602E7 memory}, id = 45391
> 06-02                                                Project(ss_store_sk=[$2], ss_item_sk=[$1], ss_sales_price=[$3]) : rowType = RecordType(ANY ss_store_sk, ANY ss_item_sk, ANY ss_sales_price): rowcount = 1316801.1337047734, cumulative cost = {5.774579491337048E8 rows, 4.612494259401114E9 cpu, 1.152134194E9 io, 2.48987648E7 network, 5878.400000000001 memory}, id = 45390
> 06-03                                                  HashJoin(condition=[=($0, $4)], joinType=[inner], semi-join: =[false]) : rowType = RecordType(ANY ss_sold_date_sk, ANY ss_item_sk, ANY ss_store_sk, ANY ss_sales_price, ANY d_date_sk, ANY d_month_seq): rowcount = 1316801.1337047734, cumulative cost = {5.76141148E8 rows, 4.608543856E9 cpu, 1.152134194E9 io, 2.48987648E7 network, 5878.400000000001 memory}, id = 45389
> 06-05                                                    Scan(table=[[dfs, /tpcdsParquet10/SF100/store_sales]], groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:///tpcdsParquet10/SF100/store_sales]], selectionRoot=maprfs:/tpcdsParquet10/SF100/store_sales, numFiles=1, numRowGroups=166, usedMetadataFile=false, columns=[`ss_sold_date_sk`, `ss_item_sk`, `ss_store_sk`, `ss_sales_price`]]]) : rowType = RecordType(ANY ss_sold_date_sk, ANY ss_item_sk, ANY ss_store_sk, ANY ss_sales_price): rowcount = 2.87997024E8, cumulative cost = {2.87997024E8 rows, 1.151988096E9 cpu, 1.151988096E9 io, 0.0 network, 0.0 memory}, id = 45384
> 06-04                                                    BroadcastExchange : rowType = RecordType(ANY d_date_sk, ANY d_month_seq): rowcount = 334.0, cumulative cost = {146766.0 rows, 588800.0 cpu, 146098.0 io, 2.48987648E7 network, 0.0 memory}, id = 45388
> 09-01                                                      SelectionVectorRemover : rowType = RecordType(ANY d_date_sk, ANY d_month_seq): rowcount = 334.0, cumulative cost = {146432.0 rows, 586128.0 cpu, 146098.0 io, 0.0 network, 0.0 memory}, id = 45387
> 09-02                                                        Filter(condition=[AND(>=($1, 1199), <=($1, 1210))]) : rowType = RecordType(ANY d_date_sk, ANY d_month_seq): rowcount = 334.0, cumulative cost = {146098.0 rows, 585794.0 cpu, 146098.0 io, 0.0 network, 0.0 memory}, id = 45386
> 09-03                                                          Scan(table=[[dfs, /tpcdsParquet10/SF100/date_dim]], groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:///tpcdsParquet10/SF100/date_dim]], selectionRoot=maprfs:/tpcdsParquet10/SF100/date_dim, numFiles=1, numRowGroups=1, usedMetadataFile=false, filter=booleanAnd(greater_than_or_equal_to(`d_month_seq`, 1199) , less_than_or_equal_to(`d_month_seq`, 1210) ) , columns=[`d_date_sk`, `d_month_seq`]]]) : rowType = RecordType(ANY d_date_sk, ANY d_month_seq): rowcount = 73049.0, cumulative cost = {73049.0 rows, 146098.0 cpu, 146098.0 io, 0.0 network, 0.0 memory}, id = 45385
> 04-04                                          Project(ss_store_sk0=[$0], ave=[$1]) : rowType = RecordType(ANY ss_store_sk0, ANY ave): rowcount = 202.0, cumulative cost = {5.822713399551244E8 rows, 4.714959034142926E9 cpu, 1.152134194E9 io, 1.1291031658748644E10 network, 7.2859002984528E7 memory}, id = 45410
> 04-06                                            BroadcastExchange : rowType = RecordType(ANY ss_store_sk, ANY ave): rowcount = 202.0, cumulative cost = {5.822711379551244E8 rows, 4.714958630142926E9 cpu, 1.152134194E9 io, 1.1291031658748644E10 network, 7.2859002984528E7 memory}, id = 45409
> 07-01                                              Project(ss_store_sk=[$0], ave=[divide(CastHigh(CASE(=($2, 0), null, $1)), $2)]) : rowType = RecordType(ANY ss_store_sk, ANY ave): rowcount = 202.0, cumulative cost = {5.822709359551244E8 rows, 4.714957014142926E9 cpu, 1.152134194E9 io, 1.1275973124348644E10 network, 7.2859002984528E7 memory}, id = 45408
> 07-02                                                HashAgg(group=[{0}], agg#0=[$SUM0($1)], agg#1=[$SUM0($2)]) : rowType = RecordType(ANY ss_store_sk, ANY $f1, BIGINT $f2): rowcount = 202.0, cumulative cost = {5.822707339551244E8 rows, 4.714956004142926E9 cpu, 1.152134194E9 io, 1.1275973124348644E10 network, 7.2859002984528E7 memory}, id = 45407
> 07-03                                                  HashToRandomExchange(dist0=[[$0]]) : rowType = RecordType(ANY ss_store_sk, ANY $f1, BIGINT $f2): rowcount = 83237.70685035396, cumulative cost = {5.82187496248274E8 rows, 4.712292397523715E9 cpu, 1.152134194E9 io, 1.1275973124348644E10 network, 7.139401934396178E7 memory}, id = 45406
> 10-01                                                    HashAgg(group=[{0}], agg#0=[$SUM0($1)], agg#1=[COUNT($1)]) : rowType = RecordType(ANY ss_store_sk, ANY $f1, BIGINT $f2): rowcount = 83237.70685035396, cumulative cost = {5.821042585414236E8 rows, 4.710960594214109E9 cpu, 1.152134194E9 io, 1.0253148182571495E10 network, 7.139401934396178E7 memory}, id = 45405
> 10-02                                                      Project(ss_store_sk=[$0], revenue=[$2]) : rowType = RecordType(ANY ss_store_sk, ANY revenue): rowcount = 832377.0685035397, cumulative cost = {5.812718814729201E8 rows, 4.6843245280219965E9 cpu, 1.152134194E9 io, 1.0253148182571495E10 network, 5.674418293829948E7 memory}, id = 45404
> 10-03                                                        HashAgg(group=[{0, 1}], revenue=[SUM($2)]) : rowType = RecordType(ANY ss_store_sk, ANY ss_item_sk, ANY revenue): rowcount = 832377.0685035397, cumulative cost = {5.804395044044166E8 rows, 4.68265977388499E9 cpu, 1.152134194E9 io, 1.0253148182571495E10 network, 5.674418293829948E7 memory}, id = 45403
> 10-04                                                          HashToRandomExchange(dist0=[[$0]], dist1=[[$1]]) : rowType = RecordType(ANY ss_store_sk, ANY ss_item_sk, ANY revenue): rowcount = 832377.0685035397, cumulative cost = {5.796071273359131E8 rows, 4.65935321596689E9 cpu, 1.152134194E9 io, 1.0253148182571495E10 network, 3.476942832980602E7 memory}, id = 45402
> 11-01                                                            HashAgg(group=[{0, 1}], revenue=[SUM($2)]) : rowType = RecordType(ANY ss_store_sk, ANY ss_item_sk, ANY revenue): rowcount = 832377.0685035397, cumulative cost = {5.787747502674096E8 rows, 4.649364691144848E9 cpu, 1.152134194E9 io, 2.48987648E7 network, 3.476942832980602E7 memory}, id = 45401
> 11-02                                                              Project(ss_store_sk=[$2], ss_item_sk=[$1], ss_sales_price=[$3]) : rowType = RecordType(ANY ss_store_sk, ANY ss_item_sk, ANY ss_sales_price): rowcount = 1316801.1337047734, cumulative cost = {5.774579491337048E8 rows, 4.612494259401114E9 cpu, 1.152134194E9 io, 2.48987648E7 network, 5878.400000000001 memory}, id = 45400
> 11-03                                                                HashJoin(condition=[=($0, $4)], joinType=[inner], semi-join: =[false]) : rowType = RecordType(ANY ss_sold_date_sk, ANY ss_item_sk, ANY ss_store_sk, ANY ss_sales_price, ANY d_date_sk, ANY d_month_seq): rowcount = 1316801.1337047734, cumulative cost = {5.76141148E8 rows, 4.608543856E9 cpu, 1.152134194E9 io, 2.48987648E7 network, 5878.400000000001 memory}, id = 45399
> 11-05                                                                  Scan(table=[[dfs, /tpcdsParquet10/SF100/store_sales]], groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:///tpcdsParquet10/SF100/store_sales]], selectionRoot=maprfs:/tpcdsParquet10/SF100/store_sales, numFiles=1, numRowGroups=166, usedMetadataFile=false, columns=[`ss_sold_date_sk`, `ss_item_sk`, `ss_store_sk`, `ss_sales_price`]]]) : rowType = RecordType(ANY ss_sold_date_sk, ANY ss_item_sk, ANY ss_store_sk, ANY ss_sales_price): rowcount = 2.87997024E8, cumulative cost = {2.87997024E8 rows, 1.151988096E9 cpu, 1.151988096E9 io, 0.0 network, 0.0 memory}, id = 45394
> 11-04                                                                  BroadcastExchange : rowType = RecordType(ANY d_date_sk, ANY d_month_seq): rowcount = 334.0, cumulative cost = {146766.0 rows, 588800.0 cpu, 146098.0 io, 2.48987648E7 network, 0.0 memory}, id = 45398
> 12-01                                                                    SelectionVectorRemover : rowType = RecordType(ANY d_date_sk, ANY d_month_seq): rowcount = 334.0, cumulative cost = {146432.0 rows, 586128.0 cpu, 146098.0 io, 0.0 network, 0.0 memory}, id = 45397
> 12-02                                                                      Filter(condition=[AND(>=($1, 1199), <=($1, 1210))]) : rowType = RecordType(ANY d_date_sk, ANY d_month_seq): rowcount = 334.0, cumulative cost = {146098.0 rows, 585794.0 cpu, 146098.0 io, 0.0 network, 0.0 memory}, id = 45396
> 12-03                                                                        Scan(table=[[dfs, /tpcdsParquet10/SF100/date_dim]], groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:///tpcdsParquet10/SF100/date_dim]], selectionRoot=maprfs:/tpcdsParquet10/SF100/date_dim, numFiles=1, numRowGroups=1, usedMetadataFile=false, filter=booleanAnd(greater_than_or_equal_to(`d_month_seq`, 1199) , less_than_or_equal_to(`d_month_seq`, 1210) ) , columns=[`d_date_sk`, `d_month_seq`]]]) : rowType = RecordType(ANY d_date_sk, ANY d_month_seq): rowcount = 73049.0, cumulative cost = {73049.0 rows, 146098.0 cpu, 146098.0 io, 0.0 network, 0.0 memory}, id = 45395
> 04-02                                        BroadcastExchange : rowType = RecordType(ANY s_store_sk, ANY s_store_name): rowcount = 402.0, cumulative cost = {804.0 rows, 4020.0 cpu, 804.0 io, 2.99679744E7 network, 0.0 memory}, id = 45413
> 08-01                                          Scan(table=[[dfs, /tpcdsParquet10/SF100/store]], groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:///tpcdsParquet10/SF100/store]], selectionRoot=maprfs:/tpcdsParquet10/SF100/store, numFiles=1, numRowGroups=1, usedMetadataFile=false, columns=[`s_store_sk`, `s_store_name`]]]) : rowType = RecordType(ANY s_store_sk, ANY s_store_name): rowcount = 402.0, cumulative cost = {402.0 rows, 804.0 cpu, 804.0 io, 0.0 network, 0.0 memory}, id = 45412
> 03-06                                    HashToRandomExchange(dist0=[[$0]]) : rowType = RecordType(ANY i_item_sk, ANY i_item_desc, ANY i_current_price, ANY i_wholesale_cost, ANY i_brand): rowcount = 204000.0, cumulative cost = {408000.0 rows, 4284000.0 cpu, 1020000.0 io, 4.17792E9 network, 0.0 memory}, id = 45417
> 05-01                                      Scan(table=[[dfs, /tpcdsParquet10/SF100/item]], groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:///tpcdsParquet10/SF100/item]], selectionRoot=maprfs:/tpcdsParquet10/SF100/item, numFiles=1, numRowGroups=1, usedMetadataFile=false, columns=[`i_item_sk`, `i_item_desc`, `i_current_price`, `i_wholesale_cost`, `i_brand`]]]) : rowType = RecordType(ANY i_item_sk, ANY i_item_desc, ANY i_current_price, ANY i_wholesale_cost, ANY i_brand): rowcount = 204000.0, cumulative cost = {204000.0 rows, 1020000.0 cpu, 1020000.0 io, 0.0 network, 0.0 memory}, id = 45416
> {noformat}
> For query 65, there are two HashAggs (operator 04-05 and 10-03) that are taking a long time. Their rowcounts are incorrect. Operator 04-05 is estimated to be 800K and is actually 55M. Operator 10-03 is estimated to be 800K and is also 55M. As a result, these two operators only have 9 minor fragments in the new plan. This may be caused by the HashJoin rowcount for operator (06-03). The estimated rowcount for the HashJoin is 1.3M rows but the actual rowcount is 55M rows.
> For query 1, hash join operator 03-04 may have a similar problem.
> For query 45, hash join operator 16-02 may have a similar problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)