You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Robert Hou (JIRA)" <ji...@apache.org> on 2018/12/13 02:05:00 UTC

[jira] [Created] (DRILL-6902) Extra limit operator is not needed

Robert Hou created DRILL-6902:
---------------------------------

             Summary: Extra limit operator is not needed
                 Key: DRILL-6902
                 URL: https://issues.apache.org/jira/browse/DRILL-6902
             Project: Apache Drill
          Issue Type: Bug
          Components: Query Planning &amp; Optimization
    Affects Versions: 1.15.0
            Reporter: Robert Hou
            Assignee: Pritesh Maker


For TPCDS query 49, there is an extra limit operator that is not needed.

Here is the query:
{noformat}
SELECT 'web' AS channel, 
               web.item, 
               web.return_ratio, 
               web.return_rank, 
               web.currency_rank 
FROM   (SELECT item, 
               return_ratio, 
               currency_ratio, 
               Rank() 
                 OVER ( 
                   ORDER BY return_ratio)   AS return_rank, 
               Rank() 
                 OVER ( 
                   ORDER BY currency_ratio) AS currency_rank 
        FROM   (SELECT ws.ws_item_sk                                       AS 
                       item, 
                       ( Cast(Sum(COALESCE(wr.wr_return_quantity, 0)) AS DEC(15, 
                              4)) / 
                         Cast( 
                         Sum(COALESCE(ws.ws_quantity, 0)) AS DEC(15, 4)) ) AS 
                       return_ratio, 
                       ( Cast(Sum(COALESCE(wr.wr_return_amt, 0)) AS DEC(15, 4)) 
                         / Cast( 
                         Sum( 
                         COALESCE(ws.ws_net_paid, 0)) AS DEC(15, 
                         4)) )                                             AS 
                       currency_ratio 
                FROM   web_sales ws 
                       LEFT OUTER JOIN web_returns wr 
                                    ON ( ws.ws_order_number = wr.wr_order_number 
                                         AND ws.ws_item_sk = wr.wr_item_sk ), 
                       date_dim 
                WHERE  wr.wr_return_amt > 10000 
                       AND ws.ws_net_profit > 1 
                       AND ws.ws_net_paid > 0 
                       AND ws.ws_quantity > 0 
                       AND ws_sold_date_sk = d_date_sk 
                       AND d_year = 1999 
                       AND d_moy = 12 
                GROUP  BY ws.ws_item_sk) in_web) web 
WHERE  ( web.return_rank <= 10 
          OR web.currency_rank <= 10 ) 
UNION 
SELECT 'catalog' AS channel, 
       catalog.item, 
       catalog.return_ratio, 
       catalog.return_rank, 
       catalog.currency_rank 
FROM   (SELECT item, 
               return_ratio, 
               currency_ratio, 
               Rank() 
                 OVER ( 
                   ORDER BY return_ratio)   AS return_rank, 
               Rank() 
                 OVER ( 
                   ORDER BY currency_ratio) AS currency_rank 
        FROM   (SELECT cs.cs_item_sk                                       AS 
                       item, 
                       ( Cast(Sum(COALESCE(cr.cr_return_quantity, 0)) AS DEC(15, 
                              4)) / 
                         Cast( 
                         Sum(COALESCE(cs.cs_quantity, 0)) AS DEC(15, 4)) ) AS 
                       return_ratio, 
                       ( Cast(Sum(COALESCE(cr.cr_return_amount, 0)) AS DEC(15, 4 
                              )) / 
                         Cast(Sum( 
                         COALESCE(cs.cs_net_paid, 0)) AS DEC( 
                         15, 4)) )                                         AS 
                       currency_ratio 
                FROM   catalog_sales cs 
                       LEFT OUTER JOIN catalog_returns cr 
                                    ON ( cs.cs_order_number = cr.cr_order_number 
                                         AND cs.cs_item_sk = cr.cr_item_sk ), 
                       date_dim 
                WHERE  cr.cr_return_amount > 10000 
                       AND cs.cs_net_profit > 1 
                       AND cs.cs_net_paid > 0 
                       AND cs.cs_quantity > 0 
                       AND cs_sold_date_sk = d_date_sk 
                       AND d_year = 1999 
                       AND d_moy = 12 
                GROUP  BY cs.cs_item_sk) in_cat) catalog 
WHERE  ( catalog.return_rank <= 10 
          OR catalog.currency_rank <= 10 ) 
UNION 
SELECT 'store' AS channel, 
       store.item, 
       store.return_ratio, 
       store.return_rank, 
       store.currency_rank 
FROM   (SELECT item, 
               return_ratio, 
               currency_ratio, 
               Rank() 
                 OVER ( 
                   ORDER BY return_ratio)   AS return_rank, 
               Rank() 
                 OVER ( 
                   ORDER BY currency_ratio) AS currency_rank 
        FROM   (SELECT sts.ss_item_sk                                       AS 
                       item, 
                       ( Cast(Sum(COALESCE(sr.sr_return_quantity, 0)) AS DEC(15, 
                              4)) / 
                         Cast( 
                         Sum(COALESCE(sts.ss_quantity, 0)) AS DEC(15, 4)) ) AS 
                       return_ratio, 
                       ( Cast(Sum(COALESCE(sr.sr_return_amt, 0)) AS DEC(15, 4)) 
                         / Cast( 
                         Sum( 
                         COALESCE(sts.ss_net_paid, 0)) AS DEC(15, 4)) )     AS 
                       currency_ratio 
                FROM   store_sales sts 
                       LEFT OUTER JOIN store_returns sr 
                                    ON ( sts.ss_ticket_number = 
                                         sr.sr_ticket_number 
                                         AND sts.ss_item_sk = sr.sr_item_sk ), 
                       date_dim 
                WHERE  sr.sr_return_amt > 10000 
                       AND sts.ss_net_profit > 1 
                       AND sts.ss_net_paid > 0 
                       AND sts.ss_quantity > 0 
                       AND ss_sold_date_sk = d_date_sk 
                       AND d_year = 1999 
                       AND d_moy = 12 
                GROUP  BY sts.ss_item_sk) in_store) store 
WHERE  ( store.return_rank <= 10 
          OR store.currency_rank <= 10 ) 
ORDER  BY 1, 
          4, 
          5
LIMIT 100; 

{noformat}

Here is the top of the plan:
{noformat}
00-00    Screen : rowType = RecordType(CHAR(7) channel, ANY item, DECIMAL(35, 20) return_ratio, BIGINT return_rank, BIGINT currency_rank): rowcount = 100.0, cumulative cost = {1.5587382656934813E10 rows, 1.6644370208245007E11 cpu, 3.2256446355E10 io, 2.126707136508128E13 network, 1.9294197896272392E10 memory}, id = 33692
00-01      Project(channel=[$0], item=[$1], return_ratio=[$2], return_rank=[$3], currency_rank=[$4]) : rowType = RecordType(CHAR(7) channel, ANY item, DECIMAL(35, 20) return_ratio, BIGINT return_rank, BIGINT currency_rank): rowcount = 100.0, cumulative cost = {1.5587382646934813E10 rows, 1.6644370207245007E11 cpu, 3.2256446355E10 io, 2.126707136508128E13 network, 1.9294197896272392E10 memory}, id = 33691
00-02        SelectionVectorRemover : rowType = RecordType(CHAR(7) channel, ANY item, DECIMAL(35, 20) return_ratio, BIGINT return_rank, BIGINT currency_rank): rowcount = 100.0, cumulative cost = {1.5587382546934813E10 rows, 1.6644370157245007E11 cpu, 3.2256446355E10 io, 2.126707136508128E13 network, 1.9294197896272392E10 memory}, id = 33690
00-03          Limit(fetch=[100]) : rowType = RecordType(CHAR(7) channel, ANY item, DECIMAL(35, 20) return_ratio, BIGINT return_rank, BIGINT currency_rank): rowcount = 100.0, cumulative cost = {1.5587382446934813E10 rows, 1.6644370147245007E11 cpu, 3.2256446355E10 io, 2.126707136508128E13 network, 1.9294197896272392E10 memory}, id = 33689
00-04            Limit(fetch=[100]) : rowType = RecordType(CHAR(7) channel, ANY item, DECIMAL(35, 20) return_ratio, BIGINT return_rank, BIGINT currency_rank): rowcount = 100.0, cumulative cost = {1.5587382346934813E10 rows, 1.6644370107245007E11 cpu, 3.2256446355E10 io, 2.126707136508128E13 network, 1.9294197896272392E10 memory}, id = 33688
00-05              SelectionVectorRemover : rowType = RecordType(CHAR(7) channel, ANY item, DECIMAL(35, 20) return_ratio, BIGINT return_rank, BIGINT currency_rank): rowcount = 9067.461896625, cumulative cost = {1.5587382246934813E10 rows, 1.6644370067245007E11 cpu, 3.2256446355E10 io, 2.126707136508128E13 network, 1.9294197896272392E10 memory}, id = 33687
00-06                TopN(limit=[100]) : rowType = RecordType(CHAR(7) channel, ANY item, DECIMAL(35, 20) return_ratio, BIGINT return_rank, BIGINT currency_rank): rowcount = 9067.461896625, cumulative cost = {1.5587373179472916E10 rows, 1.664436916049882E11 cpu, 3.2256446355E10 io, 2.126707136508128E13 network, 1.9294197896272392E10 memory}, id = 33686
00-07                  HashAgg(group=[{0, 1, 2, 3, 4}]) : rowType = RecordType(CHAR(7) channel, ANY item, DECIMAL(35, 20) return_ratio, BIGINT return_rank, BIGINT currency_rank): rowcount = 9067.461896625, cumulative cost = {1.5587364112011019E10 rows, 1.6644296869003403E11 cpu, 3.2256446355E10 io, 2.126707136508128E13 network, 1.9294197896272392E10 memory}, id = 33685
00-08                    Project(channel=[$0], item=[$1], return_ratio=[$2], return_rank=[$3], currency_rank=[$4]) : rowType = RecordType(CHAR(7) channel, ANY item, DECIMAL(35, 20) return_ratio, BIGINT return_rank, BIGINT currency_rank): rowcount = 90674.61896625, cumulative cost = {1.5587273437392052E10 rows, 1.664393417052754E11 cpu, 3.2256446355E10 io, 2.126707136508128E13 network, 1.9289410276390976E10 memory}, id = 33684
00-09                      HashToRandomExchange(dist0=[[$0]], dist1=[[$1]], dist2=[[$2]], dist3=[[$3]], dist4=[[$4]]) : rowType = RecordType(CHAR(7) channel, ANY item, DECIMAL(35, 20) return_ratio, BIGINT return_rank, BIGINT currency_rank, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 90674.61896625, cumulative cost = {1.5587182762773085E10 rows, 1.6643888833218057E11 cpu, 3.2256446355E10 io, 2.126707136508128E13 network, 1.9289410276390976E10 memory}, id = 33683
{noformat}

There are two limit operators, 00-03 and 00-04.  Only one should be needed.





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)