You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Robert Hou (JIRA)" <ji...@apache.org> on 2018/12/13 02:05:00 UTC
[jira] [Created] (DRILL-6902) Extra limit operator is not needed
Robert Hou created DRILL-6902:
---------------------------------
Summary: Extra limit operator is not needed
Key: DRILL-6902
URL: https://issues.apache.org/jira/browse/DRILL-6902
Project: Apache Drill
Issue Type: Bug
Components: Query Planning & Optimization
Affects Versions: 1.15.0
Reporter: Robert Hou
Assignee: Pritesh Maker
For TPCDS query 49, there is an extra limit operator that is not needed.
Here is the query:
{noformat}
SELECT 'web' AS channel,
web.item,
web.return_ratio,
web.return_rank,
web.currency_rank
FROM (SELECT item,
return_ratio,
currency_ratio,
Rank()
OVER (
ORDER BY return_ratio) AS return_rank,
Rank()
OVER (
ORDER BY currency_ratio) AS currency_rank
FROM (SELECT ws.ws_item_sk AS
item,
( Cast(Sum(COALESCE(wr.wr_return_quantity, 0)) AS DEC(15,
4)) /
Cast(
Sum(COALESCE(ws.ws_quantity, 0)) AS DEC(15, 4)) ) AS
return_ratio,
( Cast(Sum(COALESCE(wr.wr_return_amt, 0)) AS DEC(15, 4))
/ Cast(
Sum(
COALESCE(ws.ws_net_paid, 0)) AS DEC(15,
4)) ) AS
currency_ratio
FROM web_sales ws
LEFT OUTER JOIN web_returns wr
ON ( ws.ws_order_number = wr.wr_order_number
AND ws.ws_item_sk = wr.wr_item_sk ),
date_dim
WHERE wr.wr_return_amt > 10000
AND ws.ws_net_profit > 1
AND ws.ws_net_paid > 0
AND ws.ws_quantity > 0
AND ws_sold_date_sk = d_date_sk
AND d_year = 1999
AND d_moy = 12
GROUP BY ws.ws_item_sk) in_web) web
WHERE ( web.return_rank <= 10
OR web.currency_rank <= 10 )
UNION
SELECT 'catalog' AS channel,
catalog.item,
catalog.return_ratio,
catalog.return_rank,
catalog.currency_rank
FROM (SELECT item,
return_ratio,
currency_ratio,
Rank()
OVER (
ORDER BY return_ratio) AS return_rank,
Rank()
OVER (
ORDER BY currency_ratio) AS currency_rank
FROM (SELECT cs.cs_item_sk AS
item,
( Cast(Sum(COALESCE(cr.cr_return_quantity, 0)) AS DEC(15,
4)) /
Cast(
Sum(COALESCE(cs.cs_quantity, 0)) AS DEC(15, 4)) ) AS
return_ratio,
( Cast(Sum(COALESCE(cr.cr_return_amount, 0)) AS DEC(15, 4
)) /
Cast(Sum(
COALESCE(cs.cs_net_paid, 0)) AS DEC(
15, 4)) ) AS
currency_ratio
FROM catalog_sales cs
LEFT OUTER JOIN catalog_returns cr
ON ( cs.cs_order_number = cr.cr_order_number
AND cs.cs_item_sk = cr.cr_item_sk ),
date_dim
WHERE cr.cr_return_amount > 10000
AND cs.cs_net_profit > 1
AND cs.cs_net_paid > 0
AND cs.cs_quantity > 0
AND cs_sold_date_sk = d_date_sk
AND d_year = 1999
AND d_moy = 12
GROUP BY cs.cs_item_sk) in_cat) catalog
WHERE ( catalog.return_rank <= 10
OR catalog.currency_rank <= 10 )
UNION
SELECT 'store' AS channel,
store.item,
store.return_ratio,
store.return_rank,
store.currency_rank
FROM (SELECT item,
return_ratio,
currency_ratio,
Rank()
OVER (
ORDER BY return_ratio) AS return_rank,
Rank()
OVER (
ORDER BY currency_ratio) AS currency_rank
FROM (SELECT sts.ss_item_sk AS
item,
( Cast(Sum(COALESCE(sr.sr_return_quantity, 0)) AS DEC(15,
4)) /
Cast(
Sum(COALESCE(sts.ss_quantity, 0)) AS DEC(15, 4)) ) AS
return_ratio,
( Cast(Sum(COALESCE(sr.sr_return_amt, 0)) AS DEC(15, 4))
/ Cast(
Sum(
COALESCE(sts.ss_net_paid, 0)) AS DEC(15, 4)) ) AS
currency_ratio
FROM store_sales sts
LEFT OUTER JOIN store_returns sr
ON ( sts.ss_ticket_number =
sr.sr_ticket_number
AND sts.ss_item_sk = sr.sr_item_sk ),
date_dim
WHERE sr.sr_return_amt > 10000
AND sts.ss_net_profit > 1
AND sts.ss_net_paid > 0
AND sts.ss_quantity > 0
AND ss_sold_date_sk = d_date_sk
AND d_year = 1999
AND d_moy = 12
GROUP BY sts.ss_item_sk) in_store) store
WHERE ( store.return_rank <= 10
OR store.currency_rank <= 10 )
ORDER BY 1,
4,
5
LIMIT 100;
{noformat}
Here is the top of the plan:
{noformat}
00-00 Screen : rowType = RecordType(CHAR(7) channel, ANY item, DECIMAL(35, 20) return_ratio, BIGINT return_rank, BIGINT currency_rank): rowcount = 100.0, cumulative cost = {1.5587382656934813E10 rows, 1.6644370208245007E11 cpu, 3.2256446355E10 io, 2.126707136508128E13 network, 1.9294197896272392E10 memory}, id = 33692
00-01 Project(channel=[$0], item=[$1], return_ratio=[$2], return_rank=[$3], currency_rank=[$4]) : rowType = RecordType(CHAR(7) channel, ANY item, DECIMAL(35, 20) return_ratio, BIGINT return_rank, BIGINT currency_rank): rowcount = 100.0, cumulative cost = {1.5587382646934813E10 rows, 1.6644370207245007E11 cpu, 3.2256446355E10 io, 2.126707136508128E13 network, 1.9294197896272392E10 memory}, id = 33691
00-02 SelectionVectorRemover : rowType = RecordType(CHAR(7) channel, ANY item, DECIMAL(35, 20) return_ratio, BIGINT return_rank, BIGINT currency_rank): rowcount = 100.0, cumulative cost = {1.5587382546934813E10 rows, 1.6644370157245007E11 cpu, 3.2256446355E10 io, 2.126707136508128E13 network, 1.9294197896272392E10 memory}, id = 33690
00-03 Limit(fetch=[100]) : rowType = RecordType(CHAR(7) channel, ANY item, DECIMAL(35, 20) return_ratio, BIGINT return_rank, BIGINT currency_rank): rowcount = 100.0, cumulative cost = {1.5587382446934813E10 rows, 1.6644370147245007E11 cpu, 3.2256446355E10 io, 2.126707136508128E13 network, 1.9294197896272392E10 memory}, id = 33689
00-04 Limit(fetch=[100]) : rowType = RecordType(CHAR(7) channel, ANY item, DECIMAL(35, 20) return_ratio, BIGINT return_rank, BIGINT currency_rank): rowcount = 100.0, cumulative cost = {1.5587382346934813E10 rows, 1.6644370107245007E11 cpu, 3.2256446355E10 io, 2.126707136508128E13 network, 1.9294197896272392E10 memory}, id = 33688
00-05 SelectionVectorRemover : rowType = RecordType(CHAR(7) channel, ANY item, DECIMAL(35, 20) return_ratio, BIGINT return_rank, BIGINT currency_rank): rowcount = 9067.461896625, cumulative cost = {1.5587382246934813E10 rows, 1.6644370067245007E11 cpu, 3.2256446355E10 io, 2.126707136508128E13 network, 1.9294197896272392E10 memory}, id = 33687
00-06 TopN(limit=[100]) : rowType = RecordType(CHAR(7) channel, ANY item, DECIMAL(35, 20) return_ratio, BIGINT return_rank, BIGINT currency_rank): rowcount = 9067.461896625, cumulative cost = {1.5587373179472916E10 rows, 1.664436916049882E11 cpu, 3.2256446355E10 io, 2.126707136508128E13 network, 1.9294197896272392E10 memory}, id = 33686
00-07 HashAgg(group=[{0, 1, 2, 3, 4}]) : rowType = RecordType(CHAR(7) channel, ANY item, DECIMAL(35, 20) return_ratio, BIGINT return_rank, BIGINT currency_rank): rowcount = 9067.461896625, cumulative cost = {1.5587364112011019E10 rows, 1.6644296869003403E11 cpu, 3.2256446355E10 io, 2.126707136508128E13 network, 1.9294197896272392E10 memory}, id = 33685
00-08 Project(channel=[$0], item=[$1], return_ratio=[$2], return_rank=[$3], currency_rank=[$4]) : rowType = RecordType(CHAR(7) channel, ANY item, DECIMAL(35, 20) return_ratio, BIGINT return_rank, BIGINT currency_rank): rowcount = 90674.61896625, cumulative cost = {1.5587273437392052E10 rows, 1.664393417052754E11 cpu, 3.2256446355E10 io, 2.126707136508128E13 network, 1.9289410276390976E10 memory}, id = 33684
00-09 HashToRandomExchange(dist0=[[$0]], dist1=[[$1]], dist2=[[$2]], dist3=[[$3]], dist4=[[$4]]) : rowType = RecordType(CHAR(7) channel, ANY item, DECIMAL(35, 20) return_ratio, BIGINT return_rank, BIGINT currency_rank, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 90674.61896625, cumulative cost = {1.5587182762773085E10 rows, 1.6643888833218057E11 cpu, 3.2256446355E10 io, 2.126707136508128E13 network, 1.9289410276390976E10 memory}, id = 33683
{noformat}
There are two limit operators, 00-03 and 00-04. Only one should be needed.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)