You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2022/12/08 19:16:37 UTC
[GitHub] [pinot] walterddr opened a new issue, #9944: [multistage][opt] optimizer not correctly prune out columns
walterddr opened a new issue, #9944:
URL: https://github.com/apache/pinot/issues/9944
see:
```
SELECT
count(LO_ORDERKEY)
FROM
lineorder JOIN customer ON LO_CUSTKEY = C_CUSTKEY
WHERE
LO_QUANTITY < 10 AND C_REGION = 'AFRICA'
LogicalAggregate(group=[{}], EXPR$0=[$SUM0($0)]): rowcount = 1.0, cumulative cost = 17819.155154440043, id = 44195
LogicalExchange(distribution=[hash]): rowcount = 1.0, cumulative cost = 17818.030154440043, id = 44193
LogicalAggregate(group=[{}], EXPR$0=[COUNT()]): rowcount = 1.0, cumulative cost = 17814.030154440043, id = 44188
LogicalJoin(condition=[=($7, $30)], joinType=[inner]): rowcount = 112.5, cumulative cost = 17812.905154440043, id = 44186
LogicalExchange(distribution=[hash[7]]): rowcount = 50.0, cumulative cost = 15798.092021712584, id = 44181
LogicalFilter(condition=[<($2, 10)]): rowcount = 50.0, cumulative cost = 150.0, id = 44173
LogicalTableScan(table=[[lineorder]]): rowcount = 100.0, cumulative cost = 100.0, id = 44157
LogicalExchange(distribution=[hash[10]]): rowcount = 15.0, cumulative cost = 1902.3131327274587, id = 44182
LogicalFilter(condition=[=($7, 'AFRICA')]): rowcount = 15.0, cumulative cost = 115.0, id = 44174
LogicalTableScan(table=[[customer]]): rowcount = 100.0, cumulative cost = 100.0, id = 44159
```
The explained plan requires leaf stages to return the entire collection of columns
where as:
```
SELECT count(LO_ORDERKEY)
FROM (SELECT LO_ORDERKEY, LO_CUSTKEY FROM lineorder WHERE LO_QUANTITY < 10) AS l JOIN
(SELECT C_CUSTKEY FROM customer WHERE C_REGION = 'AFRICA') AS c ON l.LO_CUSTKEY = c.C_CUSTKEY
LogicalAggregate(group=[{}], EXPR$0=[$SUM0($0)]): rowcount = 1.0, cumulative cost = 2176.042214237391, id = 44268
LogicalExchange(distribution=[hash]): rowcount = 1.0, cumulative cost = 2174.917214237391, id = 44266
LogicalAggregate(group=[{}], EXPR$0=[COUNT()]): rowcount = 1.0, cumulative cost = 2170.917214237391, id = 44261
LogicalJoin(condition=[=($1, $2)], joinType=[inner]): rowcount = 112.5, cumulative cost = 2169.792214237391, id = 44259 <-- NOTE: this is condition =($1, $2)
LogicalExchange(distribution=[hash[1]]): rowcount = 50.0, cumulative cost = 1764.8092021712584, id = 44254
LogicalProject(LO_ORDERKEY=[$0], LO_CUSTKEY=[$7]): rowcount = 50.0, cumulative cost = 200.0, id = 44249
LogicalFilter(condition=[<($2, 10)]): rowcount = 50.0, cumulative cost = 150.0, id = 44239
LogicalTableScan(table=[[lineorder]]): rowcount = 100.0, cumulative cost = 100.0, id = 44198
LogicalExchange(distribution=[hash[0]]): rowcount = 15.0, cumulative cost = 292.4830120661326, id = 44255
LogicalProject(C_CUSTKEY=[$10]): rowcount = 15.0, cumulative cost = 130.0, id = 44251
LogicalFilter(condition=[=($7, 'AFRICA')]): rowcount = 15.0, cumulative cost = 115.0, id = 44244
LogicalTableScan(table=[[customer]]): rowcount = 100.0, cumulative cost = 100.0, id = 44202
```
if we wrapped in sub-query, it properly put the LogicalProject in-place to only return the needed columns.
This should be done automatically - if a count is only operating on one of the columns, no others should be returned via the exchange.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org