You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2022/12/08 19:16:37 UTC

[GitHub] [pinot] walterddr opened a new issue, #9944: [multistage][opt] optimizer not correctly prune out columns

walterddr opened a new issue, #9944:
URL: https://github.com/apache/pinot/issues/9944

   see:
   ```
   SELECT 
     count(LO_ORDERKEY)
   FROM 
     lineorder JOIN customer ON LO_CUSTKEY = C_CUSTKEY
   WHERE 
     LO_QUANTITY < 10 AND C_REGION = 'AFRICA'
   
   LogicalAggregate(group=[{}], EXPR$0=[$SUM0($0)]): rowcount = 1.0, cumulative cost = 17819.155154440043, id = 44195 
     LogicalExchange(distribution=[hash]): rowcount = 1.0, cumulative cost = 17818.030154440043, id = 44193 
       LogicalAggregate(group=[{}], EXPR$0=[COUNT()]): rowcount = 1.0, cumulative cost = 17814.030154440043, id = 44188 
         LogicalJoin(condition=[=($7, $30)], joinType=[inner]): rowcount = 112.5, cumulative cost = 17812.905154440043, id = 44186  
           LogicalExchange(distribution=[hash[7]]): rowcount = 50.0, cumulative cost = 15798.092021712584, id = 44181 
             LogicalFilter(condition=[<($2, 10)]): rowcount = 50.0, cumulative cost = 150.0, id = 44173 
               LogicalTableScan(table=[[lineorder]]): rowcount = 100.0, cumulative cost = 100.0, id = 44157 
           LogicalExchange(distribution=[hash[10]]): rowcount = 15.0, cumulative cost = 1902.3131327274587, id = 44182 
             LogicalFilter(condition=[=($7, 'AFRICA')]): rowcount = 15.0, cumulative cost = 115.0, id = 44174 
               LogicalTableScan(table=[[customer]]): rowcount = 100.0, cumulative cost = 100.0, id = 44159
   ```
   The explained plan requires leaf stages to return the entire collection of columns 
   
   where as:
   ```
   SELECT count(LO_ORDERKEY)
   FROM (SELECT LO_ORDERKEY, LO_CUSTKEY FROM lineorder WHERE LO_QUANTITY < 10) AS l JOIN 
        (SELECT C_CUSTKEY FROM customer WHERE  C_REGION = 'AFRICA') AS c ON l.LO_CUSTKEY = c.C_CUSTKEY
   
   
   LogicalAggregate(group=[{}], EXPR$0=[$SUM0($0)]): rowcount = 1.0, cumulative cost = 2176.042214237391, id = 44268 
     LogicalExchange(distribution=[hash]): rowcount = 1.0, cumulative cost = 2174.917214237391, id = 44266 
       LogicalAggregate(group=[{}], EXPR$0=[COUNT()]): rowcount = 1.0, cumulative cost = 2170.917214237391, id = 44261 
         LogicalJoin(condition=[=($1, $2)], joinType=[inner]): rowcount = 112.5, cumulative cost = 2169.792214237391, id = 44259  <-- NOTE: this is condition =($1, $2)
           LogicalExchange(distribution=[hash[1]]): rowcount = 50.0, cumulative cost = 1764.8092021712584, id = 44254 
             LogicalProject(LO_ORDERKEY=[$0], LO_CUSTKEY=[$7]): rowcount = 50.0, cumulative cost = 200.0, id = 44249 
               LogicalFilter(condition=[<($2, 10)]): rowcount = 50.0, cumulative cost = 150.0, id = 44239 
                 LogicalTableScan(table=[[lineorder]]): rowcount = 100.0, cumulative cost = 100.0, id = 44198 
         	LogicalExchange(distribution=[hash[0]]): rowcount = 15.0, cumulative cost = 292.4830120661326, id = 44255 
         	  LogicalProject(C_CUSTKEY=[$10]): rowcount = 15.0, cumulative cost = 130.0, id = 44251 
         	    LogicalFilter(condition=[=($7, 'AFRICA')]): rowcount = 15.0, cumulative cost = 115.0, id = 44244 
         	      LogicalTableScan(table=[[customer]]): rowcount = 100.0, cumulative cost = 100.0, id = 44202
   ```
   if we wrapped in sub-query, it properly put the LogicalProject in-place to only return the needed columns.
   
   This should be done automatically - if a count is only operating on one of the columns, no others should be returned via the exchange.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org