You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by "walterddr (via GitHub)" <gi...@apache.org> on 2023/11/16 17:23:56 UTC

[I] [multistage][bug] nested semi-join not applied [pinot]

walterddr opened a new issue, #12014:
URL: https://github.com/apache/pinot/issues/12014

   one of the TPC-DS complex query seems to have semi-join pushdown issues.
   see: 
   ```
   WITH sr_tmp AS
   (
          SELECT sr_item_sk,
                 sr_ticket_number,
                 sr_return_quantity
          FROM   store_returns
          WHERE  sr_reason_sk IN
                 (
                        SELECT r_reason_sk
                        FROM   reason
                        WHERE  r_reason_desc = 'reason 50') 
   ),
   
   sr_tmp2 AS
   (
          SELECT sr_ticket_number
          FROM   store_returns
          WHERE  sr_reason_sk IN
                 (
                        SELECT r_reason_sk
                        FROM   reason
                        WHERE  r_reason_desc = 'reason 50') 
   ),
   
   ss_tmp AS (
     SELECT ss_customer_sk, ss_item_sk, ss_ticket_number, ss_quantity, ss_sales_price
     FROM store_sales 
     WHERE ss_ticket_number IN (SELECT sr_ticket_number FROM sr_tmp2)
   )
   
   SELECT
                   /*+ aggOptions(num_groups_limit='1000000000') */
                   ss_customer_sk,
                   Sum(
                   CASE
                                   WHEN sr_return_quantity IS NOT NULL THEN ( ss_quantity - sr_return_quantity ) * ss_sales_price
                                   ELSE ( ss_quantity                                     * ss_sales_price )
                   END) act_sales
   FROM            ss_tmp
   LEFT OUTER JOIN sr_tmp
   ON              (
                                   sr_item_sk = ss_item_sk
                   AND             sr_ticket_number = ss_ticket_number)
   GROUP BY        ss_customer_sk
   limit 100;
   ```
   the issue was at the inner most join:
   ```
   Execution Plan
   LogicalSort(sort0=[$1], sort1=[$0], dir0=[ASC], dir1=[ASC], offset=[0], fetch=[100])
     PinotLogicalSortExchange(distribution=[hash], collation=[[1, 0]], isSortOnSender=[false], isSortOnReceiver=[true])
       LogicalSort(sort0=[$1], sort1=[$0], dir0=[ASC], dir1=[ASC], fetch=[100])
         LogicalProject(ss_customer_sk=[$0], act_sales=[CASE(=($2, 0), null:DOUBLE, $1)])
           LogicalAggregate(group=[{0}], agg#0=[$SUM0($1)], agg#1=[COUNT($2)])
             PinotLogicalExchange(distribution=[hash[0]])
               LogicalAggregate(group=[{0}], agg#0=[$SUM0($1)], agg#1=[COUNT($1)])
                 LogicalProject(ss_customer_sk=[$0], $f1=[CASE(IS NOT NULL($8), *(-($3, $8), $4), $5)])
                   LogicalJoin(condition=[AND(=($6, $1), =($7, $2))], joinType=[left])
                     PinotLogicalExchange(distribution=[hash[1, 2]])
                       LogicalProject(ss_customer_sk=[$0], ss_item_sk=[$1], ss_ticket_number=[$4], ss_quantity=[$2], ss_sales_price=[$3], EXPR$0=[*($2, $3)])
                         PinotLogicalExchange(distribution=[hash[4]])
                           LogicalJoin(condition=[=($4, $6)], joinType=[semi])
                             LogicalProject(ss_customer_sk=[$6], ss_item_sk=[$13], ss_quantity=[$19], ss_sales_price=[$20], ss_ticket_number=[$24])
                               LogicalTableScan(table=[[store_sales]])
                             PinotLogicalExchange(distribution=[broadcast], relExchangeType=[PIPELINE_BREAKER])
   ------------ THIS JOIN NODE BELOW SHOULD BE SEMI ------------
                               LogicalJoin(condition=[=($0, $2)], joinType=[inner]) 
                                 PinotLogicalExchange(distribution=[hash[0]])
                                   LogicalProject(sr_reason_sk=[$10], sr_ticket_number=[$22])
                                     LogicalTableScan(table=[[store_returns]])
                                 PinotLogicalExchange(distribution=[hash[0]])
                                   LogicalAggregate(group=[{0}])
                                     PinotLogicalExchange(distribution=[hash[0]])
                                       LogicalAggregate(group=[{1}])
                                         LogicalProject(r_reason_desc=[$3], r_reason_sk=[$5])
                                           LogicalFilter(condition=[=($3, 'reason 50')])
                                             LogicalTableScan(table=[[reason]])
                     PinotLogicalExchange(distribution=[hash[0, 1]])
                       LogicalProject(sr_item_sk=[$0], sr_ticket_number=[$3], sr_return_quantity=[$2])
                         PinotLogicalExchange(distribution=[hash[1]])
                           LogicalJoin(condition=[=($1, $5)], joinType=[semi])
                             LogicalProject(sr_item_sk=[$8], sr_reason_sk=[$10], sr_return_quantity=[$14], sr_ticket_number=[$22])
                               LogicalTableScan(table=[[store_returns]])
                             PinotLogicalExchange(distribution=[broadcast], relExchangeType=[PIPELINE_BREAKER])
                               LogicalProject(r_reason_desc=[$3], r_reason_sk=[$5])
                                 LogicalFilter(condition=[=($3, 'reason 50')])
                                   LogicalTableScan(table=[[reason]])
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [I] [multistage][bug] nested multi semi-join not applied [pinot]

Posted by "walterddr (via GitHub)" <gi...@apache.org>.
walterddr closed issue #12014: [multistage][bug] nested multi semi-join not applied 
URL: https://github.com/apache/pinot/issues/12014


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org