You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by "ankitsultana (via GitHub)" <gi...@apache.org> on 2023/04/18 16:51:41 UTC

[GitHub] [pinot] ankitsultana commented on issue #10628: [multistage] Query with Anti Semi-Join Fails

ankitsultana commented on issue #10628:
URL: https://github.com/apache/pinot/issues/10628#issuecomment-1513495617

   > what prompts the aggregate function to be used here as MIN(bool)?
   
   As an example take this query:
   
   ```
   explain plan for select userUUID, deviceOS from userAttributes_OFFLINE where userUUID NOT IN (SELECT userUUID FROM userGroups_OFFLINE WHERE groupUUID = 'group-1')
   ```
   
   This will have this plan:
   
   ```
   Execution Plan
   LogicalProject(userUUID=[$1], deviceOS=[$0])
     LogicalFilter(condition=[IS NOT TRUE($4)])
       LogicalJoin(condition=[=($2, $3)], joinType=[left])
         LogicalExchange(distribution=[hash[2]])
           LogicalProject(deviceOS=[$4], userUUID=[$6], userUUID0=[$6])
             LogicalTableScan(table=[[userAttributes_OFFLINE]])
         LogicalExchange(distribution=[hash[0]])
           LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
             LogicalExchange(distribution=[hash[0]])
               LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
                 LogicalProject(userUUID=[$4], $f1=[true])
                   LogicalFilter(condition=[=($3, 'group-1')])
                     LogicalTableScan(table=[[userGroups_OFFLINE]])
   ```
   
   For NOT IN Calcite creates a left-join and adds a virtual column called f0 to the right-side which has all values as true. After the join, only those rows are kept that have this column value as false. In this particular case since the columns are assumed to be non-nullable, if the left-side has a userUUID which is not there in the result-set of the right side of the join, the value of this virtual column is assumed to be false.
   
   I think nullability might also have a role to play here. One could ask why should the default value be false and not-null (in this case it could be because all columns are non-nullable on master right now).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org