You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by "walterddr (via GitHub)" <gi...@apache.org> on 2023/11/16 17:21:15 UTC

[I] [multistage][bug] multi semi-join hint not applied in weird conditions [pinot]

walterddr opened a new issue, #12013:
URL: https://github.com/apache/pinot/issues/12013

   #11937 introduces multi-join capability however there are several scenarios that this didn't apply
   ```
   select /+ joinOptions(join_strategy = 'dynamic_broadcast') /
   --[work]      key, COUNT(*)
   --[work]      *
   --[work]      sum(val)
   --[work]      sum(1)
   --[not work]  count(val)
   --[not work]  count(*)
   from tbl
   where col1 IN (select col from  dim1 where  ...)
       and col2 IN (select col from dim2 where ...)
       and col3 IN (select col from dim3 where ...)
   group by key
   ```
   
   specificallly when COUNT(*) is being used without agg-group, multi-semi join rules doesn't apply and 2 of the 3 IN clause are still modeled as shuffled inner-join


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [I] [multistage][bug] multi semi-join hint not applied in weird conditions [pinot]

Posted by "walterddr (via GitHub)" <gi...@apache.org>.
walterddr commented on issue #12013:
URL: https://github.com/apache/pinot/issues/12013#issuecomment-1823213803

   after #12014 was fixed, this issue reduces to the fact that when `COUNT(*)` or `COUNT(col)` applies to a non-nullable column. the plan will not generate a definitive SEMI-JOIN indicator above the join node.
   
   the solution can either be 
   1. propagate the semi-join indicator (e.g. through the subtree there's no access to any of the RHS table
   2. add a project above the COUNT(*) to indicate the semi-join 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [I] [multistage][bug] multi semi-join hint not applied in weird conditions [pinot]

Posted by "walterddr (via GitHub)" <gi...@apache.org>.
walterddr closed issue #12013: [multistage][bug] multi semi-join hint not applied in weird conditions
URL: https://github.com/apache/pinot/issues/12013


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [I] [multistage][bug] multi semi-join hint not applied in weird conditions [pinot]

Posted by "walterddr (via GitHub)" <gi...@apache.org>.
walterddr commented on issue #12013:
URL: https://github.com/apache/pinot/issues/12013#issuecomment-1828372664

   dug a bit deeper and realized that it is b/c of PROJECT_JOIN_TRANSPOSE rule actually extend the project into 2 parts one above and one below each join. thus all the SEMI-JOIN rules are applied properly. 
   
   however there's no AGGREGATE_JOIN_TRANSPOSE rule doing the same thing (more correctly it is a AGGREGATE_PROJECT_JOIN_TRANSPOSE). 
   
   so solution 1 above is no longer valid (or not needed)
   solution 2 is the right way to go with a new rule. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org