You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2022/11/07 20:44:50 UTC

[GitHub] [pinot] ankitsultana opened a new issue, #9748: [multistage] ShuffleRewriteVisitor::canSkipShuffle Needs to Take Into Account Whether Data is on the Same Servers

ankitsultana opened a new issue, #9748:
URL: https://github.com/apache/pinot/issues/9748

   During the shuffle rewrite phase, at present we only look at the partitioning keys to determine whether we can skip shuffle across two stages. Reference: https://github.com/apache/pinot/blob/master/pinot-query-planner/src/main/java/org/apache/pinot/query/planner/logical/ShuffleRewriteVisitor.java#L185
   
   However, it may be even though the partitioning keys are same, the data is actually on different servers. Things are working fine right now since we don't have partitioning keys in TableScan node.
   
   Once we add partitioning keys in TableScan node, we can easily run into this issue if the two tables involved in a join are on different servers but their partitioning keys and join key are the same.
   
   cc: @walterddr 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] 61yao commented on issue #9748: [multistage] ShuffleRewriteVisitor Can Allow Shuffle to be Skipped if Data is on Different Servers

Posted by GitBox <gi...@apache.org>.
61yao commented on issue #9748:
URL: https://github.com/apache/pinot/issues/9748#issuecomment-1311350001

   Can these two assumptions be checked somewhere in the code and we return an error for this case for now? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] walterddr commented on issue #9748: [multistage] ShuffleRewriteVisitor Can Allow Shuffle to be Skipped if Data is on Different Servers

Posted by GitBox <gi...@apache.org>.
walterddr commented on issue #9748:
URL: https://github.com/apache/pinot/issues/9748#issuecomment-1306172169

   +1. the only reason why this util is so simple rely on 2 pre-conditions.
   
   1. all leaf stages are assumed unpartitioned
   2. all intermediate stages assigned with the same set of server (in fact right now ALL server within the DEFAULT tenant)
   
   any of these 2 assumption break could cause skipShuffle issue. 
   
   CC @agavra 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org