You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by "walterddr (via GitHub)" <gi...@apache.org> on 2023/05/10 22:59:32 UTC

[GitHub] [pinot] walterddr opened a new issue, #10755: [multistage] proposal for a new executor scheduling model

walterddr opened a new issue, #10755:
URL: https://github.com/apache/pinot/issues/10755

   Hi All, 
   we were discussing the threading model of the current V2 engine execution runtime and we observed several issues
   
   Issues
   ===
   1. under high QPS environment we have many thread contentions due to thread waiting on mailbox sender; 
   2. we have some issues with a single opChain waiting for multiple mailbox receiving when the behavior; 
   3. we also wanted to add a "pipeline breaker" concept to the executing opChain where some of the operators will not produce any data until all input block has been consumed (for example sort operator or aggregate/group-by operator)
   
   Status Quo
   ===
   the current execution model requires every one of these runtime operators to return a no-op block and yield the opChain out of the execution threadpool b/c it can potentially cause platform thread to distributedly deadlock (see more detail in https://docs.google.com/document/d/1Vh_UAaY9WWB3dfRUTAoNzU4BrPAQEni7Xd6w84SH_Ow/edit# and https://docs.google.com/document/d/1XAMHAlhFbINvX-kK1ANlzbRz4_RkS0map4qhqs1yDtE/edit?usp=drivesdk)
   
   this is extremely efficient in terms of managing the opChain entering and leaving the scheduler service and thus we were able to utilize the threadpool with less os thread context switching; however it also posts several problems:
   - there are many problem maintaining the no-op block populating logic and each operator seems to be a bit different in handling the no-op population
   - it is a bit hard to implement multiple wake up b/c it doesn't allow the system to resume from a particular parked operator (entry point is always root of opchain)
   - it makes the operator chain complex -- as it needs to handle blocking context
   
   Proposal
   ===
   our goal is to 
   1. unblock the opChain thread content (where no-op block are not properly returned, or cannot be properly returned)
   2. makes it easy to add new operators without handling context of blocking on each operator
   3. adding more complex blocking mechansim without distributed deadlock (such as pipeline breaker mentioned above)
   
   Candidate Solutions
   ===
   1. we can make no-op block handle solely in multistage operator base class, this would still be complex as the operator needs to call base class logic; although it will make the operator logic still somewhere blocking context aware. 
   2. we can also remove no-op block and directly return the blocking signal from operator (operator has access to opChainExecutionContext which doesn't require bottom-up return of opchain)
   
   POC Status
   ===
   we've implemented the candidate solution (2) above in https://github.com/walterddr/pinot/pull/58, for several reasons
   
   PRO
   ----
   * it is vastly simpler in terms of operator and opchain. as other than mailbox operator nothing is blocking (see the lines added vs lines removed in main classes)
   * it gives possibility for different type of wait-notify (e.g. park on operator, opchain root, and wait on different signals for wake up, b/c all signaling are through opChainExecutionContext)
   * with https://openjdk.org/projects/panama/ we think it is best to let JVM handle the threadpool resource management
       * bare in mind that, the current model is still MEM heavy, as the OpChain/Context bares the entirety of the operator Chain's memory footprint.
   
   CON
   ----
   * from our basic test, we observed ~10-20% thread context switching overhead under high QPS load
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org