You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2021/05/26 01:03:39 UTC

[GitHub] [incubator-pinot] siddharthteotia opened a new issue #6978: Support SQL EXPLAIN PLAN FOR

siddharthteotia opened a new issue #6978:
URL: https://github.com/apache/incubator-pinot/issues/6978


   **Background**
   
   Standard SQL supports EXPLAIN plan functionality. The output of EXPLAIN is the physical execution plan (in a tree format) giving overview of the physical relational operators that will be used to execute the query. The query plan is always interpreted bottom-up (industry standard in databases). 
    
   In most databases (both OLTP and OLAP), the EXPLAIN plan also outputs the cost of the query (for each operator in the tree). The cost is mostly in terms of CPU, I/O that the operator might consume during the course of query execution. The numbers are based on estimates and statistics (rowCount etc) that databases commonly maintain per table for query optimizers to generate the best execution plan during the query planning process. This is commonly referred to as **Estimated execution plan** in database parlance. 
    
   Next is **Actual execution plan** which annotates the operator tree in estimated execution plan with profiling information (wall-clock time, cpu, I/O waits, numRows from one operator to another etc) collected during the execution of the query.
   
   **Proposal**
   
   Add support for `EXPLAIN PLAN FOR <SQL QUERY>`
   
   The output of the command should be the physical execution plan 
   
   - The output format is a design detail yet to be explored but should ideally be a human readable/parseable tree format
   - We can also leverage some nice UI/visualization library to feed the tree into that to generate a nice looking flowchart
   
   Physical execution plan must include the following
   
   - Physical operators that are actually going to execute the query if the query were to be executed.
   - The flow of data from one operator to another (this is where hierarchical / tree view becomes important)
   - Pinot specific information 
   -- Servers that will be queried (info available at the broker)
   -- Segments that will be queried (info available at the broker)
   -- Indexes that will be used (table config)
   
   Note that SQL QUERY inside FOR clause won’t be executed.
   
   Can consider to enhance the cluster manager controller UI that runs the EXPLAIN query to allow for visualizing and saving/downloading the plan (to XML, JSON etc)
   
   We can optionally configure to run EXPLAIN PLAN for displaying the **calcite logical plan**. This may not be feasible for the following reasons
   
   - Calcite planning code needs us to supply planning rules, traits, schema, cost factory for optimization
   - Pinot currently doesn’t have a plan optimizer since our query plans are straightforward. When we do add cost/rule based optimization support (must for N way joins distributed joins), we will have a cost factory, planner rules etc. We can then consider showing a logical plan. 
   - Will require reasonable integration with calcite planner and optimizer. 
   
   **Additional Goal**
   
   We will add support for profiling (timing) the different stages (scan, filter, group by, sort, project, reduce) of the query execution. 
   
   How does this fit in?
   
   When a user runs the SQL query (without EXPLAIN)
   
   - If profiling is turned on we can collect all these statistics and return them as part of the response statistics we return today (RequestStatistics object in broker query API)
   - If profiling is not turned on, we do whatever is done today
   
   When a user runs the EXPLAIN FOR <SQL query>
   
   - If profiling is turned on (or calcite explain plan is run with ALL ATTRIBUTES), we execute the query as well
   - Collect profiling info and return them as part of query response
   - Annotate the physical execution plan with the profiling information for each operator in the tree to get actual physical execution plan
   - So essentially we give the user an option to run EXPLAIN with profiling
   
   This is also the behavior supported by SQL Server for example. 
   
   The key thing here is profiling info. We recently added support for measuring query execution cpu time on servers. It accounts for everything under Combine operator. We should continue to use this over wall-clock/elapsed time. But this requires some performance experiments to see if pushing down cpu time based instrumentation at the operator level will introduce any overhead or not. In any case, we have the cpu time info for Combine operator so that can be put in the actual execution plan. 
   
   We have started working on this. Will be sharing design/details etc with the community. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] siddharthteotia commented on issue #6978: Support SQL EXPLAIN PLAN FOR

Posted by GitBox <gi...@apache.org>.
siddharthteotia commented on issue #6978:
URL: https://github.com/apache/incubator-pinot/issues/6978#issuecomment-848380852


   We have started working on this. Will be sharing design/details etc with the community.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org